Re: Best strategy migrate indexes

Matt Davis Sat, 29 Oct 2022 14:31:04 -0700

Inside of Zulia search engine, the object being indexed is always a
JSON/BSON object and we store the BSON as a stored byte field in the
index.  This allows easy internal reindexing when the searchable fields
change but also allows us to update to the latest lucene version.
 Combined with using lucene-backward-codecs an older index than the current
major version can be opened and reindexed.  If you have stored all the
fields (or a json/bson) in the index, it would be easy to reindex in the
new format.  If you have not, maybe opening with lucene-backward-codecs
will be enough for your use case.


Thanks,
Matt

On Sat, Oct 29, 2022 at 2:30 PM Baris Kazar <[email protected]> wrote:

> It is always great practice to retain non-indexed
> data since when Lucene changes version,
> even minor version, I always reindex.
>
> Best regards
> ________________________________
> From: Gus Heck <[email protected]>
> Sent: Saturday, October 29, 2022 2:17 PM
> To: [email protected] <[email protected]>
> Subject: Re: Best strategy migrate indexes
>
> Hi Pablo,
>
> The deafening silence is probably nobody wanting to give you the bad news.
> You are on a mission that may not be feasible, and even if you can get it
> to "work", the end result won't likely be equivalent to indexing the
> original data with Lucene 9.x. The indexing process is fundamentally lossy
> and information originally used to produce non-stored fields will have been
> thrown out. A simple example is things like stopwords or anything analyzed
> with subclasses of FilteringTokenFilter. If the stop word list changed, or
> the details of one of these filters changed (bugfix?), you will end up with
> a different result than indexing with 9.x. This is just one
> example, another would be stemming where the index likely only contains the
> stem, not the whole word. Other folks who are more interested in the
> details of our codecs than I am can probably provide further examples on a
> more fundamental level. Lucene is not a database, and the source documents
> should always be retained in a form that can be reindexed. If you have
> inherited a system where source material has not been retained, you have a
> difficult project and may have some potentially painful expectation setting
> to perform.
>
> Best,
> Gus
>
>
>
> On Fri, Oct 28, 2022 at 8:01 AM Pablo Vázquez Blázquez <[email protected]>
> wrote:
>
> > Hi all,
> >
> > I have some indices indexed with lucene 5.5.0. I have updated my
> > dependencies and code to Lucene 7 (but my final goal is to use Lucene 9)
> > and when trying to work with them I am having the exception:
> > org.apache.lucene.index.IndexFormatTooOldException: Format version is not
> > supported (resource
> >
> >
> BufferedChecksumIndexInput(MMapIndexInput(path=".......\tests\segments_b"))):
> > this index is too old (version: 5.5.0). This version of Lucene only
> > supports indexes created with release 6.0 and later.
> >
> > I want to migrate from Lucene 5.x to Lucene 9.x. Which is the best
> > strategy? Is there any tool to migrate the indices? Is it mandatory to
> > reindex? In this case, how can I deal with this when I do not have the
> > sources of documents that generated my current indices (I mean, I just
> have
> > the indices themselves)?
> >
> > Thanks,
> >
> > --
> > Pablo Vázquez
> > ([email protected])
> >
>
>
> --
>
> https://urldefense.com/v3/__http://www.needhamsoftware.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuio4iIYARA$
>  (work)
>
> https://urldefense.com/v3/__http://www.the111shift.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuirxfFWpEQ$
>  (play)
>

Re: Best strategy migrate indexes

Reply via email to