Re: Best strategy migrate indexes

Trejkaz Mon, 31 Oct 2022 16:35:49 -0700

Well...

There's a way, but I wouldn't necessarily recommend it.


You can write custom migration code against some version of Lucene
which supports doc values, to create doc values fields. It's going to
involve writing a FilterCodecReader which wraps your real index and
then pretends to also have doc values, which you'll build in a custom
class which works similarly to UninvertingReader. Then you pass those
CodecReaders to IndexWriter.addIndexes to create a new index which
really has those doc values.

We did that ourselves when we had the same issue. The only painful
thing about it is having to keep around older versions of lucene to do
that migration. Forever. Luckily we were already using lucenemigrator,
which has the older versions baked into it with package prefixes. So
that library will get fatter and fatter over time but at least our own
code only gets fatter at the rate migrations are added.

The same approach works for any other kind of ad-hoc migration you
might want to perform. e.g., you might want to create points. Or
remove an index for a field. Or add an index for a field.

TX


On Tue, 1 Nov 2022 at 02:57, Pablo Vázquez Blázquez <[email protected]> wrote:
>
> Hi all,
>
> Thank you all for your responses.
>
> So, when updating to a newer (major) Lucene version that modifies its
> codecs, there is no way to ensure everything keeps working properly, unless
> re-indexing, right?
>
> Apart from not having some original sources that were indexed (which I will
> try to solve by using the *IndexUpgrader *tool), I have another problem: I
> was using the org.apache.lucene.uninverting.UninvertingReader to perform
> queries against the index, mainly using the grouping api. But currently, it
> was removed (since Lucene 7.0). So, again, do I have any other alternative,
> apart from re-indexing to use docValues?
>
> To give you more context, I am a developer of a tool that multiple
> customers can use to index their data (currently, with Lucene 5.5.5). We
> are planning to upgrade to Lucene 9 (because of some vulnerabilities
> affecting Lucene 5.5.5) and I think asking them to reindex will not go down
> well :(
>
> Regards,
>
> El sáb, 29 oct 2022 a las 23:31, Matt Davis (<[email protected]>)
> escribió:
>
> > Inside of Zulia search engine, the object being indexed is always a
> > JSON/BSON object and we store the BSON as a stored byte field in the
> > index.  This allows easy internal reindexing when the searchable fields
> > change but also allows us to update to the latest lucene version.
> >  Combined with using lucene-backward-codecs an older index than the current
> > major version can be opened and reindexed.  If you have stored all the
> > fields (or a json/bson) in the index, it would be easy to reindex in the
> > new format.  If you have not, maybe opening with lucene-backward-codecs
> > will be enough for your use case.
> >
> > Thanks,
> > Matt
> >
> > On Sat, Oct 29, 2022 at 2:30 PM Baris Kazar <[email protected]>
> > wrote:
> >
> > > It is always great practice to retain non-indexed
> > > data since when Lucene changes version,
> > > even minor version, I always reindex.
> > >
> > > Best regards
> > > ________________________________
> > > From: Gus Heck <[email protected]>
> > > Sent: Saturday, October 29, 2022 2:17 PM
> > > To: [email protected] <[email protected]>
> > > Subject: Re: Best strategy migrate indexes
> > >
> > > Hi Pablo,
> > >
> > > The deafening silence is probably nobody wanting to give you the bad
> > news.
> > > You are on a mission that may not be feasible, and even if you can get it
> > > to "work", the end result won't likely be equivalent to indexing the
> > > original data with Lucene 9.x. The indexing process is fundamentally
> > lossy
> > > and information originally used to produce non-stored fields will have
> > been
> > > thrown out. A simple example is things like stopwords or anything
> > analyzed
> > > with subclasses of FilteringTokenFilter. If the stop word list changed,
> > or
> > > the details of one of these filters changed (bugfix?), you will end up
> > with
> > > a different result than indexing with 9.x. This is just one
> > > example, another would be stemming where the index likely only contains
> > the
> > > stem, not the whole word. Other folks who are more interested in the
> > > details of our codecs than I am can probably provide further examples on
> > a
> > > more fundamental level. Lucene is not a database, and the source
> > documents
> > > should always be retained in a form that can be reindexed. If you have
> > > inherited a system where source material has not been retained, you have
> > a
> > > difficult project and may have some potentially painful expectation
> > setting
> > > to perform.
> > >
> > > Best,
> > > Gus
> > >
> > >
> > >
> > > On Fri, Oct 28, 2022 at 8:01 AM Pablo Vázquez Blázquez <
> > [email protected]>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have some indices indexed with lucene 5.5.0. I have updated my
> > > > dependencies and code to Lucene 7 (but my final goal is to use Lucene
> > 9)
> > > > and when trying to work with them I am having the exception:
> > > > org.apache.lucene.index.IndexFormatTooOldException: Format version is
> > not
> > > > supported (resource
> > > >
> > > >
> > >
> > BufferedChecksumIndexInput(MMapIndexInput(path=".......\tests\segments_b"))):
> > > > this index is too old (version: 5.5.0). This version of Lucene only
> > > > supports indexes created with release 6.0 and later.
> > > >
> > > > I want to migrate from Lucene 5.x to Lucene 9.x. Which is the best
> > > > strategy? Is there any tool to migrate the indices? Is it mandatory to
> > > > reindex? In this case, how can I deal with this when I do not have the
> > > > sources of documents that generated my current indices (I mean, I just
> > > have
> > > > the indices themselves)?
> > > >
> > > > Thanks,
> > > >
> > > > --
> > > > Pablo Vázquez
> > > > ([email protected])
> > > >
> > >
> > >
> > > --
> > >
> > >
> > https://urldefense.com/v3/__http://www.needhamsoftware.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuio4iIYARA$
> > >  (work)
> > >
> > >
> > https://urldefense.com/v3/__http://www.the111shift.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuirxfFWpEQ$
> > >  (play)
> > >
> >
>
>
> --
> Pablo Vázquez
> ([email protected])

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Best strategy migrate indexes

Reply via email to