Re: Best strategy migrate indexes

Trejkaz Wed, 02 Nov 2022 14:57:51 -0700

Was a typo, meant to say luceneupgrader.

And by itself, it won't do any kind of work to convert fields between
different types.
For that, you have to do what I described.


TX

On Thu, 3 Nov 2022 at 07:14, Pablo Vázquez Blázquez <pabl...@gmail.com> wrote:
>
> Hi,
>
> Luckily we were already using lucenemigrator
>
>
> What do you mean with "lucenemigrator"? Is it a public tool?
>
> I am trying to create a tool to read docs from a lucene5 index and generate
> lucene9 documents from them (with docValues). That might work, right? I am
> shading both lucene5 and lucene9 to avoid package conflicts.
>
> Thanks!
>
> El mar, 1 nov 2022 a las 0:35, Trejkaz (<trej...@trypticon.org>) escribió:
>
> > Well...
> >
> > There's a way, but I wouldn't necessarily recommend it.
> >
> > You can write custom migration code against some version of Lucene
> > which supports doc values, to create doc values fields. It's going to
> > involve writing a FilterCodecReader which wraps your real index and
> > then pretends to also have doc values, which you'll build in a custom
> > class which works similarly to UninvertingReader. Then you pass those
> > CodecReaders to IndexWriter.addIndexes to create a new index which
> > really has those doc values.
> >
> > We did that ourselves when we had the same issue. The only painful
> > thing about it is having to keep around older versions of lucene to do
> > that migration. Forever. Luckily we were already using lucenemigrator,
> > which has the older versions baked into it with package prefixes. So
> > that library will get fatter and fatter over time but at least our own
> > code only gets fatter at the rate migrations are added.
> >
> > The same approach works for any other kind of ad-hoc migration you
> > might want to perform. e.g., you might want to create points. Or
> > remove an index for a field. Or add an index for a field.
> >
> > TX
> >
> >
> > On Tue, 1 Nov 2022 at 02:57, Pablo Vázquez Blázquez <pabl...@gmail.com>
> > wrote:
> > >
> > > Hi all,
> > >
> > > Thank you all for your responses.
> > >
> > > So, when updating to a newer (major) Lucene version that modifies its
> > > codecs, there is no way to ensure everything keeps working properly,
> > unless
> > > re-indexing, right?
> > >
> > > Apart from not having some original sources that were indexed (which I
> > will
> > > try to solve by using the *IndexUpgrader *tool), I have another problem:
> > I
> > > was using the org.apache.lucene.uninverting.UninvertingReader to perform
> > > queries against the index, mainly using the grouping api. But currently,
> > it
> > > was removed (since Lucene 7.0). So, again, do I have any other
> > alternative,
> > > apart from re-indexing to use docValues?
> > >
> > > To give you more context, I am a developer of a tool that multiple
> > > customers can use to index their data (currently, with Lucene 5.5.5). We
> > > are planning to upgrade to Lucene 9 (because of some vulnerabilities
> > > affecting Lucene 5.5.5) and I think asking them to reindex will not go
> > down
> > > well :(
> > >
> > > Regards,
> > >
> > > El sáb, 29 oct 2022 a las 23:31, Matt Davis (<kryptonics...@gmail.com>)
> > > escribió:
> > >
> > > > Inside of Zulia search engine, the object being indexed is always a
> > > > JSON/BSON object and we store the BSON as a stored byte field in the
> > > > index.  This allows easy internal reindexing when the searchable fields
> > > > change but also allows us to update to the latest lucene version.
> > > >  Combined with using lucene-backward-codecs an older index than the
> > current
> > > > major version can be opened and reindexed.  If you have stored all the
> > > > fields (or a json/bson) in the index, it would be easy to reindex in
> > the
> > > > new format.  If you have not, maybe opening with lucene-backward-codecs
> > > > will be enough for your use case.
> > > >
> > > > Thanks,
> > > > Matt
> > > >
> > > > On Sat, Oct 29, 2022 at 2:30 PM Baris Kazar <baris.ka...@oracle.com>
> > > > wrote:
> > > >
> > > > > It is always great practice to retain non-indexed
> > > > > data since when Lucene changes version,
> > > > > even minor version, I always reindex.
> > > > >
> > > > > Best regards
> > > > > ________________________________
> > > > > From: Gus Heck <gus.h...@gmail.com>
> > > > > Sent: Saturday, October 29, 2022 2:17 PM
> > > > > To: java-user@lucene.apache.org <java-user@lucene.apache.org>
> > > > > Subject: Re: Best strategy migrate indexes
> > > > >
> > > > > Hi Pablo,
> > > > >
> > > > > The deafening silence is probably nobody wanting to give you the bad
> > > > news.
> > > > > You are on a mission that may not be feasible, and even if you can
> > get it
> > > > > to "work", the end result won't likely be equivalent to indexing the
> > > > > original data with Lucene 9.x. The indexing process is fundamentally
> > > > lossy
> > > > > and information originally used to produce non-stored fields will
> > have
> > > > been
> > > > > thrown out. A simple example is things like stopwords or anything
> > > > analyzed
> > > > > with subclasses of FilteringTokenFilter. If the stop word list
> > changed,
> > > > or
> > > > > the details of one of these filters changed (bugfix?), you will end
> > up
> > > > with
> > > > > a different result than indexing with 9.x. This is just one
> > > > > example, another would be stemming where the index likely only
> > contains
> > > > the
> > > > > stem, not the whole word. Other folks who are more interested in the
> > > > > details of our codecs than I am can probably provide further
> > examples on
> > > > a
> > > > > more fundamental level. Lucene is not a database, and the source
> > > > documents
> > > > > should always be retained in a form that can be reindexed. If you
> > have
> > > > > inherited a system where source material has not been retained, you
> > have
> > > > a
> > > > > difficult project and may have some potentially painful expectation
> > > > setting
> > > > > to perform.
> > > > >
> > > > > Best,
> > > > > Gus
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Oct 28, 2022 at 8:01 AM Pablo Vázquez Blázquez <
> > > > pabl...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I have some indices indexed with lucene 5.5.0. I have updated my
> > > > > > dependencies and code to Lucene 7 (but my final goal is to use
> > Lucene
> > > > 9)
> > > > > > and when trying to work with them I am having the exception:
> > > > > > org.apache.lucene.index.IndexFormatTooOldException: Format version
> > is
> > > > not
> > > > > > supported (resource
> > > > > >
> > > > > >
> > > > >
> > > >
> > BufferedChecksumIndexInput(MMapIndexInput(path=".......\tests\segments_b"))):
> > > > > > this index is too old (version: 5.5.0). This version of Lucene only
> > > > > > supports indexes created with release 6.0 and later.
> > > > > >
> > > > > > I want to migrate from Lucene 5.x to Lucene 9.x. Which is the best
> > > > > > strategy? Is there any tool to migrate the indices? Is it
> > mandatory to
> > > > > > reindex? In this case, how can I deal with this when I do not have
> > the
> > > > > > sources of documents that generated my current indices (I mean, I
> > just
> > > > > have
> > > > > > the indices themselves)?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > --
> > > > > > Pablo Vázquez
> > > > > > (pabl...@gmail.com)
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > >
> > > >
> > https://urldefense.com/v3/__http://www.needhamsoftware.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuio4iIYARA$
> > > > >  (work)
> > > > >
> > > > >
> > > >
> > https://urldefense.com/v3/__http://www.the111shift.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuirxfFWpEQ$
> > > > >  (play)
> > > > >
> > > >
> > >
> > >
> > > --
> > > Pablo Vázquez
> > > (pabl...@gmail.com)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
> --
> Pablo Vázquez
> (pabl...@gmail.com)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Best strategy migrate indexes

Reply via email to