Re: Best strategy migrate indexes

Pablo Vázquez Blázquez Tue, 08 Nov 2022 01:06:41 -0800

Yes, it looks like this, but I am able to open that index with Luke 7.7.0

[image: image.png]


and it shows version 7.7.3.

I have triple-checked my program and all lucene classes are from Lucene5
and Lucene7.

Despite my final goal is to migrate to Lucene9, I want to do it
progressively, to test the api changes in my code and pass my tests. So, I
am currently migrating to Lucene7. As Luke 7.7.0 can open that migrated
index, I am moving from Lucene 7.0.0 to Lucene 7.7.0 and see if that works.

Regards.

El mar, 8 nov 2022 a las 1:25, Michael Sokolov (<msoko...@gmail.com>)
escribió:

> The error you got
>
>
> BufferedChecksumIndexInput(MMapIndexInput(path="tests_small_index-7.x-migrator\segments_1"))):
> 9 (needs to be between 6 and 7)
>
> indicates that the index you are reading was written by Lucene 9, so
> things are not set up the way you described (writing using Lucene 7)
>
>
> > Thanks TX for your response.
> >
> > I would check that the Luke version matches the Lucene version - if
> > > the two match, it shouldn't be possible to get issues like this.
> > > That is, the precise versions of Lucene each is using.
> >
> >
> > Yes, I am using
> https://github.com/DmitryKey/luke/releases/tag/luke-7.1.0
> >
> > It works ok with my new generated indexes, but it does not with the
> > "migrated" ones.
> >
> > El lun, 7 nov 2022 a las 12:18, Trejkaz (<trej...@trypticon.org>)
> escribió:
> >
> > > The process itself sounds like it should work (it's basically a
> > > reindex so it should be safer than trying to migrate directly.)
> > >
> > > I would check that the Luke version matches the Lucene version - if
> > > the two match, it shouldn't be possible to get issues like this.
> > > That is, the precise versions of Lucene each is using.
> > >
> > > TX
> > >
> > >
> > > On Mon, 7 Nov 2022 at 22:09, Pablo Vázquez Blázquez <pabl...@gmail.com
> >
> > > wrote:
> > > >
> > > > Hi!
> > > >
> > > > > I am trying to create a tool to read docs from a lucene5 index and
> > > > generate lucene9 documents from them (with docValues). That might
> work,
> > > > right? I am shading both lucene5 and lucene9 to avoid package
> conflicts.
> > > >
> > > > I am doing the following steps:
> > > >
> > > > - create IndexReader with lucene5 package over a lucene5 index
> > > > - create IndexWriter with lucene7 package
> > > > - iterate over reader.numDocs() to process each Document (lucene5)
> > > >     - convert each Document (lucene5) to lucene7 Document
> > > >         - for each IndexableField (lucene5) from Document (lucene5)
> > > convert
> > > > it to create an IndexableField (lucene7)
> > > >             - create a SortedDocValuesField (lucene7) and add it to
> the
> > > > Document (lucene7)
> > > >             - add the field to the Document (lucene7)
> > > >     - add each converted Document to the writer
> > > > - close  IndexReader and IndexWriter
> > > >
> > > > When I open the resulting migrated lucene7 index with Luke I got an
> > > error:
> > > > org.apache.lucene.index.IndexFormatTooNewException: Format version
> is not
> > > > supported (resource
> > > >
> > >
> BufferedChecksumIndexInput(MMapIndexInput(path="tests_small_index-7.x-migrator\segments_1"))):
> > > > 9 (needs to be between 6 and 7)
> > > >
> > > > When I use the tool "luceneupgrader
> > > > <https://github.com/hakanai/luceneupgrader>", I got:
> > > > java -jar luceneupgrader-0.5.2-SNAPSHOT.jar info
> > > > tests_small_index-7.x-migrator
> > > > Lucene index version: 7
> > > >
> > > > What am I doing wrong or misleading?
> > > >
> > > > Thanks!
> > > >
> > > > El mié, 2 nov 2022 a las 21:13, Pablo Vázquez Blázquez (<
> > > pabl...@gmail.com>)
> > > > escribió:
> > > >
> > > > > Hi,
> > > > >
> > > > > Luckily we were already using lucenemigrator
> > > > >
> > > > >
> > > > > What do you mean with "lucenemigrator"? Is it a public tool?
> > > > >
> > > > > I am trying to create a tool to read docs from a lucene5 index and
> > > > > generate lucene9 documents from them (with docValues). That might
> work,
> > > > > right? I am shading both lucene5 and lucene9 to avoid package
> > > conflicts.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > El mar, 1 nov 2022 a las 0:35, Trejkaz (<trej...@trypticon.org>)
> > > escribió:
> > > > >
> > > > >> Well...
> > > > >>
> > > > >> There's a way, but I wouldn't necessarily recommend it.
> > > > >>
> > > > >> You can write custom migration code against some version of Lucene
> > > > >> which supports doc values, to create doc values fields. It's
> going to
> > > > >> involve writing a FilterCodecReader which wraps your real index
> and
> > > > >> then pretends to also have doc values, which you'll build in a
> custom
> > > > >> class which works similarly to UninvertingReader. Then you pass
> those
> > > > >> CodecReaders to IndexWriter.addIndexes to create a new index which
> > > > >> really has those doc values.
> > > > >>
> > > > >> We did that ourselves when we had the same issue. The only painful
> > > > >> thing about it is having to keep around older versions of lucene
> to do
> > > > >> that migration. Forever. Luckily we were already using
> lucenemigrator,
> > > > >> which has the older versions baked into it with package prefixes.
> So
> > > > >> that library will get fatter and fatter over time but at least
> our own
> > > > >> code only gets fatter at the rate migrations are added.
> > > > >>
> > > > >> The same approach works for any other kind of ad-hoc migration you
> > > > >> might want to perform. e.g., you might want to create points. Or
> > > > >> remove an index for a field. Or add an index for a field.
> > > > >>
> > > > >> TX
> > > > >>
> > > > >>
> > > > >> On Tue, 1 Nov 2022 at 02:57, Pablo Vázquez Blázquez <
> > > pabl...@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > Hi all,
> > > > >> >
> > > > >> > Thank you all for your responses.
> > > > >> >
> > > > >> > So, when updating to a newer (major) Lucene version that
> modifies
> > > its
> > > > >> > codecs, there is no way to ensure everything keeps working
> properly,
> > > > >> unless
> > > > >> > re-indexing, right?
> > > > >> >
> > > > >> > Apart from not having some original sources that were indexed
> > > (which I
> > > > >> will
> > > > >> > try to solve by using the *IndexUpgrader *tool), I have another
> > > > >> problem: I
> > > > >> > was using the org.apache.lucene.uninverting.UninvertingReader to
> > > perform
> > > > >> > queries against the index, mainly using the grouping api. But
> > > > >> currently, it
> > > > >> > was removed (since Lucene 7.0). So, again, do I have any other
> > > > >> alternative,
> > > > >> > apart from re-indexing to use docValues?
> > > > >> >
> > > > >> > To give you more context, I am a developer of a tool that
> multiple
> > > > >> > customers can use to index their data (currently, with Lucene
> > > 5.5.5). We
> > > > >> > are planning to upgrade to Lucene 9 (because of some
> vulnerabilities
> > > > >> > affecting Lucene 5.5.5) and I think asking them to reindex will
> not
> > > go
> > > > >> down
> > > > >> > well :(
> > > > >> >
> > > > >> > Regards,
> > > > >> >
> > > > >> > El sáb, 29 oct 2022 a las 23:31, Matt Davis (<
> > > kryptonics...@gmail.com>)
> > > > >> > escribió:
> > > > >> >
> > > > >> > > Inside of Zulia search engine, the object being indexed is
> always
> > > a
> > > > >> > > JSON/BSON object and we store the BSON as a stored byte field
> in
> > > the
> > > > >> > > index.  This allows easy internal reindexing when the
> searchable
> > > > >> fields
> > > > >> > > change but also allows us to update to the latest lucene
> version.
> > > > >> > >  Combined with using lucene-backward-codecs an older index
> than
> > > the
> > > > >> current
> > > > >> > > major version can be opened and reindexed.  If you have stored
> > > all the
> > > > >> > > fields (or a json/bson) in the index, it would be easy to
> reindex
> > > in
> > > > >> the
> > > > >> > > new format.  If you have not, maybe opening with
> > > > >> lucene-backward-codecs
> > > > >> > > will be enough for your use case.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Matt
> > > > >> > >
> > > > >> > > On Sat, Oct 29, 2022 at 2:30 PM Baris Kazar <
> > > baris.ka...@oracle.com>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > It is always great practice to retain non-indexed
> > > > >> > > > data since when Lucene changes version,
> > > > >> > > > even minor version, I always reindex.
> > > > >> > > >
> > > > >> > > > Best regards
> > > > >> > > > ________________________________
> > > > >> > > > From: Gus Heck <gus.h...@gmail.com>
> > > > >> > > > Sent: Saturday, October 29, 2022 2:17 PM
> > > > >> > > > To: java-user@lucene.apache.org <
> java-user@lucene.apache.org>
> > > > >> > > > Subject: Re: Best strategy migrate indexes
> > > > >> > > >
> > > > >> > > > Hi Pablo,
> > > > >> > > >
> > > > >> > > > The deafening silence is probably nobody wanting to give you
> > > the bad
> > > > >> > > news.
> > > > >> > > > You are on a mission that may not be feasible, and even if
> you
> > > can
> > > > >> get it
> > > > >> > > > to "work", the end result won't likely be equivalent to
> > > indexing the
> > > > >> > > > original data with Lucene 9.x. The indexing process is
> > > fundamentally
> > > > >> > > lossy
> > > > >> > > > and information originally used to produce non-stored fields
> > > will
> > > > >> have
> > > > >> > > been
> > > > >> > > > thrown out. A simple example is things like stopwords or
> > > anything
> > > > >> > > analyzed
> > > > >> > > > with subclasses of FilteringTokenFilter. If the stop word
> list
> > > > >> changed,
> > > > >> > > or
> > > > >> > > > the details of one of these filters changed (bugfix?), you
> will
> > > end
> > > > >> up
> > > > >> > > with
> > > > >> > > > a different result than indexing with 9.x. This is just one
> > > > >> > > > example, another would be stemming where the index likely
> only
> > > > >> contains
> > > > >> > > the
> > > > >> > > > stem, not the whole word. Other folks who are more
> interested
> > > in the
> > > > >> > > > details of our codecs than I am can probably provide further
> > > > >> examples on
> > > > >> > > a
> > > > >> > > > more fundamental level. Lucene is not a database, and the
> source
> > > > >> > > documents
> > > > >> > > > should always be retained in a form that can be reindexed.
> If
> > > you
> > > > >> have
> > > > >> > > > inherited a system where source material has not been
> retained,
> > > you
> > > > >> have
> > > > >> > > a
> > > > >> > > > difficult project and may have some potentially painful
> > > expectation
> > > > >> > > setting
> > > > >> > > > to perform.
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > > Gus
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Fri, Oct 28, 2022 at 8:01 AM Pablo Vázquez Blázquez <
> > > > >> > > pabl...@gmail.com>
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > Hi all,
> > > > >> > > > >
> > > > >> > > > > I have some indices indexed with lucene 5.5.0. I have
> updated
> > > my
> > > > >> > > > > dependencies and code to Lucene 7 (but my final goal is
> to use
> > > > >> Lucene
> > > > >> > > 9)
> > > > >> > > > > and when trying to work with them I am having the
> exception:
> > > > >> > > > > org.apache.lucene.index.IndexFormatTooOldException: Format
> > > > >> version is
> > > > >> > > not
> > > > >> > > > > supported (resource
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >>
> > >
> BufferedChecksumIndexInput(MMapIndexInput(path=".......\tests\segments_b"))):
> > > > >> > > > > this index is too old (version: 5.5.0). This version of
> Lucene
> > > > >> only
> > > > >> > > > > supports indexes created with release 6.0 and later.
> > > > >> > > > >
> > > > >> > > > > I want to migrate from Lucene 5.x to Lucene 9.x. Which is
> the
> > > best
> > > > >> > > > > strategy? Is there any tool to migrate the indices? Is it
> > > > >> mandatory to
> > > > >> > > > > reindex? In this case, how can I deal with this when I do
> not
> > > > >> have the
> > > > >> > > > > sources of documents that generated my current indices (I
> > > mean, I
> > > > >> just
> > > > >> > > > have
> > > > >> > > > > the indices themselves)?
> > > > >> > > > >
> > > > >> > > > > Thanks,
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Pablo Vázquez
> > > > >> > > > > (pabl...@gmail.com)
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >>
> > >
> https://urldefense.com/v3/__http://www.needhamsoftware.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuio4iIYARA$
> > > > >> > > >  (work)
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >>
> > >
> https://urldefense.com/v3/__http://www.the111shift.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuirxfFWpEQ$
> > > > >> > > >  (play)
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Pablo Vázquez
> > > > >> > (pabl...@gmail.com)
> > > > >>
> > > > >>
> ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> > > > >>
> > > > >>
> > > > >
> > > > > --
> > > > > Pablo Vázquez
> > > > > (pabl...@gmail.com)
> > > > >
> > > >
> > > >
> > > > --
> > > > Pablo Vázquez
> > > > (pabl...@gmail.com)
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> > >
> >
> > --
> > Pablo Vázquez
> > (pabl...@gmail.com)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-- 
Pablo Vázquez
(pabl...@gmail.com)

Re: Best strategy migrate indexes

Reply via email to