Re: Best strategy migrate indexes

Gus Heck Mon, 31 Oct 2022 20:05:14 -0700

You really should reindex for a 4 version jump. The index upgrader tool
explicitly prohibits what you are proposing to do with it. See
https://issues.apache.org/jira/browse/LUCENE-9127 and
https://solr.apache.org/guide/8_1/indexupgrader-tool.html (It seems that
the javadoc for IndexUpgrader maybe should be enhanced to clarify this).


A good read with lots of discussions on this is found here a few comments
deep in this issue: https://issues.apache.org/jira/browse/LUCENE-8264

Another useful thing to listen to is Erick Ericksons Activate 2019
presentation https://www.youtube.com/watch?v=eaQBH_H3d3g - near the end he
tells you how to break the rules, but the caveats are important.

Getting folks to re-index is a matter of presentation much of the time
(huge corpuses with ridiculous cost, and cases where they didn't keep their
own documents excluded of course). You need to sell the benefits to justify
the cost. If you add a feature that goes along with it, you make it look
like part of the price for the feature... people might give you something
if they get something in return, but something for nothing goes down
sideways every time.

-Gus

On Mon, Oct 31, 2022 at 11:57 AM Pablo Vázquez Blázquez <pabl...@gmail.com>
wrote:

> Hi all,
>
> Thank you all for your responses.
>
> So, when updating to a newer (major) Lucene version that modifies its
> codecs, there is no way to ensure everything keeps working properly, unless
> re-indexing, right?
>
> Apart from not having some original sources that were indexed (which I will
> try to solve by using the *IndexUpgrader *tool), I have another problem: I
> was using the org.apache.lucene.uninverting.UninvertingReader to perform
> queries against the index, mainly using the grouping api. But currently, it
> was removed (since Lucene 7.0). So, again, do I have any other alternative,
> apart from re-indexing to use docValues?
>
> To give you more context, I am a developer of a tool that multiple
> customers can use to index their data (currently, with Lucene 5.5.5). We
> are planning to upgrade to Lucene 9 (because of some vulnerabilities
> affecting Lucene 5.5.5) and I think asking them to reindex will not go down
> well :(
>
> Regards,
>
> El sáb, 29 oct 2022 a las 23:31, Matt Davis (<kryptonics...@gmail.com>)
> escribió:
>
> > Inside of Zulia search engine, the object being indexed is always a
> > JSON/BSON object and we store the BSON as a stored byte field in the
> > index.  This allows easy internal reindexing when the searchable fields
> > change but also allows us to update to the latest lucene version.
> >  Combined with using lucene-backward-codecs an older index than the
> current
> > major version can be opened and reindexed.  If you have stored all the
> > fields (or a json/bson) in the index, it would be easy to reindex in the
> > new format.  If you have not, maybe opening with lucene-backward-codecs
> > will be enough for your use case.
> >
> > Thanks,
> > Matt
> >
> > On Sat, Oct 29, 2022 at 2:30 PM Baris Kazar <baris.ka...@oracle.com>
> > wrote:
> >
> > > It is always great practice to retain non-indexed
> > > data since when Lucene changes version,
> > > even minor version, I always reindex.
> > >
> > > Best regards
> > > ________________________________
> > > From: Gus Heck <gus.h...@gmail.com>
> > > Sent: Saturday, October 29, 2022 2:17 PM
> > > To: java-user@lucene.apache.org <java-user@lucene.apache.org>
> > > Subject: Re: Best strategy migrate indexes
> > >
> > > Hi Pablo,
> > >
> > > The deafening silence is probably nobody wanting to give you the bad
> > news.
> > > You are on a mission that may not be feasible, and even if you can get
> it
> > > to "work", the end result won't likely be equivalent to indexing the
> > > original data with Lucene 9.x. The indexing process is fundamentally
> > lossy
> > > and information originally used to produce non-stored fields will have
> > been
> > > thrown out. A simple example is things like stopwords or anything
> > analyzed
> > > with subclasses of FilteringTokenFilter. If the stop word list changed,
> > or
> > > the details of one of these filters changed (bugfix?), you will end up
> > with
> > > a different result than indexing with 9.x. This is just one
> > > example, another would be stemming where the index likely only contains
> > the
> > > stem, not the whole word. Other folks who are more interested in the
> > > details of our codecs than I am can probably provide further examples
> on
> > a
> > > more fundamental level. Lucene is not a database, and the source
> > documents
> > > should always be retained in a form that can be reindexed. If you have
> > > inherited a system where source material has not been retained, you
> have
> > a
> > > difficult project and may have some potentially painful expectation
> > setting
> > > to perform.
> > >
> > > Best,
> > > Gus
> > >
> > >
> > >
> > > On Fri, Oct 28, 2022 at 8:01 AM Pablo Vázquez Blázquez <
> > pabl...@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have some indices indexed with lucene 5.5.0. I have updated my
> > > > dependencies and code to Lucene 7 (but my final goal is to use Lucene
> > 9)
> > > > and when trying to work with them I am having the exception:
> > > > org.apache.lucene.index.IndexFormatTooOldException: Format version is
> > not
> > > > supported (resource
> > > >
> > > >
> > >
> >
> BufferedChecksumIndexInput(MMapIndexInput(path=".......\tests\segments_b"))):
> > > > this index is too old (version: 5.5.0). This version of Lucene only
> > > > supports indexes created with release 6.0 and later.
> > > >
> > > > I want to migrate from Lucene 5.x to Lucene 9.x. Which is the best
> > > > strategy? Is there any tool to migrate the indices? Is it mandatory
> to
> > > > reindex? In this case, how can I deal with this when I do not have
> the
> > > > sources of documents that generated my current indices (I mean, I
> just
> > > have
> > > > the indices themselves)?
> > > >
> > > > Thanks,
> > > >
> > > > --
> > > > Pablo Vázquez
> > > > (pabl...@gmail.com)
> > > >
> > >
> > >
> > > --
> > >
> > >
> >
> https://urldefense.com/v3/__http://www.needhamsoftware.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuio4iIYARA$
> > >  (work)
> > >
> > >
> >
> https://urldefense.com/v3/__http://www.the111shift.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuirxfFWpEQ$
> > >  (play)
> > >
> >
>
>
> --
> Pablo Vázquez
> (pabl...@gmail.com)
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Re: Best strategy migrate indexes

Reply via email to