The error you got BufferedChecksumIndexInput(MMapIndexInput(path="tests_small_index-7.x-migrator\segments_1"))): 9 (needs to be between 6 and 7)
indicates that the index you are reading was written by Lucene 9, so things are not set up the way you described (writing using Lucene 7) > Thanks TX for your response. > > I would check that the Luke version matches the Lucene version - if > > the two match, it shouldn't be possible to get issues like this. > > That is, the precise versions of Lucene each is using. > > > Yes, I am using https://github.com/DmitryKey/luke/releases/tag/luke-7.1.0 > > It works ok with my new generated indexes, but it does not with the > "migrated" ones. > > El lun, 7 nov 2022 a las 12:18, Trejkaz (<trej...@trypticon.org>) escribió: > > > The process itself sounds like it should work (it's basically a > > reindex so it should be safer than trying to migrate directly.) > > > > I would check that the Luke version matches the Lucene version - if > > the two match, it shouldn't be possible to get issues like this. > > That is, the precise versions of Lucene each is using. > > > > TX > > > > > > On Mon, 7 Nov 2022 at 22:09, Pablo Vázquez Blázquez <pabl...@gmail.com> > > wrote: > > > > > > Hi! > > > > > > > I am trying to create a tool to read docs from a lucene5 index and > > > generate lucene9 documents from them (with docValues). That might work, > > > right? I am shading both lucene5 and lucene9 to avoid package conflicts. > > > > > > I am doing the following steps: > > > > > > - create IndexReader with lucene5 package over a lucene5 index > > > - create IndexWriter with lucene7 package > > > - iterate over reader.numDocs() to process each Document (lucene5) > > > - convert each Document (lucene5) to lucene7 Document > > > - for each IndexableField (lucene5) from Document (lucene5) > > convert > > > it to create an IndexableField (lucene7) > > > - create a SortedDocValuesField (lucene7) and add it to the > > > Document (lucene7) > > > - add the field to the Document (lucene7) > > > - add each converted Document to the writer > > > - close IndexReader and IndexWriter > > > > > > When I open the resulting migrated lucene7 index with Luke I got an > > error: > > > org.apache.lucene.index.IndexFormatTooNewException: Format version is not > > > supported (resource > > > > > BufferedChecksumIndexInput(MMapIndexInput(path="tests_small_index-7.x-migrator\segments_1"))): > > > 9 (needs to be between 6 and 7) > > > > > > When I use the tool "luceneupgrader > > > <https://github.com/hakanai/luceneupgrader>", I got: > > > java -jar luceneupgrader-0.5.2-SNAPSHOT.jar info > > > tests_small_index-7.x-migrator > > > Lucene index version: 7 > > > > > > What am I doing wrong or misleading? > > > > > > Thanks! > > > > > > El mié, 2 nov 2022 a las 21:13, Pablo Vázquez Blázquez (< > > pabl...@gmail.com>) > > > escribió: > > > > > > > Hi, > > > > > > > > Luckily we were already using lucenemigrator > > > > > > > > > > > > What do you mean with "lucenemigrator"? Is it a public tool? > > > > > > > > I am trying to create a tool to read docs from a lucene5 index and > > > > generate lucene9 documents from them (with docValues). That might work, > > > > right? I am shading both lucene5 and lucene9 to avoid package > > conflicts. > > > > > > > > Thanks! > > > > > > > > El mar, 1 nov 2022 a las 0:35, Trejkaz (<trej...@trypticon.org>) > > escribió: > > > > > > > >> Well... > > > >> > > > >> There's a way, but I wouldn't necessarily recommend it. > > > >> > > > >> You can write custom migration code against some version of Lucene > > > >> which supports doc values, to create doc values fields. It's going to > > > >> involve writing a FilterCodecReader which wraps your real index and > > > >> then pretends to also have doc values, which you'll build in a custom > > > >> class which works similarly to UninvertingReader. Then you pass those > > > >> CodecReaders to IndexWriter.addIndexes to create a new index which > > > >> really has those doc values. > > > >> > > > >> We did that ourselves when we had the same issue. The only painful > > > >> thing about it is having to keep around older versions of lucene to do > > > >> that migration. Forever. Luckily we were already using lucenemigrator, > > > >> which has the older versions baked into it with package prefixes. So > > > >> that library will get fatter and fatter over time but at least our own > > > >> code only gets fatter at the rate migrations are added. > > > >> > > > >> The same approach works for any other kind of ad-hoc migration you > > > >> might want to perform. e.g., you might want to create points. Or > > > >> remove an index for a field. Or add an index for a field. > > > >> > > > >> TX > > > >> > > > >> > > > >> On Tue, 1 Nov 2022 at 02:57, Pablo Vázquez Blázquez < > > pabl...@gmail.com> > > > >> wrote: > > > >> > > > > >> > Hi all, > > > >> > > > > >> > Thank you all for your responses. > > > >> > > > > >> > So, when updating to a newer (major) Lucene version that modifies > > its > > > >> > codecs, there is no way to ensure everything keeps working properly, > > > >> unless > > > >> > re-indexing, right? > > > >> > > > > >> > Apart from not having some original sources that were indexed > > (which I > > > >> will > > > >> > try to solve by using the *IndexUpgrader *tool), I have another > > > >> problem: I > > > >> > was using the org.apache.lucene.uninverting.UninvertingReader to > > perform > > > >> > queries against the index, mainly using the grouping api. But > > > >> currently, it > > > >> > was removed (since Lucene 7.0). So, again, do I have any other > > > >> alternative, > > > >> > apart from re-indexing to use docValues? > > > >> > > > > >> > To give you more context, I am a developer of a tool that multiple > > > >> > customers can use to index their data (currently, with Lucene > > 5.5.5). We > > > >> > are planning to upgrade to Lucene 9 (because of some vulnerabilities > > > >> > affecting Lucene 5.5.5) and I think asking them to reindex will not > > go > > > >> down > > > >> > well :( > > > >> > > > > >> > Regards, > > > >> > > > > >> > El sáb, 29 oct 2022 a las 23:31, Matt Davis (< > > kryptonics...@gmail.com>) > > > >> > escribió: > > > >> > > > > >> > > Inside of Zulia search engine, the object being indexed is always > > a > > > >> > > JSON/BSON object and we store the BSON as a stored byte field in > > the > > > >> > > index. This allows easy internal reindexing when the searchable > > > >> fields > > > >> > > change but also allows us to update to the latest lucene version. > > > >> > > Combined with using lucene-backward-codecs an older index than > > the > > > >> current > > > >> > > major version can be opened and reindexed. If you have stored > > all the > > > >> > > fields (or a json/bson) in the index, it would be easy to reindex > > in > > > >> the > > > >> > > new format. If you have not, maybe opening with > > > >> lucene-backward-codecs > > > >> > > will be enough for your use case. > > > >> > > > > > >> > > Thanks, > > > >> > > Matt > > > >> > > > > > >> > > On Sat, Oct 29, 2022 at 2:30 PM Baris Kazar < > > baris.ka...@oracle.com> > > > >> > > wrote: > > > >> > > > > > >> > > > It is always great practice to retain non-indexed > > > >> > > > data since when Lucene changes version, > > > >> > > > even minor version, I always reindex. > > > >> > > > > > > >> > > > Best regards > > > >> > > > ________________________________ > > > >> > > > From: Gus Heck <gus.h...@gmail.com> > > > >> > > > Sent: Saturday, October 29, 2022 2:17 PM > > > >> > > > To: java-user@lucene.apache.org <java-user@lucene.apache.org> > > > >> > > > Subject: Re: Best strategy migrate indexes > > > >> > > > > > > >> > > > Hi Pablo, > > > >> > > > > > > >> > > > The deafening silence is probably nobody wanting to give you > > the bad > > > >> > > news. > > > >> > > > You are on a mission that may not be feasible, and even if you > > can > > > >> get it > > > >> > > > to "work", the end result won't likely be equivalent to > > indexing the > > > >> > > > original data with Lucene 9.x. The indexing process is > > fundamentally > > > >> > > lossy > > > >> > > > and information originally used to produce non-stored fields > > will > > > >> have > > > >> > > been > > > >> > > > thrown out. A simple example is things like stopwords or > > anything > > > >> > > analyzed > > > >> > > > with subclasses of FilteringTokenFilter. If the stop word list > > > >> changed, > > > >> > > or > > > >> > > > the details of one of these filters changed (bugfix?), you will > > end > > > >> up > > > >> > > with > > > >> > > > a different result than indexing with 9.x. This is just one > > > >> > > > example, another would be stemming where the index likely only > > > >> contains > > > >> > > the > > > >> > > > stem, not the whole word. Other folks who are more interested > > in the > > > >> > > > details of our codecs than I am can probably provide further > > > >> examples on > > > >> > > a > > > >> > > > more fundamental level. Lucene is not a database, and the source > > > >> > > documents > > > >> > > > should always be retained in a form that can be reindexed. If > > you > > > >> have > > > >> > > > inherited a system where source material has not been retained, > > you > > > >> have > > > >> > > a > > > >> > > > difficult project and may have some potentially painful > > expectation > > > >> > > setting > > > >> > > > to perform. > > > >> > > > > > > >> > > > Best, > > > >> > > > Gus > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > On Fri, Oct 28, 2022 at 8:01 AM Pablo Vázquez Blázquez < > > > >> > > pabl...@gmail.com> > > > >> > > > wrote: > > > >> > > > > > > >> > > > > Hi all, > > > >> > > > > > > > >> > > > > I have some indices indexed with lucene 5.5.0. I have updated > > my > > > >> > > > > dependencies and code to Lucene 7 (but my final goal is to use > > > >> Lucene > > > >> > > 9) > > > >> > > > > and when trying to work with them I am having the exception: > > > >> > > > > org.apache.lucene.index.IndexFormatTooOldException: Format > > > >> version is > > > >> > > not > > > >> > > > > supported (resource > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > BufferedChecksumIndexInput(MMapIndexInput(path=".......\tests\segments_b"))): > > > >> > > > > this index is too old (version: 5.5.0). This version of Lucene > > > >> only > > > >> > > > > supports indexes created with release 6.0 and later. > > > >> > > > > > > > >> > > > > I want to migrate from Lucene 5.x to Lucene 9.x. Which is the > > best > > > >> > > > > strategy? Is there any tool to migrate the indices? Is it > > > >> mandatory to > > > >> > > > > reindex? In this case, how can I deal with this when I do not > > > >> have the > > > >> > > > > sources of documents that generated my current indices (I > > mean, I > > > >> just > > > >> > > > have > > > >> > > > > the indices themselves)? > > > >> > > > > > > > >> > > > > Thanks, > > > >> > > > > > > > >> > > > > -- > > > >> > > > > Pablo Vázquez > > > >> > > > > (pabl...@gmail.com) > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > -- > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > https://urldefense.com/v3/__http://www.needhamsoftware.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuio4iIYARA$ > > > >> > > > (work) > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > https://urldefense.com/v3/__http://www.the111shift.com__;!!ACWV5N9M2RV99hQ!PVR-c0gAs5FpIrnotHWeo3sEWScxV8oFJrVpGdItGZictcDbRvnp5aZSqCRhglMCYqQsewQOuirxfFWpEQ$ > > > >> > > > (play) > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > -- > > > >> > Pablo Vázquez > > > >> > (pabl...@gmail.com) > > > >> > > > >> --------------------------------------------------------------------- > > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > >> > > > >> > > > > > > > > -- > > > > Pablo Vázquez > > > > (pabl...@gmail.com) > > > > > > > > > > > > > -- > > > Pablo Vázquez > > > (pabl...@gmail.com) > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > -- > Pablo Vázquez > (pabl...@gmail.com) --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org