> Consider the following sequence of events... an index with 2 segments (seg1 and seg2) originally created in Lucene 8.x. ==> Upgrade to 9.x ==> index few documents and commit ==> seg3 gets created with version 9.x, but merge doesn't kick in ==> documents in seg1 and seg2 get deleted followed by commit.==> You are left with seg3 in 9.x but indexCreatedVersionMajor as 8.x ==> Upgrade to Lucene 10.x fails.
Thanks for the explanation. I am wondering if this is something that you commonly encounter, seems like a bit of an edge case? Regarding scenario 1, deleting the entire index and recreating it is generally faster and less resource intensive instead of deleting all the documents. Most systems built on top of Lucene like Solr, OpenSearch, Elasticsearch expose delete API for collection/index, and users just delete and recreate the index. Probably, one of the reasons it hasn't come up much before. Will let other community members chime in on this. On Sat, Apr 19, 2025 at 7:43 PM Rahul Goswami <rahul196...@gmail.com> wrote: > For complete clarity..."minVersion" for a SegmentInfo is the min of the > minVersions of all segments involved in the merge which resulted in this > segment. If it is a "pure" segment, then minVersion=version. > > On Sat, Apr 19, 2025 at 10:35 PM Rahul Goswami <rahul196...@gmail.com> > wrote: > >> Ankit, >> "I guess the SegmentInfo "minVersion" is the min across all segments >> during the merge process?" >> > That is correct >> >> I am wondering if there is any way to end up in the 2nd scenario, without >> having deleted all the documents first? >> > Consider the following sequence of events... >> an index with 2 segments (seg1 and seg2) originally created in Lucene >> 8.x. ==> Upgrade to 9.x ==> index few documents and commit ==> seg3 gets >> created with version 9.x, but merge doesn't kick in ==> documents in seg1 >> and seg2 get deleted followed by commit.==> You are left with seg3 in 9.x >> but indexCreatedVersionMajor as 8.x ==> Upgrade to Lucene 10.x fails. >> >> -Rahul >> >> On Sat, Apr 19, 2025 at 1:01 PM Ankit Jain <jain.ank...@gmail.com> wrote: >> >>> Hi Rahul, >>> >>> Thanks for starting this interesting discussion. I was initially >>> thinking that this API potentially allows upgrading >>> "indexCreatedVersionMajor" via the merge process after rewriting all the >>> segments, but I guess the SegmentInfo "minVersion" is the min across all >>> segments during the merge process? >>> >>> So, I am wondering if there is any way to end up in the 2nd scenario, >>> without having deleted all the documents first? >>> >>> >>> Thanks >>> Ankit >>> >>> On Sat, Apr 19, 2025 at 9:17 AM Rahul Goswami <rahul196...@gmail.com> >>> wrote: >>> >>>> Hello, >>>> Today even after all documents in an index are deleted via an API call, >>>> reindexing still doesn't change the "indexCreatedVersionMajor" property >>>> value in SegmentInfos. Hence even after complete reindexing, an upgrade >>>> path X--> X+1 --> X+2 is still not possible as we end up with an >>>> IndexFormatTooOldException. >>>> >>>> Requesting an API (on IndexWriter?) which can reset this property (upon >>>> a new commit) to the current Lucene version if: >>>> 1) No more live docs present >>>> OR >>>> 2) If all SegmentInfo in the index have a "minVersion" AND "version" >>>> stamp of the latest version , but SegmentInfos has an older >>>> "indexCreatedVersionMajor". >>>> >>>> This will help users a LOT since they can now interact with the index >>>> purely via API without needing manual deletion and also help open up a >>>> legitimate path to upgrade when an index doesn't HAVE to be repopulated >>>> from the source. >>>> >>>> If there is agreement, I am happy to pick this up and submit a PR. >>>> >>>> Thanks, >>>> Rahul Goswami >>>> >>>> >>>>