rahulgoswami commented on code in PR #14607:
URL: https://github.com/apache/lucene/pull/14607#discussion_r2509109896


##########
lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java:
##########
@@ -402,11 +389,38 @@ private static void parseSegmentInfos(
     }
 
     long totalDocs = 0;
+
     for (int seg = 0; seg < numSegments; seg++) {
       String segName = input.readString();
       byte[] segmentID = new byte[StringHelper.ID_LENGTH];
       input.readBytes(segmentID, 0, segmentID.length);
-      Codec codec = readCodec(input);
+      Codec codec = null;
+      try {
+        codec = readCodec(input);

Review Comment:
   Please disregard my previous comment since my understanding of the 
VERSION_74 and VERSION_86 variables was incorrect. They are just versions 
assigned to the "segments" codec, and the variable names just represent the 
Lucene version in which the "segments" file version was assigned. I now fully 
understand what's happening and why the assertion fails when you changed to 
   
   `format = CodecUtil.checkHeaderNoMagic(input, "segments", VERSION_86, 
VERSION_CURRENT);`
   
   Moving to VERSION_86 means that CodecUtil.checkHeaderNoMagic() now throws 
IndexFormatTooOldException for indexes before Lucene 8.6.0. The test fails the 
assertion for version 8.0.0 on the line you mentioned since it now finds 
indexStatus.clean=false. That is because the below call in  
checker.checkIndex() fails:
   
   ` infos = SegmentInfos.readCommit(dir, fileName, 0);`
   
   And the reason it passes for >=8.6.0 is because the last param (min 
supported version) is 0, which means it proceeds to do  an individual segment 
check down the line, which passes because 8.x codecs are shipped and available 
in class path! 
   **Essentially today if someone just removes the indexCreatedVersionMajor 
check in SegmentInfos, they should be able to read indexes created in 8.x even 
with Lucene 11 (!!)** 
   
   By similar reasoning with VERSION_74, the test was only failing for indexes 
with version 7.4 to < 8.0.0 previously because the codec is not present. Maybe 
the PR in its current form is a reasonable solution (since 
IllegalArgumentException would only appear for indexes belonging to these 
specific versions, which are truly old anyway)? 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to