guojialiang92 commented on code in PR #13017:
URL: https://github.com/apache/lucene/pull/13017#discussion_r1505452744


##########
lucene/core/src/java/org/apache/lucene/index/ReadersAndUpdates.java:
##########
@@ -782,13 +784,17 @@ synchronized MergePolicy.MergeReader 
getReaderForMerge(IOContext context) throws
     }
 
     SegmentReader reader = getReader(context);
-    if (pendingDeletes.needsRefresh(reader)) {
+    if (pendingDeletes.needsRefresh(reader)
+        || reader.getSegmentInfo().getDelGen() != 
pendingDeletes.info.getDelGen()) {

Review Comment:
   I found that when both `IndexWriter#updateDocument` and 
`IndexWriter#softUpdateDocument` exist, the new refCount control logic may 
delete the `.liv` file that no longer exists.
   
   I have fixed this problem and added test cases, please code review, thank 
you!
   @stefanvodita 
   
   ### Test
   In order to reproduce and solve the problem, I introduced 
`testForceMergeWithPendingHardAndSoftDeleteFile`.
   
   1. `IndexWriter#addDocument` doc1 = {"id": "1", "version": "1"}
   2. `IndexWriter#commit`, which will produce segment0 = {_0.cfe, _0.cfs, 
_0.si}
   3. `IndexWriter#addDocument` doc2 = {"id": "2", "version": "1"}
   4. `IndexWriter#addDocument` doc3 = {"id": "3", "version": "1"}
   5. `IndexWriter#addDocument` doc4 = {"id": "4", "version": "1"}
   6. `IndexWriter#addDocument` doc5 = {"id": "5", "version": "1"}
   7. `IndexWriter#commit`, which will produce segment1 = {_1.cfe, _1.cfs, 
_1.si}
   8. `IndexWriter#updateDocument` updates doc2 to {"id": "2", "version": "2"}
   9. `IndexWriter#commit`, which will produce segment2 = {_2.cfe, _2.cfs, 
_2.si}, and update segment1 to {_1.cfe, _1.cfs, _1.si, _1_1.liv}
   10. `IndexWriter#updateDocument` updates doc3 to {"id": "3", "version": "2"}
   11. `IndexWriter#softUpdateDocument` updates doc4 to {"id": "4", "version": 
"2"}
   12. `DirectoryReader reader = IndexWriter#getReader(true, false)`, which 
will produce segment3 = {_3.cfe, _3.cfs, _3.si}, and update segment1 to 
{_1.cfe, _1.cfs, _1.si, _1_1.liv, _1_1.fnm, _1_1_Lucene80_0.dvd, 
_1_1_Lucene80_0.dvm}
   13. `reader.close()`
   14. `IndexWriter#commit`, update segment1 to {_1.cfe, _1.cfs, _1.si, 
_1_2.liv, _1_1.fnm, _1_1_Lucene80_0.dvd, _1_1_Lucene80_0.dvm}
   15. `IndexWriter#forceMerge(1)`
   
   ### Analyze
   **In step 12**
   Because parameter `writeAllDeletes` of method `IndexWriter#getReader` has a 
value of `false`, `IndexWriter#writeReaderPool`, 
`ReaderPool#writeAllDocValuesUpdates`, and 
`ReadersAndUpdates#writeFieldUpdates` will be executed in sequence.
   
   In method `ReadersAndUpdates#writeFieldUpdates`, since 
`ReadersAndUpdates#pendingDVUpdates` is not empty, 
`ReadersAndUpdates#swapNewReaderWithLatestLiveDocs` and 
`ReadersAndUpdates#createNewReaderWithLatestLiveDocs` will be executed in 
sequence.
   
   `ReadersAndUpdates#createNewReaderWithLatestLiveDocs` will generate a new 
`SegmentReader` based on `pendingDeletes`. The new `SegmentReader` will hold 
the latest `liveDocs` and `numDocs`, but the value of `SegmentReader.si.delGen 
`will still be 1.
   
   **In step 14**
   _1_2.liv will be created, _1_1.liv will be deleted, and 
`pendingDeletes.info.delGen` will be updated to 2. 
   
   **In step 15**
   `ReadersAndUpdates#getReaderForMerge` will determine whether `SegmentReader` 
needs to be refreshed based on `pendingDeletes`. If `delGen` is not compared, 
`SegmentReader` will not be refreshed, and the refCount of _1_1.liv will still 
be +/- during the merge. When the refCount reaches 0, _1_1.liv will be deleted 
and `NoSuchFileException` will be thrown.
   
   ### Solution
   In order to solve this problem, `ReadersAndUpdates#getReaderForMerge` is 
necessary to add comparison logic to `delGen`.
   `SegmentReader` needs to be refreshed when 
`pendingDeletes.needsRefresh(reader) || reader.getSegmentInfo().getDelGen() != 
pendingDeletes.info.getDelGen()` returns `true`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to