guojialiang92 commented on code in PR #13017: URL: https://github.com/apache/lucene/pull/13017#discussion_r1505452744
########## lucene/core/src/java/org/apache/lucene/index/ReadersAndUpdates.java: ########## @@ -782,13 +784,17 @@ synchronized MergePolicy.MergeReader getReaderForMerge(IOContext context) throws } SegmentReader reader = getReader(context); - if (pendingDeletes.needsRefresh(reader)) { + if (pendingDeletes.needsRefresh(reader) + || reader.getSegmentInfo().getDelGen() != pendingDeletes.info.getDelGen()) { Review Comment: I found that when both `IndexWriter#updateDocument` and `IndexWriter#softUpdateDocument` exist, the new refCount control logic may delete the `.liv` file that no longer exists. I have fixed this problem and added test cases, please code review, thank you! @stefanvodita ### Test In order to reproduce and solve the problem, I introduced `testForceMergeWithPendingHardAndSoftDeleteFile`. 1. `IndexWriter#addDocument` doc1 = {"id": "1", "version": "1"} 2. `IndexWriter#commit`, which will produce segment0 = {_0.cfe, _0.cfs, _0.si} 3. `IndexWriter#addDocument` doc2 = {"id": "2", "version": "1"} 4. `IndexWriter#addDocument` doc3 = {"id": "3", "version": "1"} 5. `IndexWriter#addDocument` doc4 = {"id": "4", "version": "1"} 6. `IndexWriter#addDocument` doc5 = {"id": "5", "version": "1"} 7. `IndexWriter#commit`, which will produce segment1 = {_1.cfe, _1.cfs, _1.si} 8. `IndexWriter#updateDocument` updates doc2 to {"id": "2", "version": "2"} 9. `IndexWriter#commit`, which will produce segment2 = {_2.cfe, _2.cfs, _2.si}, and update segment1 to {_1.cfe, _1.cfs, _1.si, _1_1.liv} 10. `IndexWriter#updateDocument` updates doc3 to {"id": "3", "version": "2"} 11. `IndexWriter#softUpdateDocument` updates doc4 to {"id": "4", "version": "2"} 12. `DirectoryReader reader = IndexWriter#getReader(true, false)`, which will produce segment3 = {_3.cfe, _3.cfs, _3.si}, and update segment1 to {_1.cfe, _1.cfs, _1.si, _1_1.liv, _1_1.fnm, _1_1_Lucene80_0.dvd, _1_1_Lucene80_0.dvm} 13. `reader.close()` 14. `IndexWriter#commit`, update segment1 to {_1.cfe, _1.cfs, _1.si, _1_2.liv, _1_1.fnm, _1_1_Lucene80_0.dvd, _1_1_Lucene80_0.dvm} 15. `IndexWriter#forceMerge(1)` ### Analyze **In step 12** Because parameter `writeAllDeletes` of method `IndexWriter#getReader` has a value of `false`, `IndexWriter#writeReaderPool`, `ReaderPool#writeAllDocValuesUpdates`, and `ReadersAndUpdates#writeFieldUpdates` will be executed in sequence. In method `ReadersAndUpdates#writeFieldUpdates`, since `ReadersAndUpdates#pendingDVUpdates` is not empty, `ReadersAndUpdates#swapNewReaderWithLatestLiveDocs` and `ReadersAndUpdates#createNewReaderWithLatestLiveDocs` will be executed in sequence. `ReadersAndUpdates#createNewReaderWithLatestLiveDocs` will generate a new `SegmentReader` based on `pendingDeletes`. The new `SegmentReader` will hold the latest `liveDocs` and `numDocs`, but the value of `SegmentReader.si.delGen `will still be 1. **In step 14** _1_2.liv will be created, _1_1.liv will be deleted, and `pendingDeletes.info.delGen` will be updated to 2. **In step 15** `ReadersAndUpdates#getReaderForMerge` will determine whether `SegmentReader` needs to be refreshed based on `pendingDeletes`. If `delGen` is not compared, `SegmentReader` will not be refreshed, and the refCount of _1_1.liv will still be +/- during the merge. When the refCount reaches 0, _1_1.liv will be deleted and `NoSuchFileException` will be thrown. ### Solution In order to solve this problem, `ReadersAndUpdates#getReaderForMerge` is necessary to add comparison logic to `delGen`. `SegmentReader` needs to be refreshed when `pendingDeletes.needsRefresh(reader) || reader.getSegmentInfo().getDelGen() != pendingDeletes.info.getDelGen()` returns `true`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org