Hi All, I am working on a patch that would leverage the MergePolicy and MergeScheduler to run addIndexes(CodecReader...) triggered merges concurrently (Lucene-10216 <https://issues.apache.org/jira/browse/LUCENE-10216>, WIP-PR <https://github.com/apache/lucene/pull/633>). I had some general questions about the APIs current implementation.
At the start of the API, we trigger a flush(triggerMerge: false, applyAllDeletes: true) <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L3132>. I was wondering why we need this. My understanding is that the readers brought in by addIndexes() API would be unrelated to any pending updates or deletes. I tried removing this call, and testExistingDeletes <https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/index/TestAddIndexes.java#L1022-L1052> (). failed. This leads me to understand that we flush and applyAllDeletes, so that, if there was a pending delete by term, it does not impact incoming readers that coincidentally contained docs with the same term. Is this correct? Also, since we may still get such a delete before the API completes, and those deletes would get applied, this is likely a best effort scenario, right? On a related note, the regular merge for existing segments writes all pending DV updates before merging, but we skip this in the addIndexes API. Should we be doing this in both places? Thanks, Vigya
