Copilot commented on code in PR #13506:
URL: https://github.com/apache/lucene/pull/13506#discussion_r3270753953
##########
lucene/core/src/java/org/apache/lucene/index/FreqProxTermsWriter.java:
##########
@@ -59,23 +59,29 @@ private void applyDeletes(SegmentWriteState state, Fields
fields) throws IOExcep
FrozenBufferedUpdates.TermDocsIterator iterator =
new FrozenBufferedUpdates.TermDocsIterator(fields, true);
+ // Since we have already applied delete-by-docID, liveDocs maybe not
null.
+ if (state.liveDocs == null) {
+ state.liveDocs = new FixedBitSet(state.segmentInfo.maxDoc());
+ state.liveDocs.set(0, state.segmentInfo.maxDoc());
+ }
Review Comment:
The new code eagerly allocates and fully initializes `state.liveDocs`
whenever `deleteTerms` is non-empty, even if none of the delete terms match any
postings (e.g., docIdUpto==0 or term not present). This can be significantly
more expensive than the previous per-doc null check for segments with many docs
but no effective deletions. Consider lazily allocating liveDocs only after
confirming at least one doc will be cleared (eg, by peeking the first postings
doc for a term and only then initializing liveDocs once per term/once overall).
##########
lucene/core/src/java/org/apache/lucene/index/FreqProxTermsWriter.java:
##########
@@ -59,23 +59,29 @@ private void applyDeletes(SegmentWriteState state, Fields
fields) throws IOExcep
FrozenBufferedUpdates.TermDocsIterator iterator =
new FrozenBufferedUpdates.TermDocsIterator(fields, true);
+ // Since we have already applied delete-by-docID, liveDocs maybe not
null.
+ if (state.liveDocs == null) {
+ state.liveDocs = new FixedBitSet(state.segmentInfo.maxDoc());
+ state.liveDocs.set(0, state.segmentInfo.maxDoc());
+ }
+
segDeletes.forEachOrdered(
(term, docId) -> {
DocIdSetIterator postings = iterator.nextTerm(term.field(),
term.bytes());
if (postings != null) {
assert docId < PostingsEnum.NO_MORE_DOCS;
int doc;
while ((doc = postings.nextDoc()) < docId) {
- if (state.liveDocs == null) {
- state.liveDocs = new FixedBitSet(state.segmentInfo.maxDoc());
- state.liveDocs.set(0, state.segmentInfo.maxDoc());
- }
if (state.liveDocs.getAndClear(doc)) {
state.delCountOnFlush++;
}
}
}
});
+ // Set to null, if there is no delete.
Review Comment:
The comment “Set to null, if there is no delete.” is
misleading/grammatically incorrect (it’s checking `delCountOnFlush`, i.e., no
hard deletes applied during flush). Consider rewording to “Set to null if no
hard deletes were applied during flush.”
##########
lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java:
##########
@@ -208,6 +209,56 @@ static void addDoc(IndexWriter writer) throws IOException {
writer.addDocument(doc);
}
+ public void testLiveDocs() throws IOException {
+ Directory dir = newDirectory();
+ IndexWriter writer =
+ new IndexWriter(
+ dir,
+ new IndexWriterConfig(new MockAnalyzer(random()))
+ .setMergePolicy(NoMergePolicy.INSTANCE));
+
+ Document doc;
+ doc = new Document();
+ doc.add(new KeywordField("_id", "1", Field.Store.NO));
+ doc.add(new KeywordField("version", "1", Field.Store.NO));
+ writer.addDocument(doc);
+
+ doc = new Document();
+ doc.add(new KeywordField("_id", "1", Field.Store.NO));
+ doc.add(new KeywordField("version", "2", Field.Store.NO));
+ writer.updateDocument(new Term("_id", "1"), doc);
+
+ doc = new Document();
+ doc.add(new KeywordField("foo", "1", Field.Store.NO));
+ doc.add(new KeywordField("version", "1", Field.Store.NO));
+ writer.addDocument(doc);
+
+ doc = new Document();
+ doc.add(new KeywordField("foo", "1", Field.Store.NO));
+ doc.add(new KeywordField("version", "2", Field.Store.NO));
+ writer.updateDocument(new Term("foo", "1"), doc);
+ writer.flush(); // delCountOnFlush > 0, liveDocs != null.
+
+ doc = new Document();
+ doc.add(new KeywordField("_id", "2", Field.Store.NO));
+ doc.add(new KeywordField("version", "1", Field.Store.NO));
+ writer.updateDocument(new Term("_id", "3"), doc);
+
+ doc = new Document();
+ doc.add(new KeywordField("foo", "2", Field.Store.NO));
+ doc.add(new KeywordField("version", "1", Field.Store.NO));
+ writer.updateDocument(new Term("foo", "3"), doc);
+ writer.flush(); // delCountOnFlush is 0, liveDocs is null.
+
+ DirectoryReader reader = writer.getReader(true, false);
+
+ assertEquals(2, ((StandardDirectoryReader)
reader).getSegmentInfos().info(0).getDelCount());
+ assertEquals(0, ((StandardDirectoryReader)
reader).getSegmentInfos().info(1).getDelCount());
Review Comment:
This test assumes there are exactly 2 segments and directly indexes
`info(0)` / `info(1)`, which will throw an `IndexOutOfBoundsException` if the
flush behavior changes (making failures harder to diagnose). Consider asserting
the segment count first (or iterating over `getSegmentInfos()` and asserting
the expected delCounts) to produce clearer failures.
##########
lucene/core/src/java/org/apache/lucene/index/FreqProxTermsWriter.java:
##########
@@ -59,23 +59,29 @@ private void applyDeletes(SegmentWriteState state, Fields
fields) throws IOExcep
FrozenBufferedUpdates.TermDocsIterator iterator =
new FrozenBufferedUpdates.TermDocsIterator(fields, true);
+ // Since we have already applied delete-by-docID, liveDocs maybe not
null.
Review Comment:
Minor grammar: “liveDocs maybe not null” should be “liveDocs may not be
null” (or similar).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]