[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marvin Humphrey updated LUCENE-1476: ------------------------------------ Attachment: quasi_iterator_deletions_r2.diff Jason, > Incorporated Marvin's patch into SegmentTermDocs and BitVector. I believe there's an inefficiency in my original patch. Code like this occurs in three places: {code} + if (doc >= nextDeletion) { + if (doc > nextDeletion) + nextDeletion = deletedDocs.nextSetBit(doc); + if (doc == nextDeletion) + continue; } {code} While technically correct, extra work is being done. When nextSetBit() can't find any more bits, it returns -1 (just like java.util.BitSet.nextSetBit() does). This causes the deletion loop to be checked on each iteration. The solution is to test for -1, as in the new version of the patch (also tested but not benchmarked): {code} + if (doc >= nextDeletion) { + if (doc > nextDeletion) { + nextDeletion = deletedDocs.nextSetBit(doc); + if (nextDeletion == -1) { + nextDeletion = Integer.MAX_VALUE; + } + } + if (doc == nextDeletion) { + continue; + } } {code} Theoretically, we could also change the behavior of nextSetBit() so that it returns Integer.MAX_VALUE when it can't find any more set bits. However, that's a little misleading (since it's a positive number and could thus represent a true set bit), and also would break the intended API mimicry by BitVector.nextSetBit() of java.util.BitSet.nextSetBit(). > BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs > ----------------------------------------------------------------------- > > Key: LUCENE-1476 > URL: https://issues.apache.org/jira/browse/LUCENE-1476 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.4 > Reporter: Jason Rutherglen > Priority: Trivial > Attachments: LUCENE-1476.patch, LUCENE-1476.patch, > quasi_iterator_deletions.diff, quasi_iterator_deletions_r2.diff > > Original Estimate: 12h > Remaining Estimate: 12h > > Update BitVector to implement DocIdSet. Expose deleted docs DocIdSet from > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org