[
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marvin Humphrey updated LUCENE-1476:
------------------------------------
Attachment: quasi_iterator_deletions_r2.diff
Jason,
> Incorporated Marvin's patch into SegmentTermDocs and BitVector.
I believe there's an inefficiency in my original patch. Code like this occurs
in three places:
{code}
+ if (doc >= nextDeletion) {
+ if (doc > nextDeletion)
+ nextDeletion = deletedDocs.nextSetBit(doc);
+ if (doc == nextDeletion)
+ continue;
}
{code}
While technically correct, extra work is being done. When nextSetBit() can't
find any more bits, it returns -1 (just like java.util.BitSet.nextSetBit()
does).
This causes the deletion loop to be checked on each iteration.
The solution is to test for -1, as in the new version of the patch (also
tested but not benchmarked):
{code}
+ if (doc >= nextDeletion) {
+ if (doc > nextDeletion) {
+ nextDeletion = deletedDocs.nextSetBit(doc);
+ if (nextDeletion == -1) {
+ nextDeletion = Integer.MAX_VALUE;
+ }
+ }
+ if (doc == nextDeletion) {
+ continue;
+ }
}
{code}
Theoretically, we could also change the behavior of nextSetBit() so that it
returns Integer.MAX_VALUE when it can't find any more set bits. However,
that's a little misleading (since it's a positive number and could thus
represent a true set bit), and also would break the intended API mimicry by
BitVector.nextSetBit() of java.util.BitSet.nextSetBit().
> BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
> -----------------------------------------------------------------------
>
> Key: LUCENE-1476
> URL: https://issues.apache.org/jira/browse/LUCENE-1476
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Affects Versions: 2.4
> Reporter: Jason Rutherglen
> Priority: Trivial
> Attachments: LUCENE-1476.patch, LUCENE-1476.patch,
> quasi_iterator_deletions.diff, quasi_iterator_deletions_r2.diff
>
> Original Estimate: 12h
> Remaining Estimate: 12h
>
> Update BitVector to implement DocIdSet. Expose deleted docs DocIdSet from
> IndexReader.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]