[ 
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marvin Humphrey updated LUCENE-1476:
------------------------------------

    Attachment: quasi_iterator_deletions_r2.diff

Jason,

> Incorporated Marvin's patch into SegmentTermDocs and BitVector.

I believe there's an inefficiency in my original patch.  Code like this occurs 
in three places:

{code}
+      if (doc >= nextDeletion) {
+        if (doc > nextDeletion) 
+          nextDeletion = deletedDocs.nextSetBit(doc);
+        if (doc == nextDeletion)
+          continue;
       }
{code}

While technically correct, extra work is being done.  When nextSetBit() can't
find any more bits, it returns -1 (just like java.util.BitSet.nextSetBit() 
does).  
This causes the deletion loop to be checked on each iteration.

The solution is to test for -1, as in the new version of the patch (also
tested but not benchmarked):

{code}
+      if (doc >= nextDeletion) {
+        if (doc > nextDeletion) {
+          nextDeletion = deletedDocs.nextSetBit(doc);
+          if (nextDeletion == -1) {
+            nextDeletion = Integer.MAX_VALUE;
+          }
+        }
+        if (doc == nextDeletion) {
+          continue;
+        }
       }
{code}

Theoretically, we could also change the behavior of nextSetBit() so that it
returns Integer.MAX_VALUE when it can't find any more set bits.  However,
that's a little misleading (since it's a positive number and could thus
represent a true set bit), and also would break the intended API mimicry by
BitVector.nextSetBit() of java.util.BitSet.nextSetBit().

> BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1476
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Trivial
>         Attachments: LUCENE-1476.patch, LUCENE-1476.patch, 
> quasi_iterator_deletions.diff, quasi_iterator_deletions_r2.diff
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Update BitVector to implement DocIdSet.  Expose deleted docs DocIdSet from 
> IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to