[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

Michael McCandless (JIRA) Thu, 29 Jan 2009 16:30:25 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668675#action_12668675
 ]


Michael McCandless commented on LUCENE-1476:
--------------------------------------------


{quote}
The main problem we're trying to solve is potential allocation of a
large del docs BV byte array for the copy on write of a cloned
reader.
{quote}

Right, as long as normal search performance does not get worse.
Actually, I was hoping "deletes as iterator" and "deletes higher up as
filter" might give us some gains in search performance.

{quote}
An option we haven't looked at is a MultiByteArray where
multiple byte arrays make up a virtual byte array checked by BV.get.
On deleteDocument, only the byte array chunks that are changed are
replaced in the new version, while the previously copied chunks are
kept. The overhead of the BV.get can be minimal, though in our tests
with an int array version the performance can either be equal to or
double based on factors we are not aware of. 
{quote}

I think that'd be a good approach (it amortizes the copy on write
cost), though it'd be a double deref per lookup with the
straightforward impl so I think it'll hurt normal search perf too.

And I don't think we should give up on iterator access just yet... I
think we should try list-of-sorted-ints?



> BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1476
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Trivial
>         Attachments: LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, 
> LUCENE-1476.patch, LUCENE-1476.patch, quasi_iterator_deletions.diff, 
> quasi_iterator_deletions_r2.diff, quasi_iterator_deletions_r3.diff, 
> searchdeletes.alg, sortBench2.py, sortCollate2.py, TestDeletesDocIdSet.java
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Update BitVector to implement DocIdSet.  Expose deleted docs DocIdSet from 
> IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

Reply via email to