[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

Jason Rutherglen (JIRA) Thu, 29 Jan 2009 14:10:21 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668630#action_12668630
 ]


Jason Rutherglen commented on LUCENE-1476:
------------------------------------------

bq: shut down all extraneous processes

It's a desktop machine though so it's going to have some stuff
running the background, most of which I'm not aware of being a Mac
newbie.

bq: Actually I meant a simple sorted list of ints, but even for that
I'm worried about the skipTo cost (if we use a normal binary search)

Skipping is slower because it unnecessarily checks bits that are not
useful to the query. A higher level deletions Filter implemented
perhaps in IndexSearcher requires docs that are deleted, pass through
the SegmentTermDocs doc[] cache which could add unnecessary overhead
from the vint decoding. 

The main problem we're trying to solve is potential allocation of a
large del docs BV byte array for the copy on write of a cloned
reader. An option we haven't looked at is a MultiByteArray where
multiple byte arrays make up a virtual byte array checked by BV.get.
On deleteDocument, only the byte array chunks that are changed are
replaced in the new version, while the previously copied chunks are
kept. The overhead of the BV.get can be minimal, though in our tests
with an int array version the performance can either be equal to or
double based on factors we are not aware of. 

> BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1476
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Trivial
>         Attachments: LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, 
> LUCENE-1476.patch, LUCENE-1476.patch, quasi_iterator_deletions.diff, 
> quasi_iterator_deletions_r2.diff, searchdeletes.alg, sortBench2.py, 
> sortCollate2.py, TestDeletesDocIdSet.java
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Update BitVector to implement DocIdSet.  Expose deleted docs DocIdSet from 
> IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

Reply via email to