[ 
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668583#action_12668583
 ] 

Michael McCandless commented on LUCENE-1476:
--------------------------------------------


Actually I made one mistake running your standalone test -- I had
allowed the "createIndex" to run more than once, and so I think I had
tested 30K docs with 1875 deletes (6.25%).

I just removed the index and recreated it, so I have 15K docs and 1875
deletes (12.5%).  On the mac pro I now see the patch at 4.0% slower
(4672 ms to 4859 ms), and on a Debian Linux box (kernel 2.6.22.1, java
1.5.0_08-b03) I see it 0.8% slower (7298 ms to 7357 ms).

bq. The Mac can be somewhat unreliable for performance results 

I've actually found it to be quite reliable.  What I love most about
it is, as long as you shut down all extraneous processes, it gives
very repeatable results.  I haven't found the same true (or, less so)
of various Linux's & Windows.

bq. OpenBitSet didn't seem to make much of a difference

This is very hard to believe -- the nextSetBit impl in BitVector (in
the patch) is extremely inefficient.  OpenBitSet's impl ought to be
much faster.

{quote}
The other option is something like P4Delta which stores the doc
ids in a compressed form solely for iterating.
{quote}

I think that will be too costly here (but is a good fit for
postings).

bq. Is this what you mean by sparse representation?

Actually I meant a simple sorted list of ints, but even for that I'm
worried about the skipTo cost (if we use a normal binary search).  I'm
not sure it can be made fast enough (ie faster than random access
we have today).


> BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1476
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Trivial
>         Attachments: LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, 
> LUCENE-1476.patch, LUCENE-1476.patch, quasi_iterator_deletions.diff, 
> quasi_iterator_deletions_r2.diff, searchdeletes.alg, sortBench2.py, 
> sortCollate2.py, TestDeletesDocIdSet.java
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Update BitVector to implement DocIdSet.  Expose deleted docs DocIdSet from 
> IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to