[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668583#action_12668583 ]
Michael McCandless commented on LUCENE-1476: -------------------------------------------- Actually I made one mistake running your standalone test -- I had allowed the "createIndex" to run more than once, and so I think I had tested 30K docs with 1875 deletes (6.25%). I just removed the index and recreated it, so I have 15K docs and 1875 deletes (12.5%). On the mac pro I now see the patch at 4.0% slower (4672 ms to 4859 ms), and on a Debian Linux box (kernel 2.6.22.1, java 1.5.0_08-b03) I see it 0.8% slower (7298 ms to 7357 ms). bq. The Mac can be somewhat unreliable for performance results I've actually found it to be quite reliable. What I love most about it is, as long as you shut down all extraneous processes, it gives very repeatable results. I haven't found the same true (or, less so) of various Linux's & Windows. bq. OpenBitSet didn't seem to make much of a difference This is very hard to believe -- the nextSetBit impl in BitVector (in the patch) is extremely inefficient. OpenBitSet's impl ought to be much faster. {quote} The other option is something like P4Delta which stores the doc ids in a compressed form solely for iterating. {quote} I think that will be too costly here (but is a good fit for postings). bq. Is this what you mean by sparse representation? Actually I meant a simple sorted list of ints, but even for that I'm worried about the skipTo cost (if we use a normal binary search). I'm not sure it can be made fast enough (ie faster than random access we have today). > BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs > ----------------------------------------------------------------------- > > Key: LUCENE-1476 > URL: https://issues.apache.org/jira/browse/LUCENE-1476 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.4 > Reporter: Jason Rutherglen > Priority: Trivial > Attachments: LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, > LUCENE-1476.patch, LUCENE-1476.patch, quasi_iterator_deletions.diff, > quasi_iterator_deletions_r2.diff, searchdeletes.alg, sortBench2.py, > sortCollate2.py, TestDeletesDocIdSet.java > > Original Estimate: 12h > Remaining Estimate: 12h > > Update BitVector to implement DocIdSet. Expose deleted docs DocIdSet from > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org