[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668630#action_12668630 ]
Jason Rutherglen commented on LUCENE-1476: ------------------------------------------ bq: shut down all extraneous processes It's a desktop machine though so it's going to have some stuff running the background, most of which I'm not aware of being a Mac newbie. bq: Actually I meant a simple sorted list of ints, but even for that I'm worried about the skipTo cost (if we use a normal binary search) Skipping is slower because it unnecessarily checks bits that are not useful to the query. A higher level deletions Filter implemented perhaps in IndexSearcher requires docs that are deleted, pass through the SegmentTermDocs doc[] cache which could add unnecessary overhead from the vint decoding. The main problem we're trying to solve is potential allocation of a large del docs BV byte array for the copy on write of a cloned reader. An option we haven't looked at is a MultiByteArray where multiple byte arrays make up a virtual byte array checked by BV.get. On deleteDocument, only the byte array chunks that are changed are replaced in the new version, while the previously copied chunks are kept. The overhead of the BV.get can be minimal, though in our tests with an int array version the performance can either be equal to or double based on factors we are not aware of. > BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs > ----------------------------------------------------------------------- > > Key: LUCENE-1476 > URL: https://issues.apache.org/jira/browse/LUCENE-1476 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.4 > Reporter: Jason Rutherglen > Priority: Trivial > Attachments: LUCENE-1476.patch, LUCENE-1476.patch, LUCENE-1476.patch, > LUCENE-1476.patch, LUCENE-1476.patch, quasi_iterator_deletions.diff, > quasi_iterator_deletions_r2.diff, searchdeletes.alg, sortBench2.py, > sortCollate2.py, TestDeletesDocIdSet.java > > Original Estimate: 12h > Remaining Estimate: 12h > > Update BitVector to implement DocIdSet. Expose deleted docs DocIdSet from > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org