[ 
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662100#action_12662100
 ] 

Marvin Humphrey commented on LUCENE-1476:
-----------------------------------------

Mike McCandless:

> I'm also curious what cost you see of doing the merge sort for every
> search; I think it could be uncomfortably high since it's so
> hard-for-cpu-to-predict-branch-intensive. 

Probably true.  You're going to get accelerating degradation as the number of
deletions increases.  In a large index, you could end up merging 20, 30
streams.  Based on how the priority queue in ORScorer tends to take up space
in profiling data, that might not be good.

It'd be manageable if you can keep your index reasonably in good shape, but 
you'll 
be suckin' pondwater if it gets flabby.

> We could take the first search that doesn't use skipTo and save the result
> of the merge sort, essentially doing an in-RAM-only "merge" of those
> deletes, and let subsequent searches use that single merged stream. 

That was what I had in mind when proposing the pseudo-iterator model.

{code}
class TombStoneDelEnum extends DelEnum {
  int nextDeletion(int docNum) {
    while (currentMax < docNum) { nextInternal(); }
    return bits.nextSetBit(docNum);
  }
  // ...
}
{code}

> (This is not MMAP friendly, though).

Yeah.  Ironically, that use of tombstones is more compatible with the Lucene
model. :-)

I'd be reluctant to have Lucy/KS realize those large BitVectors in per-object 
process RAM.  That'd spoil the "cheap wrapper around system i/o cache" 
IndexReader plan.

I can't see an answer yet.  But the one thing I do know is that Lucy/KS needs
a pluggable deletions mechanism to make experimentation easier -- so that's
what I'm working on today.

> BitVector implement DocIdSet
> ----------------------------
>
>                 Key: LUCENE-1476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1476
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Trivial
>         Attachments: LUCENE-1476.patch
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> BitVector can implement DocIdSet.  This is for making 
> SegmentReader.deletedDocs pluggable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to