[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

Marvin Humphrey (JIRA) Thu, 08 Jan 2009 07:19:32 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661995#action_12661995
 ]


Marvin Humphrey commented on LUCENE-1476:
-----------------------------------------

Mike McCandless:

> For a TermQuery (one term) the cost of the two approaches should be
> the same.

It'll be close, but I don't think that's quite true.  TermScorer pre-fetches
document numbers in batches from the TermDocs object.  At present, only
non-deleted doc nums get cached.  If we move the deletions filtering up, then
we'd increase traffic through that cache.  However, filling it would be
slightly cheaper, because we wouldn't be performing the deletions check.

In theory.  I'm not sure there's a way to streamline away that deletions check
in TermDocs and maintain backwards compatibility.  And while this is a fun
brainstorm, I'm still far from convinced that having TermDocs.next() and
Scorer.next() return deleted docs by default is a good idea.

> For AND (and other) queries I'm not sure. In theory, having to
> process more docIDs is more costly, eg a PhraseQuery or SpanXXXQuery
> may see much higher net cost.

If you were applying deletions filtering after Scorer.next(), then it seems
likely that costs would go up because of extra hit processing.  However, if
you use Scorer.skipTo() to jump past deletions, as in the loop I provided
above, then PhraseScorer etc. shouldn't incur any more costs themselves.

> a costly per-docID search
> with a very restrictive filter could be far more efficient if you
> applied the Filter earlier in the chain.

Under the skipTo() loop, I think the filter effectively *does* get applied
earlier in the chain.  Does that make sense?

I think the potential performance downside comes down to prefetching in
TermScorer, unless there are other classes that do similar prefetching.




> BitVector implement DocIdSet
> ----------------------------
>
>                 Key: LUCENE-1476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1476
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Trivial
>         Attachments: LUCENE-1476.patch
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> BitVector can implement DocIdSet.  This is for making 
> SegmentReader.deletedDocs pluggable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

Reply via email to