[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661995#action_12661995 ]
Marvin Humphrey commented on LUCENE-1476: ----------------------------------------- Mike McCandless: > For a TermQuery (one term) the cost of the two approaches should be > the same. It'll be close, but I don't think that's quite true. TermScorer pre-fetches document numbers in batches from the TermDocs object. At present, only non-deleted doc nums get cached. If we move the deletions filtering up, then we'd increase traffic through that cache. However, filling it would be slightly cheaper, because we wouldn't be performing the deletions check. In theory. I'm not sure there's a way to streamline away that deletions check in TermDocs and maintain backwards compatibility. And while this is a fun brainstorm, I'm still far from convinced that having TermDocs.next() and Scorer.next() return deleted docs by default is a good idea. > For AND (and other) queries I'm not sure. In theory, having to > process more docIDs is more costly, eg a PhraseQuery or SpanXXXQuery > may see much higher net cost. If you were applying deletions filtering after Scorer.next(), then it seems likely that costs would go up because of extra hit processing. However, if you use Scorer.skipTo() to jump past deletions, as in the loop I provided above, then PhraseScorer etc. shouldn't incur any more costs themselves. > a costly per-docID search > with a very restrictive filter could be far more efficient if you > applied the Filter earlier in the chain. Under the skipTo() loop, I think the filter effectively *does* get applied earlier in the chain. Does that make sense? I think the potential performance downside comes down to prefetching in TermScorer, unless there are other classes that do similar prefetching. > BitVector implement DocIdSet > ---------------------------- > > Key: LUCENE-1476 > URL: https://issues.apache.org/jira/browse/LUCENE-1476 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.4 > Reporter: Jason Rutherglen > Priority: Trivial > Attachments: LUCENE-1476.patch > > Original Estimate: 12h > Remaining Estimate: 12h > > BitVector can implement DocIdSet. This is for making > SegmentReader.deletedDocs pluggable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org