gsmiller commented on PR #11738:
URL: https://github.com/apache/lucene/pull/11738#issuecomment-1240856191
@jpountz:
> It might not be a big win in practice, but it should be enough to compare
the docFreq with the docCount (rather than maxDoc) and use this postings whose
docFreq is equal to docCount as an iterator of matches.
I like that idea. I wonder if checking for both conditions makes sense? If a
term contains all docs in the segment, it should be more efficient to use
`DocIdSet#all` right? (rather than iterating the actual postings). But, if a
term doesn't contain all docs in the segment but _does_ contain all docs in the
field (i.e., the field isn't completely dense), we could add an additional
optimization here to use that single term's postings. Is that what you had in
mind?
Here's what I'm thinking:
```
int docFreq = termsEnum.docFreq();
if (reader.maxDoc() == docFreq) {
return new WeightOrDocIdSet(DocIdSet.all(docFreq));
} else if (terms.getDocCount() == docFreq) {
TermStates termStates = new
TermStates(searcher.getTopReaderContext());
termStates.register(termsEnum.termState(), context.ord, docFreq,
termsEnum.totalTermFreq());
Query q = new ConstantScoreQuery(new TermQuery(new
Term(query.field, term), termStates));
Weight weight = searcher.rewrite(q).createWeight(searcher,
scoreMode, score());
return new WeightOrDocIdSet(weight);
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]