[GitHub] [lucene] gsmiller commented on pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

GitBox Thu, 08 Sep 2022 08:13:29 -0700


gsmiller commented on PR #11738:
URL: https://github.com/apache/lucene/pull/11738#issuecomment-1240856191


   @jpountz:
   
   > It might not be a big win in practice, but it should be enough to compare 
the docFreq with the docCount (rather than maxDoc) and use this postings whose 
docFreq is equal to docCount as an iterator of matches.
   
   I like that idea. I wonder if checking for both conditions makes sense? If a 
term contains all docs in the segment, it should be more efficient to use 
`DocIdSet#all` right? (rather than iterating the actual postings). But, if a 
term doesn't contain all docs in the segment but _does_ contain all docs in the 
field (i.e., the field isn't completely dense), we could add an additional 
optimization here to use that single term's postings. Is that what you had in 
mind?
   
   Here's what I'm thinking:
   ```
             int docFreq = termsEnum.docFreq();
             if (reader.maxDoc() == docFreq) {
               return new WeightOrDocIdSet(DocIdSet.all(docFreq));
             } else if (terms.getDocCount() == docFreq) {
               TermStates termStates = new 
TermStates(searcher.getTopReaderContext());
               termStates.register(termsEnum.termState(), context.ord, docFreq, 
termsEnum.totalTermFreq());
               Query q = new ConstantScoreQuery(new TermQuery(new 
Term(query.field, term), termStates));
               Weight weight = searcher.rewrite(q).createWeight(searcher, 
scoreMode, score());
               return new WeightOrDocIdSet(weight);
             }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] gsmiller commented on pull request #11738: Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment.

Reply via email to