jpountz commented on PR #13036: URL: https://github.com/apache/lucene/pull/13036#issuecomment-1914327764
This is a great speedup on `CountOrHighMed`! Too bad it's not faster all the time, though I'm not too surprised as conjunctions have more overhead than disjunctions when all clauses have a high cost. As a first step, maybe we can have a simple heuristic to only enable this optimization when it's almost guaranteed to yield a speedup? I'm not sure what makes the most sense, maybe a threshold on the minimum count across both clauses, and only enabling the optimization below this threshold. You'll probably need to play with various disjunctions to figure out a threshold that works. One inefficiency that your PR introduces is that it requires more lookups in terms dictionaries. We could avoid this by caching the `TermState` for each term query, which you could do in your `rewriteTwoClauseDisjunctionWithTermsForCount()` utility method: if the TermQuery has a null `TermQuery#getTermStates()`, then you could rewrite it to a `TermQuery` that has a non-null `TermStates` object. And maybe as a follow-up we could look again into the old idea of using bitsets to evaluate dense conjunctions, just like `BooleanScorer` does for disjunctions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org