GovindBalaji-S-Glean opened a new issue, #14986:
URL: https://github.com/apache/lucene/issues/14986

   ### Description
   
   1. Each IndexSearcher has its own UsageTrackingQueryCachingPolicy that is 
shared across all segments.
   2. This caching policy uses a 256-length ring buffer to keep track of 
recently used queries.
   3. A `TermInSetQuery` with `rewriteMethod = 
MultiTermQuery.CONSTANT_SCORE_BLENDED_REWRITE` yields a RewritingWeight.
   5. Getting a scorer from this RewritingWeight for a segment could involve 
rewriting to a BooleanQuery of multiple TermQuery with only the terms present 
in the segment - ref 
`org.apache.lucene.search.AbstractMultiTermQueryConstantScoreWrapper.RewritingWeight#scorerSupplier`
   6. Thus a single TermInSetQuery will end up thrashing the ring buffer as 
multiple distinct `BooleanQuery`s from different segments.
   7. This leads to very poor caching rate for indexes with a large number of 
segments.
   
   We could verify this behavior with a new caching policy that delegates to 
`UsageTrackingQueryCachingPolicy` after logging the `onUse()` and 
`shouldCache()` calls.
   
   Is there a good reason to not have this ring buffer tracking at a per 
segment level? That would fix this issue.
   
   ### Version and environment details
   
   Lucene 9.12.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to