[ 
https://issues.apache.org/jira/browse/LUCENE-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448726#comment-17448726
 ] 

Adrien Grand commented on LUCENE-10235:
---------------------------------------

I agree that it is a bit unintuitive, but I feel like it's more due to the 
nature of this cache - which is quite untypical since it waits for queries to 
be used frequently before putting them into the cache - than to the fact that 
the miss count is incorrectly reported since every miss count maps to a "get" 
into the cache?

For this cache it's probably more interesting to compare the hit count with the 
number of times we put something into the cache 
(LRUQueryCache#onDocIdSetCache). Am I getting it right that you are thinking of 
`ignored` as being `missCount - docIdSetCacheCount`, ie. the number of times 
that we didn't find an entry in the cache and yet did not decide to create a 
cache entry?


> LRUQueryCache should not count never-cacheable queries as a miss
> ----------------------------------------------------------------
>
>                 Key: LUCENE-10235
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10235
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Yannick Welsch
>            Priority: Minor
>
> Hit and miss counts of a cache are typically used to check how effective a 
> caching layer is. While looking at a system that exhibited a very high miss 
> to hit ratio, I took a closer look at Lucene's LRUQueryCache and noticed that 
> it's treating the handling of queries as a miss that it would never ever even 
> think about caching in the first place. (e.g. TermQuery and others mentioned 
> in UsageTrackingQueryCachingPolicy.shouldNeverCache).
> The reason these are counted as a miss is that LRUQueryCache (scorerSupplier 
> and bulkScorer methods) first does a lookup on the cache, incrementing hit or 
> miss counters, and upon miss, only then checks QueryCachingPolicy.shouldCache 
> to decide whether that query should be put into the cache.
> This issue is made more complex by the fact that 
> QueryCachingPolicy.shouldCache is a stateful method, and cacheability of a 
> query can change over time (e.g. after appearing N times).
> I'm opening this issue to discuss whether others also feel that the current 
> way of accounting misses is unintuitive / confusing. I would also like to put 
> forward a proposal to:
>  * generalize the boolean QueryCachingPolicy.shouldCache method to return an 
> enum instead (one of YES, NOT_RIGHT_NOW, NEVER), and only account queries 
> that are (eventually) cacheable and not in the cache as a miss,
>  * optionally introduce another metric for queries that are never cacheable, 
> e.g. "ignored", and
>  * optionally refine miss count into a count for items that are cacheable 
> right away, and those that will eventually be cacheable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to