[
https://issues.apache.org/jira/browse/LUCENE-10235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448726#comment-17448726
]
Adrien Grand commented on LUCENE-10235:
---------------------------------------
I agree that it is a bit unintuitive, but I feel like it's more due to the
nature of this cache - which is quite untypical since it waits for queries to
be used frequently before putting them into the cache - than to the fact that
the miss count is incorrectly reported since every miss count maps to a "get"
into the cache?
For this cache it's probably more interesting to compare the hit count with the
number of times we put something into the cache
(LRUQueryCache#onDocIdSetCache). Am I getting it right that you are thinking of
`ignored` as being `missCount - docIdSetCacheCount`, ie. the number of times
that we didn't find an entry in the cache and yet did not decide to create a
cache entry?
> LRUQueryCache should not count never-cacheable queries as a miss
> ----------------------------------------------------------------
>
> Key: LUCENE-10235
> URL: https://issues.apache.org/jira/browse/LUCENE-10235
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Yannick Welsch
> Priority: Minor
>
> Hit and miss counts of a cache are typically used to check how effective a
> caching layer is. While looking at a system that exhibited a very high miss
> to hit ratio, I took a closer look at Lucene's LRUQueryCache and noticed that
> it's treating the handling of queries as a miss that it would never ever even
> think about caching in the first place. (e.g. TermQuery and others mentioned
> in UsageTrackingQueryCachingPolicy.shouldNeverCache).
> The reason these are counted as a miss is that LRUQueryCache (scorerSupplier
> and bulkScorer methods) first does a lookup on the cache, incrementing hit or
> miss counters, and upon miss, only then checks QueryCachingPolicy.shouldCache
> to decide whether that query should be put into the cache.
> This issue is made more complex by the fact that
> QueryCachingPolicy.shouldCache is a stateful method, and cacheability of a
> query can change over time (e.g. after appearing N times).
> I'm opening this issue to discuss whether others also feel that the current
> way of accounting misses is unintuitive / confusing. I would also like to put
> forward a proposal to:
> * generalize the boolean QueryCachingPolicy.shouldCache method to return an
> enum instead (one of YES, NOT_RIGHT_NOW, NEVER), and only account queries
> that are (eventually) cacheable and not in the cache as a miss,
> * optionally introduce another metric for queries that are never cacheable,
> e.g. "ignored", and
> * optionally refine miss count into a count for items that are cacheable
> right away, and those that will eventually be cacheable.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]