Adrien Grand commented on LUCENE-9038:

These sound like good ideas to me.

> Evaluate Caffeine for LruQueryCache
> -----------------------------------
>                 Key: LUCENE-9038
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9038
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ben Manes
>            Priority: Major
>         Attachments: CaffeineQueryCache.java
> [LRUQueryCache|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java]
>  appears to play a central role in Lucene's performance. There are many 
> issues discussing its performance, such as LUCENE-7235, LUCENE-7237, 
> LUCENE-8027, LUCENE-8213, and LUCENE-9002. It appears that the cache's 
> overhead can be just as much of a benefit as a liability, causing various 
> workarounds and complexity.
> When reviewing the discussions and code, the following issues are concerning:
> # The cache is guarded by a single lock for all reads and writes.
> # All computations are performed outside of the any locking to avoid 
> penalizing other callers. This  doesn't handle the cache stampedes meaning 
> that multiple threads may cache miss, compute the value, and try to store it. 
> That redundant work becomes expensive under load and can be mitigated with ~ 
> per-key locks.
> # The cache queries the entry to see if it's even worth caching. At first 
> glance one assumes that is so that inexpensive entries don't bang on the lock 
> or thrash the LRU. However, this is also used to indicate data dependencies 
> for uncachable items (per JIRA), which perhaps shouldn't be invoking the 
> cache.
> # The cache lookup is skipped if the global lock is held and the value is 
> computed, but not stored. This means a busy lock reduces performance across 
> all usages and the cache's effectiveness degrades. This is not counted in the 
> miss rate, giving a false impression.
> # An attempt was made to perform computations asynchronously, due to their 
> heavy cost on tail latencies. That work was reverted due to test failures and 
> is being worked on.
> # An [in-progress change|https://github.com/apache/lucene-solr/pull/940] 
> tries to avoid LRU thrashing due to large, infrequently used items being 
> cached.
> # The cache is tightly intertwined with business logic, making it hard to 
> tease apart core algorithms and data structures from the usage scenarios.
> It seems that more and more items skip being cached because of concurrency 
> and hit rate performance, causing special case fixes based on knowledge of 
> the external code flows. Since the developers are experts on search, not 
> caching, it seems justified to evaluate if an off-the-shelf library would be 
> more helpful in terms of developer time, code complexity, and performance. 
> Solr has already introduced [Caffeine|https://github.com/ben-manes/caffeine] 
> in SOLR-8241 and SOLR-13817.
> The proposal is to replace the internals {{LruQueryCache}} so that external 
> usages are not affected in terms of the API. However, like in {{SolrCache}}, 
> a difference is that Caffeine only bounds by either the number of entries or 
> an accumulated size (e.g. bytes), but not both constraints. This likely is an 
> acceptable divergence in how the configuration is honored.
> cc [~ab], [~dsmiley]

This message was sent by Atlassian Jira

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to