[jira] [Commented] (LUCENE-9038) Evaluate Caffeine for LruQueryCache

Ben Manes (Jira) Thu, 07 Nov 2019 15:51:58 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969663#comment-16969663
 ]


Ben Manes commented on LUCENE-9038:
-----------------------------------

On the train ride to work, I started to play with stubbing out an 
implementation to better understand what an implementation could look like. For 
now I'm just untangling things in my head due to lack of familiarity and not 
expecting anything to be adopted.

> We want lucene-core to be dependency-free, so we couldn't add caffeine as a 
> dependency of lucene-core. 

I am certainly fine with that and worry about it if I can offer something 
promising. In addition to the options you mentioned, we could 
[shade|https://maven.apache.org/plugins/maven-shade-plugin] / 
[shadow|https://github.com/johnrengelman/shadow] the dependency to an internal 
package name.

> One thing that is not obvious immediately and makes implementing a query 
> cache for Lucene a bit tricky is that it needs to be able to efficiently 
> evict all cache entries for a given segment.

Thank you. I was trying to understand the {{LeafCache}} and was still under the 
impression that it was unnecessary complexity. Can you explain why caching of 
segments is needed? This certainly makes it a lot harder since they grow, as 
you cache the queries at the segment level.

Is this so that when updates occur all of the related cached queries are 
invalidated, to avoid stale responses? If so, would some versioning / 
generation field be applicable to maintain a single level cache? In that model 
the generation id is part of the key, allowing a simple increment to cause all 
of the prior content to not be fetched. This is common in remote caches (e.g. 
memcached) and, if doable here, we could maintain an index to proactively 
remove those stale entries. 

> Evaluate Caffeine for LruQueryCache
> -----------------------------------
>
>                 Key: LUCENE-9038
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9038
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ben Manes
>            Priority: Major
>
> [LRUQueryCache|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java]
>  appears to play a central role in Lucene's performance. There are many 
> issues discussing its performance, such as LUCENE-7235, LUCENE-7237, 
> LUCENE-8027, LUCENE-8213, and LUCENE-9002. It appears that the cache's 
> overhead can be just as much of a benefit as a liability, causing various 
> workarounds and complexity.
> When reviewing the discussions and code, the following issues are concerning:
> # The cache is guarded by a single lock for all reads and writes.
> # All computations are performed outside of the any locking to avoid 
> penalizing other callers. This  doesn't handle the cache stampedes meaning 
> that multiple threads may cache miss, compute the value, and try to store it. 
> That redundant work becomes expensive under load and can be mitigated with ~ 
> per-key locks.
> # The cache queries the entry to see if it's even worth caching. At first 
> glance one assumes that is so that inexpensive entries don't bang on the lock 
> or thrash the LRU. However, this is also used to indicate data dependencies 
> for uncachable items (per JIRA), which perhaps shouldn't be invoking the 
> cache.
> # The cache lookup is skipped if the global lock is held and the value is 
> computed, but not stored. This means a busy lock reduces performance across 
> all usages and the cache's effectiveness degrades. This is not counted in the 
> miss rate, giving a false impression.
> # An attempt was made to perform computations asynchronously, due to their 
> heavy cost on tail latencies. That work was reverted due to test failures and 
> is being worked on.
> # An [in-progress change|https://github.com/apache/lucene-solr/pull/940] 
> tries to avoid LRU thrashing due to large, infrequently used items being 
> cached.
> # The cache is tightly intertwined with business logic, making it hard to 
> tease apart core algorithms and data structures from the usage scenarios.
> It seems that more and more items skip being cached because of concurrency 
> and hit rate performance, causing special case fixes based on knowledge of 
> the external code flows. Since the developers are experts on search, not 
> caching, it seems justified to evaluate if an off-the-shelf library would be 
> more helpful in terms of developer time, code complexity, and performance. 
> Solr has already introduced [Caffeine|https://github.com/ben-manes/caffeine] 
> in SOLR-8241 and SOLR-13817.
> The proposal is to replace the internals {{LruQueryCache}} so that external 
> usages are not affected in terms of the API. However, like in {{SolrCache}}, 
> a difference is that Caffeine only bounds by either the number of entries or 
> an accumulated size (e.g. bytes), but not both constraints. This likely is an 
> acceptable divergence in how the configuration is honored.
> cc [~ab], [~dsmiley]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9038) Evaluate Caffeine for LruQueryCache

Reply via email to