[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

Shawn Heisey (JIRA) Fri, 23 Sep 2016 12:59:29 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517418#comment-15517418
 ]


Shawn Heisey commented on SOLR-8241:
------------------------------------

It's been so long since I wrote LFUCache that I'm having a hard time 
understanding the code.

I seem to remember that I intended warming to preserve the access counter on 
each entry when it was added to the new cache ... but I can't seem to find any 
evidence that I actually implemented this.  I think I might have implemented it 
in the faster replacement that never got finished.

I can't see a way with Caffeine to preserve the hitcounter and other relevance 
information when warming a new cache, which I think means that all warmed 
entries will be as relevant as anything new that ends up in the cache, and will 
therefore likely be the first to get evicted from a freshly warmed cache that 
happens to fill up, even if those particular entries were accessed millions of 
times in previous cache instances.

If there's a way to do the following we'd be OK: "copy the top Nth key from the 
old cache, preserving access info, and then replace the value with XXX"

Even without this capability, Caffeine would probably be overall more efficient 
than LRU, assuming that there's a reasonable span of time between commits that 
open new searchers.

> Evaluate W-TinyLfu cache
> ------------------------
>
>                 Key: SOLR-8241
>                 URL: https://issues.apache.org/jira/browse/SOLR-8241
>             Project: Solr
>          Issue Type: Wish
>          Components: search
>            Reporter: Ben Manes
>            Priority: Minor
>         Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

Reply via email to