Ben Manes commented on ACCUMULO-4177:

I agree the numbers are too close to judge and falls within the margin of 
error. The Lru cache is quite good by not suffering lock contention, delegating 
the penalties to a background thread, and being segmented to capture basic 
frequencies. The YCSB Zipf benchmarks are ideal for it, as the policy can offer 
a perfect hit rate and concurrency. Caffeine can do similar with a small 
additional overhead due to spreading out the maintenance work for more 
flexibility and to avoid O(n) operations.

So we can't argue improved concurrency or an improved hit rate (which reduces 
latencies) for the Zipf workloads. Instead we can claim to be on par and that 
there is little to no degredation. The gain should come in an improved hit rate 
for real-world workloads, which can be quite different than synthetic 
distributions. This might require evaluating on a live cluster, unfortunately. 
It might be interesting to capture real cluster traces feed that through YCSB 
if we wanted a more robust, repeatable comparison.

Thanks for the help on this. You can add me with no org (since this is a hobby 
project) on PST.

> TinyLFU-based BlockCache
> ------------------------
>                 Key: ACCUMULO-4177
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4177
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Ben Manes
>            Assignee: Ben Manes
>             Fix For: 2.0.0
>         Attachments: ACCUMULO-4177.patch
>          Time Spent: 10m
>  Remaining Estimate: 0h
> [LruBlockCache|https://github.com/apache/accumulo/blob/master/core/src/main/java/org/apache/accumulo/core/file/blockfile/cache/LruBlockCache.java]
>  appears to be based on HBase's. I currently have a patch being reviewed in 
> [HBASE-15560|https://issues.apache.org/jira/browse/HBASE-15560] that replaces 
> the pseudo Segmented LRU with the TinyLFU eviction policy. That should allow 
> the cache to make [better 
> predictions|https://github.com/ben-manes/caffeine/wiki/Efficiency] based on 
> frequency and recency, such as improved scan resistance. The implementation 
> uses [Caffeine|https://github.com/ben-manes/caffeine], the successor to 
> Guava's cache, to provide concurrency and keep the patch small.
> Full details are in the JIRA ticket. I think it should be easy to port if 
> there is interest.

This message was sent by Atlassian JIRA

Reply via email to