[ 
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637069#comment-15637069
 ] 

stack commented on HBASE-15560:
-------------------------------

Here is the key to reading the diagrams from YCSB workload c (zipfian random 
reads).

There are three diagrams. Each covers same time range. Each has 6 humps, three  
without the patch, then three with the tinylfu patch. One is Gets, one is block 
cache misses, and third is block cache hits (I had to separate the latter two 
because the hits were so much in excess of the misses).

The first three humps are from loadings done against the tip of branch-1. The 
three humps are two runs where there a lot of cache misses (the data did not 
fit the cache -- total heap was 8G) with one run where hits are mostly out of 
cache (heap was 31G).

The last three humps are from loadings done against the tip of branch-1 with 
the patch backported. The three humps here are one run where lots of cache 
misses (8G heap), a run with even more cache misses (4G), and then a case where 
most all fits the heap (31G).

Sorry the two runs are not exactly symmetric.  Can fix that next time through. 
Config error on my part.

What we can see is that tinylru seems to do better when near all fits in cache. 
We can do more throughput. It even starts to rise toward the end of the test 
run but overall is running at a higher rate. My guess is that tinylfu is just 
using more of the cache than lrublockcache and perhaps its smarts are showing 
when the rate starts to rise toward the tail of the test run.

For the cases where we are missing cache, it does worse. This I cannot explain.

There is little i/o when we miss cache (we seem to be getting blocks from 
fscache). All blocks are local.  This is a single RegionServer standing on top 
of an HDFS cluster of 8 nodes.

Pointers appreciated.



> TinyLFU-based BlockCache
> ------------------------
>
>                 Key: HBASE-15560
>                 URL: https://issues.apache.org/jira/browse/HBASE-15560
>             Project: HBase
>          Issue Type: Improvement
>          Components: BlockCache
>    Affects Versions: 2.0.0
>            Reporter: Ben Manes
>            Assignee: Ben Manes
>         Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, 
> bc.hit.count, bc.miss.count, gets, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and 
> recency of the working set. It achieves concurrency by using an O( n ) 
> background thread to prioritize the entries and evict. Accessing an entry is 
> O(1) by a hash table lookup, recording its logical access time, and setting a 
> frequency flag. A write is performed in O(1) time by updating the hash table 
> and triggering an async eviction thread. This provides ideal concurrency and 
> minimizes the latencies by penalizing the thread instead of the caller. 
> However the policy does not age the frequencies and may not be resilient to 
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the 
> frequency in a counting sketch, ages periodically by halving the counters, 
> and orders entries by SLRU. An entry is discarded by comparing the frequency 
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with 
> the highest frequency. This allows the operations to be performed in O(1) 
> time and, though the use of a compact sketch, a much larger history is 
> retained beyond the current working set. In a variety of real world traces 
> the policy had [near optimal hit 
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to 
> a write-ahead log. A read is recorded into a striped ring buffer and writes 
> to a queue. The operations are applied in batches under a try-lock by an 
> asynchronous thread, thereby track the usage pattern without incurring high 
> latencies 
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit 
> rates) the two caches have near identical throughput and latencies with 
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a 
> 1-4% hit rate improvement and therefore lower latencies. The lack luster 
> result is because a synthetic Zipfian distribution is used, which SLRU 
> performs optimally. In a more varied, real-world workload we'd expect to see 
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the 
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see 
> HighScalability 
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for 
> evaluating this patch ([github 
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to