[
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217054#comment-15217054
]
Ben Manes commented on HBASE-15560:
-----------------------------------
Hi Michael,
I agree that offer a choice is an unnecessary confusion. It made it easier to
test via a flag and, if retained, could be used only transitionally to give
users a release cycle to adjust.
For the 100% case LruBlockCache's work is a hash table read and increment of a
global counter (fetch-and-add instruction). Caffeine's is the hash table read,
select a ring buffer, CAS append into it, and schedule an async drain if full
on ForkJoinPool#commonPool() (which might cause a few more context switches).
The differences were in a margin of error in LruBlockCache's favor, but a tiny
penalty seems understandable.
> TinyLFU-based BlockCache
> ------------------------
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
> Issue Type: Improvement
> Components: BlockCache
> Reporter: Ben Manes
> Attachments: tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and
> recency of the working set. It achieves concurrency by using an O( n )
> background thread to prioritize the entries and evict. Accessing an entry is
> O(1) by a hash table lookup, recording its logical access time, and setting a
> frequency flag. A write is performed in O(1) time by updating the hash table
> and triggering an async eviction thread. This provides ideal concurrency and
> minimizes the latencies by penalizing the thread instead of the caller.
> However the policy does not age the frequencies and may not be resilient to
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the
> frequency in a counting sketch, ages periodically by halving the counters,
> and orders entries by SLRU. An entry is discarded by comparing the frequency
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with
> the highest frequency. This allows the operations to be performed in O(1)
> time and, though the use of a compact sketch, a much larger history is
> retained beyond the current working set. In a variety of real world traces
> the policy had [near optimal hit
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to
> a write-ahead log. A read is recorded into a striped ring buffer and writes
> to a queue. The operations are applied in batches under a try-lock by an
> asynchronous thread, thereby track the usage pattern without incurring high
> latencies
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit
> rates) the two caches have near identical throughput and latencies with
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a
> 1-4% hit rate improvement and therefore lower latencies. The lack luster
> result is because a synthetic Zipfian distribution is used, which SLRU
> performs optimally. In a more varied, real-world workload we'd expect to see
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see
> HighScalability
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for
> evaluating this patch ([github
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)