[
https://issues.apache.org/jira/browse/HBASE-15560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15638231#comment-15638231
]
Ben Manes commented on HBASE-15560:
-----------------------------------
I think my first step is to get an access trace (log the key's hash) so that I
can run it through my simulator. Then I can verify that the policy behaves well
when weights are ignored.
The next step would be to re-run it but assigning weights. Do these vary in
your test? I think YCSB had static so they were the same per entry. If not,
then the argument would be a potential bug when trying to be weight-aware when
promoting between the window/probation/protected regions. Right now the
simulator is not weight-aware, so I'd hack a quick test based on the trace data.
If all of those are good, then we need to go line-by-line to see where the two
caches differ. It could be a miscalculation of an entry's weight or other
subtle bugs. We'll know more if we can isolate it down to a small, repeatable
test case to debug with.
> TinyLFU-based BlockCache
> ------------------------
>
> Key: HBASE-15560
> URL: https://issues.apache.org/jira/browse/HBASE-15560
> Project: HBase
> Issue Type: Improvement
> Components: BlockCache
> Affects Versions: 2.0.0
> Reporter: Ben Manes
> Assignee: Ben Manes
> Attachments: HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch,
> HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch, HBASE-15560.patch,
> bc.hit.count, bc.miss.count, branch-1.tinylfu.txt, gets, tinylfu.patch
>
>
> LruBlockCache uses the Segmented LRU (SLRU) policy to capture frequency and
> recency of the working set. It achieves concurrency by using an O( n )
> background thread to prioritize the entries and evict. Accessing an entry is
> O(1) by a hash table lookup, recording its logical access time, and setting a
> frequency flag. A write is performed in O(1) time by updating the hash table
> and triggering an async eviction thread. This provides ideal concurrency and
> minimizes the latencies by penalizing the thread instead of the caller.
> However the policy does not age the frequencies and may not be resilient to
> various workload patterns.
> W-TinyLFU ([research paper|http://arxiv.org/pdf/1512.00727.pdf]) records the
> frequency in a counting sketch, ages periodically by halving the counters,
> and orders entries by SLRU. An entry is discarded by comparing the frequency
> of the new arrival (candidate) to the SLRU's victim, and keeping the one with
> the highest frequency. This allows the operations to be performed in O(1)
> time and, though the use of a compact sketch, a much larger history is
> retained beyond the current working set. In a variety of real world traces
> the policy had [near optimal hit
> rates|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> Concurrency is achieved by buffering and replaying the operations, similar to
> a write-ahead log. A read is recorded into a striped ring buffer and writes
> to a queue. The operations are applied in batches under a try-lock by an
> asynchronous thread, thereby track the usage pattern without incurring high
> latencies
> ([benchmarks|https://github.com/ben-manes/caffeine/wiki/Benchmarks#server-class]).
> In YCSB benchmarks the results were inconclusive. For a large cache (99% hit
> rates) the two caches have near identical throughput and latencies with
> LruBlockCache narrowly winning. At medium and small caches, TinyLFU had a
> 1-4% hit rate improvement and therefore lower latencies. The lack luster
> result is because a synthetic Zipfian distribution is used, which SLRU
> performs optimally. In a more varied, real-world workload we'd expect to see
> improvements by being able to make smarter predictions.
> The provided patch implements BlockCache using the
> [Caffeine|https://github.com/ben-manes/caffeine] caching library (see
> HighScalability
> [article|http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html]).
> Edward Bortnikov and Eshcar Hillel have graciously provided guidance for
> evaluating this patch ([github
> branch|https://github.com/ben-manes/hbase/tree/tinylfu]).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)