Ben Manes created HIVE-14983:
Summary: Evaluate TinyLFU cache
Issue Type: Improvement
Reporter: Ben Manes
The ORC low-level cache is either FIFO or LRFU, with the latter being the
default. Least-Recently-Freq-Used is an O(lg n) policy that tries to recency
with frequency for the working set. It uses a heap for frequency ordering and a
linked list for recency ordering.
[TinyLFU|https://arxiv.org/pdf/1512.00727.pdf] is an O(1) policy that uses a
sketch to probabilistically estimate an entry's frequency. Instead of focusing
on eviction, the policy filters out low-value entries from entering the cache.
It retains a larger history than the working set by retaining the frequency in
a compact counter array (e.g. [4-bit CountMin
Simulations show that a small LRU window + TinyLFU + SLRU main cache has the
If the project is ready to adopt Java 8 then it could use
[Caffeine|https://github.com/ben-manes/caffeine], the successor to Guava's
cache. It provides improved
[concurrency|https://github.com/ben-manes/caffeine/wiki/Benchmarks] in addition
to the higher hit rate.
But I think extracting the project's CountMin Sketch for a custom
implementation would be a win. The result would be simpler code and an improved
This message was sent by Atlassian JIRA