[ 
https://issues.apache.org/jira/browse/HIVE-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Manes updated HIVE-14983:
-----------------------------
    Description: 
The ORC low-level cache is either FIFO or LRFU, with the latter being the 
default. Least-Recently-Freq-Used is an O(lg n) policy that tries to balance 
recency with frequency for the working set. It uses a heap for frequency 
ordering and a linked list for recency ordering.

[TinyLFU|https://arxiv.org/pdf/1512.00727.pdf] is an O(1) policy that uses a 
sketch to probabilistically estimate an entry's frequency. Instead of focusing 
on eviction, the policy filters out low-value entries from entering the cache. 
It retains a larger history than the working set by retaining the frequency in 
a compact counter array (e.g. [4-bit CountMin 
Sketch|https://github.com/ben-manes/caffeine/blob/master/caffeine/src/main/java/com/github/benmanes/caffeine/cache/FrequencySketch.java]).
 Simulations show that a small LRU window + TinyLFU + SLRU main cache has the 
[best performance|https://github.com/ben-manes/caffeine/wiki/Efficiency].

If the project is ready to adopt Java 8 then it could use 
[Caffeine|https://github.com/ben-manes/caffeine], the successor to Guava's 
cache. It provides improved 
[concurrency|https://github.com/ben-manes/caffeine/wiki/Benchmarks] in addition 
to the higher hit rate.

But I think extracting the project's CountMin Sketch for a custom 
implementation would be a win. The result would be simpler code and an improved 
hit rates.

  was:
The ORC low-level cache is either FIFO or LRFU, with the latter being the 
default. Least-Recently-Freq-Used is an O(lg n) policy that tries to recency 
with frequency for the working set. It uses a heap for frequency ordering and a 
linked list for recency ordering.

[TinyLFU|https://arxiv.org/pdf/1512.00727.pdf] is an O(1) policy that uses a 
sketch to probabilistically estimate an entry's frequency. Instead of focusing 
on eviction, the policy filters out low-value entries from entering the cache. 
It retains a larger history than the working set by retaining the frequency in 
a compact counter array (e.g. [4-bit CountMin 
Sketch|https://github.com/ben-manes/caffeine/blob/master/caffeine/src/main/java/com/github/benmanes/caffeine/cache/FrequencySketch.java]).
 Simulations show that a small LRU window + TinyLFU + SLRU main cache has the 
[best performance|https://github.com/ben-manes/caffeine/wiki/Efficiency].

If the project is ready to adopt Java 8 then it could use 
[Caffeine|https://github.com/ben-manes/caffeine], the successor to Guava's 
cache. It provides improved 
[concurrency|https://github.com/ben-manes/caffeine/wiki/Benchmarks] in addition 
to the higher hit rate.

But I think extracting the project's CountMin Sketch for a custom 
implementation would be a win. The result would be simpler code and an improved 
hit rates.


> Evaluate TinyLFU cache
> ----------------------
>
>                 Key: HIVE-14983
>                 URL: https://issues.apache.org/jira/browse/HIVE-14983
>             Project: Hive
>          Issue Type: Improvement
>          Components: llap
>            Reporter: Ben Manes
>
> The ORC low-level cache is either FIFO or LRFU, with the latter being the 
> default. Least-Recently-Freq-Used is an O(lg n) policy that tries to balance 
> recency with frequency for the working set. It uses a heap for frequency 
> ordering and a linked list for recency ordering.
> [TinyLFU|https://arxiv.org/pdf/1512.00727.pdf] is an O(1) policy that uses a 
> sketch to probabilistically estimate an entry's frequency. Instead of 
> focusing on eviction, the policy filters out low-value entries from entering 
> the cache. It retains a larger history than the working set by retaining the 
> frequency in a compact counter array (e.g. [4-bit CountMin 
> Sketch|https://github.com/ben-manes/caffeine/blob/master/caffeine/src/main/java/com/github/benmanes/caffeine/cache/FrequencySketch.java]).
>  Simulations show that a small LRU window + TinyLFU + SLRU main cache has the 
> [best performance|https://github.com/ben-manes/caffeine/wiki/Efficiency].
> If the project is ready to adopt Java 8 then it could use 
> [Caffeine|https://github.com/ben-manes/caffeine], the successor to Guava's 
> cache. It provides improved 
> [concurrency|https://github.com/ben-manes/caffeine/wiki/Benchmarks] in 
> addition to the higher hit rate.
> But I think extracting the project's CountMin Sketch for a custom 
> implementation would be a win. The result would be simpler code and an 
> improved hit rates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to