[
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943573#comment-15943573
]
Ben Manes commented on HBASE-17339:
-----------------------------------
I think its really difficult to tell, but I'd guess that there might be a small
gain.
Those 30M misses sound compulsory, meaning that they would occur regardless of
the cache size. Therefore we'd expect an unbounded cache to have 87% hit rate
at 400M accesses or 90% at 300M. If you're observing 80%, then at best there is
10% boost. If Bélády's optimal is lower then there is even less of a difference
to boost by. It could be that SLRU captures frequency well enough that both
policies are equivalent.
The [MultiQueue
paper|https://www.usenix.org/legacy/event/usenix01/full_papers/zhou/zhou.pdf]
argues that 2nd level cache access patterns are frequency skewed. The
LruBlockCache only retains if there were multiple accesses, not the counts, and
tries to evict fairly across the buckets. Since TinyLFU captures a longer tail
(freq. of items outside of the cache), there is a chance that it can make a
better prediction. But we wouldn't know without an access trace to simulate
with.
I suspect that the high hit rate means there isn't much cache pollution to
lower the hit rate, so a good enough victim is chosen. At the tail most of the
entries have a relatively similar frequency, too. It would be fun to find out,
but you probably won't think it was worth the effort.
> Scan-Memory-First Optimization for Get Operations
> -------------------------------------------------
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
> Issue Type: Improvement
> Reporter: Eshcar Hillel
> Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch,
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch,
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a
> specific key) scans through all relevant stores of the region; for each store
> both memory components (memstores segments) and disk components (hfiles) are
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only
> components first and only if the result is incomplete scans both memory and
> disk.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)