[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

Ben Manes (JIRA) Mon, 27 Mar 2017 09:40:15 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943573#comment-15943573
 ]


Ben Manes commented on HBASE-17339:
-----------------------------------

I think its really difficult to tell, but I'd guess that there might be a small 
gain.

Those 30M misses sound compulsory, meaning that they would occur regardless of 
the cache size. Therefore we'd expect an unbounded cache to have 87% hit rate 
at 400M accesses or 90% at 300M. If you're observing 80%, then at best there is 
10% boost. If Bélády's optimal is lower then there is even less of a difference 
to boost by. It could be that SLRU captures frequency well enough that both 
policies are equivalent.

The [MultiQueue 
paper|https://www.usenix.org/legacy/event/usenix01/full_papers/zhou/zhou.pdf] 
argues that 2nd level cache access patterns are frequency skewed. The 
LruBlockCache only retains if there were multiple accesses, not the counts, and 
tries to evict fairly across the buckets. Since TinyLFU captures a longer tail 
(freq. of items outside of the cache), there is a chance that it can make a 
better prediction. But we wouldn't know without an access trace to simulate 
with.

I suspect that the high hit rate means there isn't much cache pollution to 
lower the hit rate, so a good enough victim is chosen. At the tail most of the 
entries have a relatively similar frequency, too. It would be fun to find out, 
but you probably won't think it was worth the effort.

> Scan-Memory-First Optimization for Get Operations
> -------------------------------------------------
>
>                 Key: HBASE-17339
>                 URL: https://issues.apache.org/jira/browse/HBASE-17339
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>         Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, 
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17339) Scan-Memory-First Optimization for Get Operations

Reply via email to