[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036219#comment-16036219
 ] 

Edward Bortnikov commented on HBASE-17339:
------------------------------------------

Thanks [~eshcar]. Maybe it makes sense to describe the experiment we used to 
figure out the current implementation, to provide the community with the full 
picture (smile). 

We looked at a workload with temporal (rather than spatial) locality, namely 
writes closely followed by reads. This pattern is quite frequent in pub-sub 
scenarios. Instead of seeing a performance benefit in reading from MemStore 
first, we saw nearly 100% cache hit rate, and could not explain it for a while. 
The lazy evaluation procedure described by [~eshcar] sheds the light. 

Obviously, explicitly prioritizing reading from MemStore first rather than 
simply deferring the data fetch from disk could help avoid some access to Bloom 
filters, just to figure out whether the key has earlier versions on disk. Those 
accesses could be avoided. The main practical impact is when the BF itself is 
not in memory, and accessing it triggers I/O. Is that a realistic scenario? We 
assume that normally, BF's are permanently cached for all HFile's managed by 
the RS. 

Dear community - please speak up. Thanks. 

> Scan-Memory-First Optimization for Get Operations
> -------------------------------------------------
>
>                 Key: HBASE-17339
>                 URL: https://issues.apache.org/jira/browse/HBASE-17339
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>         Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch, 
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch, 
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to