[
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036219#comment-16036219
]
Edward Bortnikov commented on HBASE-17339:
------------------------------------------
Thanks [~eshcar]. Maybe it makes sense to describe the experiment we used to
figure out the current implementation, to provide the community with the full
picture (smile).
We looked at a workload with temporal (rather than spatial) locality, namely
writes closely followed by reads. This pattern is quite frequent in pub-sub
scenarios. Instead of seeing a performance benefit in reading from MemStore
first, we saw nearly 100% cache hit rate, and could not explain it for a while.
The lazy evaluation procedure described by [~eshcar] sheds the light.
Obviously, explicitly prioritizing reading from MemStore first rather than
simply deferring the data fetch from disk could help avoid some access to Bloom
filters, just to figure out whether the key has earlier versions on disk. Those
accesses could be avoided. The main practical impact is when the BF itself is
not in memory, and accessing it triggers I/O. Is that a realistic scenario? We
assume that normally, BF's are permanently cached for all HFile's managed by
the RS.
Dear community - please speak up. Thanks.
> Scan-Memory-First Optimization for Get Operations
> -------------------------------------------------
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
> Issue Type: Improvement
> Reporter: Eshcar Hillel
> Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch,
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch,
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a
> specific key) scans through all relevant stores of the region; for each store
> both memory components (memstores segments) and disk components (hfiles) are
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only
> components first and only if the result is incomplete scans both memory and
> disk.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)