[
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033707#comment-16033707
]
Eshcar Hillel commented on HBASE-17339:
---------------------------------------
After some time away from this Jira, and some additional experiments and
digging into the code, here are our current understanding:
HBase already implements some optimization which makes the current suggestion
less critical. I will try to explain it in a nutshell.
As mentioned, a get operation is divided into two main steps
(1) creating and filtering all HFile scanners and memory scanners,
(2) applying the next operation which retrieves the result for the operation.
HBase defers the seek operation of the scanners as much as possible. In step
(1) all scanners are combined in a key-value heap which is sorted by the top
key of all scanners. However if there is more than one scanner, then the HFiles
scanners do not apply real seek. Instead they set the current cell to be a fake
cell which simulates as if a seek to the key was done. In cases were the key
can be found both in memory and on disk memory segments have higher timestamps,
and they reside at the top of the heap. Finally, in step (2) the store scanner
gets the result from the scanners heap. It starts querying the scanners at the
top. Only at this point if an HFile scanner is polled from the heap and no real
seek was done HBase seeks the key in the file. This seek might end up finding
the blocks in the cache or it retrieves them from disk.
In addition, in step (1) filtering HFile scanners requires reading HFile
metadata and bloom filters -- in most cases these can be found in cache.
The optimization implemented in this Jira takes a different approach by trying
to only look in memory segments as first step. When the data is found in memory
this indeed reduces latency since it avoids the need to read HFile metadata and
bloom filters and manages a bigger scanners heap, but when the data is only on
disk it incurs the overhead of scanning the data twice (memory only and then
full scan).
The question is, given this understanding is there a point in having the new
optimization, or are we satisfied with the current one?
Is there a known scenario where not all bloom filters and metadata blocks are
found in the cache?
> Scan-Memory-First Optimization for Get Operations
> -------------------------------------------------
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
> Issue Type: Improvement
> Reporter: Eshcar Hillel
> Assignee: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch,
> HBASE-17339-V03.patch, HBASE-17339-V03.patch, HBASE-17339-V04.patch,
> HBASE-17339-V05.patch, HBASE-17339-V06.patch, read-latency-mixed-workload.jpg
>
>
> The current implementation of a get operation (to retrieve values for a
> specific key) scans through all relevant stores of the region; for each store
> both memory components (memstores segments) and disk components (hfiles) are
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only
> components first and only if the result is incomplete scans both memory and
> disk.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)