[
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780373#comment-15780373
]
Eshcar Hillel commented on HBASE-17339:
---------------------------------------
Thanks [~yangzhe1991] for your suggestion.
I agree that a server-level configuration is not appropriate. I used it only
since this way it was easier to benchmark the optimization. Your suggestion for
verifying memory TSs are larger than flushed TSs is also reasonable.
However I think this should be a table-level property not a CF property due to
the current implementation.
This is how get operation is currently implemented in the region level:
1. in all relevant CFs, open all relevant scanners (both scanners of memory
segments, and HFile scanners); this includes initiating the scanner and seeking
the key;
2. get result as defined by the scan object.
Already in the seek step in phase 1 the operation accesses HFile blocks, which
may have side affect on the block cache.
We aim to change this into
{code}
if the optimization is applicable
1. open all relevant *memory* scanners
2. get results
ONLY if result is not complete
3. open all scanners
4. get results
else
1. open all scanners
2. get results
{code}
This way the get operation can avoid unnecessary HFile access. Also we have a
single point where we decide which steps to execute.
This optimization is a best-effort heuristic. Even when all TSs are generated
by the server the operation may need to run a full scan after running a
memory-only scan if there is a possibility that the results are not full.
The store level (CF level) only provides scanners as requested; it is not aware
of which step in the optimization is running.
Therefore it is reasonable to have this as a table level property.
> Scan-Memory-First Optimization for Get Operation
> ------------------------------------------------
>
> Key: HBASE-17339
> URL: https://issues.apache.org/jira/browse/HBASE-17339
> Project: HBase
> Issue Type: Improvement
> Reporter: Eshcar Hillel
> Attachments: HBASE-17339-V01.patch
>
>
> The current implementation of a get operation (to retrieve values for a
> specific key) scans through all relevant stores of the region; for each store
> both memory components (memstores segments) and disk components (hfiles) are
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only
> components first and only if the result is incomplete scans both memory and
> disk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)