[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780373#comment-15780373
 ] 

Eshcar Hillel commented on HBASE-17339:
---------------------------------------

Thanks [~yangzhe1991] for your suggestion.
I agree that a server-level configuration is not appropriate. I used it only 
since this way it was easier to benchmark the optimization. Your suggestion for 
verifying memory TSs are larger than flushed TSs is also reasonable.
However I think this should be a table-level property not a CF property due to 
the current implementation.

This is how get operation is currently implemented in the region level:
1. in all relevant CFs, open all relevant scanners (both scanners of memory 
segments, and HFile scanners); this includes initiating the scanner and seeking 
the key;
2. get result as defined by the scan object.

Already in the seek step in phase 1 the operation accesses HFile blocks, which 
may have side affect on the block cache. 

We aim to change this into 
{code}
if the optimization is applicable 
 1. open all relevant  *memory* scanners
 2. get results
 ONLY if result is not complete
  3. open all scanners
  4. get results
else
 1. open all scanners
 2. get results
{code}
This way the get operation can avoid unnecessary HFile access. Also we have a 
single point where we decide which steps to execute. 
This optimization is a best-effort heuristic. Even when all TSs are generated 
by the server the operation may need to run a full scan after running a 
memory-only scan if there is a possibility that the results are not full.
The store level (CF level) only provides scanners as requested; it is not aware 
of which step in the optimization is running. 
Therefore it is reasonable to have this as a table level property.
 

> Scan-Memory-First Optimization for Get Operation
> ------------------------------------------------
>
>                 Key: HBASE-17339
>                 URL: https://issues.apache.org/jira/browse/HBASE-17339
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Eshcar Hillel
>         Attachments: HBASE-17339-V01.patch
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to