[ 
https://issues.apache.org/jira/browse/HBASE-17339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839458#comment-15839458
 ] 

Eshcar Hillel commented on HBASE-17339:
---------------------------------------

The attached patch is not complete and not properly tested and so may have some 
bugs (but it is compiling :) ).
I'm posting it to get feedback on the core logic.
The main property needed for this optimization is monotonicity. A store 
preserves *monotonicity* if all timestamps in its memstore are strictly greater 
than all timestamps in its store files.

The algorithm is as follows
{code}
0. decide if we should apply optimization: (1) flag is on (2) get operation 
over a specific set of columns
if decided to apply optimization then
 1. open all relevant *memory* scanners; 
     while opening scanners collect max flushed timestamps in all stores (first 
collect); 
     a null timestamp indicates the store does not maintain monotonicity
 2. if all stores are monotonic then 
        2.1 get results
        2.2 validate monotonicity: validate max flushed timestamps have not 
changed in all stores 
           (double-collect ensures results are taken from a consistent view) 
if decided not to apply optimization 
   *OR* stores are not monotonic 
   *OR* decided to apply optimization but results do not satisfy get operation 
(not enough versions per column) 
then
 3. open all scanners
 4. get results
{code}

Missing parts (TODOs)
- properly init maxFlushedTimestamp (in AbstractMemStore)  when recovering -- 
need to traverse all existing store files
- make memoryScanOptimization a table property instead of global property; set 
to true by default
- (Optional) add a flag in Get operation which indicates if the user wants to 
apply the optimization (per each operation!); set to true by default
- (Optional) check if we can change the implementation of getScanners in 
XXXMemstore to return multiple scanners so we can later filter out each one of 
them and not either keep all or eliminate all. Currently the implementation 
(both in default and compacting) returns a singleton list with one 
MemStoreScanner which comprises one to few segment scanners.


> Scan-Memory-First Optimization for Get Operations
> -------------------------------------------------
>
>                 Key: HBASE-17339
>                 URL: https://issues.apache.org/jira/browse/HBASE-17339
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Eshcar Hillel
>         Attachments: HBASE-17339-V01.patch, HBASE-17339-V02.patch
>
>
> The current implementation of a get operation (to retrieve values for a 
> specific key) scans through all relevant stores of the region; for each store 
> both memory components (memstores segments) and disk components (hfiles) are 
> scanned in parallel.
> We suggest to apply an optimization that speculatively scans memory-only 
> components first and only if the result is incomplete scans both memory and 
> disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to