[ 
https://issues.apache.org/jira/browse/HBASE-9488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764485#comment-13764485
 ] 

Dave Latham commented on HBASE-9488:
------------------------------------

Brainstorming on other approaches to do more automatically instead of adding to 
the client api:
Looks like this patch goes for 2 optimizations for small scans - using pread, 
and reducing RPCs.

For preads we could go with HBASE-7266 and turn them on for all scans.  Another 
possibility would be to use a pread for the first batch of results, and then 
only if a scanner remains open for more batches flip to seek+read.

For reducing RPCs as [~nkeywal] points out we should return the first set of 
results for all scans (Is 0.96 already doing this?  0.94 doesn't).  Then if the 
stopRow is hit or a filter indicates completion we should also be able to close 
the scanner eagerly (again, I think 0.96 does this).  That would mean that 
scans which are small from the range (startRow to stopRow) or filters would 
complete in a single RPC.  However some small scans may be short due to other 
conditions.  For example, the META prefetch scan uses a fixed limit of rows.  
Perhaps adding a row limit to a scan would be a more clear API?

With these together it seems that short scans would automatically result in 
just a pread and a single RPC without having to add to the API.

All that said, I don't want to rain on working code.  Could we conceivably 
start with this patch then remove the method later if we get the above?  What 
do other people think about the ideas?
                
> Improve performance for small scan
> ----------------------------------
>
>                 Key: HBASE-9488
>                 URL: https://issues.apache.org/jira/browse/HBASE-9488
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, Performance, Scanners
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.98.0, 0.94.13
>
>         Attachments: hbase-9488-94-v3.patch, HBASE-9488-trunk.patch, 
> HBASE-9488-trunkV2.patch, HBASE-9488-trunkV3.patch, HBASE-9488-trunkV4.patch, 
> HBASE-9488-trunkV4.patch, mergeRpcCallForScan.patch, test results.jpg
>
>
> review board:
> https://reviews.apache.org/r/14059/
> *Performance Improvement*
> Test shows about 1.5~3X improvement for small scan where limit<=50 under 
> cache hit ratio=100%.
> See more performance test result from the picture attachment
> *Usage:*
> Scan scan = new Scan(startRow,stopRow);
> scan.setSmall(true);
> ResultScanner scanner = table.getScanner(scan);
> Set the new 'small' attribute as true for scan object, others are the same
> Now, one scan operation would call 3 RPC at least:
> openScanner();
> next();
> closeScanner();
> I think we could reduce the RPC call to one for small scan to get better 
> performance
> Also using pread is better than seek+read for small scan (For this point, see 
> more on HBASE-7266)
> Implements such a small scan as the patch, and take the performance test as 
> following:
> a.Environment:
> patched on 0.94 version
> one regionserver; 
> one client with 50 concurrent threads;
> KV size:50/100;
> 100% LRU cache hit ratio;
> Random start row of scan
> b.Results:
> See the picture attachment
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to