[
https://issues.apache.org/jira/browse/HBASE-9488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764485#comment-13764485
]
Dave Latham commented on HBASE-9488:
------------------------------------
Brainstorming on other approaches to do more automatically instead of adding to
the client api:
Looks like this patch goes for 2 optimizations for small scans - using pread,
and reducing RPCs.
For preads we could go with HBASE-7266 and turn them on for all scans. Another
possibility would be to use a pread for the first batch of results, and then
only if a scanner remains open for more batches flip to seek+read.
For reducing RPCs as [~nkeywal] points out we should return the first set of
results for all scans (Is 0.96 already doing this? 0.94 doesn't). Then if the
stopRow is hit or a filter indicates completion we should also be able to close
the scanner eagerly (again, I think 0.96 does this). That would mean that
scans which are small from the range (startRow to stopRow) or filters would
complete in a single RPC. However some small scans may be short due to other
conditions. For example, the META prefetch scan uses a fixed limit of rows.
Perhaps adding a row limit to a scan would be a more clear API?
With these together it seems that short scans would automatically result in
just a pread and a single RPC without having to add to the API.
All that said, I don't want to rain on working code. Could we conceivably
start with this patch then remove the method later if we get the above? What
do other people think about the ideas?
> Improve performance for small scan
> ----------------------------------
>
> Key: HBASE-9488
> URL: https://issues.apache.org/jira/browse/HBASE-9488
> Project: HBase
> Issue Type: Improvement
> Components: Client, Performance, Scanners
> Reporter: chunhui shen
> Assignee: chunhui shen
> Fix For: 0.98.0, 0.94.13
>
> Attachments: hbase-9488-94-v3.patch, HBASE-9488-trunk.patch,
> HBASE-9488-trunkV2.patch, HBASE-9488-trunkV3.patch, HBASE-9488-trunkV4.patch,
> HBASE-9488-trunkV4.patch, mergeRpcCallForScan.patch, test results.jpg
>
>
> review board:
> https://reviews.apache.org/r/14059/
> *Performance Improvement*
> Test shows about 1.5~3X improvement for small scan where limit<=50 under
> cache hit ratio=100%.
> See more performance test result from the picture attachment
> *Usage:*
> Scan scan = new Scan(startRow,stopRow);
> scan.setSmall(true);
> ResultScanner scanner = table.getScanner(scan);
> Set the new 'small' attribute as true for scan object, others are the same
> Now, one scan operation would call 3 RPC at least:
> openScanner();
> next();
> closeScanner();
> I think we could reduce the RPC call to one for small scan to get better
> performance
> Also using pread is better than seek+read for small scan (For this point, see
> more on HBASE-7266)
> Implements such a small scan as the patch, and take the performance test as
> following:
> a.Environment:
> patched on 0.94 version
> one regionserver;
> one client with 50 concurrent threads;
> KV size:50/100;
> 100% LRU cache hit ratio;
> Random start row of scan
> b.Results:
> See the picture attachment
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira