[
https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555584#comment-13555584
]
Hudson commented on HBASE-6066:
-------------------------------
Integrated in HBase-TRUNK #3759 (See
[https://builds.apache.org/job/HBase-TRUNK/3759/])
HBASE-6066 some low hanging read path improvement ideas (Revision 1434415)
Result = FAILURE
stack :
Files :
*
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
*
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> some low hanging read path improvement ideas
> ---------------------------------------------
>
> Key: HBASE-6066
> URL: https://issues.apache.org/jira/browse/HBASE-6066
> Project: HBase
> Issue Type: Sub-task
> Components: Performance
> Reporter: Kannan Muthukkaruppan
> Assignee: Devaraj Das
> Priority: Critical
> Labels: noob
> Fix For: 0.96.0
>
> Attachments:
> 0001-jira-HBASE-6066-89-fb-Some-read-performance-improvem.patch,
> 6066-rebased-1.patch, 6066-rebased-1.patch, metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with
> small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of
> temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
> boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
> if (!values.isEmpty()) {
> for (KeyValue kv : values) { ------> #### wasteful in most
> cases
> currentScanResultSize += kv.heapSize();
> }
> results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
> results.clear();
> boolean returnResult = nextInternal(limit, metric);
> outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we
> can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in
> StoreScanner.next(), which eventually also copies its results into
> "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
> if (metric != null) {
> HRegion.incrNumericMetric(this.metricNamePrefix + metric,
> copyKv.getLength());
> }
> results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the
> size of those KVs in a local variable and then in a finally clause increment
> the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same
> object. Both are synchronized methods. Synchronized methods calling nested
> synchronized methods on the same object are probably adding some small
> overhead. The inner next() calls isFilterDone() which is a also a
> synchronized method. We should factor the code to avoid these nested
> synchronized methods.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira