[jira] [Commented] (HBASE-6066) some low hanging read path improvement ideas

Hudson (JIRA) Thu, 24 Jan 2013 08:05:37 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13561721#comment-13561721
 ]


Hudson commented on HBASE-6066:
-------------------------------

Integrated in HBase-0.94-security #96 (See 
[https://builds.apache.org/job/HBase-0.94-security/96/])
    HBASE-7599 Port HBASE-6066 (low hanging read path improvements) to 0.94 
(Devaraj Das) (Revision 1437237)

     Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java

                
> some low hanging read path improvement ideas 
> ---------------------------------------------
>
>                 Key: HBASE-6066
>                 URL: https://issues.apache.org/jira/browse/HBASE-6066
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Devaraj Das
>            Priority: Critical
>              Labels: noob
>             Fix For: 0.96.0
>
>         Attachments: 
> 0001-jira-HBASE-6066-89-fb-Some-read-performance-improvem.patch, 
> 6066-rebased-1.patch, 6066-rebased-1.patch, metric-stringbuilder-fix.patch
>
>
> I was running some single threaded scan performance tests for a table with 
> small sized rows that is fully cached. Some observations...
> We seem to be doing several wasteful iterations over and/or building of 
> temporary lists.
> 1) One such is the following code in HRegionServer.next():
> {code}
>    boolean moreRows = s.next(values, HRegion.METRIC_NEXTSIZE);
>    if (!values.isEmpty()) {
>      for (KeyValue kv : values) {              ------> #### wasteful in most 
> cases
>        currentScanResultSize += kv.heapSize();
>    }
>    results.add(new Result(values));
> {code}
> By default the "maxScannerResultSize" is Long.MAX_VALUE. In those cases,
> we can avoid the unnecessary iteration to compute currentScanResultSize.
> 2) An example of a wasteful temporary array, is "results" in
> RegionScanner.next().
> {code}
>       results.clear();
>       boolean returnResult = nextInternal(limit, metric);
>       outResults.addAll(results);
> {code}
> results then gets copied over to outResults via an addAll(). Not sure why we 
> can not directly collect the results in outResults.
> 3) Another almost similar exmaple of a wasteful array is "results" in 
> StoreScanner.next(), which eventually also copies its results into 
> "outResults".
> 4) Reduce overhead of "size metric" maintained in StoreScanner.next().
> {code}
>   if (metric != null) {
>      HRegion.incrNumericMetric(this.metricNamePrefix + metric,
>                                copyKv.getLength());
>   }
>   results.add(copyKv);
> {code}
> A single call to next() might fetch a lot of KVs. We can first add up the 
> size of those KVs in a local variable and then in a finally clause increment 
> the metric one shot, rather than updating AtomicLongs for each KV.
> 5) RegionScanner.next() calls a helper RegionScanner.next() on the same 
> object. Both are synchronized methods. Synchronized methods calling nested 
> synchronized methods on the same object are probably adding some small 
> overhead. The inner next() calls isFilterDone() which is a also a 
> synchronized method. We should factor the code to avoid these nested 
> synchronized methods.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6066) some low hanging read path improvement ideas

Reply via email to