[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

Hadoop QA (JIRA) Tue, 29 Sep 2015 07:00:54 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935203#comment-14935203
 ]


Hadoop QA commented on HBASE-14221:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12764222/HBASE-14221_6.patch
  against master branch at commit 2ea70c7e6c70c4bd689b79718999a948001f3b21.
  ATTACHMENT ID: 12764222

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

                {color:red}-1 checkstyle{color}.  The applied patch generated 
1789 checkstyle errors (more than the master's current 1787 errors).

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS

     {color:red}-1 core zombie tests{color}.  There are 6 zombie test(s):       
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileScanning(TestHRegion.java:3890)
        at 
org.apache.hadoop.hbase.wal.TestWALSplit.testOpenZeroLengthReportedFileButWithDataGetsSplit(TestWALSplit.java:480)
        at 
org.apache.hadoop.hbase.client.TestReplicasClient.testScanWithReplicas(TestReplicasClient.java:599)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15803//testReport/
Release Findbugs (version 2.0.3)        warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15803//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15803//artifact/patchprocess/checkstyle-aggregate.html

                Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15803//console

This message is automatically generated.

> Reduce the number of time row comparison is done in a Scan
> ----------------------------------------------------------
>
>                 Key: HBASE-14221
>                 URL: https://issues.apache.org/jira/browse/HBASE-14221
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Scanners
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14221.patch, HBASE-14221_1.patch, 
> HBASE-14221_1.patch, HBASE-14221_6.patch, withmatchingRowspatch.png, 
> withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>        int ret = this.rowComparator.compareRows(curCell, cell);
>     if (!this.isReversed) {
>       if (ret <= -1) {
>         return MatchCode.DONE;
>       } else if (ret >= 1) {
>         // could optimize this, if necessary?
>         // Could also be called SEEK_TO_CURRENT_ROW, but this
>         // should be rare/never happens.
>         return MatchCode.SEEK_NEXT_ROW;
>       }
>     } else {
>       if (ret <= -1) {
>         return MatchCode.SEEK_NEXT_ROW;
>       } else if (ret >= 1) {
>         return MatchCode.DONE;
>       }
>     }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
>     if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
>         isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>       this.countPerRow = 0;
>       matcher.setToNewRow(peeked);
>     }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>           scannerContext.setKeepProgress(true);
>           heap.next(results, scannerContext);
>           scannerContext.setKeepProgress(tmpKeepProgress);
>           nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

Reply via email to