[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

Hudson (JIRA) Fri, 08 Jan 2016 05:49:20 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089213#comment-15089213
 ]


Hudson commented on HBASE-14221:
--------------------------------

FAILURE: Integrated in HBase-1.2 #495 (See 
[https://builds.apache.org/job/HBase-1.2/495/])
HBASE-14221 - Reduce the number of time row comparison is done in a Scan 
(ramkrishna: rev f9c69893e0a56e9eb60a65703d0b481d12c305d0)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java


> Reduce the number of time row comparison is done in a Scan
> ----------------------------------------------------------
>
>                 Key: HBASE-14221
>                 URL: https://issues.apache.org/jira/browse/HBASE-14221
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Scanners
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: 14221-0.98-takeALook.txt, HBASE-14221-branch-1.patch, 
> HBASE-14221.patch, HBASE-14221_1.patch, HBASE-14221_1.patch, 
> HBASE-14221_6.patch, HBASE-14221_9.patch, withmatchingRowspatch.png, 
> withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>        int ret = this.rowComparator.compareRows(curCell, cell);
>     if (!this.isReversed) {
>       if (ret <= -1) {
>         return MatchCode.DONE;
>       } else if (ret >= 1) {
>         // could optimize this, if necessary?
>         // Could also be called SEEK_TO_CURRENT_ROW, but this
>         // should be rare/never happens.
>         return MatchCode.SEEK_NEXT_ROW;
>       }
>     } else {
>       if (ret <= -1) {
>         return MatchCode.SEEK_NEXT_ROW;
>       } else if (ret >= 1) {
>         return MatchCode.DONE;
>       }
>     }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
>     if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
>         isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>       this.countPerRow = 0;
>       matcher.setToNewRow(peeked);
>     }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>           scannerContext.setKeepProgress(true);
>           heap.next(results, scannerContext);
>           scannerContext.setKeepProgress(tmpKeepProgress);
>           nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

Reply via email to