[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

ramkrishna.s.vasudevan (JIRA) Mon, 21 Sep 2015 22:48:20 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901980#comment-14901980
 ]


ramkrishna.s.vasudevan commented on HBASE-14221:
------------------------------------------------

As am not getting notifications on the JIRA updates I  missed this review. 
Sorry about that.
bq.How does this extend to the MultiCF case?
Ideally the multiCF case also the same should apply. As we tend to move from CF 
to CF till we complete the current row, this same logic should be applicable. 
But I need to work on some more complex cases to see if all fits into the same 
umbrella.
bq.So, about 10% difference for this added complexity?
Do you think this is going to be more complex for the gain we get? I thought 
avoiding as many compares as possible would be beneficial. 
bq.Why need for two flags? Why not isSingleColumnFamily test not enough? When 
would we have a single store heap scanner but then a joined heap would have 
more than one?
I created one more specifically for the joined scanner case. Because by code 
the normal scanner and joined scanner are treated as special cases.
bq.Why not just have the flag be in the scanner context?
Yes. We can try to have the flag in the scanner context too I think. Let me try 
 it out.  
But before I update the patch do you think we can get this in Stack?  The gain 
is quite siginificant in almost all the tests that I did.

> Reduce the number of time row comparison is done in a Scan
> ----------------------------------------------------------
>
>                 Key: HBASE-14221
>                 URL: https://issues.apache.org/jira/browse/HBASE-14221
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Scanners
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14221.patch, HBASE-14221_1.patch, 
> HBASE-14221_1.patch, withmatchingRowspatch.png, withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>        int ret = this.rowComparator.compareRows(curCell, cell);
>     if (!this.isReversed) {
>       if (ret <= -1) {
>         return MatchCode.DONE;
>       } else if (ret >= 1) {
>         // could optimize this, if necessary?
>         // Could also be called SEEK_TO_CURRENT_ROW, but this
>         // should be rare/never happens.
>         return MatchCode.SEEK_NEXT_ROW;
>       }
>     } else {
>       if (ret <= -1) {
>         return MatchCode.SEEK_NEXT_ROW;
>       } else if (ret >= 1) {
>         return MatchCode.DONE;
>       }
>     }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
>     if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
>         isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>       this.countPerRow = 0;
>       matcher.setToNewRow(peeked);
>     }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>           scannerContext.setKeepProgress(true);
>           heap.next(results, scannerContext);
>           scannerContext.setKeepProgress(tmpKeepProgress);
>           nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

Reply via email to