[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

ramkrishna.s.vasudevan (JIRA) Tue, 17 Nov 2015 02:50:53 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15008465#comment-15008465
 ]


ramkrishna.s.vasudevan commented on HBASE-14221:
------------------------------------------------

In order to proceed with HBASE-14221, I did some analysis in this JIRa with the 
patches provided and the benchmarks given. What Matt says is true, reseeks with 
LoserTree is faster whereas next() with LoserTree is slower. This is surely 
because of the number of operations and comparisons that we perform. 
In case of next() with KVHeap, we always have do a peek() to the next KVScanner 
and with that we compare the current KV if we really to add the scanner back to 
the KVHeap.  So here there is no change to the heap. and so no comparison. 
Where as in case of LoserTree we need to fetch the next key and always keep 
adjusting the tree. 
In case of reseek() the case with KVHeap is that we do lot of changes to the 
heap, first we simply add the current scanner to the heap, then do a poll and 
again based on the need we again add the current scanner to the heap. But in 
case of LoserTree these many comparisons are avoided and it is similar to the 
next() call with LoserTree.
Am not very sure which one would be better here because both next() and 
reseek() is going to be important when we have filters along with scans. 
Coming to this patch- this patch is more to do with the comparisons in the 
StoreScanner level and Hregion level and it is mainly in comparing the rows. So 
this patch can be seen as a different area than where the KVHeap would come in. 
So it may be worth pursuing? Am still doing some more analysis on the LoserTree 
and its impl and see how we can benefit from them. 

> Reduce the number of time row comparison is done in a Scan
> ----------------------------------------------------------
>
>                 Key: HBASE-14221
>                 URL: https://issues.apache.org/jira/browse/HBASE-14221
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Scanners
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, 
> HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, 
> withmatchingRowspatch.png, withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>        int ret = this.rowComparator.compareRows(curCell, cell);
>     if (!this.isReversed) {
>       if (ret <= -1) {
>         return MatchCode.DONE;
>       } else if (ret >= 1) {
>         // could optimize this, if necessary?
>         // Could also be called SEEK_TO_CURRENT_ROW, but this
>         // should be rare/never happens.
>         return MatchCode.SEEK_NEXT_ROW;
>       }
>     } else {
>       if (ret <= -1) {
>         return MatchCode.SEEK_NEXT_ROW;
>       } else if (ret >= 1) {
>         return MatchCode.DONE;
>       }
>     }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
>     if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
>         isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>       this.countPerRow = 0;
>       matcher.setToNewRow(peeked);
>     }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>           scannerContext.setKeepProgress(true);
>           heap.next(results, scannerContext);
>           scannerContext.setKeepProgress(tmpKeepProgress);
>           nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

Reply via email to