[
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097639#comment-15097639
]
Nick Dimiduk commented on HBASE-14221:
--------------------------------------
Looks like this was committed. Please update the Fix Versions and resolve the
issue. Thanks [~ram_krish].
> Reduce the number of time row comparison is done in a Scan
> ----------------------------------------------------------
>
> Key: HBASE-14221
> URL: https://issues.apache.org/jira/browse/HBASE-14221
> Project: HBase
> Issue Type: Sub-task
> Components: Scanners
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: 14221-0.98-takeALook.txt, HBASE-14221-branch-1.patch,
> HBASE-14221.patch, HBASE-14221_1.patch, HBASE-14221_1.patch,
> HBASE-14221_6.patch, HBASE-14221_9.patch, withmatchingRowspatch.png,
> withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
> int ret = this.rowComparator.compareRows(curCell, cell);
> if (!this.isReversed) {
> if (ret <= -1) {
> return MatchCode.DONE;
> } else if (ret >= 1) {
> // could optimize this, if necessary?
> // Could also be called SEEK_TO_CURRENT_ROW, but this
> // should be rare/never happens.
> return MatchCode.SEEK_NEXT_ROW;
> }
> } else {
> if (ret <= -1) {
> return MatchCode.SEEK_NEXT_ROW;
> } else if (ret >= 1) {
> return MatchCode.DONE;
> }
> }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
> if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) ||
> matcher.curCell == null ||
> isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
> this.countPerRow = 0;
> matcher.setToNewRow(peeked);
> }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
> scannerContext.setKeepProgress(true);
> heap.next(results, scannerContext);
> scannerContext.setKeepProgress(tmpKeepProgress);
> nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case. Was
> trying to solve this for the MultiCF case but is having lot of cases to
> solve. But atleast for a single CF case I think these comparison can be
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in
> the heap and so the 3rd comparison can surely be avoided if the
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that
> can be avoided if we know that the previous next() call was over due to a new
> row. Doing all this I found that the compareRows in the profiler which was
> 19% got reduced to 13%. Initially we can solve for single CF case which can
> be extended to MultiCF cases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)