[
https://issues.apache.org/jira/browse/HBASE-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915531#comment-13915531
]
Lars Hofhansl commented on HBASE-10625:
---------------------------------------
Final numbers for reference. (Doubled number of rows and reset everything after
each run).
20m rows, 5 columns, 8 byte keys, 10 bytes values, no encoding:
||collumns||None||C0||C1||C4||C1,C3||C2,C3||C2,C3,C4||
|w/o patch|14.88|14.74|23.57|17.70|34.48|26.06|21.76|
|w/ patch|14.24|14.02|22.09|16.95|32.39|24.86|21.63|
20m rows, 5 columns, 8 byte keys, 10 bytes values, FAST_DIFF:
||collumns||None||C0||C1||C4||C1,C3||C2,C3||C2,C3,C4||
|w/o patch|22.07|20.59|30.65|23.53|43.25|33.62|28.90|
|w/ patch|22.59|20.22|29.84|22.91|42.43|33.31|28.87|
> Remove unnecessary key compare from AbstractScannerV2.reseekTo
> --------------------------------------------------------------
>
> Key: HBASE-10625
> URL: https://issues.apache.org/jira/browse/HBASE-10625
> Project: HBase
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Attachments: 10625-0.94.txt, 10625-trunk.txt
>
>
> In reseekTo we find this
> {code}
> ...
> compared = compareKey(reader.getComparator(), key, offset, length);
> if (compared < 1) {
> // If the required key is less than or equal to current key, then
> // don't do anything.
> return compared;
> } else {
> ...
> return loadBlockAndSeekToKey(this.block, this.nextIndexedKey,
> false, key, offset, length, false);
> ...
> {code}
> loadBlockAndSeekToKey already does the right thing when a we pass a key that
> sorts before the current key. It's less efficient than this early check, but
> in the vast (all?) cases we pass forward keys (as required by the reseek
> contract). We're optimizing the wrong thing.
> Scanning with the ExplicitColumnTracker is 20-30% faster.
> (I tested with rows of 5 short KVs selected the 2nd and or 4th column)
> I propose simply removing that check.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)