[
https://issues.apache.org/jira/browse/HBASE-29103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929754#comment-17929754
]
Nick Dimiduk commented on HBASE-29103:
--------------------------------------
bq. Given that, these tests are 25% faster for average read latency which is
amazing, I just find it hard to explain especially given that I haven't
upgraded hardware in that time. I think the biggest difference is maybe that
underlying JDK that I'm running with 21 vs. 11.
It would be interesting to understand such a significant improvement. I don't
know how easy it is to replicate your test in an automated way, but git bisect
is very powerful for this kind of task.
> Avoid excessive allocations during reverse scanning when seeking to next row
> ----------------------------------------------------------------------------
>
> Key: HBASE-29103
> URL: https://issues.apache.org/jira/browse/HBASE-29103
> Project: HBase
> Issue Type: Improvement
> Components: Performance
> Affects Versions: 3.0.0-beta-1, 2.6.1
> Reporter: Becker Ewing
> Assignee: Becker Ewing
> Priority: Major
> Labels: pull-request-available
> Attachments: high-block-cache-key-to-string-alloc-profile.html
>
>
> Currently, when we're reverse scanning in a storefile, the general path is to:
> # Seek to before the current row to find the prior row
> # Seek to the beginning of the prior row
> (this can get a big more complex depending on how fast a single "seek"
> operation is, see HBASE-28043 for additional details).
>
> At step 1, we call HFileScanner#getCell and then we subsequently always call
> PrivateCellUtil.createFirstOnRow() on this Cell instance
> ([Code).|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L611-L614]
> PrivateCellUtil.createFirstOnRow() creates a [copy of only the row portion
> of this
> Cell|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-common/src/main/java/org/apache/hadoop/hbase/PrivateCellUtil.java#L2768-L2775].
>
>
> I propose that since we're only using the key-portion of the cell returned by
> HFileScanner#getCell, that we should instead call
> [HFileScanner#getKey|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java#L91-L96]
> in this scenario so we avoid deep-copying extra components of the Cell such
> as the value, tags, etc... This should be a safe change as this Cell instance
> never escapes StoreFileScanner and we only call HFileScanner#getCell when the
> scanner is already seeked.
>
> Attached is the same allocation profile taken to guide the optimizations in
> HBASE-29099 which shows that about 3% of allocations are spent in
> [BufferedEncodedSeeker.getCell in the body of
> seekBeforeAndSaveKeyToPreviousRow|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java#L284-L348].
> The region server in question here was pinned at 100% CPU utilization for a
> while and was running a reverse-scan heavy workload.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)