[ https://issues.apache.org/jira/browse/HBASE-29103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Becker Ewing updated HBASE-29103: --------------------------------- Affects Version/s: 2.5.11 > Avoid excessive allocations during reverse scanning when seeking to next row > ---------------------------------------------------------------------------- > > Key: HBASE-29103 > URL: https://issues.apache.org/jira/browse/HBASE-29103 > Project: HBase > Issue Type: Improvement > Components: Performance > Affects Versions: 3.0.0-beta-1, 2.6.1, 2.5.11 > Reporter: Becker Ewing > Assignee: Becker Ewing > Priority: Major > Labels: pull-request-available > Attachments: high-block-cache-key-to-string-alloc-profile.html > > > Currently, when we're reverse scanning in a storefile, the general path is to: > # Seek to before the current row to find the prior row > # Seek to the beginning of the prior row > (this can get a big more complex depending on how fast a single "seek" > operation is, see HBASE-28043 for additional details). > > At step 1, we call HFileScanner#getCell and then we subsequently always call > PrivateCellUtil.createFirstOnRow() on this Cell instance > ([Code).|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L611-L614] > PrivateCellUtil.createFirstOnRow() creates a [copy of only the row portion > of this > Cell|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-common/src/main/java/org/apache/hadoop/hbase/PrivateCellUtil.java#L2768-L2775]. > > > I propose that since we're only using the key-portion of the cell returned by > HFileScanner#getCell, that we should instead call > [HFileScanner#getKey|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java#L91-L96] > in this scenario so we avoid deep-copying extra components of the Cell such > as the value, tags, etc... This should be a safe change as this Cell instance > never escapes StoreFileScanner and we only call HFileScanner#getCell when the > scanner is already seeked. > > Attached is the same allocation profile taken to guide the optimizations in > HBASE-29099 which shows that about 3% of allocations are spent in > [BufferedEncodedSeeker.getCell in the body of > seekBeforeAndSaveKeyToPreviousRow|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java#L284-L348]. > The region server in question here was pinned at 100% CPU utilization for a > while and was running a reverse-scan heavy workload. -- This message was sent by Atlassian Jira (v8.20.10#820010)