Becker Ewing created HBASE-29103:
------------------------------------

             Summary: Avoid excessive allocations during reverse scanning when 
seeking to next row
                 Key: HBASE-29103
                 URL: https://issues.apache.org/jira/browse/HBASE-29103
             Project: HBase
          Issue Type: Improvement
          Components: Performance
    Affects Versions: 2.6.1, 3.0.0-beta-1
            Reporter: Becker Ewing
            Assignee: Becker Ewing
         Attachments: high-block-cache-key-to-string-alloc-profile.html

Currently, when we're reverse scanning in a storefile, the general path is to:
 # Seek to before the current row to find the prior row
 # Seek to the beginning of the prior row

(this can get a big more complex depending on how fast a single "seek" 
operation is, see HBASE-28043 for additional details).

 

At step 1, we call HFileScanner#getCell and then we subsequently always call 
PrivateCellUtil.createFirstOnRow() on this Cell instance 
([Code).|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L611-L614]
 PrivateCellUtil.createFirstOnRow() creates a [copy of only the row portion of 
this 
Cell|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-common/src/main/java/org/apache/hadoop/hbase/PrivateCellUtil.java#L2768-L2775].
 

 

I propose that since we're only using the key-portion of the cell returned by 
HFileScanner#getCell, that we should instead call 
[HFileScanner#getKey|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileScanner.java#L91-L96]
 in this scenario so we avoid deep-copying extra components of the Cell such as 
the value, tags, etc... This should be a safe change as this Cell instance 
never escapes StoreFileScanner and we only call HFileScanner#getCell when the 
scanner is already seeked.

 

Attached is the same allocation profile taken to guide the optimizations in 
HBASE-29099 which shows that about 3% of allocations are spent in 
[BufferedEncodedSeeker.getCell in the body of 
seekBeforeAndSaveKeyToPreviousRow|https://github.com/apache/hbase/blob/b89c8259c5726395c9ae3a14919bd192252ca517/hbase-common/src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java#L284-L348].
 The region server in question here was pinned at 100% CPU utilization for a 
while and was running a reverse-scan heavy workload.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to