Bryan Beaudreault created HBASE-28130:
-----------------------------------------

             Summary: Add metrics around automatic switching from PREAD to 
STREAM
                 Key: HBASE-28130
                 URL: https://issues.apache.org/jira/browse/HBASE-28130
             Project: HBase
          Issue Type: Improvement
            Reporter: Bryan Beaudreault


Unless specifically configured, scans start with ReadType.DEFAULT. In which 
case, they start as PREAD until hbase.storescanner.pread.max.bytes, then are 
converted to STREAM. 

We've sometimes found evidence of trySwitchToStream taking a good amount of 
time in profiles. This is because switching to STREAM involves opening a new 
DFSInputStream which requires hitting the namenode for block locations.

Recently I did a perf test of various scan sizes and our default max bytes 
threshold (4 blocks). I found that there is much more downside to switching to 
STREAM too early, than there is in staying in PREAD too long. When the scan 
size is around the switchover threshold (4-8 blocks), overall throughput is 
reduced by over 50%. When the scan size is much larger than the threshold (20+ 
blocks), staying in PREAD too long only reduces throughput by < 10%.

These results are in our environment where we use only SSD and have 100% 
locality, so "seeks" are not relevant, disk IO is very fast, and needing to 
block to send network requests to the namenode can have an outsized impact. 
Others may have different constraints, but metrics can help make informed 
decisions.

I'd like a metric for switchesToStreamCount, and a histogram of 
bytesReadAfterStreamSwitch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to