Tak-Lon (Stephen) Wu created HBASE-27013:
--------------------------------------------

             Summary: Introduce read all bytes when using pread for Prefetch
                 Key: HBASE-27013
                 URL: https://issues.apache.org/jira/browse/HBASE-27013
             Project: HBase
          Issue Type: Improvement
          Components: HFile, Performance
    Affects Versions: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.13
            Reporter: Tak-Lon (Stephen) Wu


h2. Problem statement

When prefetching HFiles from blob storage like S3 and use it with the storage 
implementation like S3A, we found there is a logical issue in HBase pread that 
causes the reading of the remote HFile aborts the input stream multiple times. 
This aborted stream and reopen slow down the reads and trigger many aborted 
bytes and waste time in recreating the connection especially when SSL is 
enabled.
h2. ROOT CAUSE

The root cause of above issue was due to 
[BlockIOUtils#preadWithExtra|https://github.com/apache/hbase/blob/9c8c9e7fbf8005ea89fa9b13d6d063b9f0240443/hbase-common/src/main/java/org/apache/hadoop/hbase/io/util/BlockIOUtils.java#L214-L257]
 is reading an input stream that does not guarrentee to return the data block 
and the next block header as an option data to be cached.

In the case of the input stream read short and we passed the length of the 
necessary data block plus few bytes within the size of next block header, 
[BlockIOUtils#preadWithExtra|https://github.com/apache/hbase/blob/9c8c9e7fbf8005ea89fa9b13d6d063b9f0240443/hbase-common/src/main/java/org/apache/hadoop/hbase/io/util/BlockIOUtils.java#L214-L257]
 returns to the caller. As a result, when we're trying to read the next block, 
[HFileBlock#|https://github.com/apache/hbase/blob/9c8c9e7fbf8005ea89fa9b13d6d063b9f0240443/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java#L1648-L1664]
 in hbase will try to re-read the header from the input stream, but because the 
reusable input stream has move the current position pointer ahead from the 
offset of the last read data block, with the [S3A 
implementation|https://github.com/apache/hadoop/blob/29401c820377d02a992eecde51083cf87f8e57af/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L339-L361],
 the input stream is then closed, aborted all the remaining bytes and reopen a 
new input stream at the offset of the last read data block .
h2. How do we fix it?

S3A is doing the right job that HBase is telling to move the offset from 
position A back to A - N, so there is not much thing we can do on how S3A 
handle the inputstream. meanwhile in the case of HDFS, this operation is fast.

Such that, we should fix in HBase level, and try always to read datablock + 
next block header when we're using blob storage to avoid expensive draining the 
bytes in a stream and reopen the socket with the remote storage.
h2. Draw back and discussion
 * A known drawback is, when we're at the last block, we will read extra length 
that should not be a header, and we still read that into the byte buffer array. 
the size should be always 33 bytes, and it should not a big issue in data 
correctness because the trailer will tell when the last datablock should end. 
And we just waste a 33 byte read and that data is not being used.
 * I don't know if we can use HFileStreamReader but that will change the 
Prefetch logic a lot, such that this minimum change should be the best.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to