[ 
https://issues.apache.org/jira/browse/HDDS-15424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tak-Lon (Stephen) Wu updated HDDS-15424:
----------------------------------------
    Attachment: cf-d7f3673bffb24e0eb692b37c76a06925.log

> Non-Stream read failed HBase read block checksum intermittently
> ---------------------------------------------------------------
>
>                 Key: HDDS-15424
>                 URL: https://issues.apache.org/jira/browse/HDDS-15424
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Tak-Lon (Stephen) Wu
>            Priority: Minor
>         Attachments: cf-d7f3673bffb24e0eb692b37c76a06925.log
>
>
> When testing HBase with Ozone via running YCSB read only workload C (no SCR), 
> we found a strange checksum error when reading block from Ozone's 
> ChunkInputStream. 
> [HBase's data block 
> checksum|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java#L1531-L1558]
>  (hbase.regionserver.checksum.verify) is enabled by default, and the goal is 
> to save roundtrip filesystem checksum and keep that in HBase-level by keep 
> its own check inside the HFile data blocks.
> this does not fail with HDFS (no SCR, no bucketcache).
> {code}
> 2026-05-21 14:31:41,627 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock: 
> [RpcServer.default.FPBQ.Fifo.handler=14,queue=5,port=22101]: Reading 
> d7f3673bffb24e0eb692b37c76a06925 at offset=66368883, pread=true, 
> verifyChecksum=true, cachedHeader=null, onDiskSizeWithHeader=65613
> 2026-05-21 14:31:41,655 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock: 
> [RpcServer.default.FPBQ.Fifo.handler=81,queue=0,port=22101]: Read 
> [blockType=DATA, fileOffset=238227405, headerSize=33, 
> onDiskSizeWithoutHeader=65584, uncompressedSizeWithoutHeader=65564, 
> prevBlockOffset=238161816, isUseHBaseChecksum=true, checksumType=CRC32C, 
> bytesPerChecksum=16384, onDiskDataSizeWithHeader=65597, 
> getOnDiskSizeWithHeader=65617, totalChecksumBytes=20, isUnpacked=true, 
> buf=[SingleByteBuff[pos=0, lim=65597, cap= 65650]], 
> dataBeginsWith=\x00\x00\x00+\x00\x00\x00d\x00\x17user157290070305533595, 
> fileContext=[usesHBaseChecksum=true, checksumType=CRC32C, 
> bytesPerChecksum=16384, blocksize=65536, encoding=NONE, 
> indexBlockEncoding=NONE, includesMvcc=true, includesTags=false, 
> compressAlgo=NONE, compressTags=false, decompressionContext=null, 
> cryptoContext=[cipher=NONE keyHash=NONE], 
> name=d7f3673bffb24e0eb692b37c76a06925, 
> cellComparator=org.apache.hadoop.hbase.InnerStoreCellComparator@4d3ead0a], 
> nextBlockOnDiskSize=65633] in 101 ms
> 2026-05-21 14:31:41,772 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock: 
> [RpcServer.default.FPBQ.Fifo.handler=14,queue=5,port=22101]: Read 
> [blockType=DATA, fileOffset=66368883, headerSize=33, 
> onDiskSizeWithoutHeader=65580, uncompressedSizeWithoutHeader=65560, 
> prevBlockOffset=66303290, isUseHBaseChecksum=true, checksumType=CRC32C, 
> bytesPerChecksum=16384, onDiskDataSizeWithHeader=65593, 
> getOnDiskSizeWithHeader=65613, totalChecksumBytes=20, isUnpacked=true, 
> buf=[SingleByteBuff[pos=0, lim=65593, cap= 65646]], 
> dataBeginsWith=\x00\x00\x00+\x00\x00\x00d\x00\x17user156719743603804260, 
> fileContext=[usesHBaseChecksum=true, checksumType=CRC32C, 
> bytesPerChecksum=16384, blocksize=65536, encoding=NONE, 
> indexBlockEncoding=NONE, includesMvcc=true, includesTags=false, 
> compressAlgo=NONE, compressTags=false, decompressionContext=null, 
> cryptoContext=[cipher=NONE keyHash=NONE], 
> name=d7f3673bffb24e0eb692b37c76a06925, 
> cellComparator=org.apache.hadoop.hbase.InnerStoreCellComparator@4d3ead0a], 
> nextBlockOnDiskSize=65738] in 145 ms
> 2026-05-21 14:31:45,042 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock: 
> [RpcServer.default.FPBQ.Fifo.handler=18,queue=0,port=22101]: Reading 
> d7f3673bffb24e0eb692b37c76a06925 at offset=280830996, pread=true, 
> verifyChecksum=true, cachedHeader=null, onDiskSizeWithHeader=65609
> 2026-05-21 14:32:12,355 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
> [RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HBase 
> checksumType verification failed for file 
> ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
>  at offset 237833541 filesize 392584403 checksumType 49. Retrying read with 
> HDFS checksums turned on...
> 2026-05-21 14:32:12,355 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
> [RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HBase checksum 
> verification failed for file 
> ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
>  at offset 237833541 filesize 392584403. Retrying read with HDFS checksums 
> turned on...
> 2026-05-21 14:32:12,404 WARN  org.apache.hadoop.hbase.io.hfile.HFile: 
> [RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HDFS checksum 
> verification succeeded for file 
> ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
>  at offset 237833541 filesize 392584403
> {code}
> the question is about why reading from ozone could think the data block is 
> corruption ? but when it fails back to Ozone (Filesystem) checksum, it 
> completes successfully? 
> this extra checksum at the filesystem would cause minor delay and it's not 
> always failing and only failed once a while. please see the capture reading 
> for cf d7f3673bffb24e0eb692b37c76a06925 in 
> cf-d7f3673bffb24e0eb692b37c76a06925.log and you will find most of the case it 
> was reading fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to