[
https://issues.apache.org/jira/browse/HDDS-15424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084997#comment-18084997
]
Tsz-wo Sze commented on HDDS-15424:
-----------------------------------
[~taklwu], thanks for filing this and /HDDS-15423!
Since Client CheckSum failed in both Stream Read and non-Stream Read, it may be
an existing bug but not Stream Read specific.
> Non-Stream read failed HBase read block checksum intermittently
> ---------------------------------------------------------------
>
> Key: HDDS-15424
> URL: https://issues.apache.org/jira/browse/HDDS-15424
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Tak-Lon (Stephen) Wu
> Priority: Minor
> Attachments: cf-d7f3673bffb24e0eb692b37c76a06925.log
>
>
> When testing HBase with Ozone via running YCSB read only workload C (no SCR),
> we found a strange checksum error when reading block from Ozone's
> ChunkInputStream.
> [HBase's data block
> checksum|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java#L1531-L1558]
> (hbase.regionserver.checksum.verify) is enabled by default, and the goal is
> to save roundtrip filesystem checksum and keep that in HBase-level by keep
> its own check inside the HFile data blocks.
> this does not fail with HDFS (no SCR, no bucketcache).
> {code}
> 2026-05-21 14:31:41,627 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock:
> [RpcServer.default.FPBQ.Fifo.handler=14,queue=5,port=22101]: Reading
> d7f3673bffb24e0eb692b37c76a06925 at offset=66368883, pread=true,
> verifyChecksum=true, cachedHeader=null, onDiskSizeWithHeader=65613
> 2026-05-21 14:31:41,655 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock:
> [RpcServer.default.FPBQ.Fifo.handler=81,queue=0,port=22101]: Read
> [blockType=DATA, fileOffset=238227405, headerSize=33,
> onDiskSizeWithoutHeader=65584, uncompressedSizeWithoutHeader=65564,
> prevBlockOffset=238161816, isUseHBaseChecksum=true, checksumType=CRC32C,
> bytesPerChecksum=16384, onDiskDataSizeWithHeader=65597,
> getOnDiskSizeWithHeader=65617, totalChecksumBytes=20, isUnpacked=true,
> buf=[SingleByteBuff[pos=0, lim=65597, cap= 65650]],
> dataBeginsWith=\x00\x00\x00+\x00\x00\x00d\x00\x17user157290070305533595,
> fileContext=[usesHBaseChecksum=true, checksumType=CRC32C,
> bytesPerChecksum=16384, blocksize=65536, encoding=NONE,
> indexBlockEncoding=NONE, includesMvcc=true, includesTags=false,
> compressAlgo=NONE, compressTags=false, decompressionContext=null,
> cryptoContext=[cipher=NONE keyHash=NONE],
> name=d7f3673bffb24e0eb692b37c76a06925,
> cellComparator=org.apache.hadoop.hbase.InnerStoreCellComparator@4d3ead0a],
> nextBlockOnDiskSize=65633] in 101 ms
> 2026-05-21 14:31:41,772 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock:
> [RpcServer.default.FPBQ.Fifo.handler=14,queue=5,port=22101]: Read
> [blockType=DATA, fileOffset=66368883, headerSize=33,
> onDiskSizeWithoutHeader=65580, uncompressedSizeWithoutHeader=65560,
> prevBlockOffset=66303290, isUseHBaseChecksum=true, checksumType=CRC32C,
> bytesPerChecksum=16384, onDiskDataSizeWithHeader=65593,
> getOnDiskSizeWithHeader=65613, totalChecksumBytes=20, isUnpacked=true,
> buf=[SingleByteBuff[pos=0, lim=65593, cap= 65646]],
> dataBeginsWith=\x00\x00\x00+\x00\x00\x00d\x00\x17user156719743603804260,
> fileContext=[usesHBaseChecksum=true, checksumType=CRC32C,
> bytesPerChecksum=16384, blocksize=65536, encoding=NONE,
> indexBlockEncoding=NONE, includesMvcc=true, includesTags=false,
> compressAlgo=NONE, compressTags=false, decompressionContext=null,
> cryptoContext=[cipher=NONE keyHash=NONE],
> name=d7f3673bffb24e0eb692b37c76a06925,
> cellComparator=org.apache.hadoop.hbase.InnerStoreCellComparator@4d3ead0a],
> nextBlockOnDiskSize=65738] in 145 ms
> 2026-05-21 14:31:45,042 TRACE org.apache.hadoop.hbase.io.hfile.HFileBlock:
> [RpcServer.default.FPBQ.Fifo.handler=18,queue=0,port=22101]: Reading
> d7f3673bffb24e0eb692b37c76a06925 at offset=280830996, pread=true,
> verifyChecksum=true, cachedHeader=null, onDiskSizeWithHeader=65609
> 2026-05-21 14:32:12,355 WARN org.apache.hadoop.hbase.io.hfile.HFile:
> [RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HBase
> checksumType verification failed for file
> ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
> at offset 237833541 filesize 392584403 checksumType 49. Retrying read with
> HDFS checksums turned on...
> 2026-05-21 14:32:12,355 WARN org.apache.hadoop.hbase.io.hfile.HFile:
> [RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HBase checksum
> verification failed for file
> ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
> at offset 237833541 filesize 392584403. Retrying read with HDFS checksums
> turned on...
> 2026-05-21 14:32:12,404 WARN org.apache.hadoop.hbase.io.hfile.HFile:
> [RpcServer.default.FPBQ.Fifo.handler=69,queue=6,port=22101]: HDFS checksum
> verification succeeded for file
> ofs://ozone/hbase2/hbaseroot/hbase/data/default/200m3/8a52ad8ac8ed280ae3b42270eda35307/cf/d7f3673bffb24e0eb692b37c76a06925
> at offset 237833541 filesize 392584403
> {code}
> the question is about why reading from ozone could think the data block is
> corruption ? but when it fails back to Ozone (Filesystem) checksum, it
> completes successfully?
> this extra checksum at the filesystem would cause minor delay and it's not
> always failing and only failed once a while. please see the capture reading
> for cf d7f3673bffb24e0eb692b37c76a06925 in
> cf-d7f3673bffb24e0eb692b37c76a06925.log and you will find most of the case it
> was reading fine.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]