[
https://issues.apache.org/jira/browse/HBASE-26780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762007#comment-17762007
]
Nick Dimiduk commented on HBASE-26780:
--------------------------------------
Heya [~yyZhang]. We've been seeing a similar symptom on several of our
clusters. Our current analysis is that this is caused by HDFS data block
corruption and holes in HBase's ability to recognise and request repair of the
block. The file in question later gets compacted away, so I assume that a
different code path makes correct use of HDFS checksums to access it, but it's
difficult to "catch in the act". Can you share a full stack trace from HBase?
I'll see if I can get permission to the same. Thanks.
[~cribbee] This is interesting -- we have not correlated with a newly started
data node. Indeed, our clumsy solution is to take the data node offline and
force HDFS to re-replicate the block. We are not using erasure encoding.
[~xytss123] Are you certain that the underlying block is not corrupted? How to
you check it before HDFS can automatically repair? When you read the file
again, how do you force your HDFS client to read that block from the same
datanode as when the missmatching data was encountered?
FYI, we're running with a build of 2.5.2 + some back-ported patches.
> HFileBlock.verifyOnDiskSizeMatchesHeader throw IOException: Passed in
> onDiskSizeWithHeader= A != B
> --------------------------------------------------------------------------------------------------
>
> Key: HBASE-26780
> URL: https://issues.apache.org/jira/browse/HBASE-26780
> Project: HBase
> Issue Type: Bug
> Components: BlockCache
> Affects Versions: 2.2.2
> Reporter: yuzhang
> Priority: Major
> Attachments: IOException.png
>
>
> When I scan a region, HBase throw IOException: Passed in
> onDiskSizeWithHeader= A != B
> The HFile mentioned Error message can be access normally.
> it recover by command – move region. I guess that onDiskSizeWithHeader of
> HFileBlock has been changed. And RS get the correct BlockHeader Info after
> region reopened.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)