[ 
https://issues.apache.org/jira/browse/HBASE-26780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762007#comment-17762007
 ] 

Nick Dimiduk commented on HBASE-26780:
--------------------------------------

Heya [~yyZhang]. We've been seeing a similar symptom on several of our 
clusters. Our current analysis is that this is caused by HDFS data block 
corruption and holes in HBase's ability to recognise and request repair of the 
block. The file in question later gets compacted away, so I assume that a 
different code path makes correct use of HDFS checksums to access it, but it's 
difficult to "catch in the act". Can you share a full stack trace from HBase? 
I'll see if I can get permission to the same. Thanks.

[~cribbee] This is interesting -- we have not correlated with a newly started 
data node. Indeed, our clumsy solution is to take the data node offline and 
force HDFS to re-replicate the block. We are not using erasure encoding.

[~xytss123] Are you certain that the underlying block is not corrupted? How to 
you check it before HDFS can automatically repair? When you read the file 
again, how do you force your HDFS client to read that block from the same 
datanode as when the missmatching data was encountered?

FYI, we're running with a build of 2.5.2 + some back-ported patches.

> HFileBlock.verifyOnDiskSizeMatchesHeader throw IOException: Passed in 
> onDiskSizeWithHeader= A != B
> --------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-26780
>                 URL: https://issues.apache.org/jira/browse/HBASE-26780
>             Project: HBase
>          Issue Type: Bug
>          Components: BlockCache
>    Affects Versions: 2.2.2
>            Reporter: yuzhang
>            Priority: Major
>         Attachments: IOException.png
>
>
> When I scan a region, HBase throw IOException: Passed in 
> onDiskSizeWithHeader= A != B
> The HFile mentioned Error message can be access normally.
> it recover by command – move region. I guess that onDiskSizeWithHeader of 
> HFileBlock has been changed. And RS get the correct BlockHeader Info after 
> region reopened.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to