[
https://issues.apache.org/jira/browse/HDFS-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177789#comment-14177789
]
Tsz Wo Nicholas Sze commented on HDFS-7269:
-------------------------------------------
By HDFS-1371, the client should not report checksum failure when all the nodes
are bad. Do the files have only one replica in your case?
> NN and DN don't check whether corrupted blocks reported by clients are
> actually corrupted
> -----------------------------------------------------------------------------------------
>
> Key: HDFS-7269
> URL: https://issues.apache.org/jira/browse/HDFS-7269
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Ming Ma
>
> We had a case where the client machine had memory issue and thus failed the
> checksum validation of a given block for all its replicas. So the client
> ended up informing NN about the corrupted blocks for all DNs via
> reportBadBlocks. However, the block isn't corrupted on any of the DNs. You
> can still use DFSClient to read the block. But in order to get rid of NN's
> warning message for corrupt block, we either do a NN fail over, or repair the
> file via a) copy the file somewhere, b) remove the file, c) copy the file
> back.
> It will be useful if NN and DN can validate client's report. In fact, there
> is a comment in NamenodeRpcServer about this.
> {noformat}
> /**
> * The client has detected an error on the specified located blocks
> * and is reporting them to the server. For now, the namenode will
> * mark the block as corrupt. In the future we might
> * check the blocks are actually corrupt.
> */
> {noformat}
> To allow system to recover from invalid client report quickly, we can support
> automatic recovery or manual admins command.
> 1. we can have NN send a new DatanodeCommand like ValidateBlockCommand. DN
> will notify the validate result via IBR and new
> ReceivedDeletedBlockInfo.BlockStatus.VALIDATED_BLOCK.
> 2. Some new admins command to move corrupted blocks out of BM's
> CorruptReplicasMap and UnderReplicatedBlocks.
> Appreciate any input.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)