[ 
https://issues.apache.org/jira/browse/HDFS-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177905#comment-14177905
 ] 

Ming Ma commented on HDFS-7269:
-------------------------------

Nicholas, in our case, the client only reported one replica for each 
reportBadBlocks call. But given there were multiple DFSInputStream read calls 
for a given block and each read call could mark one replica bad, all replicas 
were marked as bad.

> NN and DN don't check whether corrupted blocks reported by clients are 
> actually corrupted
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-7269
>                 URL: https://issues.apache.org/jira/browse/HDFS-7269
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>
> We had a case where the client machine had memory issue and thus failed the 
> checksum validation of a given block for all its replicas. So the client 
> ended up informing NN about the corrupted blocks for all DNs via 
> reportBadBlocks. However, the block isn't corrupted on any of the DNs. You 
> can still use DFSClient to read the block. But in order to get rid of NN's 
> warning message for corrupt block, we either do a NN fail over, or repair the 
> file via a) copy the file somewhere, b) remove the file, c) copy the file 
> back.
> It will be useful if NN and DN can validate client's report. In fact, there 
> is a comment in NamenodeRpcServer about this.
> {noformat}
>   /**
>    * The client has detected an error on the specified located blocks 
>    * and is reporting them to the server.  For now, the namenode will 
>    * mark the block as corrupt.  In the future we might 
>    * check the blocks are actually corrupt. 
>    */
> {noformat}
> To allow system to recover from invalid client report quickly, we can support 
> automatic recovery or manual admins command.
> 1. we can have NN send a new DatanodeCommand like ValidateBlockCommand. DN 
> will notify the validate result via IBR and new 
> ReceivedDeletedBlockInfo.BlockStatus.VALIDATED_BLOCK.
> 2. Some new admins command to move corrupted blocks out of BM's 
> CorruptReplicasMap and UnderReplicatedBlocks.
> Appreciate any input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to