[ 
https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16895550#comment-16895550
 ] 

Wei-Chiu Chuang commented on HDFS-13709:
----------------------------------------

[~zhangchen] what is the version of Hadoop you're using?

When client receives a block, it verifies using the checksum. If the 
verification fails it reports to NameNode and NameNode schedules a replacement 
block.

If a block is not being accessed by client, then this can happen.

 

I thought we already verify checksum during block transfer, but I was wrong. 
Here's the code in {{DataNode#transferBlock}}
{code:java}
if (replicaNotExist || replicaStateNotFinalized) {
      String errStr = "Can't send invalid block " + block;
      LOG.info(errStr);
      bpos.trySendErrorReport(DatanodeProtocol.INVALID_BLOCK, errStr);
      return;
    }
    if (blockFileNotExist) {
      // Report back to NN bad block caused by non-existent block file.
      reportBadBlock(bpos, block, "Can't replicate block " + block
          + " because the block file doesn't exist, or is not accessible");
      return;
    }
    if (lengthTooShort) {
      // Check if NN recorded length matches on-disk length 
      // Shorter on-disk len indicates corruption so report NN the corrupt block
      reportBadBlock(bpos, block, "Can't replicate block " + block
          + " because on-disk length " + data.getLength(block) 
          + " is shorter than NameNode recorded length " + block.getNumBytes());
      return;
    }
 {code}
We only report bad blocks when the block is missing or the length doesn't 
match. We don't do checksum. Not sure why. Is there a computation overhead 
concern?

> Report bad block to NN when transfer block encounter EIO exception
> ------------------------------------------------------------------
>
>                 Key: HDFS-13709
>                 URL: https://issues.apache.org/jira/browse/HDFS-13709
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Chen Zhang
>            Assignee: Chen Zhang
>            Priority: Major
>         Attachments: HDFS-13709.patch
>
>
> In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes 
> disk bad track may cause data loss.
> For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs 
> on A's replica data, and someday B and C crushed at the same time, NN will 
> try to replicate data from A but failed, this block is corrupt now but no one 
> knows, because NN think there is at least 1 healthy replica and it keep 
> trying to replicate it.
> When reading a replica which have data on bad track, OS will return an EIO 
> error, if DN reports the bad block as soon as it got an EIO,  we can find 
> this case ASAP and try to avoid data loss



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to