[ https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909262#comment-16909262 ]
Stephen O'Donnell commented on HDFS-13709: ------------------------------------------ I think this change looks good now. The exception handling code is much tidier when passing the throwable to the constructor. I can reuse this new method handleBadBlock() in HDFS-14706 once we get this one committed. > Report bad block to NN when transfer block encounter EIO exception > ------------------------------------------------------------------ > > Key: HDFS-13709 > URL: https://issues.apache.org/jira/browse/HDFS-13709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Reporter: Chen Zhang > Assignee: Chen Zhang > Priority: Major > Attachments: HDFS-13709.002.patch, HDFS-13709.003.patch, > HDFS-13709.004.patch, HDFS-13709.patch > > > In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes > disk bad track may cause data loss. > For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs > on A's replica data, and someday B and C crushed at the same time, NN will > try to replicate data from A but failed, this block is corrupt now but no one > knows, because NN think there is at least 1 healthy replica and it keep > trying to replicate it. > When reading a replica which have data on bad track, OS will return an EIO > error, if DN reports the bad block as soon as it got an EIO, we can find > this case ASAP and try to avoid data loss -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org