[ https://issues.apache.org/jira/browse/HDFS-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732693#action_12732693 ]
Boris Shkolnik commented on HDFS-457: ------------------------------------- good point. I will report to NN using errorReport interface. Currently if NN received DatanodeProtocol.DiskError message it removes the datanode. I will introduce a DatanodeProtocol.FatalDiskError - this should remove the datanode. But DiskError will just cause NN to log a WARN message about failed volume. > better handling of volume failure in Data Node storage > ------------------------------------------------------ > > Key: HDFS-457 > URL: https://issues.apache.org/jira/browse/HDFS-457 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Reporter: Boris Shkolnik > Assignee: Boris Shkolnik > > Current implementation shuts DataNode down completely when one of the > configured volumes of the storage fails. > This is rather wasteful behavior because it decreases utilization (good > storage becomes unavailable) and imposes extra load on the system > (replication of the blocks from the good volumes). These problems will become > even more prominent when we move to mixed (heterogeneous) clusters with many > more volumes per Data Node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.