[
https://issues.apache.org/jira/browse/HDFS-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883423#action_12883423
]
Jeff Zhang commented on HDFS-457:
---------------------------------
I noticed that only the BlockSender will do checkDisk, so only read operation
will make the namenode know that one disk is failed. Has anyone try this patch
? I doubt there will still be some problems with this patch.
In the testcase TestDataNodeVolumeFailure, I make a little changes, replace the
triggerFailure with a write operation (in this case the datanode do not know
one volume is failed, and then the only one replica will be write successfully,
and waiting for another replica copied from another datanode for a long time.
> better handling of volume failure in Data Node storage
> ------------------------------------------------------
>
> Key: HDFS-457
> URL: https://issues.apache.org/jira/browse/HDFS-457
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: data-node
> Reporter: Boris Shkolnik
> Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HDFS-457-1.patch, HDFS-457-2.patch, HDFS-457-2.patch,
> HDFS-457-2.patch, HDFS-457-3.patch, HDFS-457.patch, HDFS-457_20-append.patch,
> jira.HDFS-457.branch-0.20-internal.patch, TestFsck.zip
>
>
> Current implementation shuts DataNode down completely when one of the
> configured volumes of the storage fails.
> This is rather wasteful behavior because it decreases utilization (good
> storage becomes unavailable) and imposes extra load on the system
> (replication of the blocks from the good volumes). These problems will become
> even more prominent when we move to mixed (heterogeneous) clusters with many
> more volumes per Data Node.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.