[
https://issues.apache.org/jira/browse/HDFS-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025349#comment-13025349
]
Eli Collins commented on HDFS-1848:
-----------------------------------
I agree the datanode should only check the validity of all the directories
where it is configured to store data.
Point #1 is limited about allowing an administrator to specify that not all of
these configured directories should necessarily be treated equal wrt the policy
for tolerating failures. Ie the idea is *not* to use dfs.data.dir for general
datanode health monitoring. There are already plenty of tools that monitor disk
health, HDFS should just do the right thing when it experiences a failure.
Point #2 is that - in general - if the datanode experiences some failures (eg
those caused by a failed root disk) it should fail-stop.
Another way to put this is that the datanode should be *proactive* about check
for failures in it's data volumes and *re-active* about other disk failures (eg
of the root disk).
> Datanodes should shutdown when a critical volume fails
> ------------------------------------------------------
>
> Key: HDFS-1848
> URL: https://issues.apache.org/jira/browse/HDFS-1848
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: data-node
> Reporter: Eli Collins
> Fix For: 0.23.0
>
>
> A DN should shutdown when a critical volume (eg the volume that hosts the OS,
> logs, pid, tmp dir etc.) fails. The admin should be able to specify which
> volumes are critical, eg they might specify the volume that lives on the boot
> disk. A failure in one of these volumes would not be subject to the threshold
> (HDFS-1161) or result in host decommissioning (HDFS-1847) as the
> decommissioning process would likely fail.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira