[
https://issues.apache.org/jira/browse/HDFS-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026271#comment-13026271
]
Bharath Mundlapudi commented on HDFS-1848:
------------------------------------------
I think, Koji's point is - should we have something like healthchecker in
Datanode similar to Mapreduce? If so, periodically, Datanode launches this
healthcheck to determine its health against disks, nics etc. This was the
comment i made earlier. This will help admins. It is just not sufficient to
have diagnostic software on every machine. We need a mechanism to communicate
this information back to Datanode, right? This is required for fail-fast and
then fail-stop safely. By this, Datanode can look after the disks it cares
about like today and this external entity will inform about various other
diagnostic information back to Datanode. Agree?
> Datanodes should shutdown when a critical volume fails
> ------------------------------------------------------
>
> Key: HDFS-1848
> URL: https://issues.apache.org/jira/browse/HDFS-1848
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: data-node
> Reporter: Eli Collins
> Fix For: 0.23.0
>
>
> A DN should shutdown when a critical volume (eg the volume that hosts the OS,
> logs, pid, tmp dir etc.) fails. The admin should be able to specify which
> volumes are critical, eg they might specify the volume that lives on the boot
> disk. A failure in one of these volumes would not be subject to the threshold
> (HDFS-1161) or result in host decommissioning (HDFS-1847) as the
> decommissioning process would likely fail.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira