[
https://issues.apache.org/jira/browse/MAPREDUCE-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kumar Vavilapalli updated MAPREDUCE-3474:
-----------------------------------------------
Issue Type: Bug (was: Sub-task)
Parent: (was: MAPREDUCE-3121)
> NM disk failure detection only covers local dirs
> -------------------------------------------------
>
> Key: MAPREDUCE-3474
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3474
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: nodemanager
> Reporter: Eli Collins
>
> This is the MR counterpart to HDFS-1848. Like HDFS volume failure detection,
> NM disk failure detection checks a subset of the disks, and a subset of the
> directories. Eg the TT and the NM do not check the root disk for errors
> unless a local dir resides on them. Even if a local dir resides on the root
> disk the disk checking code only checks the local dirs so a failure only seen
> when accessing a part of the disk no hosting the local dirs will not be
> noticed. The disk that hosts the logs, pid, tmp dirs etc is critical, so if
> needs to be checked as well, and the NM should shutdown if a critical disk is
> not available (to prevent MR issues similar to HDFS-1848 and HDFS-2095).
> Typically people currently work around this limitation by (aside from
> ignoring it) by using raid-1 for the root disk or a health script that checks
> the root disk health.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira