[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3474:
-----------------------------------------------

    Issue Type: Bug  (was: Sub-task)
        Parent:     (was: MAPREDUCE-3121)
    
> NM disk failure detection only covers local dirs 
> -------------------------------------------------
>
>                 Key: MAPREDUCE-3474
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3474
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Eli Collins
>
> This is the MR counterpart to HDFS-1848. Like HDFS volume failure detection, 
> NM disk failure detection checks a subset of the disks, and a subset of the 
> directories. Eg the TT and the NM do not check the root disk for errors 
> unless a local dir resides on them. Even if a local dir resides on the root 
> disk the disk checking code only checks the local dirs so a failure only seen 
> when accessing a part of the disk no hosting the local dirs will not be 
> noticed. The disk that hosts the logs, pid, tmp dirs etc is critical, so if 
> needs to be checked as well, and the NM should shutdown if a critical disk is 
> not available (to prevent MR issues similar to HDFS-1848 and HDFS-2095). 
> Typically people currently work around this limitation by (aside from 
> ignoring it) by using raid-1 for the root disk or a health script that checks 
> the root disk health.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to