[ https://issues.apache.org/jira/browse/MAPREDUCE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150681#comment-13150681 ]
Hitesh Shah commented on MAPREDUCE-3121: ---------------------------------------- bq. Anyway, I will add a diskHandler.areDisksHealthy() check in the next version of the patch. My question was more of whether the task should start even if there is one healthy disk available. i.e. below the specified % failure threshold but there are still some healthy disks available. There are 2 parts to the health of a node. The state of disks when the node is already scheduled to setup/launch a container and the health status reported to the RM by the NM. The latter simply checks for the threshold. Now, in case of launching the container after the resources are localized ( in ContainersLaunch), the patch checks for the threshold whereas in the resource localization service, as mentioned above, it seems to only need one healthy disk to be available. I am fine with either approach as long as it is a common approach. We could follow the same approach as the one taken in 0.20.xxx if no one has any preferences on either. > NodeManager should handle disk-failures > --------------------------------------- > > Key: MAPREDUCE-3121 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3121 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, nodemanager > Affects Versions: 0.23.0 > Reporter: Vinod Kumar Vavilapalli > Assignee: Ravi Gummadi > Fix For: 0.23.1 > > Attachments: 3121.patch, 3121.v1.1.patch, 3121.v1.patch > > > This is akin to MAPREDUCE-2413 but for YARN's NodeManager. We want to > minimize the impact of transient/permanent disk failures on containers. With > larger number of disks per node, the ability to continue to run containers on > other disks is crucial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira