[ https://issues.apache.org/jira/browse/MAPREDUCE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150221#comment-13150221 ]
Ravi Gummadi commented on MAPREDUCE-3121: ----------------------------------------- >> Should there be a check for whether there are any good dirs left in >> ResourceLocalizationService before starting of localizing the resources? If there are no good local dirs available, then the previous lines of code {code}nmPrivateCTokensPath = diskHandler.getLocalPathForWrite(){/code} will through IOExcpetion. So I think checking again is unnecessary --- unless there is a race condition(disk failure is identified just before the call to startLocalizer(), for which there is very very little chance). With the current patch itself, if there are no good local dirs, then startLocalizer() will anyway fail/throwException. Anyway, I will add a diskHandler.areDisksHealthy() check in the next version of the patch. > NodeManager should handle disk-failures > --------------------------------------- > > Key: MAPREDUCE-3121 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3121 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, nodemanager > Affects Versions: 0.23.0 > Reporter: Vinod Kumar Vavilapalli > Assignee: Ravi Gummadi > Fix For: 0.23.1 > > Attachments: 3121.patch, 3121.v1.1.patch, 3121.v1.patch > > > This is akin to MAPREDUCE-2413 but for YARN's NodeManager. We want to > minimize the impact of transient/permanent disk failures on containers. With > larger number of disks per node, the ability to continue to run containers on > other disks is crucial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira