[ https://issues.apache.org/jira/browse/MAPREDUCE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158047#comment-13158047 ]
Eli Collins commented on MAPREDUCE-3121: ---------------------------------------- @Mahadev, shouldn't MR always be able to survive a task failure if there are sufficient resources? The client shouldn't have to distinguish between types of task failures, the AM should just re-execute the task on another node. Filed MAPREDUCE-3473. > NodeManager should handle disk-failures > --------------------------------------- > > Key: MAPREDUCE-3121 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3121 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, nodemanager > Affects Versions: 0.23.0 > Reporter: Vinod Kumar Vavilapalli > Assignee: Ravi Gummadi > Priority: Blocker > Fix For: 0.23.1 > > Attachments: 3121.patch, 3121.v1.1.patch, 3121.v1.patch, > 3121.v2.patch, 3121.v3.patch > > > This is akin to MAPREDUCE-2413 but for YARN's NodeManager. We want to > minimize the impact of transient/permanent disk failures on containers. With > larger number of disks per node, the ability to continue to run containers on > other disks is crucial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira