[ https://issues.apache.org/jira/browse/MAPREDUCE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ravi Gummadi updated MAPREDUCE-3121: ------------------------------------ Attachment: 3121.v1.patch Attaching new patch incorporating most of the review comments. Will discuss the remaining minor comments with Vinod soon and upload another patch. Added a new configuration property for "minimum fraction of number of disks that are to be healthy for considering a node to be healthy interms of disks". It is <em>yarn.nodemanager.disk-health-checker.min-healthy-disks</em>. It's default value is 0.05. i.e. By default, a node is considered unhealthy if there are only less than 5% of disks are healthy. > NodeManager should handle disk-failures > --------------------------------------- > > Key: MAPREDUCE-3121 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3121 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, nodemanager > Affects Versions: 0.23.0 > Reporter: Vinod Kumar Vavilapalli > Assignee: Ravi Gummadi > Fix For: 0.23.1 > > Attachments: 3121.patch, 3121.v1.patch > > > This is akin to MAPREDUCE-2413 but for YARN's NodeManager. We want to > minimize the impact of transient/permanent disk failures on containers. With > larger number of disks per node, the ability to continue to run containers on > other disks is crucial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira