[
https://issues.apache.org/jira/browse/MAPREDUCE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Devaraj K moved YARN-1396 to MAPREDUCE-5617:
--------------------------------------------
Component/s: (was: resourcemanager)
Affects Version/s: (was: 2.2.0)
2.2.0
Key: MAPREDUCE-5617 (was: YARN-1396)
Project: Hadoop Map/Reduce (was: Hadoop YARN)
> map task is not re-launched when the task is failed while reducers are
> running with full cluster capacity - which will lead to job hang
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5617
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5617
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.2.0
> Environment: SuSe Linux
> Reporter: Sunil G
> Priority: Critical
>
> In a Cluster with 16GB capacity, job has started with 100maps and 10
> reducers.
> When the reducers has started its execution, one NM has went down and
> resulted a failure for 2 maps. But at this time, remaining 8Gb was used by 6
> reducers and AM. So there was no place to launch the failed maps. [NM never
> came up again, and cluster size became 8GB]
> If we kill one of reducers, then also the map cannot be launched as the
> priority of Failed map is lesser than that of reducer. So the remaining
> reducer only will get allocated from RM side.
> This is causing a hang for in reducer side.
--
This message was sent by Atlassian JIRA
(v6.1#6144)