Bob created MAPREDUCE-6513:
------------------------------
Summary: MR job got hanged forever when one NM unstable for some
time
Key: MAPREDUCE-6513
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: applicationmaster, resourcemanager
Affects Versions: 3.0.0
Reporter: Bob
when job is in-progress which is having more tasks,one node became unstable due
to some OS issue.After the node became unstable, the map on this node status
changed to KILLED state.
Currently maps which were running on unstable node are rescheduled, and all are
in scheduled state and wait for RM assign container.Seen ask requests for map
till Node is good (all those failed), there are no ask request after this. But
AM keeps on preempting the reducers (it's recycling).
Finally reducers are waiting for complete mappers and mappers did n't get
container..
My Question Is:
============
why map requests did not sent AM ,once after node recovery.?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)