[jira] [Created] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

Bob (JIRA) Thu, 15 Oct 2015 08:33:54 -0700

Bob created MAPREDUCE-6513:
------------------------------

             Summary: MR job got hanged forever when one NM unstable for some 
time
                 Key: MAPREDUCE-6513
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: applicationmaster, resourcemanager
    Affects Versions: 3.0.0
            Reporter: Bob



when job is in-progress which is having more tasks,one node became unstable due 
to some OS issue.After the node became unstable, the map on this node status 
changed to KILLED state. 

Currently maps which were running on unstable node are rescheduled, and all are 
in scheduled state and wait for RM assign container.Seen ask requests for map 
till Node is good (all those failed), there are no ask request after this. But 
AM keeps on preempting the reducers (it's recycling).

Finally reducers are waiting for complete mappers and mappers did n't get 
container..

My Question Is:
============
why map requests did not sent AM ,once after node recovery.?










--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

Reply via email to