[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

Varun Saxena (JIRA) Tue, 20 Oct 2015 06:28:41 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965097#comment-14965097
 ]


Varun Saxena commented on MAPREDUCE-6513:
-----------------------------------------

Thanks [~cchen1257] and [~devaraj.k] for sharing your thoughts on this.

The obvious solution which we considered when we got this issue was to mark the 
map task as failed so that its priority becomes 5, which would mean Scheduler 
will assign resources to it before reducers. But after long discussion 
internally, we decided against it. Main reason being should we mark a mapper as 
failed when it is perfectly fine and has been marked succeeded. Also this would 
be counted towards task attempt failure. Whether to kill it or fail it frankly 
is a debatable topic and there was a long discussion on it in the JIRA where 
this code was added(refer to MAPREDUCE-3921)1
cc [~bikassaha], [~vinodkv] so that they can also share their thoughts on this.

Moreover, once the map task has been killed its as good as an original task 
attempt which is in scheduled stage (with new task attempt scheduled). So if 
resources could be assigned to original attempt, they should be to this new 
attempt as well(if headroom is available). This made me think that there must 
be some other problem as well. Kindly note that slowstart.completedmaps config 
here was 0.05

Assuming headroom coming from RM was correct and digging into logs we found a 
couple of issues. As pointed away there was a loop of reducers being preempted 
and ramped up again and again.
Firstly we noticed that AM was always ramping up and never ramping down 
reducers. So we thought we can have a configuration which can decide when the 
maps are starved and not ramp up reducers if maps are starving. This would 
ensure that maps get more chance to be assigned in above scenario.
Secondly, when we ramp down all the scheduled reduces, we were not updating the 
ask and hence RM was continuing to allocate resources for reducers(which were 
later rejected by AM) even though it could have assigned these resources to 
mappers straight away.

> MR job got hanged forever when one NM unstable for some time
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-6513
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Bob
>            Assignee: Varun Saxena
>            Priority: Critical
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> ============
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

Reply via email to