[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536027#comment-14536027
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5617:
----------------------------------------------------

[~sunilg],

bq. If we kill one of reducers, then also the map cannot be launched as the 
priority of Failed map is lesser than that of reducer. So the remaining reducer 
only will get allocated from RM side. This is causing a hang for in reducer 
side. 
This isn't correctly interpreted. I can understand it being not 
straight-forward, but lower numbered priority of failed map actually means it 
is of higher priority w.r.t scheduling.

You may have been seeing something else. If you think this is a stale issue, 
please close this as invalid. Thanks.

> map task is not re-launched when the task is failed while reducers are 
> running with full cluster capacity - which will lead to job hang
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5617
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5617
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>         Environment: SuSe Linux
>            Reporter: Sunil G
>            Priority: Critical
>
> In a Cluster with 16GB capacity, job has started with 100maps and 10 
> reducers. 
> When the reducers has started its execution, one NM has went down and 
> resulted a failure for 2 maps. But at this time, remaining 8Gb was used by 6 
> reducers and AM. So there was no place to launch the failed maps. [NM never 
> came up again, and cluster size became 8GB]
> If we kill one of reducers, then also the map cannot be launched as the 
> priority of Failed map is lesser than that of reducer. So the remaining 
> reducer only will get allocated from RM side.
> This is causing a hang for in reducer side. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to