[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112428#comment-14112428
 ] 

Maysam Yabandeh commented on MAPREDUCE-6043:
--------------------------------------------

Thanks [~jlowe] for the comments. Further investigations suggest that the cause 
for the lost messages, as you correctly mentioned, was a bug introduced by one 
of our recent internal patches (the container is successfully finished but its 
status is not successfully transmitted to MRAppMaster through RM). Nevertheless 
I was under impression that there is double standards in MRAppMaster, sometimes 
relying on RM to successfully transmit statuses and sometimes not. But after 
your comment, I double checked and noticed that the logic for the exceptional 
case described in case 1 is only available in our internal repository and has 
not made it to the trunk yet. So, I now understand that the containers statuses 
reported by NM are guaranteed to be received by MRAppMaster through RM, and 
MRAppMaster's logic need not to be resilient against such cases.


> Reducer-preemption does not kick in
> -----------------------------------
>
>                 Key: MAPREDUCE-6043
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6043
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Maysam Yabandeh
>
> We have seen various cases that reducer-preemption does not kick in and the 
> scheduled mappers wait behind running reducers forever. Each time there seems 
> to be a different scenario. So far we have tracked down two of such cases and 
> the common element between them is that the variables in RMContainerAllocator 
> go out of sync since they only get updated when completed container is 
> reported by RM. However there are many corner cases that such report is not 
> received from RM and yet the MapReduce app moves forward. Perhaps one 
> possible fix would be to update such variables also after exceptional cases.
> The logic for triggering preemption is at 
> RMContainerAllocator::preemptReducesIfNeeded
> The preemption is triggered if the following is true:
> {code}
> headroom +  am * |m| + pr * |r| < mapResourceRequest
> {code} 
> where am: number of assigned mappers, |m| is mapper size, pr is number of 
> reducers being preempted, and |r| is the reducer size. Each of these 
> variables going out of sync will cause the preemption not to kick in. In the 
> following comment, we explain two of such cases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to