[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977938#comment-13977938
 ] 

Ming Ma commented on MAPREDUCE-5652:
------------------------------------

Nice work. Jason, I would like to clarify how the following scenarios are 
handled. Perhaps they are covered at the YARN layer as part of 
https://issues.apache.org/jira/browse/YARN-1336.

1. NM crash scenario. There is a corner case, after RM notifies NM regarding 
the completion of a specific application, right before AuxServices get the 
chance to process the event, NM crashes. The app entry won't be removed after 
the recovery store after NM is restarted, as APPLICATION_STOP won't be 
delivered to NM for that application after NM restart.

2. NM graceful shutdown. It seems ContainerManagerImpl's serviceStop will 
generate ContainerManagerEventType.FINISH_APPS event. That means AuxServices 
could clean up and remove it from the recovery store as part of NM shutdown.

> NM Recovery. ShuffleHandler should handle NM restarts
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Jason Lowe
>              Labels: shuffle
>         Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, 
> MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, 
> MAPREDUCE-5652.patch
>
>
> ShuffleHandler should work across NM restarts and not require re-running 
> map-tasks. On NM restart, the map outputs are cleaned up requiring 
> re-execution of map tasks and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to