[
https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977938#comment-13977938
]
Ming Ma commented on MAPREDUCE-5652:
------------------------------------
Nice work. Jason, I would like to clarify how the following scenarios are
handled. Perhaps they are covered at the YARN layer as part of
https://issues.apache.org/jira/browse/YARN-1336.
1. NM crash scenario. There is a corner case, after RM notifies NM regarding
the completion of a specific application, right before AuxServices get the
chance to process the event, NM crashes. The app entry won't be removed after
the recovery store after NM is restarted, as APPLICATION_STOP won't be
delivered to NM for that application after NM restart.
2. NM graceful shutdown. It seems ContainerManagerImpl's serviceStop will
generate ContainerManagerEventType.FINISH_APPS event. That means AuxServices
could clean up and remove it from the recovery store as part of NM shutdown.
> NM Recovery. ShuffleHandler should handle NM restarts
> -----------------------------------------------------
>
> Key: MAPREDUCE-5652
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.2.0
> Reporter: Karthik Kambatla
> Assignee: Jason Lowe
> Labels: shuffle
> Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch,
> MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch,
> MAPREDUCE-5652.patch
>
>
> ShuffleHandler should work across NM restarts and not require re-running
> map-tasks. On NM restart, the map outputs are cleaned up requiring
> re-execution of map tasks and should be avoided.
--
This message was sent by Atlassian JIRA
(v6.2#6252)