[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866655#comment-13866655
 ] 

Jason Lowe commented on MAPREDUCE-5652:
---------------------------------------

I've largely implemented this as part of the prototype for YARN-1336.  I 
actually have two versions, one that uses FileSystem to store the shuffle 
tokens and job-to-user mappings and another that uses leveldb.  (The prototype 
currently has a  leveldb back-end store to simplify some of the race conditions 
during store and recovery.)  It shouldn't be too much effort to extricate just 
the ShuffleHandler changes, although there aren't any unit tests for it yet.

As Alejandro pointed out it also needs some help from the NodeManager to keep 
it from cleaning up the local directories and removing the shuffle output after 
restarting.  That's also been done as part of the prototype and is relatively 
straightforward, but we're still missing a mechanism for distinguishing the 
restart case vs. shutdown/decommission case and some other cleanup.

> ShuffleHandler should handle NM restarts
> ----------------------------------------
>
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>              Labels: shuffle
>
> ShuffleHandler should work across NM restarts and not require re-running 
> map-tasks. On NM restart, the map outputs are cleaned up requiring 
> re-execution of map tasks and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to