[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts

Jason Lowe (JIRA) Thu, 01 May 2014 07:13:37 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Lowe updated MAPREDUCE-5652:
----------------------------------

    Attachment: MAPREDUCE-5652-v9-and-YARN-1987.patch

Filed YARN-1987 to cover the DBIterator wrapper and updating the patch to use 
that new wrapper class.  Note that the patch includes YARN-1987 so Jenkins can 
comment.

bq. If ShuffleHandler gets DBException during recoverState as part of 
serviceStart, should ShuffleHandler ignore the exception and continue like the 
store doesn't exist?

Failure to recover should be a rare situation where the DB is 
corrupted/inaccessible or there's some schema incompatibility between versions 
if an upgrade occurs during the NM downtime.  It should be investigated and 
corrected, otherwise the errors will likely be glossed over and we will 
continue to fail to shuffle across NM restarts from that point forward despite 
the user specifying otherwise.

We could add a config to request a "best effort" mode where it will continue 
despite the inability to recover, but is that an NM-wide config, a config just 
for the shuffle handler, or something else?  If we want a config to control 
this I propose we address it in a followup JIRA.

> NM Recovery. ShuffleHandler should handle NM restarts
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Jason Lowe
>              Labels: shuffle
>         Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, 
> MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, 
> MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, 
> MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch
>
>
> ShuffleHandler should work across NM restarts and not require re-running 
> map-tasks. On NM restart, the map outputs are cleaned up requiring 
> re-execution of map tasks and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts

Reply via email to