[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts

Jason Lowe (JIRA) Thu, 24 Apr 2014 09:05:34 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Lowe updated MAPREDUCE-5652:
----------------------------------

    Attachment: MAPREDUCE-5652-v7.patch

bq. 1. Does leveDB's delete method throw exception? JNI has some exception 
handling and the caller needs to retrieve the exceptions, etc.

Nice catch!  I didn't notice there were _two_ DBExceptions flying around in 
leveldb code.  org.fusesource.leveldbjni.internal.NativeDB.DBException comes 
from the JNI layer and derives from IOException, and it was the one I was 
familiar with.  However the wrapper code around the JNI layer catches that 
exception and rethrows it as org.iq80.leveldb.DBException which is a 
RuntimeException.  That means we need to wrap all calls that can throw the 
runtime form and either handle them directly or rethrow as an IOException if 
it's not appropriate to let the RuntimeException leak out of the method.

Updated the patch to deal with the runtime DBException when necessary.  I'll 
also have to make similar changes in the NMLevelDBStateStore for the other NM 
restart patches.

bq. 2. It seems like recover/restore are common in NM/RM restart. Any abstract 
interface defined for that?

They both support recovery but the forms in which they do it are very different 
(e.g.: types of state persisted are significantly different, backing store 
types have no overlap, etc.)  There could be a generic Recoverable interface 
that supports a recover() method, but I'm not sure what value that adds.  Did 
you have a particular interface in mind or ideas on how it would be used?

> NM Recovery. ShuffleHandler should handle NM restarts
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Jason Lowe
>              Labels: shuffle
>         Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, 
> MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, 
> MAPREDUCE-5652-v7.patch, MAPREDUCE-5652.patch
>
>
> ShuffleHandler should work across NM restarts and not require re-running 
> map-tasks. On NM restart, the map outputs are cleaned up requiring 
> re-execution of map tasks and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts

Reply via email to