[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts

Jason Lowe (JIRA) Fri, 25 Apr 2014 07:09:49 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Lowe updated MAPREDUCE-5652:
----------------------------------

    Attachment: MAPREDUCE-5652-v8.patch

Sigh.  I discovered that leveldb's DBIterator isn't consistent with the DB 
interface and throws raw RuntimeException rather than the derived DBException.  
That means whenever we're interacting with the database via the iterator we 
risk leaking what should be caught as a DBException since it's a raw 
RuntimeException instead.

There's a few approaches I considered to work around this:

# Catch RuntimeException rather than DBException for the code blocks that 
interact with the iterator.
# Catch RuntimeException and if the cause is NativeDB.DBException then throw an 
IOException otherwise rethrow the original exception.
# Wrap DBIterator in a private wrapper class which catches RuntimeException for 
each method invoked and rethrows it as DBException.

I dismissed the first approach since it's too ham-fisted.  We're likely to 
catch NPEs and other unrelated RuntimeException and handle them as if they were 
leveldb errors.  The second approach has the drawback that it knows a bit too 
much about the DBIterator implementation in that it's digging into the 
RuntimeException looking for a specific cause.  If the cause were to switch to 
the iq80 DBException or some other type then we'd leak it instead of converting 
it.  Therefore I went with the third approach. It's still catching raw 
RuntimeException like the first approach, but it has the advantage that the 
try..catch block is localized to just the iterator method being invoked.  Also 
if leveldb's iterator is ever fixed in the future to throw DBException then we 
can simply remove the wrapper rather than change all the try..catch code blocks 
that work with the iterator.



> NM Recovery. ShuffleHandler should handle NM restarts
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Jason Lowe
>              Labels: shuffle
>         Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, 
> MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, 
> MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652.patch
>
>
> ShuffleHandler should work across NM restarts and not require re-running 
> map-tasks. On NM restart, the map outputs are cleaned up requiring 
> re-execution of map tasks and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts

Reply via email to