[ 
https://issues.apache.org/jira/browse/MAPREDUCE-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747242#action_12747242
 ] 

Arun C Murthy commented on MAPREDUCE-430:
-----------------------------------------

I've been doing some thinking about the 'right' approach for handling 
exceptions and errors in the map/reduce tasks and did bounce some of these 
through Chris too:

# Every code path in the tasks' should propagate the exception/error upwards 
after doing any necessary clean-up in it's own components and sub-components
# We should distinguish between user errors (OOM, IOException etc.) and 
systemic errors (FSError, ChecksumError etc.) and define just two methods on 
the TaskUmbilicalProtocol: userError and systemError. In future these should be 
used to _blacklist_ nodes only on 'systemError', not on 'userError'.
# Child.java:main should be the only place we call the methods on 
TaskUmbilicalProtocol to inform the parent TaskTracker about errors. It should 
unwrap the caught exception and 
# All threads (shuffle copier threads, merger threads, sort/spill threads etc.) 
should catch Throwable and save the exception for the 'main' thread to examine. 
The 'main' thread should examine these at all appropriate places and abort 
correctly.
# We should _never_ *rethrow* exceptions from the 'main' threads - rather we 
should 'wrap' them in appropriate exceptions and throw them with the right 
*initCause*.  This is so that we don't lose the original stack traces.
# We should strive to use the same 'exception' types for the 'wrapper 
exceptions' whenever the exception is part of the signature e.g. IOException 
for map/reduce in the old api and IOException and InterruptedException for 
map/reduce in the new api (it is highly unfortunate that the RPC layer wraps 
InterruptedException in an IOException today! :( ). This is very important 
since the application writer might be relying on the 'right' exception for his 
specific error-handling needs. Thus we should wrap 
IOException/InterruptedException in an IOException and other Exceptions/Errors 
in a RuntimeException.

Thoughts?

> Task stuck in cleanup with OutOfMemoryErrors
> --------------------------------------------
>
>                 Key: MAPREDUCE-430
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-430
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amar Kamat
>             Fix For: 0.20.1
>
>         Attachments: MAPREDUCE-430-v1.11.patch, 
> MAPREDUCE-430-v1.12-branch-0.20.patch, MAPREDUCE-430-v1.12.patch, 
> MAPREDUCE-430-v1.6-branch-0.20.patch, MAPREDUCE-430-v1.6.patch, 
> MAPREDUCE-430-v1.7.patch, MAPREDUCE-430-v1.8.patch
>
>
> Obesrved a task with OutOfMemory error, stuck in cleanup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to