[ https://issues.apache.org/jira/browse/MAPREDUCE-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747242#action_12747242 ]
Arun C Murthy commented on MAPREDUCE-430: ----------------------------------------- I've been doing some thinking about the 'right' approach for handling exceptions and errors in the map/reduce tasks and did bounce some of these through Chris too: # Every code path in the tasks' should propagate the exception/error upwards after doing any necessary clean-up in it's own components and sub-components # We should distinguish between user errors (OOM, IOException etc.) and systemic errors (FSError, ChecksumError etc.) and define just two methods on the TaskUmbilicalProtocol: userError and systemError. In future these should be used to _blacklist_ nodes only on 'systemError', not on 'userError'. # Child.java:main should be the only place we call the methods on TaskUmbilicalProtocol to inform the parent TaskTracker about errors. It should unwrap the caught exception and # All threads (shuffle copier threads, merger threads, sort/spill threads etc.) should catch Throwable and save the exception for the 'main' thread to examine. The 'main' thread should examine these at all appropriate places and abort correctly. # We should _never_ *rethrow* exceptions from the 'main' threads - rather we should 'wrap' them in appropriate exceptions and throw them with the right *initCause*. This is so that we don't lose the original stack traces. # We should strive to use the same 'exception' types for the 'wrapper exceptions' whenever the exception is part of the signature e.g. IOException for map/reduce in the old api and IOException and InterruptedException for map/reduce in the new api (it is highly unfortunate that the RPC layer wraps InterruptedException in an IOException today! :( ). This is very important since the application writer might be relying on the 'right' exception for his specific error-handling needs. Thus we should wrap IOException/InterruptedException in an IOException and other Exceptions/Errors in a RuntimeException. Thoughts? > Task stuck in cleanup with OutOfMemoryErrors > -------------------------------------------- > > Key: MAPREDUCE-430 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-430 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Amareshwari Sriramadasu > Assignee: Amar Kamat > Fix For: 0.20.1 > > Attachments: MAPREDUCE-430-v1.11.patch, > MAPREDUCE-430-v1.12-branch-0.20.patch, MAPREDUCE-430-v1.12.patch, > MAPREDUCE-430-v1.6-branch-0.20.patch, MAPREDUCE-430-v1.6.patch, > MAPREDUCE-430-v1.7.patch, MAPREDUCE-430-v1.8.patch > > > Obesrved a task with OutOfMemory error, stuck in cleanup. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.