[
https://issues.apache.org/jira/browse/MAPREDUCE-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747242#action_12747242
]
Arun C Murthy commented on MAPREDUCE-430:
-----------------------------------------
I've been doing some thinking about the 'right' approach for handling
exceptions and errors in the map/reduce tasks and did bounce some of these
through Chris too:
# Every code path in the tasks' should propagate the exception/error upwards
after doing any necessary clean-up in it's own components and sub-components
# We should distinguish between user errors (OOM, IOException etc.) and
systemic errors (FSError, ChecksumError etc.) and define just two methods on
the TaskUmbilicalProtocol: userError and systemError. In future these should be
used to _blacklist_ nodes only on 'systemError', not on 'userError'.
# Child.java:main should be the only place we call the methods on
TaskUmbilicalProtocol to inform the parent TaskTracker about errors. It should
unwrap the caught exception and
# All threads (shuffle copier threads, merger threads, sort/spill threads etc.)
should catch Throwable and save the exception for the 'main' thread to examine.
The 'main' thread should examine these at all appropriate places and abort
correctly.
# We should _never_ *rethrow* exceptions from the 'main' threads - rather we
should 'wrap' them in appropriate exceptions and throw them with the right
*initCause*. This is so that we don't lose the original stack traces.
# We should strive to use the same 'exception' types for the 'wrapper
exceptions' whenever the exception is part of the signature e.g. IOException
for map/reduce in the old api and IOException and InterruptedException for
map/reduce in the new api (it is highly unfortunate that the RPC layer wraps
InterruptedException in an IOException today! :( ). This is very important
since the application writer might be relying on the 'right' exception for his
specific error-handling needs. Thus we should wrap
IOException/InterruptedException in an IOException and other Exceptions/Errors
in a RuntimeException.
Thoughts?
> Task stuck in cleanup with OutOfMemoryErrors
> --------------------------------------------
>
> Key: MAPREDUCE-430
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-430
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amar Kamat
> Fix For: 0.20.1
>
> Attachments: MAPREDUCE-430-v1.11.patch,
> MAPREDUCE-430-v1.12-branch-0.20.patch, MAPREDUCE-430-v1.12.patch,
> MAPREDUCE-430-v1.6-branch-0.20.patch, MAPREDUCE-430-v1.6.patch,
> MAPREDUCE-430-v1.7.patch, MAPREDUCE-430-v1.8.patch
>
>
> Obesrved a task with OutOfMemory error, stuck in cleanup.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.