[
https://issues.apache.org/jira/browse/MAPREDUCE-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733230#action_12733230
]
Arun C Murthy commented on MAPREDUCE-430:
-----------------------------------------
Apologies for seeing this late, but:
Did the task which threw up OOMError eventually time-out and was it declared as
'FAILED' ?
----
{code}
- } catch (FSError e) {
- LOG.fatal("FSError", e);
+ } catch (Error e) {
+ String error = "Error";
+ if (e instanceof FSError) {
+ error = "FSError";
+ }
{code}
This is a bad idiom, we can just catch FSError and Error separately, no?
----
I also don't understand why we removed TaskUmbilicalProtocol.fsError and
TaskUmbilicalProtocol.shuffleError in favour of a single
TaskUmbilicalProtocol.taskError - having specific information about the reason
for task failure is important, in future we should be using the specific
information to drive decisions such as #retries for the task, tracker
blacklisting etc. For e.g. errors such as OOM, process-tree running out of
memory (high-ram jobs) shouldn't count against the TaskTracker. Hence, having
more information about the specific error is *good*.
> Task stuck in cleanup with OutOfMemoryErrors
> --------------------------------------------
>
> Key: MAPREDUCE-430
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-430
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Amareshwari Sriramadasu
> Assignee: Amar Kamat
> Fix For: 0.20.1
>
> Attachments: MAPREDUCE-430-v1.6-branch-0.20.patch,
> MAPREDUCE-430-v1.6.patch
>
>
> Obesrved a task with OutOfMemory error, stuck in cleanup.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.