[ 
https://issues.apache.org/jira/browse/MAPREDUCE-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733230#action_12733230
 ] 

Arun C Murthy commented on MAPREDUCE-430:
-----------------------------------------

Apologies for seeing this late, but:

Did the task which threw up OOMError eventually time-out and was it declared as 
'FAILED' ? 

----

{code}
-    } catch (FSError e) {
-      LOG.fatal("FSError", e);
+    } catch (Error e) {
+      String error = "Error";
+      if (e instanceof FSError) {
+        error = "FSError";
+      }
{code}

This is a bad idiom, we can just catch FSError and Error separately, no?

----

I also don't understand why we removed TaskUmbilicalProtocol.fsError and 
TaskUmbilicalProtocol.shuffleError in favour of a single 
TaskUmbilicalProtocol.taskError - having specific information about the reason 
for task failure is important, in future we should be using the specific 
information to drive decisions such as #retries for the task, tracker 
blacklisting etc. For e.g. errors such as OOM, process-tree running out of 
memory (high-ram jobs) shouldn't count against the TaskTracker. Hence, having 
more information about the specific error is *good*.

> Task stuck in cleanup with OutOfMemoryErrors
> --------------------------------------------
>
>                 Key: MAPREDUCE-430
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-430
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amar Kamat
>             Fix For: 0.20.1
>
>         Attachments: MAPREDUCE-430-v1.6-branch-0.20.patch, 
> MAPREDUCE-430-v1.6.patch
>
>
> Obesrved a task with OutOfMemory error, stuck in cleanup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to