[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074398#comment-14074398
 ] 

Jason Lowe commented on MAPREDUCE-6002:
---------------------------------------

bq. In this extreme case, the exception is going to be missed by the listener, 
and the task attempt is moved to PREEMPTED instead of FAILED.

This will only be true if the task was trying to be preempted *as* it failed, 
correct?   The AM will see the container completion event from the RM, and 
since the attempt didn't explicitly report a completion status it will key off 
the container status code to determine the attempt's fate.  If the attempt 
really happened to fail independently just as it was being preempted then 
that's a race we can live with either way, IMHO.  The thing we don't want is to 
have the attempt fail _because_ of a preemption or task-kill, so I think it 
will be safe to squelch errors that are occurring during shutdown.

I think the biggest issue will be if an error in the task attempt causes the 
entire JVM to start shutting down before the error is reported via the 
umbilical (e.g.: the user code calls System.exit on an error).  The good news 
is that the task attempt will still end up in the FAILED state but any useful, 
context-specific error messages from the attempt will not be reported via the 
umbilical.  The AM will only know that the task attempt exited without saying 
why.  I suspect this is a rare situation when it occurs, probably correctable 
in the user's code in many of those cases, and the attempt logs should be able 
to sort things out if it does occur.

> MR task should prevent report error to AM when process is shutting down
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6002
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 2.5.0
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: MR-6002.patch
>
>
> With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
> But it is still possible a MR task fail and report to AM when preemption take 
> effect and the AM hasn't received completed container from RM yet. It will 
> cause the task attempt marked failed instead of preempted.
> An example is FileSystem has shutdown hook, it will close all FileSystem 
> instance, if at the same time, the FileSystem is in-use (like reading split 
> details from HDFS), MR task will fail and report the fatal error to MR AM. An 
> exception will be raised:
> {code}
> 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1405985051088_0018_m_000025_0 - exited : java.io.IOException: 
> Filesystem closed
>       at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
>       at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
>       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
>       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
>       at java.io.DataInputStream.readByte(DataInputStream.java:265)
>       at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
>       at 
> org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
>       at org.apache.hadoop.io.Text.readString(Text.java:464)
>       at org.apache.hadoop.io.Text.readString(Text.java:457)
>       at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> We should prevent this, because it is possible other exceptions happen when 
> shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to