[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075183#comment-14075183
 ] 

Wangda Tan commented on MAPREDUCE-6002:
---------------------------------------

Jason, thanks for your comments,
bq. I suspect this is a rare situation when it occurs, probably correctable in 
the user's code in many of those cases, and the attempt logs should be able to 
sort things out if it does occur.
I agree, in normal failure, no matter what kind of exception throw, YarnChild 
should be able to catch them and report to AM. In some rare cases, if some 
error cause JVM starting shutdown before reporting to AM, it cannot 
successfully report to AM in a big chance even if we don't change this.

To Zhijie,
bq. Isn't it possible that PREEMPTED from RM still comes before AM knows the 
task attempt FAILED?
I think what Jason mentioned is another case: there's no preemption happens, 
it's a failure happens in TA side, and JVM shutdown happens before TA can 
report such error to AM.

Thanks,
Wangda

> MR task should prevent report error to AM when process is shutting down
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6002
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 2.5.0
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: MR-6002.patch
>
>
> With MAPREDUCE-5900, preempted MR task should not be treat as failed. 
> But it is still possible a MR task fail and report to AM when preemption take 
> effect and the AM hasn't received completed container from RM yet. It will 
> cause the task attempt marked failed instead of preempted.
> An example is FileSystem has shutdown hook, it will close all FileSystem 
> instance, if at the same time, the FileSystem is in-use (like reading split 
> details from HDFS), MR task will fail and report the fatal error to MR AM. An 
> exception will be raised:
> {code}
> 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1405985051088_0018_m_000025_0 - exited : java.io.IOException: 
> Filesystem closed
>       at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
>       at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
>       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
>       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
>       at java.io.DataInputStream.readByte(DataInputStream.java:265)
>       at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
>       at 
> org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
>       at org.apache.hadoop.io.Text.readString(Text.java:464)
>       at org.apache.hadoop.io.Text.readString(Text.java:457)
>       at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> We should prevent this, because it is possible other exceptions happen when 
> shutting down, we shouldn't report any of such exceptions to AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to