[
https://issues.apache.org/jira/browse/MAPREDUCE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074592#comment-14074592
]
Zhijie Shen commented on MAPREDUCE-6002:
----------------------------------------
Thanks for your feedback, [~jlowe]!
bq. This will only be true if the task was trying to be preempted as it failed,
correct? The AM will see the container completion event from the RM, and since
the attempt didn't explicitly report a completion status it will key off the
container status code to determine the attempt's fate. If the attempt really
happened to fail independently just as it was being preempted then that's a
race we can live with either way, IMHO. The thing we don't want is to have the
attempt fail because of a preemption or task-kill, so I think it will be safe
to squelch errors that are occurring during shutdown.
Exactly. This was the point I'd like to make, and the patch is actually solving
the problem in this way.
bq. The good news is that the task attempt will still end up in the FAILED
state but any useful,
Isn't it possible that PREEMPTED from RM still comes before AM knows the task
attempt FAILED? Say preemption logic has already happened on RM, and the
completed container status has already be sent to AM, but NM hasn't notified RM
and PingChecker hasn't found it. Anyway, it is still safe, because it doesn't
break the agreement that we don't want is to have the attempt fail because of a
preemption or task-kill.
> MR task should prevent report error to AM when process is shutting down
> -----------------------------------------------------------------------
>
> Key: MAPREDUCE-6002
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: task
> Affects Versions: 2.5.0
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: MR-6002.patch
>
>
> With MAPREDUCE-5900, preempted MR task should not be treat as failed.
> But it is still possible a MR task fail and report to AM when preemption take
> effect and the AM hasn't received completed container from RM yet. It will
> cause the task attempt marked failed instead of preempted.
> An example is FileSystem has shutdown hook, it will close all FileSystem
> instance, if at the same time, the FileSystem is in-use (like reading split
> details from HDFS), MR task will fail and report the fatal error to MR AM. An
> exception will be raised:
> {code}
> 2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903]
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
> attempt_1405985051088_0018_m_000025_0 - exited : java.io.IOException:
> Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
> at java.io.DataInputStream.readByte(DataInputStream.java:265)
> at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
> at
> org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
> at org.apache.hadoop.io.Text.readString(Text.java:464)
> at org.apache.hadoop.io.Text.readString(Text.java:457)
> at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> We should prevent this, because it is possible other exceptions happen when
> shutting down, we shouldn't report any of such exceptions to AM.
--
This message was sent by Atlassian JIRA
(v6.2#6252)