Wangda Tan created MAPREDUCE-6002:
-------------------------------------
Summary: MR task should prevent report error to AM when process is
shutting down
Key: MAPREDUCE-6002
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6002
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: task
Affects Versions: 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
With MAPREDUCE-5900, preempted MR task should not be treat as failed.
But it is still possible a MR task fail and report to AM when preemption take
effect and the AM hasn't received completed container from RM yet. It will
cause the task attempt marked failed instead of preempted.
An example is FileSystem has shutdown hook, it will close all FileSystem
instance, if at the same time, the FileSystem is in-use (like reading split
details from HDFS), MR task will fail and report the fatal error to MR AM. An
exception will be raised:
{code}
2014-07-22 01:46:19,613 FATAL [IPC Server handler 10 on 56903]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
attempt_1405985051088_0018_m_000025_0 - exited : java.io.IOException:
Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:776)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:645)
at java.io.DataInputStream.readByte(DataInputStream.java:265)
at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
at
org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
at org.apache.hadoop.io.Text.readString(Text.java:464)
at org.apache.hadoop.io.Text.readString(Text.java:457)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:357)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
{code}
We should prevent this, because it is possible other exceptions happen when
shutting down, we shouldn't report any of such exceptions to AM.
--
This message was sent by Atlassian JIRA
(v6.2#6252)