Jason Lowe created MAPREDUCE-5307:
-------------------------------------
Summary: Failure to launch a task leads to missing failed task
Key: MAPREDUCE-5307
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5307
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mr-am, mrv2
Affects Versions: 2.0.4-alpha, 0.23.7
Reporter: Jason Lowe
When the AM tries to connect to a node to launch a task attempt but the
launcher encounters an error (e.g.: RPC timeout, no connectivity to node,
etc.), the error will count as a failed task attempt. However the task attempt
will not appear in the job history leading to confusing situations.
For example, one job that had particular difficulty connecting to nodes had a
task fail four attempts. Attempts 0, 1, and 3 timed out trying to launch the
task, and attempt 2 failed for other reasons. The job diagnostics were "Task
task_1368637921738_2222198_m_000396 failed 1 times" and there was only one task
attempt listed for task_1368637921738_2222198_m_000396, which was
attempt_1368637921738_2222198_m_000396_2.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira