[
https://issues.apache.org/jira/browse/HADOOP-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526123
]
Arun C Murthy commented on HADOOP-1862:
---------------------------------------
Hmm... one straw to clutch:
{noformat}
$ cat 1862-event.log | grep task_200709041519_0023_m_001149
OBSOLETE task_200709041519_0023_m_001149_0
http://a.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001149_0
FAILED task_200709041519_0023_m_001149_0 null
SUCCEEDED task_200709041519_0023_m_001149_1
http://b.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001149_1
SUCCEEDED task_200709041519_0023_m_001149_2
http://c.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001149_2
$ cat 1862-event.log | grep task_200709041519_0023_m_001816
OBSOLETE task_200709041519_0023_m_001816_0
http://x.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001816_0
FAILED task_200709041519_0023_m_001816_0 null
SUCCEEDED task_200709041519_0023_m_001816_1
http://y.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001816_1
SUCCEEDED task_200709041519_0023_m_001816_2
http://z.a.com:50060/tasklog?plaintext=true&taskid=task_200709041519_0023_m_001816_2
{noformat}
Essentially, in {{JobInProgress.updateTaskStatuses(TaskInProgress, TaskStatus,
JobTrackerMetrics)}} the {{TaskCompletionEvent.Status.SUCCEEDED}} is added
irrespective of whether the TIP is already complete or not, leading to each
reducer seeing 2 {{TaskCompletionEvent.Status.SUCCEEDED}} events as above...
clearly the fetch from one of them will fail since either _1 or _2 will be
{{KILLED}}, not a happy situation.
Like I said, I'll try to dig deeper, maybe this could help someone beat me to
it. *smile*
> reduces are getting stuck trying to find map outputs
> ----------------------------------------------------
>
> Key: HADOOP-1862
> URL: https://issues.apache.org/jira/browse/HADOOP-1862
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.14.1
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Priority: Blocker
> Fix For: 0.15.0
>
>
> Some of the reduces have been stuck for hours looking for 137 map outputs.
> When I look at the job events all 2600 of the maps have succeeded. There have
> been lots of lost task trackers and shuffle failures. The maps have been run
> between 1 to 6 times each. I do see some of the events in the task event log
> are marked OBSOLETE.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.