[
https://issues.apache.org/jira/browse/HADOOP-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681346#action_12681346
]
sam rash commented on HADOOP-5407:
----------------------------------
We have also seen this error. This is what we saw in the TaskTracker that was
trying to launch the task:
2009-03-12 14:05:53,099 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200902281847_7071/attempt_200902281847_7071_r_000003_0/output/file.out
in any of the configured local direct2009-03-12 01:46:30,781 INFO
org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask):
attempt_200902281847_7071_r_000003_
0
2009-03-12 01:46:30,781 INFO org.apache.hadoop.mapred.TaskTracker: Trying to
launch : attempt_200902281847_7071_r_000003_0
2009-03-12 01:46:35,802 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not fin
d
taskTracker/jobcache/job_200902281847_7071/attempt_200902281847_7071_r_000003_0/output/file.out
in any of the configured local direct
ories
2009-03-12 01:46:40,805 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not fin
d
taskTracker/jobcache/job_200902281847_7071/attempt_200902281847_7071_r_000003_0/output/file.out
in any of the configured local direct
ories
2009-03-12 01:46:45,807 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not fin
d
taskTracker/jobcache/job_200902281847_7071/attempt_200902281847_7071_r_000003_0/output/file.out
in any of the configured local direct
ories
.... (REPEAT LAST LOG ENTRY)....
> Sometimes, Reduce tasks hang, State is unassigned
> -------------------------------------------------
>
> Key: HADOOP-5407
> URL: https://issues.apache.org/jira/browse/HADOOP-5407
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.19.0
> Reporter: ZhuGuanyin
>
> Hi, all
> When our cluster runs for a long time, some reduce tasks running on some
> tasktrackers hang. Their states are UNASSIGNED. Then, all reduce tasks on
> these tasktracker will hang.
> We kill the hang reduce task, then the reduce task attempt is re-scheduled to
> this tasktracker, the attempt task continues to hang. We fail it, it goes to
> another tasktracker, it is executed successfully.
> Tasktracker which has hang reduce task will receive new reduce task, but the
> reduce task continue to hang for ever.
> When we reboot the tasktracker machine, reduce task no longer hangs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.