[ 
https://issues.apache.org/jira/browse/HADOOP-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681346#action_12681346
 ] 

sam rash commented on HADOOP-5407:
----------------------------------

We have also seen this error.  This is what we saw in the TaskTracker that was 
trying to launch the task:

2009-03-12 14:05:53,099 INFO org.apache.hadoop.mapred.TaskTracker: 
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find 
taskTracker/jobcache/job_200902281847_7071/attempt_200902281847_7071_r_000003_0/output/file.out
 in any of the configured local direct2009-03-12 01:46:30,781 INFO 
org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): 
attempt_200902281847_7071_r_000003_
0
2009-03-12 01:46:30,781 INFO org.apache.hadoop.mapred.TaskTracker: Trying to 
launch : attempt_200902281847_7071_r_000003_0
2009-03-12 01:46:35,802 INFO org.apache.hadoop.mapred.TaskTracker: 
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not fin
d 
taskTracker/jobcache/job_200902281847_7071/attempt_200902281847_7071_r_000003_0/output/file.out
 in any of the configured local direct
ories
2009-03-12 01:46:40,805 INFO org.apache.hadoop.mapred.TaskTracker: 
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not fin
d 
taskTracker/jobcache/job_200902281847_7071/attempt_200902281847_7071_r_000003_0/output/file.out
 in any of the configured local direct
ories
2009-03-12 01:46:45,807 INFO org.apache.hadoop.mapred.TaskTracker: 
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not fin
d 
taskTracker/jobcache/job_200902281847_7071/attempt_200902281847_7071_r_000003_0/output/file.out
 in any of the configured local direct
ories
.... (REPEAT LAST LOG ENTRY)....



> Sometimes, Reduce tasks hang, State is unassigned
> -------------------------------------------------
>
>                 Key: HADOOP-5407
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5407
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: ZhuGuanyin
>
> Hi, all
> When our cluster runs for a long time, some reduce tasks running on some 
> tasktrackers hang. Their states are UNASSIGNED.  Then, all reduce tasks on 
> these tasktracker will hang.
> We kill the hang reduce task, then the reduce task attempt is re-scheduled to 
> this tasktracker, the attempt task continues to hang. We fail it, it goes to 
> another tasktracker, it is executed successfully. 
> Tasktracker which has hang reduce task will receive new reduce task, but the 
> reduce  task continue to hang for ever.
> When we reboot the tasktracker machine, reduce task no longer hangs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to