broken symlinks in jobcache when local tasks are done but job is in progress
----------------------------------------------------------------------------

                 Key: HADOOP-3713
                 URL: https://issues.apache.org/jira/browse/HADOOP-3713
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.17.0
            Reporter: Rajiv Chittajallu


When all running tasks on a tasktracker are done, not all links for  
/<mapred.local.dir>/taskTracker/jobcache/<job>/work are deleted. This is 
resulting in new tasks from the same job scheduled on this node to fail with

 2008-07-07 17:44:49,756 INFO org.apache.hadoop.mapred.TaskTracker: 
LaunchTaskAction: task_200807071715_0022_r_000295_0
 2008-07-07 17:44:49,773 WARN org.apache.hadoop.mapred.TaskTracker: Error 
initializing task_200807071715_0022_r_000295_0:
 java.io.IOException: Mkdirs failed to create 
/tmp3/taskTracker/jobcache/job_200807071715_0022/work
 at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:680)
        at 
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274)
        at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310)
       at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)

$  ls -lt /tmp3/taskTracker/jobcache/job_200807071715_0022/work
lrwxrwxrwx 1 user users 135 Jul  7 17:44 
/tmp3/taskTracker/jobcache/job_200807071715_0022/work -> 
/tmp0/taskTracker/jobcache/job_200807071715_0022/work
$  ls -lt /tmp0/mapred-local/taskTracker/jobcache/job_200807071715_0022/work
ls: /tmp0/taskTracker/jobcache/job_200807071715_0022/work: No such file or 
directory

Earlier tasks scheduled on this tasktracker have completed successfully

2008-07-07 17:44:44,926 INFO org.apache.hadoop.mapred.TaskRunner: 
task_200807071715_0022_r_000004_0 done; removing files.
2008-07-07 17:44:44,931 INFO org.apache.hadoop.mapred.TaskRunner: 
task_200807071715_0022_r_000176_0 done; removing files.
2008-07-07 17:44:44,958 INFO org.apache.hadoop.mapred.TaskRunner: 
task_200807071715_0022_r_000210_0 done; removing files.
2008-07-07 17:44:49,486 INFO org.apache.hadoop.mapred.TaskRunner: 
task_200807071715_0022_r_000153_0 done; removing files.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to