JT restart recovery:  Exclude jobs which failed during SUBMIT_JOB (due to acl) 
-------------------------------------------------------------------------------

                 Key: HADOOP-5400
                 URL: https://issues.apache.org/jira/browse/HADOOP-5400
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
         Environment: Hadoop 0.20 +  0.20.0 + HADOOP-5225 + HADOOP-5332
            Reporter: Rajiv Chittajallu
            Priority: Blocker


mapred.jobtracker.restart.recover is set to true in mapred-site.xml

This is a job that failed  during Job submit due to invalid ACL 

2009-03-04 18:31:25,970 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
14 on 50300, call submitJob(job_200903041223_0259) from 192.168.10.1:41306: 
error: org.apache.hadoop.security.AccessControlException: User rajive cannot 
perform operation SUBMIT_JOB on queue default

When the  JobTracker was restarted after some time, the failed job was being 
recovered/restarted

2009-03-04 19:13:30,544 INFO org.apache.hadoop.mapred.JobTracker: Found an 
incomplete job directory job_200903041852_0040. Deleting it!!
2009-03-04 19:13:30,613 INFO org.apache.hadoop.mapred.FairScheduler: 
Successfully configured FairScheduler
2009-03-04 19:13:30,614 INFO org.apache.hadoop.mapred.JobTracker: Trying to 
recover job job_200903041223_0259


2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: JobTracker 
failed to recover job job_200903041223_0259. Ignoring it.
java.io.FileNotFoundException: File 
file:/grid/0/hadoop/var/log/history/axonitegold-jt1.gold.ygrid.yahoo.com_1236192735577_job_200903041223_0259_rajive_word+count
 does not exist.
        at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:360)
        at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
        at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
        at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:336)
        at 
org.apache.hadoop.mapred.JobHistory.parseHistoryFromFS(JobHistory.java:245)
        at 
org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1144)
        at 
org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:1603)
        at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3326)
2009-03-04 18:53:17,147 INFO org.apache.hadoop.mapred.JobTracker: Restart count 
for job job_200903041223_0259 is 0
2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Input size 
for job job_200903041223_0259 = 4664646202464
2009-03-04 18:53:18,626 INFO org.apache.hadoop.mapred.JobInProgress: Split info 
for job:job_200903041223_0259 with 34640 splits:

These jobs failed during job submit shouldn't be considered for recovery. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to