[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400918#comment-13400918
 ] 

Arun C Murthy commented on MAPREDUCE-3837:
------------------------------------------

Mayank, as we briefly discussed you'll need to fix the re-submit to read 
jobtokens from HDFS and pass them along (i.e. Credentials object) to the 
submitJob api. Sorry, I've been traveling a lot and missed commenting here, my 
bad.

Other nits:

# You've removed the call to JobClient.isJobDirValid which is dangerous. Since 
the contents have changed in hadoop-1 post security, please add a private 
isJobDirValid method to the JT and use it. This method should check for jobInfo 
file on HDFS (JobTracker.JOB_INFO_FILE) and the jobTokens file 
(TokenCache.JOB_TOKEN_HDFS_FILE).
# Also, since we only care about jobIds now for JT recovery, it's better to add 
a Set<JobId> jobIdsToRecover rather than rely on Set<JobInfo> jobsToRecover. 
This way we can avoid all the unnecessary translations b/w o.a.h.mapred.JobId 
and o.a.h.mapreduce.JobId.
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after 
> that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, 
> PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, 
> PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are 
> running , so if job tracker's property mapreduce.jobtracker.restart.recover 
> is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker 
> closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to