[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221253#comment-13221253
 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

Hi Alejandro

Thanks for your help testing this patch, I am really sorry about confusion as I 
missed one function in the patch.  I have attached the new patch , tested it 
and it is working fine in my local environment. I am not sure how I missed that 
before.

Please let me know if you find any more issues with that.

Arun,

I believe the issues were in terms of recovering the jobs from the point they 
crashed. Here what I am doing is very simplistic approach. I am reading the job 
token file and resubmitting the jobs in case of crash and recover. I am not 
trying to recover from the point it left from the last run.

In this scenario it is a new run of the job and works well. The downside is the 
whole job will re run however the upside is Users don't need to resubmit the 
jobs.

Please let me know your thoughts.

Thanks,
Mayank 
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after 
> that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, 
> PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, 
> PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are 
> running , so if job tracker's property mapreduce.jobtracker.restart.recover 
> is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker 
> closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to