[ 
https://issues.apache.org/jira/browse/OOZIE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated OOZIE-1483:
---------------------------------

    Description: 
To support for the JobTracker to recover jobs on restart, we need to configure 
the launcher job to be restarted by the JT, but not any of the launched jobs 
({{mapreduce.job.restart.recover}}).  This way, the launcher job will simply 
start over when the JT recovers it; if we allow the JT to recover the actual 
jobs, then they will interfere.   We'll also need this for the same ability in 
YARN.

This should be fairly trivial except for the MapReduce action because of the 
optimization where the launcher finishes instead of waiting for the actual job 
and Oozie does an "id swap".  Trying to add support for JT to recover the MR 
action doesn't seem feasible as we'd run into a lot of trickiness and some race 
conditions due to the id swap.  

Instead, I think we should remove the MR optimization because it will allow us 
to to support the recoverability for the MR action as well.  This also has the 
benefit of simplifying the code because we'd be getting rid of all of the id 
swap stuff and also making the MR action consistent with the other actions.  
The only downside is that the MR action will take an extra Map slot just like 
the other actions.  

  was:
To support for the JobTracker to recover jobs on restart, we need to configure 
the launcher job to be restarted by the JT, but not any of the launched jobs 
({{mapred.job.restart.recover}}).  This way, the launcher job will simply start 
over when the JT recovers it; if we allow the JT to recover the actual jobs, 
then they will interfere.   We'll also need this for the same ability in YARN.

This should be fairly trivial except for the MapReduce action because of the 
optimization where the launcher finishes instead of waiting for the actual job 
and Oozie does an "id swap".  Trying to add support for JT to recover the MR 
action doesn't seem feasible as we'd run into a lot of trickiness and some race 
conditions due to the id swap.  

Instead, I think we should remove the MR optimization because it will allow us 
to to support the recoverability for the MR action as well.  This also has the 
benefit of simplifying the code because we'd be getting rid of all of the id 
swap stuff and also making the MR action consistent with the other actions.  
The only downside is that the MR action will take an extra Map slot just like 
the other actions.  

    
> Support for Job Recoverability
> ------------------------------
>
>                 Key: OOZIE-1483
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1483
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>
> To support for the JobTracker to recover jobs on restart, we need to 
> configure the launcher job to be restarted by the JT, but not any of the 
> launched jobs ({{mapreduce.job.restart.recover}}).  This way, the launcher 
> job will simply start over when the JT recovers it; if we allow the JT to 
> recover the actual jobs, then they will interfere.   We'll also need this for 
> the same ability in YARN.
> This should be fairly trivial except for the MapReduce action because of the 
> optimization where the launcher finishes instead of waiting for the actual 
> job and Oozie does an "id swap".  Trying to add support for JT to recover the 
> MR action doesn't seem feasible as we'd run into a lot of trickiness and some 
> race conditions due to the id swap.  
> Instead, I think we should remove the MR optimization because it will allow 
> us to to support the recoverability for the MR action as well.  This also has 
> the benefit of simplifying the code because we'd be getting rid of all of the 
> id swap stuff and also making the MR action consistent with the other 
> actions.  The only downside is that the MR action will take an extra Map slot 
> just like the other actions.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to