[jira] [Updated] (OOZIE-1205) If the JobTracker is restarted during a Fork, Oozie doesn't fail all of the currently running actions

Robert Kanter (JIRA) Mon, 11 Feb 2013 16:15:14 -0800

     [ 
https://issues.apache.org/jira/browse/OOZIE-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Kanter updated OOZIE-1205:
---------------------------------

    Attachment: OOZIE-1205.patch

The patch add a new {{FailXCommand}} class.  Existing places where an action 
and job were set to {{FAILED}} now only sets the action to {{FAILED}} and then 
queues a {{FailXCommand}} to "properly" fail the WF.  

I added some unit tests in {{TestFailXCommand}} and also verified that the case 
in the Description and above comment now behave correctly.  
                
> If the JobTracker is restarted during a Fork, Oozie doesn't fail all of the 
> currently running actions
> -----------------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-1205
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1205
>             Project: Oozie
>          Issue Type: Bug
>          Components: action
>    Affects Versions: trunk
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>             Fix For: trunk
>
>         Attachments: OOZIE-1205.patch
>
>
> If you have a workflow with a fork and restart the JobTracker while its 
> executing the paths in the fork, those two jobs will be lost (as expected).  
> Once the timeout occurs on the {{ActionCheckXCommand}}, it will check both 
> actions sequentially.  While checking the first action, it sets the status to 
> FAILED and also sets the workflow's status to FAILED.  It then moves on to 
> the other action that was running concurrently, but it cannot pass the 
> precondition check because the workflow was already FAILED (the check 
> requires that the Workflow is RUNNING).  It will keep trying this every time 
> the timeout hits (10min is default) and print a WARN message in the log.   
> That action will also be in RUNNING state forever even though the underlying 
> job isn't running and the WF is FAILED.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (OOZIE-1205) If the JobTracker is restarted during a Fork, Oozie doesn't fail all of the currently running actions

Reply via email to