[ 
https://issues.apache.org/jira/browse/OOZIE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated OOZIE-2142:
---------------------------------
    Attachment: OOZIE-2142.patch

Turns out the problem is a little more generic than just what I described 
above.  We're not handling the case when {{ActionCheckXCommand}} gets an 
{{ERROR}} during the check and that error isn't eligible to be retried.  In 
that case, we don't do anything, so the action stays in it's current status 
(RUNNING) forever.  

The patch makes a simple change so that if it's not going to retry, it fails 
instead.

> Changing the JT whitelist causes running Workflows to stay RUNNING forever
> --------------------------------------------------------------------------
>
>                 Key: OOZIE-2142
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2142
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: OOZIE-2142.patch
>
>
> If you change the JT whitelist while a workflow is running (and restart 
> Oozie), that workflow will stay RUNNING forever.  The correct behavior should 
> be the same as if the JT is unavailable: Oozie retries a few times and 
> SUSPENDs the workflow.  Then the user should either put it back into the 
> whitelist and resume, or simply kill it.
> There might be multiple ways to reproduce, but here's one:
> # Submit a workflow that has enough actions to run for a while
> # Suspend the workflow
> # Change the JT whitelist
> # Restart Oozie
> # Resume the workflow
> You'll get errors about the whitelist, but it won't ever transition out of 
> RUNNING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to