Robert Kanter created OOZIE-2397:
------------------------------------
Summary: LAST_ONLY and NONE don't properly handle READY actions
Key: OOZIE-2397
URL: https://issues.apache.org/jira/browse/OOZIE-2397
Project: Oozie
Issue Type: Bug
Components: core
Affects Versions: 4.2.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Critical
Fix For: trunk
When using LAST_ONLY or NONE, actions are supposed to be able to transition
from READY to SKIPPED if the right criteria are met, but they don't. This is in
contrast to the timeout feature, which does not.
Here's a more detailed technical description of the problem:
We handle LAST_ONLY in
[CoordMaterializeTransitionXCommand|http://github.mtv.cloudera.com/CDH/oozie/blob/cdh5-4.1.0_5.5.0/core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java#L242]
and
[CoordActionInputCheckXCommand|http://github.mtv.cloudera.com/CDH/oozie/blob/cdh5-4.1.0_5.5.0/core/src/main/java/org/apache/oozie/command/coord/CoordActionInputCheckXCommand.java#L156].
The former deals with materializing the actions and the behavior to set "old"
actions to SKIPPED when materializing them. The latter deals with checking the
input datasets for actions and the behavior to determine if a WAITING action is
ready to transition to READY (deps are met) and all that entails, including
changing status to READY and queuing a CoordActionReadyXCommand. If the deps
are not met and the dataset is not there yet, it will queue itself at some
delay. So, these only handle the materialization and WAITING states. However,
LAST_ONLY is supposed to also do READY --> SKIPPED if it's condition is met
(unlike TIMEDOUT, which can only come from WAITING; *this additional difference
should probably be called out in the docs*).
[CoordActionReadyXCommand|http://github.mtv.cloudera.com/CDH/oozie/blob/cdh5-4.1.0_5.5.0/core/src/main/java/org/apache/oozie/command/coord/CoordActionReadyXCommand.java#L103]
needs to be updated to handle LAST_ONLY. It currently treats LAST_ONLY the
same as LIFO (via CoordJobGetReadyActionsJPAExecutor), where the order is the
only difference from FIFO. After retrieving all READY actions, it should check
if any meet their LAST_ONLY condition, and if so, queue a
CoordActionSkipXCommand for them (maybe make a bulk version?) instead of a
CoordActionStartXCommand.
We have the same issue with NONE, which has similar behavior.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)