[ https://issues.apache.org/jira/browse/OOZIE-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981787#comment-13981787 ]
Bowen Zhang commented on OOZIE-1319: ------------------------------------ Let me give a more detailed description of the two scenarios * same coordinator job. If oozie server is back up at 11am, CoordMaterializationService will run immediately and create 6 actions from 9am to 9:50am which the first 5 actions will be "SKIPPED". At the same time, RecoveryService is very likely to kick off the action at 9:50am and turn the action into "SUCCEEDED" even before your CoordMaterializationService runs again to materialize actions from 10am to 10:50am where your "GET_UNSCHEDULED_ACTIONS" won't be able to retrieve the action at 9:50am. As a result, both actions at 9:50am and 10:50am are "SUCCEEDED". This is not the correct behavior. * same coordinator job with a data dependency. oozie server is always up from the beginning. If data dependency is fulfilled at 10:35am, then by definition, the action at 10:30 am should run and all the previous ones should be "SKIPPED". But in your code, everything will run into "SUCCEEDED" since the next CoordMaterializationService will run and try to materialize new actions at 10:55 am. At 10:30 am, there are already 12 actions in "WAITING" state in the database since at 8:55 am and 9:55 am, actions ARE NOT marked "SKIPPED" during materialization stage. * {quote} It only runs "GET_UNSCHEDULED_ACTIONS" if the coordinator job's execution strategy is LAST_ONLY; there's an if statement in handleLastOnly() {quote} Not true. What if at the same time, another coordinator job with "LIFO" execution order is running? Will you "SKIPPED" all its actions? > "LAST_ONLY" in execution control for coordinator job still runs all the > actions > ------------------------------------------------------------------------------- > > Key: OOZIE-1319 > URL: https://issues.apache.org/jira/browse/OOZIE-1319 > Project: Oozie > Issue Type: Bug > Reporter: Bowen Zhang > Assignee: Robert Kanter > Attachments: OOZIE-1319.patch, OOZIE-1319.patch, OOZIE-1319.patch, > oozie-1319.patch > > > In execute() of CoordJobGetReadyActionsJPAExecutor.java, once we retrieve the > top item from a "LIFO" query result, we do not discard or delete the > remaining items from the result list. As a result, the next time execute() is > invoked, we will be retrieving the next item in line. Consequently, LAST_ONLY > strategy will also execute all ready actions for a given coordinator job, > making it no different than LIFO. -- This message was sent by Atlassian JIRA (v6.2#6252)