[
https://issues.apache.org/jira/browse/OOZIE-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474675#comment-13474675
]
Ryota Egashira commented on OOZIE-614:
--------------------------------------
I agree with what Max mentioned for LAST_ONLY.
Let me propose possible approaches in two aspects to support LAST_ONLY as well
as LIFO
1. creation (materialization) of coord actions
currently creation of actions always happens only in a chronological order,
older to newer(FIFO), and as describe earlier, in batch.
but to support LAST_ONLY execution, especially for old actions prior to current
time(in catch up mode), it is necessary to do creation in a reverse
order(new->old). then when one of old actions becomes ready for execution
successfully, we are safe to delete older actions. (in chronological order, we
would have to keep old actions on hold until checking newer ones) Reverse order
creation is also required for LIFO where we need to execute newer actions first.
but at the same time, we need to continue creating new actions as time
progresses and action's nominal time fall in materialization window. This is by
nature progressing from older to newer.
This means that oozie needs to do creation of coord actions in both directions,
and need two watermarks to keep track of timestamp by which coord actions have
been created in both directions.
currently coord job table has column NEXT_MATD_TIME, which track nominal time
by what coord actions have been created, assuming the chronological order. My
proposal is to create another column to track watermark in a reverse direction.
The new column is only used in case of LIFO, LAST_ONLY. for FIFO (as default),
we still can use NEXT_MATD_TIME only and leave the new column as NULL or so.
2. execution of coord actions (when "ready" state)
LAST_ONLY, once one action becomes ready for execution, we can discard created
coord actions older than that, and keep actions newer than that.
LIFO, even in current implementation (CoordActionReadyXCommand), oozie queries
"ready" coord actions in the desc order of nominal time. thus, in combination
with reverse-order creation, we can fulfill LIFO requirement.
Any feedback is highly appreciated.
> Oozie does not behave as expected when using coordinator execution order
> LAST_ONLY
> ----------------------------------------------------------------------------------
>
> Key: OOZIE-614
> URL: https://issues.apache.org/jira/browse/OOZIE-614
> Project: Oozie
> Issue Type: Bug
> Reporter: Craig Peters
> Assignee: Ryota Egashira
>
> After executing the last coordinator action on a queue for a job, the prior
> coordinator actions will still be executed if a later coordinator action is
> not created. The behavior expected based upon the documentation for
> Coordinator is that the older coordinator actions are discarded. See:
> http://yahoo.github.com/oozie/releases/3.1.0/CoordinatorFunctionalSpec.html#a6.3._Synchronous_Coordinator_Application_Definition
> When using the LAST_ONLY execution order the coordinator job needs to somehow
> discard the older coordinator actions as the documentation describes. This
> may result the desired behavior for many users when in steady state operation.
> Using a simple approach of invalidating or killing prior coordinator actions
> there are likely to be cases when this functionality won't behave as expected
> because at any given time the latest coordinator on the queue in the READY
> state will not be the latest potential. This is because the newer actions
> haven't been created yet due to the batch creation of coordinator actions
> oldest first. Some discussion of the best way to address this challenge is
> warranted I believe.
> Another consideration is that the user will not be able to easily
> discriminate between coordinator actions that were discarded (or skipped) due
> to the execution order and those that were killed for any other reason.
> Perhaps it makes sense to introduce a new state? It could be DISCARDED to
> match the current documentation or perhaps SKIPPED.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira