[
https://issues.apache.org/jira/browse/OOZIE-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027351#comment-14027351
]
Purshotam Shah commented on OOZIE-1778:
---------------------------------------
{quote}
Isn't the issue because of asynchronous commands?
{quote}
Currently yes.
{quote}
For example, bundle commands(operations) update bundle and queue coord
commands. Any failures in the corresponding coord commands can only be handled
by recovery services(recovery/status transit service).
{quote}
I don't think StatusTransitService updates bundle action pending. It resets the
bundle pending based on bundle actions pending status.
Same as recovery service, it pick job whose pending is set.
If you remember issue https://issues.apache.org/jira/browse/OOZIE-1622 (
Multiple CoordSubmit for same bundle).
This might happens if you issue a change command on bundle which set pending on
each bundle action and issue change command on each coord job, and if any coord
change command failed and if bundle action is in prep, then you will see
multiple coord job for same bundle action. May be same with suspend command.
{quote}
Failure handling of bundle command will not work in this case as coord commands
are just queued and the actual execution happens later in another thread.
{quote}
Not for bundle command, but for coord command as it can update bundle action
on rollback.
> Rollback option for XCommand
> ----------------------------
>
> Key: OOZIE-1778
> URL: https://issues.apache.org/jira/browse/OOZIE-1778
> Project: Oozie
> Issue Type: Bug
> Reporter: Purshotam Shah
>
> Currently if we issue a command at bundle level, which set the pending for
> bundle action and issue child command.
> If child command succeed, then it's all good. But if child command at
> pre-check or acquiring lock fails, then there is no way to update parent.
> In this scenario, bundle action and remain in pending and will cause
> unexpected behavior.
> We should do something like
> {code:java}
> XCommand.call() throws CommandException {
> try {
> eagerVerifyPrecondition();
> acquireLockCron.start();
> acquireLock();
> acquireLockCron.stop();
> loadState();
> verifyPrecondition();
> ret = execute();
> }
> catch(Throwable e){
> handleFailure();
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)