[ 
https://issues.apache.org/jira/browse/OOZIE-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067116#comment-15067116
 ] 

Purshotam Shah commented on OOZIE-2394:
---------------------------------------

{quote}
Removing that will impact performance very badly and we can't do that. 
{quote}
None of wfstart,end command uses eager and eager verify. So, I don't think that 
there will be any performance issue. In fact, wherever I saw eager and eager 
verify, they have incorrect logic or there are multiple verify/load

Ex. BundleJobChangeXCommand  does prevalidate in eagerVerifyPrecondition which 
is incorrect.
eagerVerifyPrecondition happens before acquiring lock.  So if try to pause a 
bundle, which has SUCCEDED, it possible.

{code BundleJobChangeXCommand.java}
 @Override
    protected void verifyPrecondition() throws CommandException, 
PreconditionException {
    }

    @Override
    protected void eagerLoadState() throws CommandException {
        try {
            this.bundleJob = 
BundleJobQueryExecutor.getInstance().get(BundleJobQuery.GET_BUNDLE_JOB_STATUS, 
jobId);
            LogUtils.setLogInfo(bundleJob);
        }
        catch (JPAExecutorException ex) {
            throw new CommandException(ex);
        }
    }

    @Override
    protected void eagerVerifyPrecondition() throws CommandException, 
PreconditionException {
        validateChangeValue(changeValue);

        if (bundleJob == null) {
            LOG.info("BundleChangeCommand not succeeded - " + "job " + jobId + 
" does not exist");
            throw new PreconditionException(ErrorCode.E1314, jobId);
        }
        if (isChangePauseTime) {
            if (bundleJob.getStatus() == Job.Status.SUCCEEDED || 
bundleJob.getStatus() == Job.Status.FAILED
                    || bundleJob.getStatus() == Job.Status.KILLED || 
bundleJob.getStatus() == Job.Status.DONEWITHERROR) {
                LOG.info("BundleChangeCommand not succeeded for changing 
pausetime- " + "job " + jobId + " finished, status is "
                        + bundleJob.getStatusStr());
                throw new PreconditionException(ErrorCode.E1312, jobId, 
bundleJob.getStatus().toString());
            }
        }
        else if(isChangeEndTime){
            if (bundleJob.getStatus() == Job.Status.KILLED) {
                LOG.info("BundleChangeCommand not succeeded for changing 
endtime- " + "job " + jobId + " finished, status is "
                        + bundleJob.getStatusStr());
                throw new PreconditionException(ErrorCode.E1312, jobId, 
bundleJob.getStatus().toString());
            }
        }
    }
{code}


bq. Even if locks are re-entrant, acquiring them has minor cost associated with 
it and so it was skipping that as well which is good.
No, I don't think so. This information is done at client side. I guess curator 
framework don't even have to call the ZK to check that.

Custom doing this, which is already done as part of java/curator framework 
might introduce more bug.

Only issue is see is https://issues.apache.org/jira/browse/OOZIE-1922. Will fix 
this as well.


> Oozie can execute command without holding lock
> ----------------------------------------------
>
>                 Key: OOZIE-2394
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2394
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Purshotam Shah
>            Assignee: Purshotam Shah
>            Priority: Critical
>         Attachments: OOZIE-2394-V1.patch
>
>
> To speedup job submission ( not the the forked actions) we create workflow 
> actions synchronously. We call ActionStartXCommand from SignalXCommand by 
> setting isSynchronous = true. This will bypass lock acquiring, which is Ok, 
> SignalXCommand will have the job lock.
> If there is transient error. Same command is requeued which will have 
> isSynchronous flag set to true.
> Requeued command will wake-up and started executing without acquiring lock. 
> If the job submission takes more than 2 min, then we might have issue.
> Action recovery is set to 2 min ( default), Recovery service will run and 
> submitted new the command. since the first command didn't acquire any lock. 
> Recovery will be able to run the new command.
> We will have two same command running parallely.
> All our commands are reentrant, we don't have to have set synchronized flag 
> to run multiple command from same thread.
> Because of reentrant, command running in same thread should be able to 
> acquire same lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to