Re: JobNoLongerInDbException and Incomplete instances.

Nandika Jayawardana Fri, 01 Mar 2013 00:48:06 -0800

Hi Sathwik,

I am running ode with tomcat 7.0.29 and mysql 5.5.29 version. I used the
configuration settings given under "Configuring ODE in Tomcat with MySql
database". from ode war deployment guide. {
http://ode.apache.org/war-deployment.html }.
As you have explained, when the JobProcessorException is thrown due to
 instance lock timeout , the transaction will be rollback and default retry
setting of 3 times will happen. However, the restoration of the deleted job
back to job table does not happen. Therefore subsequent retries will also
result in JobNoLongerInDbException. At execTransaction method, when the
retry loop is over, the exception thrown will also be
 JobNoLongerInDbException. Since this exception is caught at  "catch
(JobNoLongerInDbException jde) " block, it will never go into the
exponential back off setting.


Is there any additional configuration settings I need to do ?

Regards
Nandika


On Fri, Mar 1, 2013 at 12:21 PM, Sathwik B P <[email protected]> wrote:

> Hi,
> This is really strange.
>
> This is the ideal behaviour:
> If a job fails for any reason it gets retried defined by the parameter
> (immediateRetryCount default 3 times with a time interval
> _immediateTransactionRetryInterval default 1 sec) and then the scheduler
> will put it on a exponential backoff defined by pow(5,retryCount) where
> retryCount is <= 10.
>
> If the rollback doesn't happen incase of any exception then none of the
> jobs will ever complete since it will never go into the exponential backoff
> path.
>
> In my opinion the transaction manager will maintain the jdbc connection
> object throughout it's execution, no matter how many times the connection
> is borrowed during the transaction.
>
> Which database are you using and what configuration changes have you done
> in ode-axis.properties.
>
> regards,
> sathwik
>
> On Fri, Mar 1, 2013 at 1:31 AM, Nandika Jayawardana <[email protected]
> >wrote:
>
> > Hi All,
> > I am running ode trunk build with apache tomcat as described in [1] . I
> > have an asynchronous bpel process which has a receive, invoke and a
> > receive. When I run this process for a while, I see that there are few
> > incomplete instances, although all the expected messages reached ode.
> From
> > the debug logs, I figured that it is happening as follows.
> >
> >   If a thread executing a job tries to acquire the process instance lock,
> > while another thread is executing on the same instance and times out, it
> > will throw a timeout exception at InstanceLockManager which will be
> wrapped
> > to a  JobProcessorException.
> >
> > In SimpleScheduler, RunJob.call method, when the execution of a job
> starts,
> > it will try to delete the job from the db. For the initial try, it
> > would succeed since the job is in db. However, when
> > the JobProcessorException exception happens due to timeout on instance
> > lock, the transaction gets rolled back. Ideally, the job should be
> restored
> > back when the rollback happens. However, the job does not get restored to
> > db as the transaction manager and db resources are not associated. Hence
> > when the scheduler  tries to retry 3 times by default, it will fail with
> > job no longer in db error.  This results in few of the process instances
> > never completing since the job was abandoned even though the messages
> > reached ode.
> >
> > Following log extracts from the ode log explains the scenario.
> >
> > grep instanceid
> >
> > 16:36:12,115 ODEServer-78 DEBUG [InstanceLockManager]
> > Thread[ODEServer-78,5,main]: lock(iid=36423, time=1MICROSECONDS)
> > 16:36:12,115 ODEServer-78 DEBUG [InstanceLockManager]
> > Thread[ODEServer-78,5,main]: lock(iid=36423,
> > time=1MICROSECONDS)-->WAITING(held by Thread[ODEServer-9,5,main])
> > 16:36:12,115 ODEServer-78 DEBUG [InstanceLockManager]
> > Thread[ODEServer-78,5,main]: lock(iid=36423,
> time=1MICROSECONDS)-->TIMEOUT
> > (held by Thread[ODEServer-9,5,main])
> > 16:36:12,115 ODEServer-78 DEBUG [BpelEngineImpl] Instance 36423 is busy,
> > rescheduling job.
> > 16:36:12,239 ODEServer-9 DEBUG [InstanceLockManager]
> > Thread[ODEServer-9,5,main]: unlock(iid=36423)
> > 16:36:15,120 ODEServer-78 DEBUG [SimpleScheduler] job no longer in db
> > forced rollback: Job hqejbhcnphr8357nokgnxp time: 2013-02-28 16:36:11 IST
> > transacted: true persisted: true details: JobDetails( instanceId: 36423
> > mexId: null processId: null type: MATCHER channel: null correlatorId:
> > DebugCallbackPL.debugOpCallback correlationKeySet:
> > @2[CorrelationSet~746ee3bf-4c4c-4da9-bdb0-233a760ce377] retryCount: null
> > inMem: false detailsExt: {})
> >
> > grep jobid
> >
> > 16:36:11,960 ODEServer-9 DEBUG [JdbcDelegate] insertJob
> > hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94 loaded=true
> > 16:36:12,007 ODEServer-1 DEBUG [SimpleScheduler] todo.enqueue job from
> db:
> > hqejbhcnphr8357nokgnxp for 1362049571960(16:36:11,960)
> > 16:36:12,007 ODEServer-78 DEBUG [JdbcDelegate] deleteJob
> > hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94
> > 16:36:12,032 ODEServer-9 DEBUG [SimpleScheduler] scheduled immediate job:
> > hqejbhcnphr8357nokgnxp
> > 16:36:12,239 ODEServer-9 DEBUG [SimpleScheduler] Job
> hqejbhcnphr8357nokgnxp
> > is being processed (outstanding job)
> > 16:36:13,116 ODEServer-78 DEBUG [JdbcDelegate] deleteJob
> > hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94
> > org.apache.ode.scheduler.simple.JobNoLongerInDbException: Job no longer
> in
> > db: hqejbhcnphr8357nokgnxp nodeId=hqejbhcnphr8357nokgj94
> > 16:36:14,118 ODEServer-78 DEBUG [JdbcDelegate] deleteJob
> > hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94
> > org.apache.ode.scheduler.simple.JobNoLongerInDbException: Job no longer
> in
> > db: hqejbhcnphr8357nokgnxp nodeId=hqejbhcnphr8357nokgj94
> > 16:36:15,119 ODEServer-78 DEBUG [JdbcDelegate] deleteJob
> > hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94
> > 16:36:15,120 ODEServer-78 DEBUG [SimpleScheduler] job no longer in db
> > forced rollback: Job hqejbhcnphr8357nokgnxp time: 2013-02-28 16:36:11 IST
> > transacted: true persisted: true details: JobDetails( instanceId: 36423
> > mexId: null processId: null type: MATCHER channel: null correlatorId:
> > DebugCallbackPL.debugOpCallback correlationKeySet:
> > @2[CorrelationSet~746ee3bf-4c4c-4da9-bdb0-233a760ce377] retryCount: null
> > inMem: false detailsExt: {})
> >
> >
> > Is this the expected behavior or is there any additional settings i
> should
> > configure to make transaction manager restore job to db at rollback ?
> > Will reinserting the job back to db when the  JobProcessorException
> happens
> >  fix this problem ?
> >
> > Regards
> > Nandika
> >
> > [1] http://ode.apache.org/war-deployment.html
> >
>

Re: JobNoLongerInDbException and Incomplete instances.

Reply via email to