Re: JobNoLongerInDbException and Incomplete instances.

Sathwik B P Thu, 28 Feb 2013 22:51:52 -0800

Hi,
This is really strange.

This is the ideal behaviour:
If a job fails for any reason it gets retried defined by the parameter
(immediateRetryCount default 3 times with a time interval
_immediateTransactionRetryInterval default 1 sec) and then the scheduler
will put it on a exponential backoff defined by pow(5,retryCount) where
retryCount is <= 10.


If the rollback doesn't happen incase of any exception then none of the
jobs will ever complete since it will never go into the exponential backoff
path.

In my opinion the transaction manager will maintain the jdbc connection
object throughout it's execution, no matter how many times the connection
is borrowed during the transaction.

Which database are you using and what configuration changes have you done
in ode-axis.properties.

regards,
sathwik

On Fri, Mar 1, 2013 at 1:31 AM, Nandika Jayawardana <[email protected]>wrote:

> Hi All,
> I am running ode trunk build with apache tomcat as described in [1] . I
> have an asynchronous bpel process which has a receive, invoke and a
> receive. When I run this process for a while, I see that there are few
> incomplete instances, although all the expected messages reached ode. From
> the debug logs, I figured that it is happening as follows.
>
>   If a thread executing a job tries to acquire the process instance lock,
> while another thread is executing on the same instance and times out, it
> will throw a timeout exception at InstanceLockManager which will be wrapped
> to a  JobProcessorException.
>
> In SimpleScheduler, RunJob.call method, when the execution of a job starts,
> it will try to delete the job from the db. For the initial try, it
> would succeed since the job is in db. However, when
> the JobProcessorException exception happens due to timeout on instance
> lock, the transaction gets rolled back. Ideally, the job should be restored
> back when the rollback happens. However, the job does not get restored to
> db as the transaction manager and db resources are not associated. Hence
> when the scheduler  tries to retry 3 times by default, it will fail with
> job no longer in db error.  This results in few of the process instances
> never completing since the job was abandoned even though the messages
> reached ode.
>
> Following log extracts from the ode log explains the scenario.
>
> grep instanceid
>
> 16:36:12,115 ODEServer-78 DEBUG [InstanceLockManager]
> Thread[ODEServer-78,5,main]: lock(iid=36423, time=1MICROSECONDS)
> 16:36:12,115 ODEServer-78 DEBUG [InstanceLockManager]
> Thread[ODEServer-78,5,main]: lock(iid=36423,
> time=1MICROSECONDS)-->WAITING(held by Thread[ODEServer-9,5,main])
> 16:36:12,115 ODEServer-78 DEBUG [InstanceLockManager]
> Thread[ODEServer-78,5,main]: lock(iid=36423, time=1MICROSECONDS)-->TIMEOUT
> (held by Thread[ODEServer-9,5,main])
> 16:36:12,115 ODEServer-78 DEBUG [BpelEngineImpl] Instance 36423 is busy,
> rescheduling job.
> 16:36:12,239 ODEServer-9 DEBUG [InstanceLockManager]
> Thread[ODEServer-9,5,main]: unlock(iid=36423)
> 16:36:15,120 ODEServer-78 DEBUG [SimpleScheduler] job no longer in db
> forced rollback: Job hqejbhcnphr8357nokgnxp time: 2013-02-28 16:36:11 IST
> transacted: true persisted: true details: JobDetails( instanceId: 36423
> mexId: null processId: null type: MATCHER channel: null correlatorId:
> DebugCallbackPL.debugOpCallback correlationKeySet:
> @2[CorrelationSet~746ee3bf-4c4c-4da9-bdb0-233a760ce377] retryCount: null
> inMem: false detailsExt: {})
>
> grep jobid
>
> 16:36:11,960 ODEServer-9 DEBUG [JdbcDelegate] insertJob
> hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94 loaded=true
> 16:36:12,007 ODEServer-1 DEBUG [SimpleScheduler] todo.enqueue job from db:
> hqejbhcnphr8357nokgnxp for 1362049571960(16:36:11,960)
> 16:36:12,007 ODEServer-78 DEBUG [JdbcDelegate] deleteJob
> hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94
> 16:36:12,032 ODEServer-9 DEBUG [SimpleScheduler] scheduled immediate job:
> hqejbhcnphr8357nokgnxp
> 16:36:12,239 ODEServer-9 DEBUG [SimpleScheduler] Job hqejbhcnphr8357nokgnxp
> is being processed (outstanding job)
> 16:36:13,116 ODEServer-78 DEBUG [JdbcDelegate] deleteJob
> hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94
> org.apache.ode.scheduler.simple.JobNoLongerInDbException: Job no longer in
> db: hqejbhcnphr8357nokgnxp nodeId=hqejbhcnphr8357nokgj94
> 16:36:14,118 ODEServer-78 DEBUG [JdbcDelegate] deleteJob
> hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94
> org.apache.ode.scheduler.simple.JobNoLongerInDbException: Job no longer in
> db: hqejbhcnphr8357nokgnxp nodeId=hqejbhcnphr8357nokgj94
> 16:36:15,119 ODEServer-78 DEBUG [JdbcDelegate] deleteJob
> hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94
> 16:36:15,120 ODEServer-78 DEBUG [SimpleScheduler] job no longer in db
> forced rollback: Job hqejbhcnphr8357nokgnxp time: 2013-02-28 16:36:11 IST
> transacted: true persisted: true details: JobDetails( instanceId: 36423
> mexId: null processId: null type: MATCHER channel: null correlatorId:
> DebugCallbackPL.debugOpCallback correlationKeySet:
> @2[CorrelationSet~746ee3bf-4c4c-4da9-bdb0-233a760ce377] retryCount: null
> inMem: false detailsExt: {})
>
>
> Is this the expected behavior or is there any additional settings i should
> configure to make transaction manager restore job to db at rollback ?
> Will reinserting the job back to db when the  JobProcessorException happens
>  fix this problem ?
>
> Regards
> Nandika
>
> [1] http://ode.apache.org/war-deployment.html
>

Re: JobNoLongerInDbException and Incomplete instances.

Reply via email to