Hi All,
I am running ode trunk build with apache tomcat as described in [1] . I
have an asynchronous bpel process which has a receive, invoke and a
receive. When I run this process for a while, I see that there are few
incomplete instances, although all the expected messages reached ode. From
the debug logs, I figured that it is happening as follows.

  If a thread executing a job tries to acquire the process instance lock,
while another thread is executing on the same instance and times out, it
will throw a timeout exception at InstanceLockManager which will be wrapped
to a  JobProcessorException.

In SimpleScheduler, RunJob.call method, when the execution of a job starts,
it will try to delete the job from the db. For the initial try, it
would succeed since the job is in db. However, when
the JobProcessorException exception happens due to timeout on instance
lock, the transaction gets rolled back. Ideally, the job should be restored
back when the rollback happens. However, the job does not get restored to
db as the transaction manager and db resources are not associated. Hence
when the scheduler  tries to retry 3 times by default, it will fail with
job no longer in db error.  This results in few of the process instances
never completing since the job was abandoned even though the messages
reached ode.

Following log extracts from the ode log explains the scenario.

grep instanceid

16:36:12,115 ODEServer-78 DEBUG [InstanceLockManager]
Thread[ODEServer-78,5,main]: lock(iid=36423, time=1MICROSECONDS)
16:36:12,115 ODEServer-78 DEBUG [InstanceLockManager]
Thread[ODEServer-78,5,main]: lock(iid=36423,
time=1MICROSECONDS)-->WAITING(held by Thread[ODEServer-9,5,main])
16:36:12,115 ODEServer-78 DEBUG [InstanceLockManager]
Thread[ODEServer-78,5,main]: lock(iid=36423, time=1MICROSECONDS)-->TIMEOUT
(held by Thread[ODEServer-9,5,main])
16:36:12,115 ODEServer-78 DEBUG [BpelEngineImpl] Instance 36423 is busy,
rescheduling job.
16:36:12,239 ODEServer-9 DEBUG [InstanceLockManager]
Thread[ODEServer-9,5,main]: unlock(iid=36423)
16:36:15,120 ODEServer-78 DEBUG [SimpleScheduler] job no longer in db
forced rollback: Job hqejbhcnphr8357nokgnxp time: 2013-02-28 16:36:11 IST
transacted: true persisted: true details: JobDetails( instanceId: 36423
mexId: null processId: null type: MATCHER channel: null correlatorId:
DebugCallbackPL.debugOpCallback correlationKeySet:
@2[CorrelationSet~746ee3bf-4c4c-4da9-bdb0-233a760ce377] retryCount: null
inMem: false detailsExt: {})

grep jobid

16:36:11,960 ODEServer-9 DEBUG [JdbcDelegate] insertJob
hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94 loaded=true
16:36:12,007 ODEServer-1 DEBUG [SimpleScheduler] todo.enqueue job from db:
hqejbhcnphr8357nokgnxp for 1362049571960(16:36:11,960)
16:36:12,007 ODEServer-78 DEBUG [JdbcDelegate] deleteJob
hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94
16:36:12,032 ODEServer-9 DEBUG [SimpleScheduler] scheduled immediate job:
hqejbhcnphr8357nokgnxp
16:36:12,239 ODEServer-9 DEBUG [SimpleScheduler] Job hqejbhcnphr8357nokgnxp
is being processed (outstanding job)
16:36:13,116 ODEServer-78 DEBUG [JdbcDelegate] deleteJob
hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94
org.apache.ode.scheduler.simple.JobNoLongerInDbException: Job no longer in
db: hqejbhcnphr8357nokgnxp nodeId=hqejbhcnphr8357nokgj94
16:36:14,118 ODEServer-78 DEBUG [JdbcDelegate] deleteJob
hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94
org.apache.ode.scheduler.simple.JobNoLongerInDbException: Job no longer in
db: hqejbhcnphr8357nokgnxp nodeId=hqejbhcnphr8357nokgj94
16:36:15,119 ODEServer-78 DEBUG [JdbcDelegate] deleteJob
hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94
16:36:15,120 ODEServer-78 DEBUG [SimpleScheduler] job no longer in db
forced rollback: Job hqejbhcnphr8357nokgnxp time: 2013-02-28 16:36:11 IST
transacted: true persisted: true details: JobDetails( instanceId: 36423
mexId: null processId: null type: MATCHER channel: null correlatorId:
DebugCallbackPL.debugOpCallback correlationKeySet:
@2[CorrelationSet~746ee3bf-4c4c-4da9-bdb0-233a760ce377] retryCount: null
inMem: false detailsExt: {})


Is this the expected behavior or is there any additional settings i should
configure to make transaction manager restore job to db at rollback ?
Will reinserting the job back to db when the  JobProcessorException happens
 fix this problem ?

Regards
Nandika

[1] http://ode.apache.org/war-deployment.html

Reply via email to