Hi All, I am running ode trunk build with apache tomcat as described in [1] . I have an asynchronous bpel process which has a receive, invoke and a receive. When I run this process for a while, I see that there are few incomplete instances, although all the expected messages reached ode. From the debug logs, I figured that it is happening as follows.
If a thread executing a job tries to acquire the process instance lock, while another thread is executing on the same instance and times out, it will throw a timeout exception at InstanceLockManager which will be wrapped to a JobProcessorException. In SimpleScheduler, RunJob.call method, when the execution of a job starts, it will try to delete the job from the db. For the initial try, it would succeed since the job is in db. However, when the JobProcessorException exception happens due to timeout on instance lock, the transaction gets rolled back. Ideally, the job should be restored back when the rollback happens. However, the job does not get restored to db as the transaction manager and db resources are not associated. Hence when the scheduler tries to retry 3 times by default, it will fail with job no longer in db error. This results in few of the process instances never completing since the job was abandoned even though the messages reached ode. Following log extracts from the ode log explains the scenario. grep instanceid 16:36:12,115 ODEServer-78 DEBUG [InstanceLockManager] Thread[ODEServer-78,5,main]: lock(iid=36423, time=1MICROSECONDS) 16:36:12,115 ODEServer-78 DEBUG [InstanceLockManager] Thread[ODEServer-78,5,main]: lock(iid=36423, time=1MICROSECONDS)-->WAITING(held by Thread[ODEServer-9,5,main]) 16:36:12,115 ODEServer-78 DEBUG [InstanceLockManager] Thread[ODEServer-78,5,main]: lock(iid=36423, time=1MICROSECONDS)-->TIMEOUT (held by Thread[ODEServer-9,5,main]) 16:36:12,115 ODEServer-78 DEBUG [BpelEngineImpl] Instance 36423 is busy, rescheduling job. 16:36:12,239 ODEServer-9 DEBUG [InstanceLockManager] Thread[ODEServer-9,5,main]: unlock(iid=36423) 16:36:15,120 ODEServer-78 DEBUG [SimpleScheduler] job no longer in db forced rollback: Job hqejbhcnphr8357nokgnxp time: 2013-02-28 16:36:11 IST transacted: true persisted: true details: JobDetails( instanceId: 36423 mexId: null processId: null type: MATCHER channel: null correlatorId: DebugCallbackPL.debugOpCallback correlationKeySet: @2[CorrelationSet~746ee3bf-4c4c-4da9-bdb0-233a760ce377] retryCount: null inMem: false detailsExt: {}) grep jobid 16:36:11,960 ODEServer-9 DEBUG [JdbcDelegate] insertJob hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94 loaded=true 16:36:12,007 ODEServer-1 DEBUG [SimpleScheduler] todo.enqueue job from db: hqejbhcnphr8357nokgnxp for 1362049571960(16:36:11,960) 16:36:12,007 ODEServer-78 DEBUG [JdbcDelegate] deleteJob hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94 16:36:12,032 ODEServer-9 DEBUG [SimpleScheduler] scheduled immediate job: hqejbhcnphr8357nokgnxp 16:36:12,239 ODEServer-9 DEBUG [SimpleScheduler] Job hqejbhcnphr8357nokgnxp is being processed (outstanding job) 16:36:13,116 ODEServer-78 DEBUG [JdbcDelegate] deleteJob hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94 org.apache.ode.scheduler.simple.JobNoLongerInDbException: Job no longer in db: hqejbhcnphr8357nokgnxp nodeId=hqejbhcnphr8357nokgj94 16:36:14,118 ODEServer-78 DEBUG [JdbcDelegate] deleteJob hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94 org.apache.ode.scheduler.simple.JobNoLongerInDbException: Job no longer in db: hqejbhcnphr8357nokgnxp nodeId=hqejbhcnphr8357nokgj94 16:36:15,119 ODEServer-78 DEBUG [JdbcDelegate] deleteJob hqejbhcnphr8357nokgnxp on node hqejbhcnphr8357nokgj94 16:36:15,120 ODEServer-78 DEBUG [SimpleScheduler] job no longer in db forced rollback: Job hqejbhcnphr8357nokgnxp time: 2013-02-28 16:36:11 IST transacted: true persisted: true details: JobDetails( instanceId: 36423 mexId: null processId: null type: MATCHER channel: null correlatorId: DebugCallbackPL.debugOpCallback correlationKeySet: @2[CorrelationSet~746ee3bf-4c4c-4da9-bdb0-233a760ce377] retryCount: null inMem: false detailsExt: {}) Is this the expected behavior or is there any additional settings i should configure to make transaction manager restore job to db at rollback ? Will reinserting the job back to db when the JobProcessorException happens fix this problem ? Regards Nandika [1] http://ode.apache.org/war-deployment.html