Hi Everyone,

I came across an issue in the OFBiz job polling and scheduling process and
would like to get inputs from the community.

There appears to be a race condition where a *recurring job can lose its
recurrence (tempExprId)* if the server crashes at a specific point during
execution.

When a job moves from SERVICE_QUEUED to SERVICE_RUNNING, a crash
occurring *before
the next recurrence is created* leaves the job in SERVICE_RUNNING state. On
restart, JobManager.reloadCrashedJobs() assumes the next recurrence already
exists and reschedules the job without tempExprId. Since the recurrence was
never actually created, the chain breaks and the job does not run again
after the retry.

I have a couple of possible fixes in the Job Manager area and am currently
evaluating the approach.

Kind Regards,
Chandan Khandelwal

Reply via email to