Alena Prokharchyk created CLOUDSTACK-2680:
---------------------------------------------
Summary: Async job expunge thread expunges not only inactive jobs,
but also the jobs that are currently being processed
Key: CLOUDSTACK-2680
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-2680
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Components: Management Server
Affects Versions: 4.1.0
Reporter: Alena Prokharchyk
Assignee: Alena Prokharchyk
Fix For: 4.2.0
Async Job Expunge thread that expunges jobs being in the async_job table for
more than "job.expire.minutes", expunge not only inactive (waiting) jobs, but
also the jobs that are currently being processed. It affects all cloudStack
jobs. It wasn't caught before because the default expire
interval is 1 day, and the job would expire faster on the backend (30 mins is
the default timeout).
So here what happens in snapshot case:
1) Set "concurrent.snapshots.threshold.perhost"=1, job.expire.minutes=15 mins
2) First createSnapshot API was executed at "X" time. Async job1 was created.
As there were no other snapshot jobs, the command was sent for execution to the
backend.
3) Second createSnapshot was executed at "X + 30 seconds" time. Async job2 was
created. Job2 is sitting in the queue and waiting on a job1 to finish.
4) The job1 didn't return back in 15 mins, and it was considered as expired by
the AsyncJobManager, and removed from the queue (although it was already
processed)
5) The background process checking on the sync status for job2 (runs every 10
seconds), found out that there is nothing blocking job2 any more, and sent it
to the backend.
The recommended fix would be: expire/expunge only inactive and already
completed jobs. Don't touch the jobs that are currently being processed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira