Min Chen created CLOUDSTACK-7749:
------------------------------------
Summary: AsyncJob GC thread cannot purge queue items that have
been blocking for too long if exception is thrown in expunging some unfinished
or completed old jobs, this will make some future jobs stuck.
Key: CLOUDSTACK-7749
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7749
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Components: Management Server
Affects Versions: 4.3.0
Reporter: Min Chen
Assignee: Min Chen
Priority: Critical
Fix For: 4.5.0
AsyncJobManager has a GC thread to clean up some unfinished job and complete
jobs that are too old. In this same thread, we are also forcefully cancel
blocking queue items if they've been staying there for too long. Currently if
there is an exception thrown in expunging one job, for example, like the one
below:
2014-10-14 17:57:26,347 INFO [o.a.c.f.j.i.AsyncJobManagerImpl]
(AsyncJobMgr-Heartbeat-1:ctx-67ae4177) Expunging unfinished job AsyncJobVO
{id:18443, userId: 60, accountId: 69, instanceType: null, instanc eId: null,
cmd: com.cloud.vm.VmWorkStart, cmdInfo:
rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIACkoABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljb
HVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL
01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAABFAAAAAAAAADwAAAAAAAACdXQAGVZpcnR1YWxNYWNoaW5lTWFuY
WdlckltcGwAAAAAAAAAAHBwcHBwcHBzcgARamF2YS51dGlsLkhhc2hNYXAFB9rBwxZg0QMAAkYACmxvYWRGYWN0b3JJAAl0aHJlc2hvbGR4cD9AAAAAAAAMdwgAAAAQAAAAAXQAClZtUGFzc3dvcmR0ABxyTzBBQlhRQURuTmhkbVZrWDNCaGMzTjNiM0preHA,
cmdVersi on: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result:
null, initMsid: 244536014864905, completeMsid: null, lastUpdated: null,
lastPolled: null, created: Wed Aug 20 18:14:13 EDT 2014}
2014-10-14 17:57:26,350 DEBUG [c.c.u.d.T.Transaction]
(AsyncJobMgr-Heartbeat-1:ctx-67ae4177) Rolling back the transaction: Time = 2
Name = AsyncJobMgr-Heartbeat-1; called by -TransactionLegacy.rollback:8
96-TransactionLegacy.removeUpTo:839-TransactionLegacy.close:663-TransactionContextInterceptor.invoke:35-ReflectiveMethodInvocation.proceed:161-ExposeInvocationInterceptor.invoke:91-ReflectiveMethodInvocat
ion.proceed:172-JdkDynamicAopProxy.invoke:204-$Proxy151.expunge:-1-AsyncJobManagerImpl$8.doInTransactionWithoutResult:802-TransactionCallbackNoReturn.doInTransaction:25-Transaction$2.doInTransaction:49
2014-10-14 17:57:26,368 ERROR [o.a.c.f.j.i.AsyncJobManagerImpl]
(AsyncJobMgr-Heartbeat-1:ctx-67ae4177) Unexpected exception when trying to
execute queue item,
com.cloud.utils.exception.CloudRuntimeException: DB Exception on:
com.mysql.jdbc.JDBC4PreparedStatement@39fdfb52: DELETE FROM async_job WHERE
async_job.id= 18443
at com.cloud.utils.db.GenericDaoBase.expunge(GenericDaoBase.java:1144)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at
com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:33)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
at
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
at com.sun.proxy.$Proxy151.expunge(Unknown Source)
at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$8.doInTransactionWithoutResult(AsyncJobManagerImpl.java:802)
at
com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25)
at com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:49)
at com.cloud.utils.db.Transaction.execute(Transaction.java:37)
at com.cloud.utils.db.Transaction.execute(Transaction.java:46)
at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl.expungeAsyncJob(AsyncJobManagerImpl.java:799)
at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$7.reallyRun(AsyncJobManagerImpl.java:762)
at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$7.runInContext(AsyncJobManagerImpl.java:738)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:50)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:47)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by:
com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException:
Cannot delete or update a parent row: a foreign key constraint fails
(`cloud`.`async_job_join_map`, CONSTRAINT `fk_async_job_join_map__join_job_id`
FOREIGN KEY (`join_job_id`) REFERENCES `async_job` (`id`))
all the following purge queue item action will not get chances to run at all.
This will cause potential job stuck issues since we are serializing VM
operations per VM.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)