wido opened a new issue #4657:
URL: https://github.com/apache/cloudstack/issues/4657
##### ISSUE TYPE
* Bug Report
##### COMPONENT NAME
~~~
Management Server
MySQL
~~~
##### CLOUDSTACK VERSION
~~~
4.13.1
~~~
##### CONFIGURATION
This is a standard Management Server with a connection to a MariaDB server
##### OS / ENVIRONMENT
- Ubuntu 16.04.7
- 10.2.34-MariaDB-10.2.34+maria~xenial-log
- CloudStack 4.13.1
Settings in db.properties:
<pre>
db.cloud.url.params=prepStmtCacheSize=517&cachePrepStmts=true&sessionVariables=sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'
db.cloud.maxIdle=30
db.cloud.maxActive=250
db.cloud.reconnectAtTxEnd=true
db.cloud.autoReconnectForPools=true
db.cloud.autoReconnect=tru
</pre>
##### SUMMARY
The Management Server seems to loose it's connection to MySQL while it is
actually still up and running. Causing some components to fail inside the
Management Server.
##### STEPS TO REPRODUCE
Not known yet
##### ACTUAL RESULTS
In the logs suddenly it shows:
~~~
2021-02-06 03:59:04,748 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl]
(AgentManager-Handler-12:null) (logid:) VM state report. host: 30, vm id: 1019,
power state: PowerOn
2021-02-06 03:59:04,751 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl]
(AgentManager-Handler-12:null) (logid:) VM state report. host: 30, vm id: 187,
power state: PowerOn
2021-02-06 03:59:04,754 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl]
(AgentManager-Handler-12:null) (logid:) VM state report. host: 30, vm id: 702,
power state: PowerOn
2021-02-06 03:59:04,780 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl]
(AgentManager-Handler-12:null) (logid:) Done with process of VM state report.
host: 30
2021-02-06 03:59:04,836 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentManager-Handler-14:null) (logid:) Ping from 34(s-1072-VM)
2021-02-06 03:59:10,284 DEBUG [c.c.s.StatsCollector]
(StatsCollector-3:ctx-9527a5cd) (logid:2204af40) AutoScaling Monitor is
running...
2021-02-06 03:59:13,074 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentManager-Handler-3:null) (logid:) Ping from Routing host
3(xx.c01.p01.xx.xx)
2021-02-06 03:59:13,075 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl]
(AgentManager-Handler-3:null) (logid:) Process host VM state report from ping
process. host: 3
2021-02-06 03:59:13,369 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentManager-Handler-5:null) (logid:) SeqA 35-21752: Processing Seq 35-21752:
{ Cmd , MgmtId: -1, via: 35, Ver: v1, Flags: 11,
[{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":1071,"_loadInfo":"{\n
\"connections\": []\n}","wait":0}}] }
2021-02-06 03:59:14,662 DEBUG [c.c.a.t.Request]
(StatsCollector-1:ctx-dcc6eeba) (logid:098d3bcd) Seq 1-8767382574583524527:
Received: { Ans: , MgmtId: 90520732637524, via: 1(xx.c01.p01.xx), Ver: v1,
Flags: 10, { GetVolumeStatsAnswer } }
2021-02-06 03:59:14,662 DEBUG [c.c.a.m.AgentManagerImpl]
(StatsCollector-1:ctx-dcc6eeba) (logid:098d3bcd) Details from executing class
com.cloud.agent.api.GetVolumeStatsCommand:
2021-02-06 03:59:14,741 DEBUG [c.c.a.t.Request]
(StatsCollector-1:ctx-dcc6eeba) (logid:098d3bcd) Seq 1-8767382574583524528:
Received: { Ans: , MgmtId: 90520732637524, via: 1(xx.c01.p01.xx), Ver: v1,
Flags: 10, { GetVolumeStatsAnswer } }
2021-02-06 03:59:14,741 DEBUG [c.c.a.m.AgentManagerImpl]
(StatsCollector-1:ctx-dcc6eeba) (logid:098d3bcd) Details from executing class
com.cloud.agent.api.GetVolumeStatsCommand:
2021-02-06 03:59:14,782 ERROR [o.a.c.f.j.d.SyncQueueItemDaoImpl]
(AsyncJobMgr-Heartbeat-1:ctx-6c122c53) (logid:f5164750) Unexpected sql
exception,
java.sql.SQLException: Cannot get a connection, pool error Timeout waiting
for idle object
at
org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:142)
at
com.cloud.utils.db.TransactionLegacy.getConnection(TransactionLegacy.java:557)
at
com.cloud.utils.db.TransactionLegacy.prepareStatement(TransactionLegacy.java:468)
at
com.cloud.utils.db.TransactionLegacy.prepareAutoCloseStatement(TransactionLegacy.java:461)
at
org.apache.cloudstack.framework.jobs.dao.SyncQueueItemDaoImpl.getNextQueueItems(SyncQueueItemDaoImpl.java:106)
at sun.reflect.GeneratedMethodAccessor86.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:338)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:197)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at
com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:174)
at
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
at
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
at com.sun.proxy.$Proxy195.getNextQueueItems(Unknown Source)
at
org.apache.cloudstack.framework.jobs.impl.SyncQueueManagerImpl$3.doInTransactionWithoutResult(SyncQueueManagerImpl.java:140)
at
com.cloud.utils.db.TransactionCallbackNoReturn.doInTransaction(TransactionCallbackNoReturn.java:25)
at
com.cloud.utils.db.Transaction$2.doInTransaction(Transaction.java:50)
at com.cloud.utils.db.Transaction.execute(Transaction.java:40)
at com.cloud.utils.db.Transaction.execute(Transaction.java:47)
at
org.apache.cloudstack.framework.jobs.impl.SyncQueueManagerImpl.dequeueFromAny(SyncQueueManagerImpl.java:137)
at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$6.reallyRun(AsyncJobManagerImpl.java:813)
at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$6.runInContext(AsyncJobManagerImpl.java:801)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:447)
at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:361)
at
org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:134)
... 36 more
~~~
This then keeps repeating and the management server doesn't work properly
until restarted.
Other log entries:
<pre>
2021-02-06 03:59:15,230 DEBUG [c.c.c.ClusterManagerImpl]
(Cluster-Heartbeat-1:ctx-6b154ebc) (logid:4a081a66) Management server heartbeat
takes too long to finish. profiler: Done. Duration: 10003ms,
profilerHeartbeatUpdate: Started but not stopped, profilerPeerScan: Not Started
2021-02-06 03:59:15,230 ERROR [c.c.c.ClusterManagerImpl]
(Cluster-Heartbeat-1:ctx-6b154ebc) (logid:4a081a66) Unexpected exception in
cluster heartbeat
java.lang.RuntimeException: Cannot get a connection, pool error Timeout
waiting for idle object
at
com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(ManagementServerHostDaoImpl.java:148)
at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:338)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:197)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at
com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:174)
at
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
at
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
at com.sun.proxy.$Proxy176.update(Unknown Source)
at
com.cloud.cluster.ClusterManagerImpl$4.runInContext(ClusterManagerImpl.java:555)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: Cannot get a connection, pool error
Timeout waiting for idle object
at
org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:142)
at
com.cloud.utils.db.TransactionLegacy.getConnection(TransactionLegacy.java:557)
at
com.cloud.utils.db.TransactionLegacy.prepareStatement(TransactionLegacy.java:468)
... 27 more
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:447)
at
org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:361)
at
org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:134)
... 29 more
</pre>
<pre>
2021-02-06 03:59:15,227 WARN [c.c.c.d.ManagementServerHostDaoImpl]
(Cluster-Heartbeat-1:ctx-6b154ebc) (logid:4a081a66) Unexpected exception,
java.sql.SQLException: Cannot get a connection, pool error Timeout waiting
for idle object
at
org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:142)
at
com.cloud.utils.db.TransactionLegacy.getConnection(TransactionLegacy.java:557)
at
com.cloud.utils.db.TransactionLegacy.prepareStatement(TransactionLegacy.java:468)
at
com.cloud.utils.db.TransactionLegacy.prepareAutoCloseStatement(TransactionLegacy.java:461)
at
com.cloud.cluster.dao.ManagementServerHostDaoImpl.update(ManagementServerHostDaoImpl.java:134)
at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:338)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:197)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at
com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:174)
at
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
at
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
at com.sun.proxy.$Proxy176.update(Unknown Source)
at
com.cloud.cluster.ClusterManagerImpl$4.runInContext(ClusterManagerImpl.java:555)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
</pre>
Increasing *db.cloud.maxActive=250* might help, but I would rather not
increase this until we are sure that's the root-cause.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]