Kiran Koneti created CLOUDSTACK-4288:
----------------------------------------
Summary: Management server is hanging quite often and in
indefinite time intervals.
Key: CLOUDSTACK-4288
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4288
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Components: Install and Setup
Affects Versions: 4.2.0
Reporter: Kiran Koneti
Priority: Blocker
Fix For: 4.2.0
I have created a Advanced Zone setup using the latest rhel63 399 build which is
generated around 12:08 PM IST.I see the management server hanging quite few
often for few minutes and restores again after some time on its own.
At that time all the all teh CS operations are halted even the Management
server logs also halt and once it starts the hosts go into alert state and
comes up later.
This is observed quite often and when i took the thread dump it shows the below
messages
""SecGrp-Worker-1" prio=10 tid=0x00007f86bc1de000 nid=0x28b waiting on
condition [0x00007f86b7cfb000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000077b193618> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
com.cloud.network.security.LocalSecurityGroupWorkQueue.getWork(LocalSecurityGroupWorkQueue.java:152)
at
com.cloud.network.security.SecurityGroupManagerImpl2.work(SecurityGroupManagerImpl2.java:136)
at
com.cloud.network.security.SecurityGroupManagerImpl2$WorkerThread.run(SecurityGroupManagerImpl2.java:71)
"SecGrp-Worker-0" prio=10 tid=0x00007f86bc1dc000 nid=0x28a waiting on condition
[0x00007f86b7dfc000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000077b193618> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
com.cloud.network.security.LocalSecurityGroupWorkQueue.getWork(LocalSecurityGroupWorkQueue.java:152)
at
com.cloud.network.security.SecurityGroupManagerImpl2.work(SecurityGroupManagerImpl2.java:136)
at
com.cloud.network.security.SecurityGroupManagerImpl2$WorkerThread.run(SecurityGroupManagerImpl2.java:71)
"HA-2" prio=10 tid=0x00007f86bc1da000 nid=0x289 waiting on condition
[0x00007f86b7efd000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000077dc218b0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:193)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)"
Attaching the catalina.out as well as the management server logs.
This issue is observed in two different setups i.e with rhel 63 build in my
environment and also rhel62 environment which manasa is using.
During the hang period when i did top the cpu% goes down to very low values.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira