Daniil Kirilyuk created QPID-8514:
-------------------------------------
Summary: High CPU usage and stucked connections
Key: QPID-8514
URL: https://issues.apache.org/jira/browse/QPID-8514
Project: Qpid
Issue Type: Bug
Components: Broker-J
Affects Versions: qpid-java-broker-8.0.4
Reporter: Daniil Kirilyuk
There were around 600 connections in state CLOSE__WAIT:
{noformat}
[prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep TIME_WAIT | wc -l
38
[prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep CLOSE_WAIT | wc -l
625
[prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep -v CLOSE_WAIT | grep -V
TIME_WAIT | wc -l
7
{noformat}
CPU usage raises up to 90% and higher.
Connections couldn't be closed by webgui. Broker log contains message below,
but connections were not closed.
{noformat}
2021-03-26 15:12:28,474 INFO [Broker-Config] (q.m.c.model_delete) -
[mng:3hpU+miq(admin@/127.0.0.1:62743)] [con:751(TEST@/127.0.0.1:40422/default)]
CON-1007 : Connection close initiated by operator
{noformat}
Thread dumps shows around 49 threads with same stacktrace related to ring queue
{noformat}
java.lang.Thread.State: RUNNABLE
at
org.apache.qpid.server.queue.QueueEntryImpl.addStateChangeListener(QueueEntryImpl.java:674)
at
org.apache.qpid.server.queue.QueueEntryImpl.acquireOrSteal(QueueEntryImpl.java:316)
at
org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1913)
at
org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1863)
at
org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.checkOverflow(RingOverflowPolicyHandler.java:102)
at
org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.access$000(RingOverflowPolicyHandler.java:44)
at
org.apache.qpid.server.queue.RingOverflowPolicyHandler.checkOverflow(RingOverflowPolicyHandler.java:41)
at
org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1382)
at
org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1290)
at
org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135)
at
org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309)
at
org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84)
at
org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588)
at
org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1343)
at
org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$142/959267941.accept(Unknown
Source)
at java.lang.Iterable.forEach(Iterable.java:75)
at
org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1343)
at
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1349)
at
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$141/1774823837.run(Unknown
Source)
at java.security.AccessController.doPrivileged(Native Method)
at
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1347)
at
org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1328)
at
org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
at
org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$119/6380061.run(Unknown
Source)
at java.security.AccessController.doPrivileged(Native Method)
at
org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
at
org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
at
org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
at
org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
at
org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
at
org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
at
org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
at
org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
at
org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
at
org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
at
org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
at
org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1487500813.run(Unknown
Source)
at java.lang.Thread.run(Thread.java:748)
{noformat}
Also ring queue behaves strangely: queue "{{DeadLetterQueue"}} has set maximum
message depth to {{1}}, but in there were 55 message with {{Acquired}} state.
First message in {{DeadLetterQueue}} - {{2021-03-25, 16:50:59.381 (00:00
UTC)}}.
Last message in {{DeadLetterQueue}} - {{2021-03-25, 16:50:59.683 (00:00 UTC)}}.
These times correlates to timed out drain request at 16:52:00,410, because
default drain request is set 60 seconds:
{noformat}
^[[0m^[[0m2021-03-25 16:52:00,408 INFO [org.apache.qpid.jms.JmsSession]
(QpidJMS Connection Executor: ID:986cd915-d072-4711-bd71-50a74aab4be8:1) A JMS
MessageConsumer has been closed: JmsConsumerInfo: {
ID:986cd915-d072-4711-bd71-50a74aab4be8:1:1:1, destination =
broadcast.XX.PRISMA_LIVERiskLimitTrigger }
^[[0m^[[33m2021-03-25 16:52:00,410 WARN
[org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel
(camel-1) thread #3 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger])
Setup of JMS message listener invoker failed for destination
'broadcast.XX.PRISMA_LIVERiskLimitTrigger' - trying to recover. Cause: Remote
did not respond to a drain request in time
^[[0m^[[33m2021-03-25 16:52:01,414 WARN
[org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel
(camel-1) thread #6 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger])
Setup of JMS message listener invoker failed for destination
'broadcast.XX.PRISMA_LIVERiskLimitTrigger'
{noformat}
After the problem was triggered broker started to reject client connections:
{noformat}
2021-03-25 17:55:43,716 INFO [Broker-Config] (q.m.c.open) -
[con:901(/127.0.0.1:56012)] CON-1001 : Open : Destination :
amqp(127.0.0.1:22101) : Protocol Version : 1.0
2021-03-25 17:55:43,745 INFO [IO-/127.0.0.1:56012] (q.m.c.open) -
[con:901(ARP@/127.0.0.1:56012/default)] CON-1001 : Open : Destination :
amqp(172.16.75.197:22101) : Protocol Version : 1.0 : Client ID :
ID:af50d087-0610-406a-9792-ffa5bf6b68b4:5 : Client Version : 0.51.0 : Client
Product : QpidJMS
2021-03-25 17:55:43,747 INFO [IO-/127.0.0.1:56012] (q.m.a.denied) -
[con:901(ARP@/127.0.0.1:56012/default)] ACL-1002 : Denied : Access Virtualhost
NAME=default,VIRTUALHOST_NAME=default
2021-03-25 17:55:43,755 INFO [Broker-Config] (q.m.c.dropped_connection) -
[con:901(ARP@/127.0.0.1:56012/default)] CON-1004 : Connection dropped
{noformat}
Issue seems to be quite rare: same test ran in a loop over several days and
this issue was not triggered.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]