[ https://issues.apache.org/jira/browse/QPID-8514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320145#comment-17320145 ]
Daniil Kirilyuk commented on QPID-8514: --------------------------------------- [~orudyy], Thank you very much for the analysis and for the test case. We've changed the implementation of queue.getLeastSignificantOldestEntry() in suggested way, but that brings another problem: messages in acquired state remain in a queue undeleted, leading to queue overflow. Shouldn't the deletion logic be changed as well to make possible message stealing in such cases? Alternatively ring queue behavior could be changed to reject messages when queue size is reached and queue is filled with acquired messages. Which approach is preferable in your opinion? > High CPU usage and stucked connections > -------------------------------------- > > Key: QPID-8514 > URL: https://issues.apache.org/jira/browse/QPID-8514 > Project: Qpid > Issue Type: Bug > Components: Broker-J > Affects Versions: qpid-java-broker-8.0.4 > Reporter: Daniil Kirilyuk > Priority: Minor > Attachments: RingPolicyConcurrencyTest.java, threaddump.zip > > > There were around 600 connections in state CLOSE__WAIT: > {noformat} > [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep TIME_WAIT | wc -l > 38 > [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep CLOSE_WAIT | wc -l > 625 > [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep -v CLOSE_WAIT | grep > -V TIME_WAIT | wc -l > 7 > {noformat} > CPU usage raises up to 90% and higher. > Connections couldn't be closed by webgui. Broker log contains message below, > but connections were not closed. > {noformat} > 2021-03-26 15:12:28,474 INFO [Broker-Config] (q.m.c.model_delete) - > [mng:3hpU+miq(admin@/127.0.0.1:62743)] > [con:751(TEST@/127.0.0.1:40422/default)] CON-1007 : Connection close > initiated by operator > {noformat} > Thread dumps shows around 49 threads with same stacktrace related to ring > queue > {noformat} > java.lang.Thread.State: RUNNABLE > at > org.apache.qpid.server.queue.QueueEntryImpl.addStateChangeListener(QueueEntryImpl.java:674) > at > org.apache.qpid.server.queue.QueueEntryImpl.acquireOrSteal(QueueEntryImpl.java:316) > at > org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1913) > at > org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1863) > at > org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.checkOverflow(RingOverflowPolicyHandler.java:102) > at > org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.access$000(RingOverflowPolicyHandler.java:44) > at > org.apache.qpid.server.queue.RingOverflowPolicyHandler.checkOverflow(RingOverflowPolicyHandler.java:41) > at > org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1382) > at > org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1290) > at > org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135) > at > org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309) > at > org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84) > at > org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588) > at > org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1343) > at > org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$142/959267941.accept(Unknown > Source) > at java.lang.Iterable.forEach(Iterable.java:75) > at > org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1343) > at > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1349) > at > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$141/1774823837.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1347) > at > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1328) > at > org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576) > at > org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$119/6380061.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571) > at > org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135) > at > org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611) > at > org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58) > at > org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496) > at > org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270) > at > org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134) > at > org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575) > at > org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366) > at > org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97) > at > org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464) > at > org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1487500813.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748) > {noformat} > Also ring queue behaves strangely: queue "{{DeadLetterQueue"}} has set > maximum message depth to {{1}}, but in there were 55 message with > {{Acquired}} state. > First message in {{DeadLetterQueue}} - {{2021-03-25, 16:50:59.381 (00:00 > UTC)}}. > Last message in {{DeadLetterQueue}} - {{2021-03-25, 16:50:59.683 (00:00 > UTC)}}. > These times correlates to timed out drain request at 16:52:00,410, because > default drain request is set 60 seconds: > {noformat} > ^[[0m^[[0m2021-03-25 16:52:00,408 INFO [org.apache.qpid.jms.JmsSession] > (QpidJMS Connection Executor: ID:986cd915-d072-4711-bd71-50a74aab4be8:1) A > JMS MessageConsumer has been closed: JmsConsumerInfo: { > ID:986cd915-d072-4711-bd71-50a74aab4be8:1:1:1, destination = > broadcast.XX.PRISMA_LIVERiskLimitTrigger } > ^[[0m^[[33m2021-03-25 16:52:00,410 WARN > [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel > (camel-1) thread #3 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger]) > Setup of JMS message listener invoker failed for destination > 'broadcast.XX.PRISMA_LIVERiskLimitTrigger' - trying to recover. Cause: Remote > did not respond to a drain request in time > ^[[0m^[[33m2021-03-25 16:52:01,414 WARN > [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel > (camel-1) thread #6 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger]) > Setup of JMS message listener invoker failed for destination > 'broadcast.XX.PRISMA_LIVERiskLimitTrigger' > {noformat} > After the problem was triggered broker started to reject client connections: > {noformat} > 2021-03-25 17:55:43,716 INFO [Broker-Config] (q.m.c.open) - > [con:901(/127.0.0.1:56012)] CON-1001 : Open : Destination : > amqp(127.0.0.1:22101) : Protocol Version : 1.0 > 2021-03-25 17:55:43,745 INFO [IO-/127.0.0.1:56012] (q.m.c.open) - > [con:901(ARP@/127.0.0.1:56012/default)] CON-1001 : Open : Destination : > amqp(172.16.75.197:22101) : Protocol Version : 1.0 : Client ID : > ID:af50d087-0610-406a-9792-ffa5bf6b68b4:5 : Client Version : 0.51.0 : Client > Product : QpidJMS > 2021-03-25 17:55:43,747 INFO [IO-/127.0.0.1:56012] (q.m.a.denied) - > [con:901(ARP@/127.0.0.1:56012/default)] ACL-1002 : Denied : Access > Virtualhost NAME=default,VIRTUALHOST_NAME=default > 2021-03-25 17:55:43,755 INFO [Broker-Config] (q.m.c.dropped_connection) - > [con:901(ARP@/127.0.0.1:56012/default)] CON-1004 : Connection dropped > {noformat} > Issue seems to be quite rare: same test ran in a loop over several days and > this issue was not triggered. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org