[ https://issues.apache.org/jira/browse/QPID-8514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331563#comment-17331563 ]
Alex Rudyy commented on QPID-8514: ---------------------------------- Hi [~daniel.kirilyuk], I committed a change [5db3f7c|https://gitbox.apache.org/repos/asf?p=qpid-broker-j.git;h=5db3f7c] for a ring overflow policy. It allows to delete the next available entry after least significant one (oldest in case of standard queue) when the least significant entry is acquired by the consumer using acquired non-stealable state. The Broker cannot delete the messages in acquired non-stealable state. As result, previously, the same message was returned by the queue (as the deletion did happen), causing the loop in the ring overflow policy. The implemented change should prevent the CPU spinning in the ring policy loop on running into acquired non-stealable message. IMHO, it should be a reasonable compromise, as a message transition into acquired non-stealable state happens when a consumer settles the message. The non-transacted messaging use cases should not affected by the reported issue, as the message deletion happens immediately after settling. Thus, the message would be in acquired non-stealable state for a very short amount time. As for transacted messaging use cases, the consumer can commit its transaction with some delay after messages being settled. Thus, the transacted settled message can be in acquired non-stealable state for some time until consuming transaction is committed. I strongly believe that messages consumed using Session.CLIENT_ACKNOWLEDGE are not effected by the issue. Though, if you see the issue manifestation for Session.CLIENT_ACKNOWLEDGE please let me know. The idea behind the acquired non-stealable state is that message cannot be deleted on the broker side when it is in the acquired state. It was added to solve some concurrency issues on concurrent message deletion from multiple threads (for example, consumer thread and management thread performing queue clean up). Potentially, the issue can be solved by not putting the message into acquired non-stealable state on settling transacted message. For example, transacted messages can remain in the acquired steabale state until transaction is committed. Though, that might require a lot more changes to the code. Please feel free to try to implement it, if the provided implementation is not acceptable for you. Kind Regards, Alex > High CPU usage and stucked connections > -------------------------------------- > > Key: QPID-8514 > URL: https://issues.apache.org/jira/browse/QPID-8514 > Project: Qpid > Issue Type: Bug > Components: Broker-J > Affects Versions: qpid-java-broker-8.0.4 > Reporter: Daniil Kirilyuk > Priority: Minor > Attachments: QPID-8514 - broker-log.zip, > RingPolicyConcurrencyTest.java, threaddump.zip > > > There were around 600 connections in state CLOSE__WAIT: > {noformat} > [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep TIME_WAIT | wc -l > 38 > [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep CLOSE_WAIT | wc -l > 625 > [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep -v CLOSE_WAIT | grep > -V TIME_WAIT | wc -l > 7 > {noformat} > CPU usage raises up to 90% and higher. > Connections couldn't be closed by webgui. Broker log contains message below, > but connections were not closed. > {noformat} > 2021-03-26 15:12:28,474 INFO [Broker-Config] (q.m.c.model_delete) - > [mng:3hpU+miq(admin@/127.0.0.1:62743)] > [con:751(TEST@/127.0.0.1:40422/default)] CON-1007 : Connection close > initiated by operator > {noformat} > Thread dumps shows around 49 threads with same stacktrace related to ring > queue > {noformat} > java.lang.Thread.State: RUNNABLE > at > org.apache.qpid.server.queue.QueueEntryImpl.addStateChangeListener(QueueEntryImpl.java:674) > at > org.apache.qpid.server.queue.QueueEntryImpl.acquireOrSteal(QueueEntryImpl.java:316) > at > org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1913) > at > org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1863) > at > org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.checkOverflow(RingOverflowPolicyHandler.java:102) > at > org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.access$000(RingOverflowPolicyHandler.java:44) > at > org.apache.qpid.server.queue.RingOverflowPolicyHandler.checkOverflow(RingOverflowPolicyHandler.java:41) > at > org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1382) > at > org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1290) > at > org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135) > at > org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309) > at > org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84) > at > org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588) > at > org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1343) > at > org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$142/959267941.accept(Unknown > Source) > at java.lang.Iterable.forEach(Iterable.java:75) > at > org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1343) > at > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1349) > at > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$141/1774823837.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1347) > at > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1328) > at > org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576) > at > org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$119/6380061.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571) > at > org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135) > at > org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611) > at > org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58) > at > org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496) > at > org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270) > at > org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134) > at > org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575) > at > org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366) > at > org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97) > at > org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464) > at > org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1487500813.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748) > {noformat} > Also ring queue behaves strangely: queue "{{DeadLetterQueue"}} has set > maximum message depth to {{1}}, but in there were 55 message with > {{Acquired}} state. > First message in {{DeadLetterQueue}} - {{2021-03-25, 16:50:59.381 (00:00 > UTC)}}. > Last message in {{DeadLetterQueue}} - {{2021-03-25, 16:50:59.683 (00:00 > UTC)}}. > These times correlates to timed out drain request at 16:52:00,410, because > default drain request is set 60 seconds: > {noformat} > ^[[0m^[[0m2021-03-25 16:52:00,408 INFO [org.apache.qpid.jms.JmsSession] > (QpidJMS Connection Executor: ID:986cd915-d072-4711-bd71-50a74aab4be8:1) A > JMS MessageConsumer has been closed: JmsConsumerInfo: { > ID:986cd915-d072-4711-bd71-50a74aab4be8:1:1:1, destination = > broadcast.XX.PRISMA_LIVERiskLimitTrigger } > ^[[0m^[[33m2021-03-25 16:52:00,410 WARN > [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel > (camel-1) thread #3 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger]) > Setup of JMS message listener invoker failed for destination > 'broadcast.XX.PRISMA_LIVERiskLimitTrigger' - trying to recover. Cause: Remote > did not respond to a drain request in time > ^[[0m^[[33m2021-03-25 16:52:01,414 WARN > [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel > (camel-1) thread #6 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger]) > Setup of JMS message listener invoker failed for destination > 'broadcast.XX.PRISMA_LIVERiskLimitTrigger' > {noformat} > After the problem was triggered broker started to reject client connections: > {noformat} > 2021-03-25 17:55:43,716 INFO [Broker-Config] (q.m.c.open) - > [con:901(/127.0.0.1:56012)] CON-1001 : Open : Destination : > amqp(127.0.0.1:22101) : Protocol Version : 1.0 > 2021-03-25 17:55:43,745 INFO [IO-/127.0.0.1:56012] (q.m.c.open) - > [con:901(ARP@/127.0.0.1:56012/default)] CON-1001 : Open : Destination : > amqp(172.16.75.197:22101) : Protocol Version : 1.0 : Client ID : > ID:af50d087-0610-406a-9792-ffa5bf6b68b4:5 : Client Version : 0.51.0 : Client > Product : QpidJMS > 2021-03-25 17:55:43,747 INFO [IO-/127.0.0.1:56012] (q.m.a.denied) - > [con:901(ARP@/127.0.0.1:56012/default)] ACL-1002 : Denied : Access > Virtualhost NAME=default,VIRTUALHOST_NAME=default > 2021-03-25 17:55:43,755 INFO [Broker-Config] (q.m.c.dropped_connection) - > [con:901(ARP@/127.0.0.1:56012/default)] CON-1004 : Connection dropped > {noformat} > Issue seems to be quite rare: same test ran in a loop over several days and > this issue was not triggered. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org