[ https://issues.apache.org/jira/browse/QPID-8514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320583#comment-17320583 ]
Alex Rudyy commented on QPID-8514: ---------------------------------- Hi [~daniel.kirilyuk], Well, an acquired by consumer queue entry should be in {{StealableConsumerAcquiredState}} for majority of the time ( except for a short period of time, when message is being dispatched to a consumer. The state is {{UnstealableConsumerAcquiredState}} when message is being dispatched). After, sending acquired entry to a consumer, it should transit into {{StealableConsumerAcquiredState}}. The transition happens in {{org.apache.qpid.server.consumer.AbstractConsumerTarget#send}} by invoking method {{org.apache.qpid.server.message.MessageInstance#makeAcquisitionStealable}}. The latter should change entry state from {{UnstealableConsumerAcquiredState}} to {{StealableConsumerAcquiredState}} . It seems that transition into {{StealableConsumerAcquiredState}} does not happen when issue manifest I see that it might happen on conversion exception being thrown. It needs fixing... I guess you are not using older protocols. Thus, it should not be a problem. Do you have any errors reported into the broker logs just before the issue manifests? I believe in order to resolve reported problem, we need to understand what leaves consumer in {{UnstealableConsumerAcquiredState}} and fix the underlying issue. > High CPU usage and stucked connections > -------------------------------------- > > Key: QPID-8514 > URL: https://issues.apache.org/jira/browse/QPID-8514 > Project: Qpid > Issue Type: Bug > Components: Broker-J > Affects Versions: qpid-java-broker-8.0.4 > Reporter: Daniil Kirilyuk > Priority: Minor > Attachments: RingPolicyConcurrencyTest.java, threaddump.zip > > > There were around 600 connections in state CLOSE__WAIT: > {noformat} > [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep TIME_WAIT | wc -l > 38 > [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep CLOSE_WAIT | wc -l > 625 > [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep -v CLOSE_WAIT | grep > -V TIME_WAIT | wc -l > 7 > {noformat} > CPU usage raises up to 90% and higher. > Connections couldn't be closed by webgui. Broker log contains message below, > but connections were not closed. > {noformat} > 2021-03-26 15:12:28,474 INFO [Broker-Config] (q.m.c.model_delete) - > [mng:3hpU+miq(admin@/127.0.0.1:62743)] > [con:751(TEST@/127.0.0.1:40422/default)] CON-1007 : Connection close > initiated by operator > {noformat} > Thread dumps shows around 49 threads with same stacktrace related to ring > queue > {noformat} > java.lang.Thread.State: RUNNABLE > at > org.apache.qpid.server.queue.QueueEntryImpl.addStateChangeListener(QueueEntryImpl.java:674) > at > org.apache.qpid.server.queue.QueueEntryImpl.acquireOrSteal(QueueEntryImpl.java:316) > at > org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1913) > at > org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1863) > at > org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.checkOverflow(RingOverflowPolicyHandler.java:102) > at > org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.access$000(RingOverflowPolicyHandler.java:44) > at > org.apache.qpid.server.queue.RingOverflowPolicyHandler.checkOverflow(RingOverflowPolicyHandler.java:41) > at > org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1382) > at > org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1290) > at > org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135) > at > org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309) > at > org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84) > at > org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588) > at > org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1343) > at > org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$142/959267941.accept(Unknown > Source) > at java.lang.Iterable.forEach(Iterable.java:75) > at > org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1343) > at > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1349) > at > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$141/1774823837.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1347) > at > org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1328) > at > org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576) > at > org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$119/6380061.run(Unknown > Source) > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571) > at > org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135) > at > org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611) > at > org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58) > at > org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496) > at > org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270) > at > org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134) > at > org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575) > at > org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366) > at > org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97) > at > org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464) > at > org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1487500813.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748) > {noformat} > Also ring queue behaves strangely: queue "{{DeadLetterQueue"}} has set > maximum message depth to {{1}}, but in there were 55 message with > {{Acquired}} state. > First message in {{DeadLetterQueue}} - {{2021-03-25, 16:50:59.381 (00:00 > UTC)}}. > Last message in {{DeadLetterQueue}} - {{2021-03-25, 16:50:59.683 (00:00 > UTC)}}. > These times correlates to timed out drain request at 16:52:00,410, because > default drain request is set 60 seconds: > {noformat} > ^[[0m^[[0m2021-03-25 16:52:00,408 INFO [org.apache.qpid.jms.JmsSession] > (QpidJMS Connection Executor: ID:986cd915-d072-4711-bd71-50a74aab4be8:1) A > JMS MessageConsumer has been closed: JmsConsumerInfo: { > ID:986cd915-d072-4711-bd71-50a74aab4be8:1:1:1, destination = > broadcast.XX.PRISMA_LIVERiskLimitTrigger } > ^[[0m^[[33m2021-03-25 16:52:00,410 WARN > [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel > (camel-1) thread #3 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger]) > Setup of JMS message listener invoker failed for destination > 'broadcast.XX.PRISMA_LIVERiskLimitTrigger' - trying to recover. Cause: Remote > did not respond to a drain request in time > ^[[0m^[[33m2021-03-25 16:52:01,414 WARN > [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel > (camel-1) thread #6 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger]) > Setup of JMS message listener invoker failed for destination > 'broadcast.XX.PRISMA_LIVERiskLimitTrigger' > {noformat} > After the problem was triggered broker started to reject client connections: > {noformat} > 2021-03-25 17:55:43,716 INFO [Broker-Config] (q.m.c.open) - > [con:901(/127.0.0.1:56012)] CON-1001 : Open : Destination : > amqp(127.0.0.1:22101) : Protocol Version : 1.0 > 2021-03-25 17:55:43,745 INFO [IO-/127.0.0.1:56012] (q.m.c.open) - > [con:901(ARP@/127.0.0.1:56012/default)] CON-1001 : Open : Destination : > amqp(172.16.75.197:22101) : Protocol Version : 1.0 : Client ID : > ID:af50d087-0610-406a-9792-ffa5bf6b68b4:5 : Client Version : 0.51.0 : Client > Product : QpidJMS > 2021-03-25 17:55:43,747 INFO [IO-/127.0.0.1:56012] (q.m.a.denied) - > [con:901(ARP@/127.0.0.1:56012/default)] ACL-1002 : Denied : Access > Virtualhost NAME=default,VIRTUALHOST_NAME=default > 2021-03-25 17:55:43,755 INFO [Broker-Config] (q.m.c.dropped_connection) - > [con:901(ARP@/127.0.0.1:56012/default)] CON-1004 : Connection dropped > {noformat} > Issue seems to be quite rare: same test ran in a loop over several days and > this issue was not triggered. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org