[ 
https://issues.apache.org/jira/browse/QPID-8514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331563#comment-17331563
 ] 

Alex Rudyy commented on QPID-8514:
----------------------------------

Hi [~daniel.kirilyuk],
I committed a change 
[5db3f7c|https://gitbox.apache.org/repos/asf?p=qpid-broker-j.git;h=5db3f7c] for 
a ring overflow policy. It allows to delete the next available entry after 
least significant one (oldest in case of standard queue) when the least 
significant entry is acquired by the consumer using acquired non-stealable 
state. The Broker cannot delete the messages in acquired non-stealable state. 
As result, previously, the same message was returned by the queue (as the 
deletion did happen), causing the loop in the ring overflow policy. The 
implemented change should prevent the CPU spinning in the ring policy loop on 
running into acquired non-stealable message.

IMHO, it should be a reasonable compromise, as a message transition into 
acquired non-stealable state happens when a consumer settles the message. The 
non-transacted messaging use cases should not affected by the reported issue, 
as the message deletion happens immediately after settling. Thus, the message 
would be in acquired non-stealable state for a very short amount time. As for  
transacted messaging use cases, the consumer can commit its transaction with 
some delay after messages being settled. Thus, the transacted settled message  
can be in acquired non-stealable state for some time until consuming 
transaction is committed. I strongly believe that messages consumed using 
Session.CLIENT_ACKNOWLEDGE are not effected by the issue. Though, if you see 
the issue manifestation for Session.CLIENT_ACKNOWLEDGE please let me know. 

The idea behind the acquired non-stealable state is that message cannot be 
deleted on the broker side when it is in the acquired state. It was added to 
solve some concurrency issues on concurrent message deletion from multiple 
threads (for example, consumer thread and management thread performing queue 
clean up). 

Potentially, the issue can be solved by not putting the message into acquired 
non-stealable state on settling transacted message. For example, transacted 
messages can remain in the acquired steabale state until transaction is 
committed. Though, that might require a lot more changes to the code. Please 
feel free to try to implement it, if the provided implementation is not 
acceptable for you.

Kind Regards,
Alex




> High CPU usage and stucked connections
> --------------------------------------
>
>                 Key: QPID-8514
>                 URL: https://issues.apache.org/jira/browse/QPID-8514
>             Project: Qpid
>          Issue Type: Bug
>          Components: Broker-J
>    Affects Versions: qpid-java-broker-8.0.4
>            Reporter: Daniil Kirilyuk
>            Priority: Minor
>         Attachments: QPID-8514 - broker-log.zip, 
> RingPolicyConcurrencyTest.java, threaddump.zip
>
>
> There were around 600 connections in state CLOSE__WAIT: 
> {noformat}
> [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep TIME_WAIT | wc -l
> 38
> [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep CLOSE_WAIT | wc -l
> 625
> [prod01@fdmchd log]$ netstat -Ainet | grep 22101 | grep -v CLOSE_WAIT | grep 
> -V TIME_WAIT | wc -l
> 7
> {noformat}
> CPU usage raises up to 90% and higher.
> Connections couldn't be closed by webgui. Broker log contains message below, 
> but connections were not closed.
> {noformat}
> 2021-03-26 15:12:28,474 INFO  [Broker-Config] (q.m.c.model_delete) - 
> [mng:3hpU+miq(admin@/127.0.0.1:62743)] 
> [con:751(TEST@/127.0.0.1:40422/default)] CON-1007 : Connection close 
> initiated by operator
> {noformat}
>  Thread dumps shows around 49 threads with same stacktrace related to ring 
> queue
> {noformat}
> java.lang.Thread.State: RUNNABLE
>       at 
> org.apache.qpid.server.queue.QueueEntryImpl.addStateChangeListener(QueueEntryImpl.java:674)
>       at 
> org.apache.qpid.server.queue.QueueEntryImpl.acquireOrSteal(QueueEntryImpl.java:316)
>       at 
> org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1913)
>       at 
> org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1863)
>       at 
> org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.checkOverflow(RingOverflowPolicyHandler.java:102)
>       at 
> org.apache.qpid.server.queue.RingOverflowPolicyHandler$Handler.access$000(RingOverflowPolicyHandler.java:44)
>       at 
> org.apache.qpid.server.queue.RingOverflowPolicyHandler.checkOverflow(RingOverflowPolicyHandler.java:41)
>       at 
> org.apache.qpid.server.queue.AbstractQueue.doEnqueue(AbstractQueue.java:1382)
>       at 
> org.apache.qpid.server.queue.AbstractQueue.enqueue(AbstractQueue.java:1290)
>       at 
> org.apache.qpid.server.message.RoutingResult$1.postCommit(RoutingResult.java:135)
>       at 
> org.apache.qpid.server.txn.AsyncAutoCommitTransaction$3.postCommit(AsyncAutoCommitTransaction.java:309)
>       at 
> org.apache.qpid.server.txn.AsyncCommand.complete(AsyncCommand.java:84)
>       at 
> org.apache.qpid.server.protocol.v1_0.StandardReceivingLinkEndpoint.receiveComplete(StandardReceivingLinkEndpoint.java:588)
>       at 
> org.apache.qpid.server.protocol.v1_0.Session_1_0.lambda$receivedComplete$5(Session_1_0.java:1343)
>       at 
> org.apache.qpid.server.protocol.v1_0.Session_1_0$$Lambda$142/959267941.accept(Unknown
>  Source)
>       at java.lang.Iterable.forEach(Iterable.java:75)
>       at 
> org.apache.qpid.server.protocol.v1_0.Session_1_0.receivedComplete(Session_1_0.java:1343)
>       at 
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.lambda$receivedComplete$11(AMQPConnection_1_0Impl.java:1349)
>       at 
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl$$Lambda$141/1774823837.run(Unknown
>  Source)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at 
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.receivedComplete(AMQPConnection_1_0Impl.java:1347)
>       at 
> org.apache.qpid.server.protocol.v1_0.AMQPConnection_1_0Impl.onReceive(AMQPConnection_1_0Impl.java:1328)
>       at 
> org.apache.qpid.server.transport.AbstractAMQPConnection.lambda$received$2(AbstractAMQPConnection.java:576)
>       at 
> org.apache.qpid.server.transport.AbstractAMQPConnection$$Lambda$119/6380061.run(Unknown
>  Source)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at 
> org.apache.qpid.server.transport.AbstractAMQPConnection.received(AbstractAMQPConnection.java:571)
>       at 
> org.apache.qpid.server.transport.MultiVersionProtocolEngine.received(MultiVersionProtocolEngine.java:135)
>       at 
> org.apache.qpid.server.transport.NonBlockingConnection.processAmqpData(NonBlockingConnection.java:611)
>       at 
> org.apache.qpid.server.transport.NonBlockingConnectionPlainDelegate.processData(NonBlockingConnectionPlainDelegate.java:58)
>       at 
> org.apache.qpid.server.transport.NonBlockingConnection.doRead(NonBlockingConnection.java:496)
>       at 
> org.apache.qpid.server.transport.NonBlockingConnection.doWork(NonBlockingConnection.java:270)
>       at 
> org.apache.qpid.server.transport.NetworkConnectionScheduler.processConnection(NetworkConnectionScheduler.java:134)
>       at 
> org.apache.qpid.server.transport.SelectorThread$ConnectionProcessor.processConnection(SelectorThread.java:575)
>       at 
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.performSelect(SelectorThread.java:366)
>       at 
> org.apache.qpid.server.transport.SelectorThread$SelectionTask.run(SelectorThread.java:97)
>       at 
> org.apache.qpid.server.transport.SelectorThread.run(SelectorThread.java:533)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at 
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
>       at 
> org.apache.qpid.server.bytebuffer.QpidByteBufferFactory$$Lambda$10/1487500813.run(Unknown
>  Source)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Also ring queue behaves strangely: queue "{{DeadLetterQueue"}} has set 
> maximum message depth to {{1}}, but in there were 55 message with 
> {{Acquired}} state.
>  First message in {{DeadLetterQueue}} - {{2021-03-25, 16:50:59.381 (00:00 
> UTC)}}.
>  Last message in {{DeadLetterQueue}} - {{2021-03-25, 16:50:59.683 (00:00 
> UTC)}}.
> These times correlates to timed out drain request at 16:52:00,410, because 
> default drain request is set 60 seconds:
> {noformat}
> ^[[0m^[[0m2021-03-25 16:52:00,408 INFO  [org.apache.qpid.jms.JmsSession] 
> (QpidJMS Connection Executor: ID:986cd915-d072-4711-bd71-50a74aab4be8:1) A 
> JMS MessageConsumer has been closed: JmsConsumerInfo: { 
> ID:986cd915-d072-4711-bd71-50a74aab4be8:1:1:1, destination = 
> broadcast.XX.PRISMA_LIVERiskLimitTrigger }
> ^[[0m^[[33m2021-03-25 16:52:00,410 WARN  
> [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel 
> (camel-1) thread #3 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger]) 
> Setup of JMS message listener invoker failed for destination 
> 'broadcast.XX.PRISMA_LIVERiskLimitTrigger' - trying to recover. Cause: Remote 
> did not respond to a drain request in time
> ^[[0m^[[33m2021-03-25 16:52:01,414 WARN  
> [org.apache.camel.component.jms.DefaultJmsMessageListenerContainer] (Camel 
> (camel-1) thread #6 - JmsConsumer[broadcast.XX.PRISMA_LIVERiskLimitTrigger]) 
> Setup of JMS message listener invoker failed for destination 
> 'broadcast.XX.PRISMA_LIVERiskLimitTrigger'
> {noformat}
> After the problem was triggered broker started to reject client connections:
> {noformat}
>  2021-03-25 17:55:43,716 INFO  [Broker-Config] (q.m.c.open) - 
> [con:901(/127.0.0.1:56012)] CON-1001 : Open : Destination : 
> amqp(127.0.0.1:22101) : Protocol Version : 1.0
> 2021-03-25 17:55:43,745 INFO  [IO-/127.0.0.1:56012] (q.m.c.open) - 
> [con:901(ARP@/127.0.0.1:56012/default)] CON-1001 : Open : Destination : 
> amqp(172.16.75.197:22101) : Protocol Version : 1.0 : Client ID : 
> ID:af50d087-0610-406a-9792-ffa5bf6b68b4:5 : Client Version : 0.51.0 : Client 
> Product : QpidJMS
> 2021-03-25 17:55:43,747 INFO  [IO-/127.0.0.1:56012] (q.m.a.denied) - 
> [con:901(ARP@/127.0.0.1:56012/default)] ACL-1002 : Denied : Access 
> Virtualhost NAME=default,VIRTUALHOST_NAME=default
> 2021-03-25 17:55:43,755 INFO  [Broker-Config] (q.m.c.dropped_connection) - 
> [con:901(ARP@/127.0.0.1:56012/default)] CON-1004 : Connection dropped
> {noformat}
> Issue seems to be quite rare: same test ran in a loop over several days and 
> this issue was not triggered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to