[ 
https://issues.apache.org/jira/browse/AMQ-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher L. Shannon reopened AMQ-9107:
-----------------------------------------

This doesn't have nearly enough testing around durable subscriptions and this 
causes a memory leak. ConsumerInfo is not great to use as the key because that 
object will change every time a consumer goes offline/online for a durable 
subscription. I just ran a test where I went online/offline several times in a 
row for a durable and the new map kept growing because the consumer id changed 
each time.

The performance improvement would be nice but I'm skeptical about using 
ConsumerInfo as a key here unless a lot more testing is done to prove no memory 
leaks and things actually work for all edge cases including broker restarts, 
etc.

Also, by introducing a second map you now have created race conditions because 
the maps may get out of sync with some weird bug.

> Closing many consumers causes CPU to spike to 100%
> --------------------------------------------------
>
>                 Key: AMQ-9107
>                 URL: https://issues.apache.org/jira/browse/AMQ-9107
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.17.1, 5.16.5
>            Reporter: Lucas Tétreault
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
>             Fix For: 5.18.0, 5.16.6, 5.17.3
>
>         Attachments: example.zip, image-2022-10-07-00-12-39-657.png, 
> image-2022-10-07-00-17-30-657.png
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When there are many consumers (~188k) on a queue, closing them is incredibly 
> expensive and causes the CPU to spike to 100% while the consumers are closed. 
> Tested on an Amazon MQ mq.m5.large instance (2 vcpu, 8gb memory).
> I have attached a minimal recreation of the issue where the following 
> happens: 
> 1/ Open 100 connections.
> 2/ Create consumers as fast as we can on all of those connections until we 
> hit at least 188k consumers.
> 3/ Sleep for 5 minutes so we can observe the CPU come back down after opening 
> all those connections.
> 4/ Start closing consumers as fast as we can.
> 5/ After all consumers are closed, sleep for 5 minutes to observe the CPU 
> come back down after closing all the connections.
>  
> In this example it seems 5 minutes wasn't actually sufficient time for the 
> CPU to come back down and the consumer and connection counts seem to hit 0 at 
> the same time: 
> !image-2022-10-07-00-12-39-657.png|width=757,height=353!
>  
> In a previous test with more time sleeping after closing all the consumers we 
> can see the CPU come back down before we close the connections. 
> !image-2022-10-07-00-17-30-657.png|width=764,height=348!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to