[
https://issues.apache.org/jira/browse/AMQ-9107?focusedWorklogId=822456&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-822456
]
ASF GitHub Bot logged work on AMQ-9107:
---------------------------------------
Author: ASF GitHub Bot
Created on: 01/Nov/22 20:20
Start Date: 01/Nov/22 20:20
Worklog Time Spent: 10m
Work Description: cshannon commented on PR #928:
URL: https://github.com/apache/activemq/pull/928#issuecomment-1299090316
@mattrpav , @jbonofre . @lucastetreault - This is ready for review, see what
you think. Part of this commit reverts the previous change and updates with a
new test.
Because technically ConsumerInfos can be associated with multiple
Subscriptions and because ConsumerInfo objects can change if a durable
subscription goes offline and back online I thought it was generally a bad idea
to store a map associating them as you'd have to keep it up to date on changes
with durables (memory leak in the original PR because of that) and doesn't
really work for one to many association of composite destinations even if
rarely used.
The new approach here is to just use the existing Regions that already
handle all of that and track in a concurrent map consumer ids and subscriptions
so all we have to do is look through the region or regions (if composite) to
grab the subs and then we are good. We shouldn't need it but I added a fail
safe where it falls back on Exception or if it couldn't find a sub using the
new method but that shouldn't happen unless something went very wrong.
I would expect performance to still be much better than before and similar
to the previous commit as we are still just looking up the subscriptions in
concurrent maps vs iterating.
Issue Time Tracking
-------------------
Worklog Id: (was: 822456)
Time Spent: 3h 10m (was: 3h)
> Closing many consumers causes CPU to spike to 100%
> --------------------------------------------------
>
> Key: AMQ-9107
> URL: https://issues.apache.org/jira/browse/AMQ-9107
> Project: ActiveMQ
> Issue Type: Bug
> Affects Versions: 5.17.1, 5.16.5
> Reporter: Lucas Tétreault
> Assignee: Jean-Baptiste Onofré
> Priority: Major
> Fix For: 5.18.0, 5.16.6, 5.17.3
>
> Attachments: example.zip, image-2022-10-07-00-12-39-657.png,
> image-2022-10-07-00-17-30-657.png
>
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> When there are many consumers (~188k) on a queue, closing them is incredibly
> expensive and causes the CPU to spike to 100% while the consumers are closed.
> Tested on an Amazon MQ mq.m5.large instance (2 vcpu, 8gb memory).
> I have attached a minimal recreation of the issue where the following
> happens:
> 1/ Open 100 connections.
> 2/ Create consumers as fast as we can on all of those connections until we
> hit at least 188k consumers.
> 3/ Sleep for 5 minutes so we can observe the CPU come back down after opening
> all those connections.
> 4/ Start closing consumers as fast as we can.
> 5/ After all consumers are closed, sleep for 5 minutes to observe the CPU
> come back down after closing all the connections.
>
> In this example it seems 5 minutes wasn't actually sufficient time for the
> CPU to come back down and the consumer and connection counts seem to hit 0 at
> the same time:
> !image-2022-10-07-00-12-39-657.png|width=757,height=353!
>
> In a previous test with more time sleeping after closing all the consumers we
> can see the CPU come back down before we close the connections.
> !image-2022-10-07-00-17-30-657.png|width=764,height=348!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)