[ 
https://issues.apache.org/jira/browse/AMQ-9107?focusedWorklogId=815473&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-815473
 ]

ASF GitHub Bot logged work on AMQ-9107:
---------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Oct/22 07:11
            Start Date: 11/Oct/22 07:11
    Worklog Time Spent: 10m 
      Work Description: lucastetreault opened a new pull request, #908:
URL: https://github.com/apache/activemq/pull/908

   Running a profiler while executing the sample code attached to 
[AMQ-9107](https://issues.apache.org/jira/browse/AMQ-9107) identified 
ManagedRegionBroker.removeConsumer as the bottleneck. The existing 
implementation loops over all the subscriptions to find the subscription for 
the consumer we want to close. When we have n consumers and we want to close 
them all this for loop is O(n^2) and when n is big enough it creates a serious 
performance issue. With 188,000 consumers we observe the CPU at 100% for ~40 
minutes while all the connections are closed: 
   
   <img width="1217" alt="image" 
src="https://user-images.githubusercontent.com/7095337/195011857-a6971abb-b73c-41fd-bd88-9ab376388949.png";>
   
   
   After this PR, running the same test case we observe a spike in CPU of only 
one minute or less, similar to what it took to create the consumers: 
   
   <img width="968" alt="image" 
src="https://user-images.githubusercontent.com/7095337/195017869-c17c8b4a-fabc-4c2c-a909-6073955613a1.png";>
   
   I ran the full suite of tests and everything is passing.
   
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 815473)
    Remaining Estimate: 0h
            Time Spent: 10m

> Closing many consumers causes CPU to spike to 100%
> --------------------------------------------------
>
>                 Key: AMQ-9107
>                 URL: https://issues.apache.org/jira/browse/AMQ-9107
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.17.1, 5.16.5
>            Reporter: Lucas Tétreault
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
>         Attachments: example.zip, image-2022-10-07-00-12-39-657.png, 
> image-2022-10-07-00-17-30-657.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When there are many consumers (~188k) on a queue, closing them is incredibly 
> expensive and causes the CPU to spike to 100% while the consumers are closed. 
> Tested on an Amazon MQ mq.m5.large instance (2 vcpu, 8gb memory).
> I have attached a minimal recreation of the issue where the following 
> happens: 
> 1/ Open 100 connections.
> 2/ Create consumers as fast as we can on all of those connections until we 
> hit at least 188k consumers.
> 3/ Sleep for 5 minutes so we can observe the CPU come back down after opening 
> all those connections.
> 4/ Start closing consumers as fast as we can.
> 5/ After all consumers are closed, sleep for 5 minutes to observe the CPU 
> come back down after closing all the connections.
>  
> In this example it seems 5 minutes wasn't actually sufficient time for the 
> CPU to come back down and the consumer and connection counts seem to hit 0 at 
> the same time: 
> !image-2022-10-07-00-12-39-657.png|width=757,height=353!
>  
> In a previous test with more time sleeping after closing all the consumers we 
> can see the CPU come back down before we close the connections. 
> !image-2022-10-07-00-17-30-657.png|width=764,height=348!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to