[
https://issues.apache.org/jira/browse/ARTEMIS-5446?focusedWorklogId=967904&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-967904
]
ASF GitHub Bot logged work on ARTEMIS-5446:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 28/Apr/25 20:52
Start Date: 28/Apr/25 20:52
Worklog Time Spent: 10m
Work Description: jbertram opened a new pull request, #5655:
URL: https://github.com/apache/activemq-artemis/pull/5655
When the broker starts with AMQP protocol support an AckManager instance is
automatically created and added to the broker's list of "external components."
However, this component is not removed when the broker is stopped (e.g. when an
active backup shuts down due to fail-back). Over time these instances can
accumulate and cause memory issues.
Issue Time Tracking
-------------------
Worklog Id: (was: 967904)
Remaining Estimate: 0h
Time Spent: 10m
> Memory leak on Artemis backup node with failover & failback
> -----------------------------------------------------------
>
> Key: ARTEMIS-5446
> URL: https://issues.apache.org/jira/browse/ARTEMIS-5446
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.36.0
> Reporter: Jean-Pascal Briquet
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2025-04-24-11-16-45-740.png,
> image-2025-04-24-11-20-25-598.png, image-2025-04-24-11-24-45-343.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> *Description:*
> Backup nodes may encounter OOM errors when the primary become unavailable and
> the backup transitions to live. This issue impacts HA, as the backup node may
> become unresponsive and block until the node is restarted by an operator.
> *Analysis:*
> Upon analyzing a heap dump of a backup node, it appears that instances of
> object of type {{PostOfficeImpl}} accumulate on the heap each time the
> primary node is restarted. These {{PostOfficeImpl}} objects (and related
> objects like {{QueueImpl}}, {{DivertImpl}}, {{ClusterConnectionImpl}}, ...)
> are not removed by the GC.
> In large configuration (1500 queues or more), it can fill up the heap memory
> quickly. I tried to traceback the source of the problem, but it goes a bit
> far into Artemis internals.
> Once this state is reached, OOM errors happens randomly in various stack
> traces:
> {code:java}
> Caused by: java.lang.OutOfMemoryError: Java heap space{code}
> *Reproduction Scenario:*
> * Start a primary/backup pair.
> * Primary node is active and the backup node is synchronized
> * Capture a JVM heap dump (at this stage only one single {{PostOfficeImpl}}
> instance exists on the heap)
> Repeat the following steps multiple times:
> * Stop the primary node
> * Wait for the backup to activate
> * Start the primary node to trigger failback
> * Wait for the primary to activate
> After several cycles perform a last JVM heap dump. You will observe multiple
> {{PostOfficeImpl}} instances lingering in the heap.
> *Example:*
> !image-2025-04-24-11-16-45-740.png|width=503,height=175!
> Another example with a high number of Queues in configuration (see retained
> heap size).
> !image-2025-04-24-11-20-25-598.png|width=749,height=214!
> Each instance of {{PostOfficeIml}} has a retained size of 970MB
> !image-2025-04-24-11-24-45-343.png|width=557,height=256!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact