Jean-Pascal Briquet created ARTEMIS-5446:
--------------------------------------------

             Summary: Memory leak on Artemis backup node with HA replication 
policy and Zookeeper quorum
                 Key: ARTEMIS-5446
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-5446
             Project: ActiveMQ Artemis
          Issue Type: Bug
          Components: Broker
    Affects Versions: 2.36.0
            Reporter: Jean-Pascal Briquet
         Attachments: image-2025-04-24-11-16-45-740.png, 
image-2025-04-24-11-20-25-598.png, image-2025-04-24-11-24-45-343.png

*Description:*
Backup nodes may encounter OOM errors when the primary become unavailable and 
the backup transitions to live.
This issue impacts HA, as the backup node may become unresponsive and block 
until the node is restarted by an operator.

 

*Analysis:*

Upon analyzing a heap dump of a backup node, it appears that instances of 
object of type PostOfficeImpl accumulate on the heap each time the primary node 
is restarted.
These PostOfficeImpl objects (and related objects like QueueImpl, DivertImpl, 
ClusterConnectionImpl, ...) are not removed by the GC.

In large configuration (1500 queues or more), it can fill up the heap memory 
quickly.
I tried to traceback the source of the problem, but it goes a bit far into 
Artemis internals.

Once this state is reached, OOM errors happens randomly in various stack traces:
{code:java}
Caused by: java.lang.OutOfMemoryError: Java heap space{code}
 

*Reproduction Scenario:*
 * Start a primary/backup pair.
 * Primary node is live and the backup node is synchronized
 * Capture a JVM heap dump (at this stage only one single PostOfficeImpl 
instance exists on the heap)

Repeat the following steps multiple times:
 * Stop the primary node
 * Wait for the backup to become live
 * Start the primary node
 * The backup give back the lead to the primary
 * Wait for the primary to become live

After several cycles, perform a last JVM heap dump. You will observe multiple 
PostOfficeImpl instances lingering in the heap.


*Example:*

!image-2025-04-24-11-16-45-740.png|width=503,height=175!

 

Another example with a high number of Queues in configuration (see retained 
heap size).

!image-2025-04-24-11-20-25-598.png|width=749,height=214!

Each instance of PostOfficeIml has a retained size of 970MB

!image-2025-04-24-11-24-45-343.png|width=557,height=256!

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to