[ 
https://issues.apache.org/jira/browse/ARTEMIS-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18034338#comment-18034338
 ] 

Michael Shemesh commented on ARTEMIS-5735:
------------------------------------------

I would suggest to apply this patch to avoid this issue:
{code:java}
Subject: [PATCH] Avoid race condition with Orphaned consumers
---
Index: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git 
a/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java
 
b/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java
--- 
a/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java
    (revision b4d3a776499cb3ef9a350107faa998c81b20c3e6)
+++ 
b/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ActiveMQServerImpl.java
    (date 1761897778629)
@@ -1915,8 +1915,11 @@
       }
 
       ServerSessionImpl session = new ServerSessionImpl(name, username, 
password, validatedUser, minLargeMessageSize, autoCommitSends, autoCommitAcks, 
preAcknowledge, configuration.isPersistDeliveryCountBeforeDelivery(), xa, 
connection, storageManager, postOffice, resourceManager, securityStore, 
managementService, this, configuration.getManagementAddress(), defaultAddress 
== null ? null : SimpleString.of(defaultAddress), callback, context, 
pagingManager, prefixes, securityDomain, isLegacyProducer);
-
       sessions.put(name, session);
+      connection.addFailureListener(session);
+      if (connection.isDestroyed()) {
+         session.close(true);
+      }
 
       if (hasBrokerSessionPlugins()) {
          callBrokerSessionPlugins(plugin -> 
plugin.afterCreateSession(session));
Index: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ServerSessionImpl.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git 
a/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ServerSessionImpl.java
 
b/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ServerSessionImpl.java
--- 
a/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ServerSessionImpl.java
    (revision b4d3a776499cb3ef9a350107faa998c81b20c3e6)
+++ 
b/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ServerSessionImpl.java
    (date 1761897778640)
@@ -280,7 +280,6 @@
 
       this.defaultAddress = defaultAddress;
 
-      remotingConnection.addFailureListener(this);
       this.context = context;
 
       this.sessionExecutor = server.getExecutorFactory().getExecutor();
 {code}
Basically, add the failureListener only after the session was added to the 
sessions map and after doing so, make sure that the connection was not 
destroyed in the meantime. If it was, close the session.

This avoids any new sync blocks, so the performance should basically stay the 
same.

> Queue is stuck with Orphaned consumer and no longer consumes messages
> ---------------------------------------------------------------------
>
>                 Key: ARTEMIS-5735
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-5735
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>    Affects Versions: 2.35.0
>            Reporter: Michael Shemesh
>            Priority: Major
>         Attachments: artemis_consumer_issue.tar, 
> image-2025-10-31-09-08-39-117.png
>
>
> h3. *Problem Description*
> Under heavy load queues get stuck and their consumers do not receive any 
> messages.
> The issue seems to be that there is one Orphaned consumer with messages in 
> transit so no more messages are being processes by other consumers.
> This is how it looks on our web console:
> !image-2025-10-31-09-08-39-117.png|width=717,height=157!
>  
> This seems to be related to issue 
> https://issues.apache.org/jira/browse/ARTEMIS-4476
> The added visibility there helped understand the issue, however it did not 
> fix it.
>  
> ----
> h3. *Technical Information*
> After some investigation, I found that the the issue is a race condition 
> between the of closing a connection and the creation of a new session.
> I have created an example application that reproduces one part of the the 
> issue and shows that it's possible to have the 
> *ServerSessionImpl.connectionFailed* miss the connection event, so the 
> connection is being closed while the session was not yet registered to the 
> connection {*}AbstractRemotingConnection.failureListeners{*}.
> In such cases a Orphaned session and consumer will block the usage of the 
> queue.
> In more severe cases (our production) the session is in fact registered to 
> the {*}failureListeners{*}, but it was not yet added to the sessions map, so 
> after it is closed, only then is this closed session added to the sessions 
> map and then it can not be removed from there by manually closing the session 
> from the web console.
> The relevant code part with the race condition is in {*}ActiveMQServerImpl{*}:
> {code:java}
> ServerSessionImpl session = new ServerSessionImpl(...);
> sessions.put(name, session); {code}
> and *AbstractRemotingConnection.callFailureListeners*
>  
> The project *artemis_consumer_issue.tar* attached is what I used to reproduce 
> the simpler case of just having a Orphaned connection that can be closed 
> manually.
> The *README.md* file in the project explains how to reproduce the issue.
> This is the general process:
>  # I'm intervening in the code in order to delay sending the 
> *failureListeners* events by adding a sleep.
>  # Add a very short ttl in the broker.xml to force the 
> *FailureCheckAndFlushThread* process to close the connections.
>  # Add some messages to a queue when there are no consumers yet
>  # Startup the consumers (8 in this case)
>  # The 8 consumers take more time to handle the messages than the ttl 
> configured
>  # The connection will be created and terminated very shortly after.
> This will trigger a recreation of the sessions and consumers while the 
> *failureListeners* are still stuck on the sleep.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to