Michael Shemesh created ARTEMIS-5735:
----------------------------------------

             Summary: Queue is stuck with Orphaned consumer and no longer 
consumes messages
                 Key: ARTEMIS-5735
                 URL: https://issues.apache.org/jira/browse/ARTEMIS-5735
             Project: ActiveMQ Artemis
          Issue Type: Bug
    Affects Versions: 2.35.0
            Reporter: Michael Shemesh
         Attachments: artemis_consumer_issue.tar, 
image-2025-10-31-09-08-39-117.png

h3. *Problem Description*

Under heavy load queues get stuck and their consumers do not receive any 
messages.

The issue seems to be that there is one Orphaned consumer with messages in 
transit so no more messages are being processes by other consumers.

This is how it looks on our web console:

!image-2025-10-31-09-08-39-117.png|width=717,height=157!

 

This seems to be related to issue 
https://issues.apache.org/jira/browse/ARTEMIS-4476

The added visibility there helped understand the issue, however it did not fix 
it.

 
----
h3. *Technical Information*

After some investigation, I found that the the issue is a race condition 
between the of closing a connection and the creation of a new session.

I have created an example application that reproduces one part of the the issue 
and shows that it's possible to have the *ServerSessionImpl.connectionFailed* 
miss the connection event, so the connection is being closed while the session 
was not yet registered to the connection 
{*}AbstractRemotingConnection.failureListeners{*}.
In such cases a Orphaned session and consumer will block the usage of the queue.

In more severe cases (our production) the session is in fact registered to the 
{*}failureListeners{*}, but it was not yet added to the sessions map, so after 
it is closed, only then is this closed session added to the sessions map and 
then it can not be removed from there by manually closing the session from the 
web console.

The relevant code part with the race condition is in {*}ActiveMQServerImpl{*}:
{code:java}
ServerSessionImpl session = new ServerSessionImpl(...);
sessions.put(name, session); {code}
and *AbstractRemotingConnection.callFailureListeners*

 

The project *artemis_consumer_issue.tar* attached is what I used to reproduce 
the simpler case of just having a Orphaned connection that can be closed 
manually.

The *README.md* file in the project explains how to reproduce the issue.
This is the general process:
 # I'm intervening in the code in order to delay sending the *failureListeners* 
events by adding a sleep.
 # Add a very short ttl in the broker.xml to force the 
*FailureCheckAndFlushThread* process to close the connections.
 # Add some messages to a queue when there are no consumers yet
 # Startup the consumers (8 in this case)
 # The 8 consumers take more time to handle the messages than the ttl configured
 # The connection will be created and terminated very shortly after.
This will trigger a recreation of the sessions and consumers while the 
*failureListeners* are still stuck on the sleep.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to