Michael Shemesh created ARTEMIS-5735:
----------------------------------------
Summary: Queue is stuck with Orphaned consumer and no longer
consumes messages
Key: ARTEMIS-5735
URL: https://issues.apache.org/jira/browse/ARTEMIS-5735
Project: ActiveMQ Artemis
Issue Type: Bug
Affects Versions: 2.35.0
Reporter: Michael Shemesh
Attachments: artemis_consumer_issue.tar,
image-2025-10-31-09-08-39-117.png
h3. *Problem Description*
Under heavy load queues get stuck and their consumers do not receive any
messages.
The issue seems to be that there is one Orphaned consumer with messages in
transit so no more messages are being processes by other consumers.
This is how it looks on our web console:
!image-2025-10-31-09-08-39-117.png|width=717,height=157!
This seems to be related to issue
https://issues.apache.org/jira/browse/ARTEMIS-4476
The added visibility there helped understand the issue, however it did not fix
it.
----
h3. *Technical Information*
After some investigation, I found that the the issue is a race condition
between the of closing a connection and the creation of a new session.
I have created an example application that reproduces one part of the the issue
and shows that it's possible to have the *ServerSessionImpl.connectionFailed*
miss the connection event, so the connection is being closed while the session
was not yet registered to the connection
{*}AbstractRemotingConnection.failureListeners{*}.
In such cases a Orphaned session and consumer will block the usage of the queue.
In more severe cases (our production) the session is in fact registered to the
{*}failureListeners{*}, but it was not yet added to the sessions map, so after
it is closed, only then is this closed session added to the sessions map and
then it can not be removed from there by manually closing the session from the
web console.
The relevant code part with the race condition is in {*}ActiveMQServerImpl{*}:
{code:java}
ServerSessionImpl session = new ServerSessionImpl(...);
sessions.put(name, session); {code}
and *AbstractRemotingConnection.callFailureListeners*
The project *artemis_consumer_issue.tar* attached is what I used to reproduce
the simpler case of just having a Orphaned connection that can be closed
manually.
The *README.md* file in the project explains how to reproduce the issue.
This is the general process:
# I'm intervening in the code in order to delay sending the *failureListeners*
events by adding a sleep.
# Add a very short ttl in the broker.xml to force the
*FailureCheckAndFlushThread* process to close the connections.
# Add some messages to a queue when there are no consumers yet
# Startup the consumers (8 in this case)
# The 8 consumers take more time to handle the messages than the ttl configured
# The connection will be created and terminated very shortly after.
This will trigger a recreation of the sessions and consumers while the
*failureListeners* are still stuck on the sleep.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact