[
https://issues.apache.org/jira/browse/ARTEMIS-5735?focusedWorklogId=994981&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-994981
]
ASF GitHub Bot logged work on ARTEMIS-5735:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 05/Dec/25 16:28
Start Date: 05/Dec/25 16:28
Worklog Time Spent: 10m
Work Description: jbertram commented on PR #6063:
URL:
https://github.com/apache/activemq-artemis/pull/6063#issuecomment-3617601164
@michaeladada, thanks for the contribution. Nice work!
Issue Time Tracking
-------------------
Worklog Id: (was: 994981)
Time Spent: 3h 10m (was: 3h)
> Queue is stuck with orphaned consumer and no longer consumes messages
> ---------------------------------------------------------------------
>
> Key: ARTEMIS-5735
> URL: https://issues.apache.org/jira/browse/ARTEMIS-5735
> Project: Artemis
> Issue Type: Bug
> Affects Versions: 2.35.0
> Reporter: Michael Shemesh
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.45.0
>
> Attachments: artemis_consumer_issue.tar,
> image-2025-10-31-09-08-39-117.png
>
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> h3. *Problem Description*
> Under heavy load queues get stuck and their consumers do not receive any
> messages.
> The issue seems to be that there is one Orphaned consumer with messages in
> transit so no more messages are being processes by other consumers.
> This is how it looks on our web console:
> !image-2025-10-31-09-08-39-117.png|width=717,height=157!
> This seems to be related to issue ARTEMIS-4476.
> The added visibility there helped understand the issue, however it did not
> fix it.
>
> ----
> h3. *Technical Information*
> After some investigation, I found that the the issue is a race condition
> between the of closing a connection and the creation of a new session.
> I have created an example application that reproduces one part of the the
> issue and shows that it's possible to have the
> {{ServerSessionImpl.connectionFailed}} miss the connection event, so the
> connection is being closed while the session was not yet registered to the
> connection {{AbstractRemotingConnection.failureListeners}}.
> In such cases a Orphaned session and consumer will block the usage of the
> queue.
> In more severe cases (our production) the session is in fact registered to
> the {{failureListeners}}, but it was not yet added to the sessions map, so
> after it is closed, only then is this closed session added to the sessions
> map and then it can not be removed from there by manually closing the session
> from the web console.
> The relevant code part with the race condition is in {{ActiveMQServerImpl}}:
> {code:java}
> ServerSessionImpl session = new ServerSessionImpl(...);
> sessions.put(name, session); {code}
> and {{AbstractRemotingConnection.callFailureListeners}}
> The project [^artemis_consumer_issue.tar] attached is what I used to
> reproduce the simpler case of just having a Orphaned connection that can be
> closed manually.
> The {{README.md}} file in the project explains how to reproduce the issue.
> This is the general process:
> # I'm intervening in the code in order to delay sending the
> {{failureListeners}} events by adding a {{Thread.sleep()}}.
> # Add a very short ttl in {{broker.xml}} to force the
> {{FailureCheckAndFlushThread}} process to close the connections.
> # Add some messages to a queue when there are no consumers yet
> # Startup the consumers (8 in this case)
> # The 8 consumers take more time to handle the messages than the ttl
> configured
> # The connection will be created and terminated very shortly after.
> This will trigger a recreation of the sessions and consumers while the
> {{failureListeners}} are still stuck on the sleep.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact