[ 
https://issues.apache.org/jira/browse/ARTEMIS-5735?focusedWorklogId=994976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-994976
 ]

ASF GitHub Bot logged work on ARTEMIS-5735:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Dec/25 16:07
            Start Date: 05/Dec/25 16:07
    Worklog Time Spent: 10m 
      Work Description: gemmellr commented on PR #6063:
URL: 
https://github.com/apache/activemq-artemis/pull/6063#issuecomment-3617519774

   I would squash during 'merge' personally, its really 1 overall change, would 
be nicer if it was all in the same commit, and its nice not adding known-broken 
stuff to main. Can always still see both component commits on the PR later if 
looking at the change close enough to care.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 994976)
    Time Spent: 2h 40m  (was: 2.5h)

> Queue is stuck with orphaned consumer and no longer consumes messages
> ---------------------------------------------------------------------
>
>                 Key: ARTEMIS-5735
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-5735
>             Project: Artemis
>          Issue Type: Bug
>    Affects Versions: 2.35.0
>            Reporter: Michael Shemesh
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: artemis_consumer_issue.tar, 
> image-2025-10-31-09-08-39-117.png
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> h3. *Problem Description*
> Under heavy load queues get stuck and their consumers do not receive any 
> messages.
> The issue seems to be that there is one Orphaned consumer with messages in 
> transit so no more messages are being processes by other consumers.
> This is how it looks on our web console:
> !image-2025-10-31-09-08-39-117.png|width=717,height=157!
> This seems to be related to issue ARTEMIS-4476.
> The added visibility there helped understand the issue, however it did not 
> fix it.
>  
> ----
> h3. *Technical Information*
> After some investigation, I found that the the issue is a race condition 
> between the of closing a connection and the creation of a new session.
> I have created an example application that reproduces one part of the the 
> issue and shows that it's possible to have the 
> {{ServerSessionImpl.connectionFailed}} miss the connection event, so the 
> connection is being closed while the session was not yet registered to the 
> connection {{AbstractRemotingConnection.failureListeners}}.
> In such cases a Orphaned session and consumer will block the usage of the 
> queue.
> In more severe cases (our production) the session is in fact registered to 
> the {{failureListeners}}, but it was not yet added to the sessions map, so 
> after it is closed, only then is this closed session added to the sessions 
> map and then it can not be removed from there by manually closing the session 
> from the web console.
> The relevant code part with the race condition is in {{ActiveMQServerImpl}}:
> {code:java}
> ServerSessionImpl session = new ServerSessionImpl(...);
> sessions.put(name, session); {code}
> and {{AbstractRemotingConnection.callFailureListeners}}
> The project  [^artemis_consumer_issue.tar]  attached is what I used to 
> reproduce the simpler case of just having a Orphaned connection that can be 
> closed manually.
> The {{README.md}} file in the project explains how to reproduce the issue.
> This is the general process:
>  # I'm intervening in the code in order to delay sending the 
> {{failureListeners}} events by adding a {{Thread.sleep()}}.
>  # Add a very short ttl in {{broker.xml}} to force the 
> {{FailureCheckAndFlushThread}} process to close the connections.
>  # Add some messages to a queue when there are no consumers yet
>  # Startup the consumers (8 in this case)
>  # The 8 consumers take more time to handle the messages than the ttl 
> configured
>  # The connection will be created and terminated very shortly after.
> This will trigger a recreation of the sessions and consumers while the 
> {{failureListeners}} are still stuck on the sleep.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to