[ 
https://issues.apache.org/jira/browse/ARTEMIS-5735?focusedWorklogId=992841&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-992841
 ]

ASF GitHub Bot logged work on ARTEMIS-5735:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Nov/25 13:38
            Start Date: 21/Nov/25 13:38
    Worklog Time Spent: 10m 
      Work Description: gemmellr commented on PR #6063:
URL: 
https://github.com/apache/activemq-artemis/pull/6063#issuecomment-3563069764

   There looks to be a persistent test failure introduced in one of the extra 
ported OpenWire tests. On 3 full runs 
org.apache.activemq.ConnectionCleanupTest.testChangeClientID() has failed each 
time, complaining about a session being closed whilst creating a consumer.
   
   It does look like a strange test, accessing client internals to 'remove' and 
yet then reuse the same Connection, so I'm not clear whether its something real 
clients and applications can do and if so whether it needs to be kept working.




Issue Time Tracking
-------------------

    Worklog Id:     (was: 992841)
    Time Spent: 50m  (was: 40m)

> Queue is stuck with orphaned consumer and no longer consumes messages
> ---------------------------------------------------------------------
>
>                 Key: ARTEMIS-5735
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-5735
>             Project: Artemis
>          Issue Type: Bug
>    Affects Versions: 2.35.0
>            Reporter: Michael Shemesh
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: artemis_consumer_issue.tar, 
> image-2025-10-31-09-08-39-117.png
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> h3. *Problem Description*
> Under heavy load queues get stuck and their consumers do not receive any 
> messages.
> The issue seems to be that there is one Orphaned consumer with messages in 
> transit so no more messages are being processes by other consumers.
> This is how it looks on our web console:
> !image-2025-10-31-09-08-39-117.png|width=717,height=157!
> This seems to be related to issue ARTEMIS-4476.
> The added visibility there helped understand the issue, however it did not 
> fix it.
>  
> ----
> h3. *Technical Information*
> After some investigation, I found that the the issue is a race condition 
> between the of closing a connection and the creation of a new session.
> I have created an example application that reproduces one part of the the 
> issue and shows that it's possible to have the 
> {{ServerSessionImpl.connectionFailed}} miss the connection event, so the 
> connection is being closed while the session was not yet registered to the 
> connection {{AbstractRemotingConnection.failureListeners}}.
> In such cases a Orphaned session and consumer will block the usage of the 
> queue.
> In more severe cases (our production) the session is in fact registered to 
> the {{failureListeners}}, but it was not yet added to the sessions map, so 
> after it is closed, only then is this closed session added to the sessions 
> map and then it can not be removed from there by manually closing the session 
> from the web console.
> The relevant code part with the race condition is in {{ActiveMQServerImpl}}:
> {code:java}
> ServerSessionImpl session = new ServerSessionImpl(...);
> sessions.put(name, session); {code}
> and {{AbstractRemotingConnection.callFailureListeners}}
> The project  [^artemis_consumer_issue.tar]  attached is what I used to 
> reproduce the simpler case of just having a Orphaned connection that can be 
> closed manually.
> The {{README.md}} file in the project explains how to reproduce the issue.
> This is the general process:
>  # I'm intervening in the code in order to delay sending the 
> {{failureListeners}} events by adding a {{Thread.sleep()}}.
>  # Add a very short ttl in {{broker.xml}} to force the 
> {{FailureCheckAndFlushThread}} process to close the connections.
>  # Add some messages to a queue when there are no consumers yet
>  # Startup the consumers (8 in this case)
>  # The 8 consumers take more time to handle the messages than the ttl 
> configured
>  # The connection will be created and terminated very shortly after.
> This will trigger a recreation of the sessions and consumers while the 
> {{failureListeners}} are still stuck on the sleep.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to