[jira] [Commented] (ARTEMIS-4797) Failover connection references are not always cleaned up in NettyAcceptor, leaking memory

Josh Byster (Jira) Thu, 06 Jun 2024 15:22:36 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852959#comment-17852959
 ]


Josh Byster commented on ARTEMIS-4797:
--------------------------------------

[~erwindon] They do not appear for me as far as I can tell in the console. 
However, they have the same client ID on reconnect (due to failover). Looks 
like in your issue there is no client ID.
If you try to cherry-pick and local build Artemis with the PR for this ticket, 
it should be able to quickly tell you if it's the same issue since the fix I 
did does work for me. https://github.com/apache/activemq-artemis/pull/4960 

> Failover connection references are not always cleaned up in NettyAcceptor, 
> leaking memory
> -----------------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-4797
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4797
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: OpenWire
>            Reporter: Josh Byster
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> I'm still trying to parse through exactly what conditions this occurs in, 
> since I'm able to reproduce it in a very specific production setup but not in 
> an isolated environment locally.
> For context, we have custom slow consumer detection that closes connection 
> IDs with slow consumers. These connections are connected via failover 
> transport using client ActiveMQ Classic 5.16.4 (OpenWire). This seems to be 
> specific to Netty.
> It appears this specific order of events causes the connection to not get 
> cleaned up and retained indefinitely on the broker. With frequent kicking of 
> connections, this ends up causing the broker to eventually OOM.
> 1. Connection is created, {{ActiveMQServerChannelHandler}} is created as well
> 2. {{ActiveMQServerChannelHandler#createConnection}} is called, {{active}} 
> flag is set {{true}}.
> 3. A few minutes go by, then we call 
> {{ActiveMQServerControl#closeConnectionWithID}} with the connection ID.
> 4. {{ActiveMQChannelHandler#exceptionCaught}} gets called—*this is the key 
> point that causes issues*. The connection is cleaned up if and only if this 
> is *not* called. The root cause of the exception is 
> {{AbstractChannel.close(ChannelPromise)}}, however the comment above it says 
> this is normal for failover.
> 5. The {{active}} flag is set to {{false}}.
> 6. {{ActiveMQChannelHandler#channelInactive}} gets called, but does *not* 
> call {{listener.connectionDestroyed}} since the {{active}} flag is false.
> 7. The connection is never removed from the {{connections}} map in 
> {{NettyAcceptor}}, causing a leak and eventual OOM of the broker if it 
> happens frequently enough.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact

[jira] [Commented] (ARTEMIS-4797) Failover connection references are not always cleaned up in NettyAcceptor, leaking memory

Reply via email to