[ 
https://issues.apache.org/jira/browse/ARTEMIS-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Byster updated ARTEMIS-4797:
---------------------------------
    Description: 
I'm still trying to parse through exactly what conditions this occurs in, since 
I'm able to reproduce it in a very specific production setup but not in an 
isolated environment locally.

For context, we have custom slow consumer detection that closes connection IDs 
with slow consumers. These connections are connected via failover transport 
using client ActiveMQ Classic 5.16.4 (OpenWire). This seems to be specific to 
Netty.

It appears this specific order of events causes the connection to not get 
cleaned up and retained indefinitely on the broker. With frequent kicking of 
connections, this ends up causing the broker to eventually OOM.

1. Connection is created, {{ActiveMQServerChannelHandler}} is created as well
2. {{ActiveMQServerChannelHandler#createConnection}} is called, {{active}} flag 
is set {{true}}.
3. A few minutes go by, then we call 
{{ActiveMQServerControl#closeConnectionWithID}} with the connection ID.
4. {{ActiveMQChannelHandler#exceptionCaught}} gets called—*this is the key 
point that causes issues*. The connection is cleaned up if and only if this is 
*not* called. The root cause of the exception is 
{{AbstractChannel.close(ChannelPromise)}}, however the comment above it says 
this is normal for failover.
5. The {{active}} flag is set to {{false}}.
6. {{ActiveMQChannelHandler#channelInactive}} gets called, but does *not* call 
{{listener.connectionDestroyed}} since the {{active}} flag is false.
7. The connection is never removed from the {{connections}} map in 
{{NettyAcceptor}}, causing a leak and eventual OOM of the broker if it happens 
frequently enough.



  was:
I'm still trying to parse through exactly what conditions this occurs in, since 
I'm able to reproduce it in a very specific production setup but not in an 
isolated environment locally.

For context, we have custom slow consumer detection that closes connection IDs 
with slow consumers. These connections are connected via failover transport 
using client ActiveMQ Classic 5.16.4 (OpenWire). This seems to be specific to 
Netty.

It appears this specific order of events causes the connection to not get 
cleaned up and retained indefinitely on the broker. With frequent kicking of 
connections, this ends up causing the broker to eventually OOM.

1. Connection is created, {{ActiveMQServerChannelHandler}} is created as well
2. {{ActiveMQServerChannelHandler#createConnection}} is called, {{active}} flag 
is set {{true}}.
3. We call {{ActiveMQServerControl#closeConnectionWithID}} with the connection 
ID.
4. {{ActiveMQChannelHandler#exceptionCaught}} gets called—*this is the key 
point that causes issues*. The connection is cleaned up if and only if this is 
*not* called. The root cause of the exception is 
{{AbstractChannel.close(ChannelPromise)}}, however the comment above it says 
this is normal for failover.
5. The {{active}} flag is set to {{false}}.
6. {{ActiveMQChannelHandler#channelInactive}} gets called, but does *not* call 
{{listener.connectionDestroyed}} since the {{active}} flag is false.
7. The connection is never removed from the {{connections}} map in 
{{NettyAcceptor}}, causing a leak and eventual OOM of the broker if it happens 
frequently enough.




> Failover connection references are not always cleaned up in NettyAcceptor, 
> leaking memory
> -----------------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-4797
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4797
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: OpenWire
>            Reporter: Josh Byster
>            Priority: Major
>
> I'm still trying to parse through exactly what conditions this occurs in, 
> since I'm able to reproduce it in a very specific production setup but not in 
> an isolated environment locally.
> For context, we have custom slow consumer detection that closes connection 
> IDs with slow consumers. These connections are connected via failover 
> transport using client ActiveMQ Classic 5.16.4 (OpenWire). This seems to be 
> specific to Netty.
> It appears this specific order of events causes the connection to not get 
> cleaned up and retained indefinitely on the broker. With frequent kicking of 
> connections, this ends up causing the broker to eventually OOM.
> 1. Connection is created, {{ActiveMQServerChannelHandler}} is created as well
> 2. {{ActiveMQServerChannelHandler#createConnection}} is called, {{active}} 
> flag is set {{true}}.
> 3. A few minutes go by, then we call 
> {{ActiveMQServerControl#closeConnectionWithID}} with the connection ID.
> 4. {{ActiveMQChannelHandler#exceptionCaught}} gets called—*this is the key 
> point that causes issues*. The connection is cleaned up if and only if this 
> is *not* called. The root cause of the exception is 
> {{AbstractChannel.close(ChannelPromise)}}, however the comment above it says 
> this is normal for failover.
> 5. The {{active}} flag is set to {{false}}.
> 6. {{ActiveMQChannelHandler#channelInactive}} gets called, but does *not* 
> call {{listener.connectionDestroyed}} since the {{active}} flag is false.
> 7. The connection is never removed from the {{connections}} map in 
> {{NettyAcceptor}}, causing a leak and eventual OOM of the broker if it 
> happens frequently enough.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to