[
https://issues.apache.org/jira/browse/ARTEMIS-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Byster updated ARTEMIS-4797:
---------------------------------
Description:
I'm still trying to parse through exactly what conditions this occurs in, since
I'm able to reproduce it in a very specific production setup but not in an
isolated environment locally.
For context, we have custom slow consumer detection that closes connection IDs
with slow consumers. These connections are connected via failover transport
using client ActiveMQ Classic 5.16.4 (OpenWire). This seems to be specific to
Netty.
It appears this specific order of events causes the connection to not get
cleaned up and retained indefinitely on the broker. With frequent kicking of
connections, this ends up causing the broker to eventually OOM.
1. Connection is created, {{ActiveMQServerChannelHandler}} is created as well
2. {{ActiveMQServerChannelHandler#createConnection}} is called, {{active}} flag
is set {{true}}.
3. A few minutes go by, then we call
{{ActiveMQServerControl#closeConnectionWithID}} with the connection ID.
4. {{ActiveMQChannelHandler#exceptionCaught}} gets called—*this is the key
point that causes issues*. The connection is cleaned up if and only if this is
*not* called. The root cause of the exception is
{{AbstractChannel.close(ChannelPromise)}}, however the comment above it says
this is normal for failover.
5. The {{active}} flag is set to {{false}}.
6. {{ActiveMQChannelHandler#channelInactive}} gets called, but does *not* call
{{listener.connectionDestroyed}} since the {{active}} flag is false.
7. The connection is never removed from the {{connections}} map in
{{NettyAcceptor}}, causing a leak and eventual OOM of the broker if it happens
frequently enough.
was:
I'm still trying to parse through exactly what conditions this occurs in, since
I'm able to reproduce it in a very specific production setup but not in an
isolated environment locally.
For context, we have custom slow consumer detection that closes connection IDs
with slow consumers. These connections are connected via failover transport
using client ActiveMQ Classic 5.16.4 (OpenWire). This seems to be specific to
Netty.
It appears this specific order of events causes the connection to not get
cleaned up and retained indefinitely on the broker. With frequent kicking of
connections, this ends up causing the broker to eventually OOM.
1. Connection is created, {{ActiveMQServerChannelHandler}} is created as well
2. {{ActiveMQServerChannelHandler#createConnection}} is called, {{active}} flag
is set {{true}}.
3. We call {{ActiveMQServerControl#closeConnectionWithID}} with the connection
ID.
4. {{ActiveMQChannelHandler#exceptionCaught}} gets called—*this is the key
point that causes issues*. The connection is cleaned up if and only if this is
*not* called. The root cause of the exception is
{{AbstractChannel.close(ChannelPromise)}}, however the comment above it says
this is normal for failover.
5. The {{active}} flag is set to {{false}}.
6. {{ActiveMQChannelHandler#channelInactive}} gets called, but does *not* call
{{listener.connectionDestroyed}} since the {{active}} flag is false.
7. The connection is never removed from the {{connections}} map in
{{NettyAcceptor}}, causing a leak and eventual OOM of the broker if it happens
frequently enough.
> Failover connection references are not always cleaned up in NettyAcceptor,
> leaking memory
> -----------------------------------------------------------------------------------------
>
> Key: ARTEMIS-4797
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4797
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: OpenWire
> Reporter: Josh Byster
> Priority: Major
>
> I'm still trying to parse through exactly what conditions this occurs in,
> since I'm able to reproduce it in a very specific production setup but not in
> an isolated environment locally.
> For context, we have custom slow consumer detection that closes connection
> IDs with slow consumers. These connections are connected via failover
> transport using client ActiveMQ Classic 5.16.4 (OpenWire). This seems to be
> specific to Netty.
> It appears this specific order of events causes the connection to not get
> cleaned up and retained indefinitely on the broker. With frequent kicking of
> connections, this ends up causing the broker to eventually OOM.
> 1. Connection is created, {{ActiveMQServerChannelHandler}} is created as well
> 2. {{ActiveMQServerChannelHandler#createConnection}} is called, {{active}}
> flag is set {{true}}.
> 3. A few minutes go by, then we call
> {{ActiveMQServerControl#closeConnectionWithID}} with the connection ID.
> 4. {{ActiveMQChannelHandler#exceptionCaught}} gets called—*this is the key
> point that causes issues*. The connection is cleaned up if and only if this
> is *not* called. The root cause of the exception is
> {{AbstractChannel.close(ChannelPromise)}}, however the comment above it says
> this is normal for failover.
> 5. The {{active}} flag is set to {{false}}.
> 6. {{ActiveMQChannelHandler#channelInactive}} gets called, but does *not*
> call {{listener.connectionDestroyed}} since the {{active}} flag is false.
> 7. The connection is never removed from the {{connections}} map in
> {{NettyAcceptor}}, causing a leak and eventual OOM of the broker if it
> happens frequently enough.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact