Network of Brokers Memory Leak Due to Race Condition
----------------------------------------------------
Key: AMQ-1709
URL: https://issues.apache.org/activemq/browse/AMQ-1709
Project: ActiveMQ
Issue Type: Bug
Components: Broker, Transport
Affects Versions: 5.0.0, 4.1.2
Reporter: Howard Orner
When you a a network of brokers configuration with at least 3 brokers, such as:
<broker brokerName="A" persistent="false" ...
...
<transportConnector name="AListener" uri="tcp://localhost:61610"/>
...
<networkConnector name="BConnector" uri="static:(tcp://localhost:61620)"/>
<networkConnector name="CConnector" uri="static:(tcp://localhost:61630)"/>
with the other brokers have a similar configuration.
Then, if you have subscribers trying to connect to all of the brokers you can
have a race condition at start up where the transports accept connections from
subscribers before the network connectors are initialized. In
BrokerService.startAllConnectors(), the transports are started first. Then the
NetworkConnectors. As part of starting the network connectors, their
constructors takes a collection obtained by calling
getBroker().getDurableDestinations(). Normally this list would be empty.
However, if clients connect before this is called, a list is returned for each
topic subscribed to. Then, instead of creating standard TopicSubscriptions for
the network connector, DurableTopicSubscriptions are created. I'm not sure if
this really should be a problem, but it is because SimpleDispatchPolicy, in the
process of iterating through the DurableTopicSubscriptions, causes messages to
be queued up for prefetch without clearing all of the references (for each pass
through it looks like three references are registered and only two are cleared.
This becomes a memory leak. In the logs you see a message saying the
PrefetchLimit was reached and then you start seeing logs about memory usage
increasing until it gets to 100% and then everything stops.
To reproduce this, create a network of brokers configuration of at least 3
brokers -- the more you have the more likely you are to hit this without a lot
of tries so I suggest a bunch. Start all brokers. Establish a publisher on
broker A using failover://(tcp://localhost:61610) then establish a bunch of
subscribers on all the brokers using a similar configuration, i.e,
failover://(tcp://localhost:61610), failover://(tcp://localhost:61620). The
more you have on broker 'A' the better since you are trying to reproduce the
race condition. You want the others up so that the other brokers expect
messages to be passed to them. Once everybody is up and happy, kill broker A
and restart it. If you do that enough times, you will hit the race condition
and the memory leak will start. You can also put a break point in
BrokerService.startAllConnectors() after the transports are started but before
the network connectors are started. That'll give clients to connect to the
transport threads before you tell the VM to continue.
I found it an easy fix to store the durable destination list in a local
variable before starting the transports and passing that to the network
connectors instead of separate calls.. I'm not sure if there are 'normal' ways
for that list to be anything other than empty. If not, you could just pass an
empty set to the network connectors, but suspect there are legitimate
configurations that may need this to requested. If so, this memory leak would
likely occur in these cases, too.
I ran into this in 4.1.2. I haven't tested 5.0 since our attempts to switch to
5.0 were met with failure due to the number of bugs in 5.0 (already reported by
others). Looking at 5.0.0 source, the race condition is still there in
BrokerService.startAllConnectors() so I suspect the memory leak is there as
well.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.