Network of Brokers Memory Leak Due to Race Condition
----------------------------------------------------

                 Key: AMQ-1709
                 URL: https://issues.apache.org/activemq/browse/AMQ-1709
             Project: ActiveMQ
          Issue Type: Bug
          Components: Broker, Transport
    Affects Versions: 5.0.0, 4.1.2
            Reporter: Howard Orner


When you a a network of brokers configuration with at least 3 brokers, such as:

<broker brokerName="A" persistent="false" ...
...
<transportConnector name="AListener" uri="tcp://localhost:61610"/>
...
<networkConnector name="BConnector" uri="static:(tcp://localhost:61620)"/>
<networkConnector name="CConnector" uri="static:(tcp://localhost:61630)"/>

with the other brokers have a similar configuration.
Then, if you have subscribers trying to connect to all of the brokers you can 
have a race condition at start up where the transports accept connections from 
subscribers before the network connectors are initialized.  In 
BrokerService.startAllConnectors(), the transports are started first.  Then the 
NetworkConnectors.  As part of starting the network connectors, their 
constructors takes a collection obtained by calling 
getBroker().getDurableDestinations().  Normally this list would be empty.  
However, if clients connect before this is called, a list is returned for each 
topic subscribed to.  Then, instead of creating standard TopicSubscriptions for 
the network connector, DurableTopicSubscriptions are created.  I'm not sure if 
this really should be a problem, but it is because SimpleDispatchPolicy, in the 
process of iterating through the DurableTopicSubscriptions, causes messages to 
be queued up for prefetch without clearing all of the references (for each pass 
through it looks like three references are registered and only two are cleared. 
 This becomes a memory leak.  In the logs you see a message saying the 
PrefetchLimit was reached and then you start seeing logs about memory usage 
increasing until it gets to 100% and then everything stops.  

To reproduce this, create a network of brokers configuration of at least 3 
brokers -- the more you have the more likely you are to hit this without a lot 
of tries so I suggest a bunch.  Start all brokers.  Establish a publisher on 
broker A using failover://(tcp://localhost:61610) then establish a bunch of 
subscribers on all the brokers using a similar configuration, i.e, 
failover://(tcp://localhost:61610), failover://(tcp://localhost:61620).  The 
more you have on broker 'A' the better since you are trying to reproduce the 
race condition.  You want the others up so that the other brokers expect 
messages to be passed to them.    Once everybody is up and happy, kill broker A 
and restart it.  If you do that enough times, you will hit the race condition 
and the memory leak will start.    You can also put a break point in 
BrokerService.startAllConnectors() after the transports are started but before 
the network connectors are started.  That'll give clients to connect to the 
transport threads before you tell the VM to continue.

I found it an easy fix to store the durable destination list in a local 
variable before starting the transports and passing that to the network 
connectors instead of separate calls..  I'm not sure if there are 'normal' ways 
for that list to be anything other than empty.  If not, you could just pass an 
empty set to the network connectors, but suspect there are legitimate 
configurations that may need this to requested.  If so, this memory leak would 
likely occur in these cases, too.   

I ran into this in 4.1.2.  I haven't tested 5.0 since our attempts to switch to 
5.0 were met with failure due to the number of bugs in 5.0 (already reported by 
others).  Looking at 5.0.0 source, the race condition is still there in 
BrokerService.startAllConnectors() so I suspect the memory leak is there as 
well.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to