Failover transport race condition causes intermittent incomplete bridge 
connections
-----------------------------------------------------------------------------------

                 Key: AMQ-3575
                 URL: https://issues.apache.org/jira/browse/AMQ-3575
             Project: ActiveMQ
          Issue Type: Bug
          Components: Transport
    Affects Versions: 5.5.0
         Environment: CentOS 5.5 and Mac OSX10
            Reporter: Aaron Phillips
             Fix For: NEEDS_REVIEWED


There is a race condition in FailoverTransport.java that sometimes results in 
preventing network bridge connections from starting.  This is a serious issue 
as it was preventing us from setting up failover connections between brokers.  
I would have asked it be critical if it weren't for a workaround.  The 
workaround I have found is as follows:

Turn on activemq thread pooling option to avoid failover bridge connection race 
condition.  Change the following property to in your start script to make it 
false like so.  Somehow this got me around the problem of the wrong thread 
sometimes winning:
-Dorg.apache.activemq.UseDedicatedTaskRunner=false

I've attached a unit test to be dropped in 
activemq-core/src/test/java/org/apache/activemq/transport/failover.  The unit 
test shows that when a delay is introduced in setting of the TransportListener, 
the BrokerInfo command required to complete the bridge connection will never be 
processed.  There are two unit tests in this class and both are designed to 
pass.  The test called "testTcpThreadWinsPreventsCompletionOfBridge" passes by 
asserting that it *did not* receive the BrokerInfo command.  You can see 
through setting the delay value that you can control whether the Main thread 
wins (in which case all is well), or the TCP thread wins (in which case the 
network bridge is hung and fails to start)

Note, this issue only affects network bridge connections which are setup with 
failover transport, such as a broker that connects to a Master-Slave pair, e.g. 
failover://(tcp://master:61616,tcp://slave:61616)?randomize=false

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to