[ 
https://issues.apache.org/activemq/browse/AMQ-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=57052#action_57052
 ] 

Rob Davies commented on AMQ-2541:
---------------------------------

Hi Stirling,

thanks for the patch - it doesn't look like its from trunk though (so finding 
it hard to see what you've changed) - could you get src from trunk and make the 
patch again ? 

thanks,.

Rob

> Extremely slow broker startup when using SimpleDiscoveryAgent with an 
> inactive Network of Brokers.
> --------------------------------------------------------------------------------------------------
>
>                 Key: AMQ-2541
>                 URL: https://issues.apache.org/activemq/browse/AMQ-2541
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker, Connector
>    Affects Versions: 5.3.0
>            Reporter: Stirling Chow
>         Attachments: patch.txt
>
>
> Symptom
> ========
> An AMQ broker that is configured to join a statically-defined (i.e., using 
> uri="static:(tcp://host1:61616,tcp://host2:61616,tcp://..) ) network of 
> brokers can have an extremely long startup time (in the order of 5+ minutes), 
> if many of the brokers in the network are not alive.
> The following log entires show the startup of an AMQ Broker 
> (http://192.168.170.112:50000) that is configured to join a network with 
> three other brokers:
> http://10.10.60.78:50000
> http://10.9.62.135:50000
> http://10.10.60.75:50000)
> The three other have not yet started.
> The log file shows that it takes nearly 4 minutes from BrokerService#start() 
> to return control to the calling thread (AlarmPoint Node-main):
> 2009-12-18 15:24:46,783 [AlarmPoint Node-main] INFO    -  - ActiveMQ 5.3.0 
> JMS Message Broker (localhost) is starting
> ...
> 2009-12-18 15:24:47,158 [AlarmPoint Node-main] INFO    -  - Connector 
> http://192.168.170.112:50000 Started
> 2009-12-18 15:24:47,158 [AlarmPoint Node-main] INFO    -  - Establishing 
> network connection from vm://localhost to http://10.10.60.78:50000
> ...
> 2009-12-18 15:26:11,314 [AlarmPoint Node-main] WARN    -  - Could not start 
> network bridge between: vm://localhost and: http://10.10.60.78:50000 due to: 
> java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:26:11,314 [AlarmPoint Node-main] DEBUG   -  - Start failure 
> exception: java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:26:11,314 [AlarmPoint Node-main] INFO    -  - Establishing 
> network connection from vm://localhost to http://10.9.62.135:50000
> ...
> 2009-12-18 15:27:35,299 [AlarmPoint Node-main] WARN    -  - Could not start 
> network bridge between: vm://localhost and: http://10.9.62.135:50000 due to: 
> java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:27:35,299 [AlarmPoint Node-main] DEBUG   -  - Start failure 
> exception: java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:27:35,299 [AlarmPoint Node-main] INFO    -  - Establishing 
> network connection from vm://localhost to http://10.10.60.75:50000
> ...
> 2009-12-18 15:28:59,314 [AlarmPoint Node-main] WARN    -  - Could not start 
> network bridge between: vm://localhost and: http://10.10.60.75:50000 due to: 
> java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:28:59,314 [AlarmPoint Node-main] DEBUG   -  - Start failure 
> exception: java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:28:59,314 [AlarmPoint Node-main] INFO    -  - Network Connector 
> bridge Started
> 2009-12-18 15:28:59,314 [AlarmPoint Node-main] INFO    -  - ActiveMQ JMS 
> Message Broker (localhost, ID:vic-esx4-ns1-1280-1261178686846-0:0) started
> Cause
> =====
> The broker's network connector is implemented by 
> org.apache.activemq.network.DiscoveryNetworkConnector, which in turn uses 
> org.apache.activemq.transport.discovery.simple.SimpleDiscoveryAgent to 
> determine whether the URLs configured in 
> uri="static:(tcp://host1:61616,tcp://host2:61616,tcp://..) " are active.  
> SimpleDiscoveryAgent#start() has this loop:
>     public void start() throws Exception {
>         running.set(true);
>         for (int i = 0; i < services.length; i++) {
>             listener.onServiceAdd(new SimpleDiscoveryEvent(services[i]));
>         }
>     }
> "listener.onServiceAdd(...) " is called for each URL and is implemented by 
> DiscoveryNetworkConnector#onServiceAdd(...).  The main thread calls 
> BrokerService#start()  which calls DiscoveryNetworkConnector#start() which 
> calls SimpleDiscoveryAgent#start(), which sequentially calls 
> DiscoveryNetworkConnector#onServiceAdd(...).  Since the URLs being 
> "discovered" are inactive, DiscoveryNetworkConnector#onServiceAdd(...)  
> blocks ~1m30s (this will depend on network configuration) for each URL.  This 
> blocks the main thread that is trying to start the broker.  If there are 
> several inactive URLs, then the blocking time becomes excessive.
> Solution
> =======
> If you follow through the DiscoveryNetworkConnector#onServiceAdd(...) method, 
> it eventially calls SimpleDiscoveryAgent#serviceFailed(...) for each inactive 
> URL.  In turn SimpleDiscoveryAgent#serviceFailed(...) launches an 
> asynchronous task that pauses for the configured reconnect delay, and then 
> retries the call to DiscoveryNetworkConnector#onServiceAdd(...).  So it must 
> be safe to call DiscoveryNetworkConnector#onServiceAdd(...) concurrently.  
> Therefore, SimpleDiscoveryAgent#start()'s loop should be changed to launch 
> asynchronous tasks to make the DiscoveryNetworkConnector#onServiceAdd(...) 
> calls concurrently rather than synchronously.
> This solution has the benefit of returning control immediately to the caller 
> of SimpleDiscoveryAgent#start(...), thus starting the broker faster, and 
> allows the network discovery to find "active" URLs much faster (i.e., with 
> the sequential loop, if the "active" URL is the last one, its discovery is 
> significantly delayed).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to