[
https://issues.apache.org/activemq/browse/AMQ-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=57052#action_57052
]
Rob Davies commented on AMQ-2541:
---------------------------------
Hi Stirling,
thanks for the patch - it doesn't look like its from trunk though (so finding
it hard to see what you've changed) - could you get src from trunk and make the
patch again ?
thanks,.
Rob
> Extremely slow broker startup when using SimpleDiscoveryAgent with an
> inactive Network of Brokers.
> --------------------------------------------------------------------------------------------------
>
> Key: AMQ-2541
> URL: https://issues.apache.org/activemq/browse/AMQ-2541
> Project: ActiveMQ
> Issue Type: Bug
> Components: Broker, Connector
> Affects Versions: 5.3.0
> Reporter: Stirling Chow
> Attachments: patch.txt
>
>
> Symptom
> ========
> An AMQ broker that is configured to join a statically-defined (i.e., using
> uri="static:(tcp://host1:61616,tcp://host2:61616,tcp://..) ) network of
> brokers can have an extremely long startup time (in the order of 5+ minutes),
> if many of the brokers in the network are not alive.
> The following log entires show the startup of an AMQ Broker
> (http://192.168.170.112:50000) that is configured to join a network with
> three other brokers:
> http://10.10.60.78:50000
> http://10.9.62.135:50000
> http://10.10.60.75:50000)
> The three other have not yet started.
> The log file shows that it takes nearly 4 minutes from BrokerService#start()
> to return control to the calling thread (AlarmPoint Node-main):
> 2009-12-18 15:24:46,783 [AlarmPoint Node-main] INFO - - ActiveMQ 5.3.0
> JMS Message Broker (localhost) is starting
> ...
> 2009-12-18 15:24:47,158 [AlarmPoint Node-main] INFO - - Connector
> http://192.168.170.112:50000 Started
> 2009-12-18 15:24:47,158 [AlarmPoint Node-main] INFO - - Establishing
> network connection from vm://localhost to http://10.10.60.78:50000
> ...
> 2009-12-18 15:26:11,314 [AlarmPoint Node-main] WARN - - Could not start
> network bridge between: vm://localhost and: http://10.10.60.78:50000 due to:
> java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:26:11,314 [AlarmPoint Node-main] DEBUG - - Start failure
> exception: java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:26:11,314 [AlarmPoint Node-main] INFO - - Establishing
> network connection from vm://localhost to http://10.9.62.135:50000
> ...
> 2009-12-18 15:27:35,299 [AlarmPoint Node-main] WARN - - Could not start
> network bridge between: vm://localhost and: http://10.9.62.135:50000 due to:
> java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:27:35,299 [AlarmPoint Node-main] DEBUG - - Start failure
> exception: java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:27:35,299 [AlarmPoint Node-main] INFO - - Establishing
> network connection from vm://localhost to http://10.10.60.75:50000
> ...
> 2009-12-18 15:28:59,314 [AlarmPoint Node-main] WARN - - Could not start
> network bridge between: vm://localhost and: http://10.10.60.75:50000 due to:
> java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:28:59,314 [AlarmPoint Node-main] DEBUG - - Start failure
> exception: java.net.ConnectException: Connection timed out: connect
> 2009-12-18 15:28:59,314 [AlarmPoint Node-main] INFO - - Network Connector
> bridge Started
> 2009-12-18 15:28:59,314 [AlarmPoint Node-main] INFO - - ActiveMQ JMS
> Message Broker (localhost, ID:vic-esx4-ns1-1280-1261178686846-0:0) started
> Cause
> =====
> The broker's network connector is implemented by
> org.apache.activemq.network.DiscoveryNetworkConnector, which in turn uses
> org.apache.activemq.transport.discovery.simple.SimpleDiscoveryAgent to
> determine whether the URLs configured in
> uri="static:(tcp://host1:61616,tcp://host2:61616,tcp://..) " are active.
> SimpleDiscoveryAgent#start() has this loop:
> public void start() throws Exception {
> running.set(true);
> for (int i = 0; i < services.length; i++) {
> listener.onServiceAdd(new SimpleDiscoveryEvent(services[i]));
> }
> }
> "listener.onServiceAdd(...) " is called for each URL and is implemented by
> DiscoveryNetworkConnector#onServiceAdd(...). The main thread calls
> BrokerService#start() which calls DiscoveryNetworkConnector#start() which
> calls SimpleDiscoveryAgent#start(), which sequentially calls
> DiscoveryNetworkConnector#onServiceAdd(...). Since the URLs being
> "discovered" are inactive, DiscoveryNetworkConnector#onServiceAdd(...)
> blocks ~1m30s (this will depend on network configuration) for each URL. This
> blocks the main thread that is trying to start the broker. If there are
> several inactive URLs, then the blocking time becomes excessive.
> Solution
> =======
> If you follow through the DiscoveryNetworkConnector#onServiceAdd(...) method,
> it eventially calls SimpleDiscoveryAgent#serviceFailed(...) for each inactive
> URL. In turn SimpleDiscoveryAgent#serviceFailed(...) launches an
> asynchronous task that pauses for the configured reconnect delay, and then
> retries the call to DiscoveryNetworkConnector#onServiceAdd(...). So it must
> be safe to call DiscoveryNetworkConnector#onServiceAdd(...) concurrently.
> Therefore, SimpleDiscoveryAgent#start()'s loop should be changed to launch
> asynchronous tasks to make the DiscoveryNetworkConnector#onServiceAdd(...)
> calls concurrently rather than synchronously.
> This solution has the benefit of returning control immediately to the caller
> of SimpleDiscoveryAgent#start(...), thus starting the broker faster, and
> allows the network discovery to find "active" URLs much faster (i.e., with
> the sequential loop, if the "active" URL is the last one, its discovery is
> significantly delayed).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.