[
https://issues.apache.org/jira/browse/AMQ-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Timothy Bish resolved AMQ-3016.
-------------------------------
Resolution: Fixed
Fix Version/s: 5.8.0
Noticed this also while doing some work in the bridging code.
> Race condition in DemandForwardingBridgeSupport can cause remote connection
> to be leaked.
> -----------------------------------------------------------------------------------------
>
> Key: AMQ-3016
> URL: https://issues.apache.org/jira/browse/AMQ-3016
> Project: ActiveMQ
> Issue Type: Bug
> Components: Connector, Transport
> Affects Versions: 5.4.1
> Reporter: Stirling Chow
> Fix For: 5.8.0
>
> Attachments: ConnectionLeakTest.java, patch.txt
>
>
> Symptom
> ========
> I set up two Brokers and a network bridge from Broker A to Broker B over
> HTTP. When the bridge is established, each Broker has a single transport
> connection (VM on broker A and HTTP on broker B) as recorded in
> RegionBroker.connections
> I noticed that when Broker A was stopped (normally), periodically the HTTP
> connection would remain in Broker B's RegionBroker.connections until the
> InactivityMonitor on the connection timed out. If the InactivityMonitor was
> disbled, then the connection would remain indefinitely.
> If Broker A was restarted, the bridge would be restarted and a second
> connection would be recorded in Broker B's RegionBroker.connections.
> Repeating restarts of Broker A would cause an accumulation of "dead"
> connections, which would eventually lead to an OOM.
> Cause
> =====
> When Broker A is stopped, DemandForwardingBridgeSupport.stop() is called and
> sends a ShutdownInfo command to the local and remote transports. When the
> transports receive the ShutdownInfo, they remove the connection from their
> associated RegionBroker.connections as part of
> TransportConnection.processRemoveConnection(ConnectionId, long):
> public synchronized Response processRemoveConnection(ConnectionId id,
> long lastDeliveredSequenceId)
> throws InterruptedException {
> ...
> try {
> broker.removeConnection(cs.getContext(), cs.getInfo(), null);
> } catch (Throwable e) {
> SERVICELOG.warn("Failed to remove connection " +
> cs.getInfo(), e);
> }
> In the cases were Broker B would not clean up its connection, I traced the
> code and discovered that the ShutdownInfo message was not being sent because
> the remote transport (HttpClientTransport) had already had its "stopped" flag
> set to true as part of HttpClientTransport.oneway(Object command):
> public void oneway(Object command) throws IOException {
> if (isStopped()) {
> throw new IOException("stopped.");
> }
> ...
> DemandForwardingBridgeSupport's stop() method has the following code:
> public void stop() throws Exception {
> ...
> ASYNC_TASKS.execute(new Runnable() {
> public void run() {
> try {
> localBroker.oneway(new ShutdownInfo());
> sendShutdown.countDown();
> remoteBroker.oneway(new ShutdownInfo());
> } catch (Throwable e) {
> LOG.debug("Caught exception sending
> shutdown", e);
> } finally {
> sendShutdown.countDown();
> }
> }
> });
> if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
> LOG.info("Network Could not shutdown in a timely
> manner");
> }
> } finally {
> ServiceStopper ss = new ServiceStopper();
> ss.stop(remoteBroker);
> ss.stop(localBroker);
> // Release the started Latch since another thread could be
> // stuck waiting for it to start up.
> startedLatch.countDown();
> startedLatch.countDown();
> localStartedLatch.countDown();
> ss.throwFirstException();
> }
> }
> ShutdownInfo is sent asynchronously to the local and remote transports by a
> slave thread:
> localBroker.oneway(new ShutdownInfo());
> sendShutdown.countDown();
> remoteBroker.oneway(new ShutdownInfo());
> The sendShutdown latch is used by the master thread to prevent running the
> finally clause until the ShutdownInfo has been sent:
> if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
> LOG.info("Network Could not shutdown in a timely
> manner");
> }
> } finally {
> ServiceStopper ss = new ServiceStopper();
> ss.stop(remoteBroker);
> ss.stop(localBroker);
> ...
> }
> }
> However, because the latch is countdown *before* remoteTransport.oneway(new
> ShutdownInfo()) there is a race condition and in most cases the main thread
> calls ss.stop(remoteBroker) before the slave thread has completed its call to
> remoteTransport.oneway(new ShutdownInfo()). As a result, the remoteTransport
> appears already stopped and the ShutdownInfo is discarded. This leaves the
> connection dangling on the remote broker until the InactivityMonitor closes
> it.
> Solution
> ======
> The sendShutdown latch should be countdown *after* remoteTransport.oneway(new
> ShutdownInfo()).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira