Race condition in DemandForwardingBridgeSupport can cause remote connection to
be leaked.
-----------------------------------------------------------------------------------------
Key: AMQ-3016
URL: https://issues.apache.org/activemq/browse/AMQ-3016
Project: ActiveMQ
Issue Type: Bug
Components: Connector, Transport
Affects Versions: 5.4.1
Reporter: Stirling Chow
Symptom
========
I set up two Brokers and a network bridge from Broker A to Broker B over HTTP.
When the bridge is established, each Broker has a single transport connection
(VM on broker A and HTTP on broker B) as recorded in RegionBroker.connections
I noticed that when Broker A was stopped (normally), periodically the HTTP
connection would remain in Broker B's RegionBroker.connections until the
InactivityMonitor on the connection timed out. If the InactivityMonitor was
disbled, then the connection would remain indefinitely.
If Broker A was restarted, the bridge would be restarted and a second
connection would be recorded in Broker B's RegionBroker.connections. Repeating
restarts of Broker A would cause an accumulation of "dead" connections, which
would eventually lead to an OOM.
Cause
=====
When Broker A is stopped, DemandForwardingBridgeSupport.stop() is called and
sends a ShutdownInfo command to the local and remote transports. When the
transports receive the ShutdownInfo, they remove the connection from their
associated RegionBroker.connections as part of
TransportConnection.processRemoveConnection(ConnectionId, long):
public synchronized Response processRemoveConnection(ConnectionId id, long
lastDeliveredSequenceId)
throws InterruptedException {
...
try {
broker.removeConnection(cs.getContext(), cs.getInfo(), null);
} catch (Throwable e) {
SERVICELOG.warn("Failed to remove connection " + cs.getInfo(),
e);
}
In the cases were Broker B would not clean up its connection, I traced the code
and discovered that the ShutdownInfo message was not being sent because the
remote transport (HttpClientTransport) had already had its "stopped" flag set
to true as part of HttpClientTransport.oneway(Object command):
public void oneway(Object command) throws IOException {
if (isStopped()) {
throw new IOException("stopped.");
}
...
DemandForwardingBridgeSupport's stop() method has the following code:
public void stop() throws Exception {
...
ASYNC_TASKS.execute(new Runnable() {
public void run() {
try {
localBroker.oneway(new ShutdownInfo());
sendShutdown.countDown();
remoteBroker.oneway(new ShutdownInfo());
} catch (Throwable e) {
LOG.debug("Caught exception sending shutdown",
e);
} finally {
sendShutdown.countDown();
}
}
});
if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
LOG.info("Network Could not shutdown in a timely
manner");
}
} finally {
ServiceStopper ss = new ServiceStopper();
ss.stop(remoteBroker);
ss.stop(localBroker);
// Release the started Latch since another thread could be
// stuck waiting for it to start up.
startedLatch.countDown();
startedLatch.countDown();
localStartedLatch.countDown();
ss.throwFirstException();
}
}
ShutdownInfo is sent asynchronously to the local and remote transports by a
slave thread:
localBroker.oneway(new ShutdownInfo());
sendShutdown.countDown();
remoteBroker.oneway(new ShutdownInfo());
The sendShutdown latch is used by the master thread to prevent running the
finally clause until the ShutdownInfo has been sent:
if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
LOG.info("Network Could not shutdown in a timely
manner");
}
} finally {
ServiceStopper ss = new ServiceStopper();
ss.stop(remoteBroker);
ss.stop(localBroker);
...
}
}
However, because the latch is countdown *before* remoteTransport.oneway(new
ShutdownInfo()) there is a race condition and in most cases the main thread
calls ss.stop(remoteBroker) before the slave thread has completed its call to
remoteTransport.oneway(new ShutdownInfo()). As a result, the remoteTransport
appears already stopped and the ShutdownInfo is discarded. This leaves the
connection dangling on the remote broker until the InactivityMonitor closes it.
Solution
======
The sendShutdown latch should be countdown *after* remoteTransport.oneway(new
ShutdownInfo()).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.