[jira] [Resolved] (AMQ-3016) Race condition in DemandForwardingBridgeSupport can cause remote connection to be leaked.

Timothy Bish (JIRA) Tue, 06 Nov 2012 13:52:15 -0800

     [ 
https://issues.apache.org/jira/browse/AMQ-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Timothy Bish resolved AMQ-3016.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 5.8.0

Noticed this also while doing some work in the bridging code.  
                
> Race condition in DemandForwardingBridgeSupport can cause remote connection 
> to be leaked.
> -----------------------------------------------------------------------------------------
>
>                 Key: AMQ-3016
>                 URL: https://issues.apache.org/jira/browse/AMQ-3016
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Connector, Transport
>    Affects Versions: 5.4.1
>            Reporter: Stirling Chow
>             Fix For: 5.8.0
>
>         Attachments: ConnectionLeakTest.java, patch.txt
>
>
> Symptom
> ========
> I set up two Brokers and a network bridge from Broker A to Broker B over 
> HTTP.  When the bridge is established, each Broker has a single transport 
> connection (VM on broker A and HTTP on broker B) as recorded in 
> RegionBroker.connections
> I noticed that when Broker A was stopped (normally), periodically the HTTP 
> connection would remain in Broker B's RegionBroker.connections until the 
> InactivityMonitor on the connection timed out.  If the InactivityMonitor was 
> disbled, then the connection would remain indefinitely.  
> If Broker A was restarted, the bridge would be restarted and a second 
> connection would be recorded in Broker B's RegionBroker.connections.  
> Repeating restarts of Broker A would cause an accumulation of "dead" 
> connections, which would eventually lead to an OOM.
> Cause
> =====
> When Broker A is stopped, DemandForwardingBridgeSupport.stop() is called and 
> sends a ShutdownInfo command to the local and remote transports.  When the 
> transports receive the ShutdownInfo, they remove the connection from their 
> associated RegionBroker.connections as part of  
> TransportConnection.processRemoveConnection(ConnectionId, long):
>     public synchronized Response processRemoveConnection(ConnectionId id, 
> long lastDeliveredSequenceId)
>             throws InterruptedException {
> ...
>             try {
>                 broker.removeConnection(cs.getContext(), cs.getInfo(), null);
>             } catch (Throwable e) {
>                 SERVICELOG.warn("Failed to remove connection " + 
> cs.getInfo(), e);
>             }
> In the cases were Broker B would not clean up its connection, I traced the 
> code and discovered that the ShutdownInfo message was not being sent because 
> the remote transport (HttpClientTransport) had already had its "stopped" flag 
> set to true as part of HttpClientTransport.oneway(Object command):
>     public void oneway(Object command) throws IOException {
>         if (isStopped()) {
>             throw new IOException("stopped.");
>         }
> ...
> DemandForwardingBridgeSupport's stop() method has the following code:
>     public void stop() throws Exception {
> ...
>                     ASYNC_TASKS.execute(new Runnable() {
>                         public void run() {
>                             try {
>                                 localBroker.oneway(new ShutdownInfo());
>                                 sendShutdown.countDown();
>                                 remoteBroker.oneway(new ShutdownInfo());
>                             } catch (Throwable e) {
>                                 LOG.debug("Caught exception sending 
> shutdown", e);
>                             } finally {
>                                 sendShutdown.countDown();
>                             }
>                         }
>                     });
>                     if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
>                         LOG.info("Network Could not shutdown in a timely 
> manner");
>                     }
>                 } finally {
>                     ServiceStopper ss = new ServiceStopper();
>                     ss.stop(remoteBroker);
>                     ss.stop(localBroker);
>                     // Release the started Latch since another thread could be
>                     // stuck waiting for it to start up.
>                     startedLatch.countDown();
>                     startedLatch.countDown();
>                     localStartedLatch.countDown();
>                     ss.throwFirstException();
>                 }
>             }
> ShutdownInfo is sent asynchronously to the local and remote transports by a 
> slave thread:
>                                 localBroker.oneway(new ShutdownInfo());
>                                 sendShutdown.countDown();
>                                 remoteBroker.oneway(new ShutdownInfo());
> The sendShutdown  latch is used by the master thread to prevent running the 
> finally clause until the ShutdownInfo has been sent:
>                     if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
>                         LOG.info("Network Could not shutdown in a timely 
> manner");
>                     }
>                 } finally {
>                     ServiceStopper ss = new ServiceStopper();
>                     ss.stop(remoteBroker);
>                     ss.stop(localBroker);
> ...
>                 }
>             }
> However, because the latch is countdown *before* remoteTransport.oneway(new 
> ShutdownInfo()) there is a race condition and in most cases the main thread 
> calls ss.stop(remoteBroker) before the slave thread has completed its call to 
> remoteTransport.oneway(new ShutdownInfo()).  As a result, the remoteTransport 
> appears already stopped and the ShutdownInfo is discarded.  This leaves the 
> connection dangling on the remote broker until the InactivityMonitor closes 
> it.
> Solution
> ======
> The sendShutdown latch should be countdown *after* remoteTransport.oneway(new 
> ShutdownInfo()).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (AMQ-3016) Race condition in DemandForwardingBridgeSupport can cause remote connection to be leaked.

Reply via email to