Mitchell Wagner created AMQ-9787:
------------------------------------

             Summary: FailoverTransport deadlock on half-open broker connection
                 Key: AMQ-9787
                 URL: https://issues.apache.org/jira/browse/AMQ-9787
             Project: ActiveMQ Classic
          Issue Type: Bug
    Affects Versions: 6.1.2, 5.17.1
            Reporter: Mitchell Wagner


When the network cable is pulled from the host of a remote broker, clients 
experience extended hangs—up to multiple minutes—while waiting for the network 
stack to close the connection. This occurs despite the client InactivityMonitor 
being configured with the default 30-second timeout (maxInactivityDuration).

A thread dump of the client captured during this state indicates that the 
InactivityMonitor is blocked because a write operation holds a mutex, and that 
write is itself blocked while waiting for the network stack to complete TCP 
transmission attempts. This behavior can lead to significantly delayed client 
failover in scenarios where the network connection is abruptly lost, even with 
inactivity monitoring enabled.

 

Analysis:
 * The producer holds a lock (reconnectMutex) in FailoverTransport while 
performing a write operation.
 * TCP retransmissions caused by network disconnection block the write.
 * Because the write retains the lock, the InactivityMonitor is blocked when 
attempting to fire the InactivityIOException, preventing timely detection of 
inactive clients.

 

Steps to reproduce:
 # Configure a client to connect to a failover URL specifying two or more 
ActiveMQ brokers over TCP/SSL.
 # Configure the client inactivity monitor with the default 30-second timeout.
 # Pull the network cable (or otherwise sever the network connection) to the 
broker the clients are connected to.
 # Observe that the clients hang for several minutes until the network stack 
eventually closes the connection, and then failover to an alternate host 
address.

 

Expected behavior:

Clients should detect inactivity within the configured timeout period, even if 
the network transport is experiencing retransmissions.

 

+Thread dump snippet – InactivityMonitor blocked:+

"ActiveMQ InactivityMonitor Worker 4" #3386 daemon prio=5 os_prio=0 cpu=16.18ms 
elapsed=655.78s tid=0x00007f5a78003020 nid=0xce84 waiting for monitor entry  
[0x00007f59ecd99000]

   java.lang.Thread.State: BLOCKED (on object monitor)

    at 
org.apache.activemq.transport.failover.FailoverTransport.handleTransportFailure(FailoverTransport.java:276)

    - waiting to lock <0x00000000c15946f0> (a java.lang.Object)

    at 
org.apache.activemq.transport.failover.FailoverTransport$3.onException(FailoverTransport.java:226)

    at 
org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:114)

    at 
org.apache.activemq.transport.WireFormatNegotiator.onException(WireFormatNegotiator.java:173)

    at 
org.apache.activemq.transport.AbstractInactivityMonitor.onException(AbstractInactivityMonitor.java:346)

    at 
org.apache.activemq.transport.AbstractInactivityMonitor$5.run(AbstractInactivityMonitor.java:248)

    at 
java.util.concurrent.ThreadPoolExecutor.runWorker([[email protected]/ThreadPoolExecutor.java:1136|mailto:[email protected]/ThreadPoolExecutor.java:1136])

    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run([[email protected]/ThreadPoolExecutor.java:635|mailto:[email protected]/ThreadPoolExecutor.java:635])

    at 
java.lang.Thread.run([[email protected]/Thread.java:840|mailto:[email protected]/Thread.java:840])

 

+Producer blocked by the write that owns the mutex 0x00000000c15946f0:+

"Camel (camelContext) thread #34 - seda://publish" #383 daemon prio=5 os_prio=0 
cpu=1600.83ms elapsed=1357.89s tid=0x00007f5acc2582b0 nid=0x725a runnable

   java.lang.Thread.State: RUNNABLE

    at 
sun.nio.ch.FileDispatcherImpl.write0([[email protected]/Native|mailto:[email protected]/Native]
 Method)

    at 
sun.nio.ch.SocketDispatcher.write([[email protected]/SocketDispatcher.java:62|mailto:[email protected]/SocketDispatcher.java:62])

    at 
sun.nio.ch.NioSocketImpl.tryWrite([[email protected]/NioSocketImpl.java:403|mailto:[email protected]/NioSocketImpl.java:403])

    at 
sun.nio.ch.NioSocketImpl.implWrite([[email protected]/NioSocketImpl.java:418|mailto:[email protected]/NioSocketImpl.java:418])

    at 
sun.nio.ch.NioSocketImpl.write([[email protected]/NioSocketImpl.java:445|mailto:[email protected]/NioSocketImpl.java:445])

    at 
sun.nio.ch.NioSocketImpl$2.write([[email protected]/NioSocketImpl.java:831|mailto:[email protected]/NioSocketImpl.java:831])

    at 
java.net.Socket$SocketOutputStream.write([[email protected]/Socket.java:1035|mailto:[email protected]/Socket.java:1035])

    at 
sun.security.ssl.SSLSocketOutputRecord.deliver([[email protected]/SSLSocketOutputRecord.java:345|mailto:[email protected]/SSLSocketOutputRecord.java:345])

    at 
sun.security.ssl.SSLSocketImpl$AppOutputStream.write([[email protected]/SSLSocketImpl.java:1308|mailto:[email protected]/SSLSocketImpl.java:1308])

    at 
org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115)

    at 
java.io.DataOutputStream.flush([[email protected]/DataOutputStream.java:128|mailto:[email protected]/DataOutputStream.java:128])

    at 
org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:194)

    at 
org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:336)

    at 
org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:318)

    at 
org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:94)

    at 
org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:116)

    at 
org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:670)

    - locked <0x00000000c15946f0> (a java.lang.Object)

    at 
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)

    at 
org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)

    at 
org.apache.activemq.ActiveMQConnection.doAsyncSendPacket(ActiveMQConnection.java:1311)

    at 
org.apache.activemq.ActiveMQConnection.asyncSendPacket(ActiveMQConnection.java:1305)

    at org.apache.activemq.ActiveMQSession.send(ActiveMQSession.java:1965)

    - locked <0x00000000c0c29e08> (a java.lang.Object)

    at 
org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:288)

    at 
org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:223)

    at org.apache.activemq.jms.pool.PooledProducer.send(PooledProducer.java:95)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to