[
https://issues.apache.org/jira/browse/AMQ-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18031236#comment-18031236
]
Mitchell Wagner commented on AMQ-9787:
--------------------------------------
Transport connector:
ssl://0.0.0.0:${activemq.port}?maximumConnections=1000&wireFormat.maxFrameSize=104857600
We use the default maxInactivityDuration
We believe that our issue is on the client side.
> FailoverTransport deadlock on half-open broker connection
> ---------------------------------------------------------
>
> Key: AMQ-9787
> URL: https://issues.apache.org/jira/browse/AMQ-9787
> Project: ActiveMQ Classic
> Issue Type: Bug
> Affects Versions: 5.17.1, 6.1.2
> Reporter: Mitchell Wagner
> Priority: Major
>
> When the network cable is pulled from the host of a remote broker, clients
> experience extended hangs—up to multiple minutes—while waiting for the
> network stack to close the connection. This occurs despite the client
> InactivityMonitor being configured with the default 30-second timeout
> (maxInactivityDuration).
> A thread dump of the client captured during this state indicates that the
> InactivityMonitor is blocked because a write operation holds a mutex, and
> that write is itself blocked while waiting for the network stack to complete
> TCP transmission attempts. This behavior can lead to significantly delayed
> client failover in scenarios where the network connection is abruptly lost,
> even with inactivity monitoring enabled.
> Analysis:
> * The producer holds a lock (reconnectMutex) in FailoverTransport while
> performing a write operation.
> * TCP retransmissions caused by network disconnection block the write.
> * Because the write retains the lock, the InactivityMonitor is blocked when
> attempting to fire the InactivityIOException, preventing timely detection of
> inactive clients.
> Steps to reproduce:
> # Configure a client to connect to a failover URL specifying two or more
> ActiveMQ brokers over TCP/SSL.
> # Configure the client inactivity monitor with the default 30-second timeout.
> # Pull the network cable (or otherwise sever the network connection) to the
> broker the clients are connected to.
> # Observe that the clients hang for several minutes until the network stack
> eventually closes the connection, and then failover to an alternate host
> address.
> Expected behavior:
> Clients should detect inactivity within the configured timeout period, even
> if the network transport is experiencing retransmissions.
> +Thread dump snippet – InactivityMonitor blocked:+
> {noformat}
> "ActiveMQ InactivityMonitor Worker 4" #3386 daemon prio=5 os_prio=0
> cpu=16.18ms elapsed=655.78s tid=0x00007f5a78003020 nid=0xce84 waiting for
> monitor entry [0x00007f59ecd99000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.activemq.transport.failover.FailoverTransport.handleTransportFailure(FailoverTransport.java:276)
> - waiting to lock <0x00000000c15946f0> (a java.lang.Object)
> at
> org.apache.activemq.transport.failover.FailoverTransport$3.onException(FailoverTransport.java:226)
> at
> org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:114)
> at
> org.apache.activemq.transport.WireFormatNegotiator.onException(WireFormatNegotiator.java:173)
> at
> org.apache.activemq.transport.AbstractInactivityMonitor.onException(AbstractInactivityMonitor.java:346)
> at
> org.apache.activemq.transport.AbstractInactivityMonitor$5.run(AbstractInactivityMonitor.java:248)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker([[email protected]/ThreadPoolExecutor.java:1136|mailto:[email protected]/ThreadPoolExecutor.java:1136])
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run([[email protected]/ThreadPoolExecutor.java:635|mailto:[email protected]/ThreadPoolExecutor.java:635])
> at
> java.lang.Thread.run([[email protected]/Thread.java:840|mailto:[email protected]/Thread.java:840]){noformat}
> +Producer blocked by the write that owns the mutex 0x00000000c15946f0:+
> {noformat}
> "Camel (camelContext) thread #34 - seda://publish" #383 daemon prio=5
> os_prio=0 cpu=1600.83ms elapsed=1357.89s tid=0x00007f5acc2582b0 nid=0x725a
> runnable
> java.lang.Thread.State: RUNNABLE
> at
> sun.nio.ch.FileDispatcherImpl.write0([[email protected]/Native|mailto:[email protected]/Native]
> Method)
> at
> sun.nio.ch.SocketDispatcher.write([[email protected]/SocketDispatcher.java:62|mailto:[email protected]/SocketDispatcher.java:62])
> at
> sun.nio.ch.NioSocketImpl.tryWrite([[email protected]/NioSocketImpl.java:403|mailto:[email protected]/NioSocketImpl.java:403])
> at
> sun.nio.ch.NioSocketImpl.implWrite([[email protected]/NioSocketImpl.java:418|mailto:[email protected]/NioSocketImpl.java:418])
> at
> sun.nio.ch.NioSocketImpl.write([[email protected]/NioSocketImpl.java:445|mailto:[email protected]/NioSocketImpl.java:445])
> at
> sun.nio.ch.NioSocketImpl$2.write([[email protected]/NioSocketImpl.java:831|mailto:[email protected]/NioSocketImpl.java:831])
> at
> java.net.Socket$SocketOutputStream.write([[email protected]/Socket.java:1035|mailto:[email protected]/Socket.java:1035])
> at
> sun.security.ssl.SSLSocketOutputRecord.deliver([[email protected]/SSLSocketOutputRecord.java:345|mailto:[email protected]/SSLSocketOutputRecord.java:345])
> at
> sun.security.ssl.SSLSocketImpl$AppOutputStream.write([[email protected]/SSLSocketImpl.java:1308|mailto:[email protected]/SSLSocketImpl.java:1308])
> at
> org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115)
> at
> java.io.DataOutputStream.flush([[email protected]/DataOutputStream.java:128|mailto:[email protected]/DataOutputStream.java:128])
> at
> org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:194)
> at
> org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:336)
> at
> org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:318)
> at
> org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:94)
> at
> org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:116)
> at
> org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:670)
> - locked <0x00000000c15946f0> (a java.lang.Object)
> at
> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
> at
> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
> at
> org.apache.activemq.ActiveMQConnection.doAsyncSendPacket(ActiveMQConnection.java:1311)
> at
> org.apache.activemq.ActiveMQConnection.asyncSendPacket(ActiveMQConnection.java:1305)
> at org.apache.activemq.ActiveMQSession.send(ActiveMQSession.java:1965)
> - locked <0x00000000c0c29e08> (a java.lang.Object)
> at
> org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:288)
> at
> org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:223)
> at
> org.apache.activemq.jms.pool.PooledProducer.send(PooledProducer.java:95){noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact