[ 
https://issues.apache.org/jira/browse/AMQ-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18030862#comment-18030862
 ] 

Jean-Baptiste Onofré commented on AMQ-9787:
-------------------------------------------

Can you share the transport connector URL from the activemq.xml ? Especially if 
you have inactivity monitor configuration.

> FailoverTransport deadlock on half-open broker connection
> ---------------------------------------------------------
>
>                 Key: AMQ-9787
>                 URL: https://issues.apache.org/jira/browse/AMQ-9787
>             Project: ActiveMQ Classic
>          Issue Type: Bug
>    Affects Versions: 5.17.1, 6.1.2
>            Reporter: Mitchell Wagner
>            Priority: Major
>
> When the network cable is pulled from the host of a remote broker, clients 
> experience extended hangs—up to multiple minutes—while waiting for the 
> network stack to close the connection. This occurs despite the client 
> InactivityMonitor being configured with the default 30-second timeout 
> (maxInactivityDuration).
> A thread dump of the client captured during this state indicates that the 
> InactivityMonitor is blocked because a write operation holds a mutex, and 
> that write is itself blocked while waiting for the network stack to complete 
> TCP transmission attempts. This behavior can lead to significantly delayed 
> client failover in scenarios where the network connection is abruptly lost, 
> even with inactivity monitoring enabled.
> Analysis:
>  * The producer holds a lock (reconnectMutex) in FailoverTransport while 
> performing a write operation.
>  * TCP retransmissions caused by network disconnection block the write.
>  * Because the write retains the lock, the InactivityMonitor is blocked when 
> attempting to fire the InactivityIOException, preventing timely detection of 
> inactive clients.
> Steps to reproduce:
>  # Configure a client to connect to a failover URL specifying two or more 
> ActiveMQ brokers over TCP/SSL.
>  # Configure the client inactivity monitor with the default 30-second timeout.
>  # Pull the network cable (or otherwise sever the network connection) to the 
> broker the clients are connected to.
>  # Observe that the clients hang for several minutes until the network stack 
> eventually closes the connection, and then failover to an alternate host 
> address.
> Expected behavior:
> Clients should detect inactivity within the configured timeout period, even 
> if the network transport is experiencing retransmissions.
> +Thread dump snippet – InactivityMonitor blocked:+
> {noformat}
> "ActiveMQ InactivityMonitor Worker 4" #3386 daemon prio=5 os_prio=0 
> cpu=16.18ms elapsed=655.78s tid=0x00007f5a78003020 nid=0xce84 waiting for 
> monitor entry  [0x00007f59ecd99000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at 
> org.apache.activemq.transport.failover.FailoverTransport.handleTransportFailure(FailoverTransport.java:276)
>     - waiting to lock <0x00000000c15946f0> (a java.lang.Object)
>     at 
> org.apache.activemq.transport.failover.FailoverTransport$3.onException(FailoverTransport.java:226)
>     at 
> org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:114)
>     at 
> org.apache.activemq.transport.WireFormatNegotiator.onException(WireFormatNegotiator.java:173)
>     at 
> org.apache.activemq.transport.AbstractInactivityMonitor.onException(AbstractInactivityMonitor.java:346)
>     at 
> org.apache.activemq.transport.AbstractInactivityMonitor$5.run(AbstractInactivityMonitor.java:248)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker([[email protected]/ThreadPoolExecutor.java:1136|mailto:[email protected]/ThreadPoolExecutor.java:1136])
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run([[email protected]/ThreadPoolExecutor.java:635|mailto:[email protected]/ThreadPoolExecutor.java:635])
>     at 
> java.lang.Thread.run([[email protected]/Thread.java:840|mailto:[email protected]/Thread.java:840]){noformat}
> +Producer blocked by the write that owns the mutex 0x00000000c15946f0:+
> {noformat}
> "Camel (camelContext) thread #34 - seda://publish" #383 daemon prio=5 
> os_prio=0 cpu=1600.83ms elapsed=1357.89s tid=0x00007f5acc2582b0 nid=0x725a 
> runnable
>    java.lang.Thread.State: RUNNABLE
>     at 
> sun.nio.ch.FileDispatcherImpl.write0([[email protected]/Native|mailto:[email protected]/Native]
>  Method)
>     at 
> sun.nio.ch.SocketDispatcher.write([[email protected]/SocketDispatcher.java:62|mailto:[email protected]/SocketDispatcher.java:62])
>     at 
> sun.nio.ch.NioSocketImpl.tryWrite([[email protected]/NioSocketImpl.java:403|mailto:[email protected]/NioSocketImpl.java:403])
>     at 
> sun.nio.ch.NioSocketImpl.implWrite([[email protected]/NioSocketImpl.java:418|mailto:[email protected]/NioSocketImpl.java:418])
>     at 
> sun.nio.ch.NioSocketImpl.write([[email protected]/NioSocketImpl.java:445|mailto:[email protected]/NioSocketImpl.java:445])
>     at 
> sun.nio.ch.NioSocketImpl$2.write([[email protected]/NioSocketImpl.java:831|mailto:[email protected]/NioSocketImpl.java:831])
>     at 
> java.net.Socket$SocketOutputStream.write([[email protected]/Socket.java:1035|mailto:[email protected]/Socket.java:1035])
>     at 
> sun.security.ssl.SSLSocketOutputRecord.deliver([[email protected]/SSLSocketOutputRecord.java:345|mailto:[email protected]/SSLSocketOutputRecord.java:345])
>     at 
> sun.security.ssl.SSLSocketImpl$AppOutputStream.write([[email protected]/SSLSocketImpl.java:1308|mailto:[email protected]/SSLSocketImpl.java:1308])
>     at 
> org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:115)
>     at 
> java.io.DataOutputStream.flush([[email protected]/DataOutputStream.java:128|mailto:[email protected]/DataOutputStream.java:128])
>     at 
> org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:194)
>     at 
> org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:336)
>     at 
> org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:318)
>     at 
> org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:94)
>     at 
> org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:116)
>     at 
> org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:670)
>     - locked <0x00000000c15946f0> (a java.lang.Object)
>     at 
> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
>     at 
> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
>     at 
> org.apache.activemq.ActiveMQConnection.doAsyncSendPacket(ActiveMQConnection.java:1311)
>     at 
> org.apache.activemq.ActiveMQConnection.asyncSendPacket(ActiveMQConnection.java:1305)
>     at org.apache.activemq.ActiveMQSession.send(ActiveMQSession.java:1965)
>     - locked <0x00000000c0c29e08> (a java.lang.Object)
>     at 
> org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:288)
>     at 
> org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:223)
>     at 
> org.apache.activemq.jms.pool.PooledProducer.send(PooledProducer.java:95){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
For further information, visit: https://activemq.apache.org/contact


Reply via email to