https://bz.apache.org/bugzilla/show_bug.cgi?id=70103
暴兴 <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW --- Comment #2 from 暴兴 <[email protected]> --- Thank you for looking into this. I have captured a thread dump from the production environment that clearly demonstrates this deadlock, regardless of the OS version. The issue is not OS-specific; it is a fundamental behavioral difference between TCP and UDS connect() system calls when the listening side is absent. Here is the exact deadlock captured in the wild: 1. The Acceptor Thread (Not consuming connections) "http-nio-uds-Acceptor" #177 [255] daemon prio=5 os_prio=0 cpu=379673.17ms elapsed=89634.05s tid=0x0000557f50ca7750 nid=255 sleeping [0x00007f63ff595000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep0([email protected]/Native Method) at java.lang.Thread.sleep([email protected]/Unknown Source) at org.apache.tomcat.util.net.Acceptor.run(Acceptor.java:98) at java.lang.Thread.run([email protected]/Unknown Source) Analysis: The Acceptor thread is in TIMED_WAITING (sleeping) at Acceptor.java:98 (likely a brief error-recovery sleep after a transient exception or just between loop cycles). Crucially, it is NOT blocking on accept(), which means it cannot consume any incoming connections from the OS kernel’s accept queue. 2. The Shutdown Thread (Blocked indefinitely on UDS connect) "tomcat-shutdown" #178 [670] prio=5 os_prio=0 cpu=15.99ms elapsed=89381.36s tid=0x0000557f51d36890 nid=670 runnable [0x00007f63fc008000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.UnixDomainSockets.connect0([email protected]/Native Method) at sun.nio.ch.UnixDomainSockets.connect([email protected]/Unknown Source) at sun.nio.ch.UnixDomainSockets.connect([email protected]/Unknown Source) at sun.nio.ch.SocketChannelImpl.connect([email protected]/Unknown Source) at org.apache.tomcat.util.net.NioEndpoint.unlockAccept(NioEndpoint.java:417) at org.apache.tomcat.util.net.AbstractEndpoint.pause(AbstractEndpoint.java:1506) at org.apache.coyote.AbstractProtocol.pause(AbstractProtocol.java:712) at org.apache.catalina.connector.Connector.pause(Connector.java:1010) ... Analysis: The shutdown thread is executing unlockAccept() and is stuck inside the native method UnixDomainSockets.connect0(). Because the Acceptor thread is not calling accept(), the OS kernel cannot complete the UDS connection. Unlike TCP (where the kernel buffers the handshake and connect() returns immediately), UDS connect() blocks until the listening side consumes it. Since there is no timeout configured, it blocks forever. The Deadlock Summary: The Acceptor thread is sleeping/waiting and not calling accept(). The Shutdown thread is blocked infinitely at unlockAccept() -> connect0(), waiting for an accept() that will never happen. The shutdown process is permanently hung. This proves the race condition is real and explains why it’s hard to reproduce in a quiet local environment: if the Acceptor happens to be blocking on accept() when unlockAccept() is called, the connect() succeeds instantly. The bug only triggers when the Acceptor is temporarily outside the accept() call during the exact window of the shutdown sequence. -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
