DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUGĀ· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=37896>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED ANDĀ· INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=37896 Summary: FastAsyncSocketSender blocks all threads on socket error Product: Tomcat 5 Version: 5.5.12 Platform: Other OS/Version: other Status: NEW Severity: normal Priority: P2 Component: Catalina:Cluster AssignedTo: tomcat-dev@jakarta.apache.org ReportedBy: [EMAIL PROTECTED] If one server fails "badly" (I believe resulting in a socket time out error) the FastAsyncSocketSender is locked by a thread and causes a backlog on all subsequent http threads causing the entire machine to run out of sockets. Details below : Default cluster settings : <Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster" /> We have mutlipele web machines (6 of them). Something really bad happened at our data center (not sure what, cable fault, some dweeb tripped on our ethernet, don't quite know yet) causing one of our web servers to die. The rest of the machines then back logged trying to replicate to the dead machine, which caused all the web servers to fill up the max threads causing a site outtage. We took stack traces at the point in time where we had to restart the tomcat process, what I believe to be the relavent stack traces are included below. You can see one of the http threads (143) is trying to replicate synchronously (which I found odd using fastasynch but okay) I believe this thread is stuck on a 2 minute socket time out and currently holds a lock on FastAsych. Notice the Cluster-MembershipReceiver thread is waiting for the fastAsynch object and currently holds a lock on ReplicationTransmitter. Notice Http thread (147) is waiting on ReplicationTransmitter. As a result I have about 298 other Http threads all waiting on ReplicationTransmitter. I had 300 threads configured. Now I realised after a "while" the socket will time out and it'll all work itself out but our site was stuck in this mode for over 10 minutes so I think this is kind of a bug on the basis that 1 machine dying (albiet badly) shouldn't cause all other machines to backlog at all. ------------ "http-80-Processor143" daemon prio=1 tid=0x084ad748 nid=0x6953 runnable [0x7e7bf000..0x7e7bf63c] at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:124) at org.apache.catalina.cluster.tcp.DataSender.writeData(DataSender.java:830) at org.apache.catalina.cluster.tcp.DataSender.pushMessage(DataSender.java:772) at org.apache.catalina.cluster.tcp.DataSender.sendMessage(DataSender.java:598) - locked <0x4e7864f8> (a org.apache.catalina.cluster.tcp.FastAsyncSocketSender) at org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessageData(ReplicationTransmitter.java:868) at org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessageClusterDomain(ReplicationTransmitter.java:460) at org.apache.catalina.cluster.tcp.SimpleTcpCluster.sendClusterDomain(SimpleTcpCluster.java:1017) at org.apache.catalina.cluster.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:333) at org.apache.catalina.cluster.tcp.ReplicationValve.invoke(ReplicationValve.java:271) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.valves.FastCommonAccessLogValve.invoke(FastCommonAccessLogValve.java:495) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:868) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:663) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595) "Cluster-MembershipReceiver" daemon prio=1 tid=0x78804ad8 nid=0x661c waiting for monitor entry [0x786ff000..0x786ff73c] at org.apache.catalina.cluster.tcp.DataSender.disconnect(DataSender.java:560) - waiting to lock <0x4e7864f8> (a org.apache.catalina.cluster.tcp.FastAsyncSocketSender) at org.apache.catalina.cluster.tcp.FastAsyncSocketSender.disconnect(FastAsyncSocketSender.java:295) at org.apache.catalina.cluster.tcp.ReplicationTransmitter.remove(ReplicationTransmitter.java:689) - locked <0x4e7a4e68> (a org.apache.catalina.cluster.tcp.ReplicationTransmitter) at org.apache.catalina.cluster.tcp.SimpleTcpCluster.memberDisappeared(SimpleTcpCluster.java:1124) at org.apache.catalina.cluster.mcast.McastService.memberDisappeared(McastService.java:455) at org.apache.catalina.cluster.mcast.McastServiceImpl.receive(McastServiceImpl.java:221) at org.apache.catalina.cluster.mcast.McastServiceImpl$ReceiverThread.run(McastServiceImpl.java:253) "http-80-Processor147" daemon prio=1 tid=0x084b1208 nid=0x6957 waiting for monitor entry [0x7e8bf000..0x7e8bf83c] at org.apache.catalina.cluster.tcp.ReplicationTransmitter.addStats(ReplicationTransmitter.java:702) - waiting to lock <0x4e7a4e68> (a org.apache.catalina.cluster.tcp.ReplicationTransmitter) at org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessageData(ReplicationTransmitter.java:870) at org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessageClusterDomain(ReplicationTransmitter.java:460) at org.apache.catalina.cluster.tcp.SimpleTcpCluster.sendClusterDomain(SimpleTcpCluster.java:1017) at org.apache.catalina.cluster.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:333) at org.apache.catalina.cluster.tcp.ReplicationValve.invoke(ReplicationValve.java:271) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.valves.FastCommonAccessLogValve.invoke(FastCommonAccessLogValve.java:495) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:868) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:663) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595) -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]