Chad David wrote: > Has anyone noticed (or fixed) a bug in -current where socket connections > on the local machine do not shutdown properly? During stress testing > I'm seeing thousands (2316 right now) of these: > > tcp4 0 0 192.168.1.2.8080 192.168.1.2.2215 FIN_WAIT_2 > tcp4 0 0 192.168.1.2.2215 192.168.1.2.8080 LAST_ACK > > Both the client and the server are dead, but the connections stay in this > state. > > I tested with the server on -current and the client on another box, and > all of the server sockets end up in TIME_WAIT. Is there something delaying > the last ack on local connections?
A connection goes into FIN_WAIT_2 when it has received the ACK of the FIN, but not received a FIN (or sent an ACK) itself, thus permitting it to enter TIME_WAIT state for 2MSL before proceeding to the CLOSED state, as a result of a server initiated close. A connection goes into LAST_ACK when it has sent a FIN and not received the ACK of the FIN before proceeding to the CLOSED state, as a result of a client initiated close. Since it's showing IP addresses, you appear to be using real network connections, rather than loopback connections. There are basically several ways to cause this: 1) You have something on your network, like a dummynet, that is deteministically dropping the the ACK to the client when the server goes from FIN_WAIT_1, so that the server goes to CLOSING instead of going to FIN_WAIT_2 (client closes first), or the FIN in the other direction so that the server doesn't go to TIME_WAIT from FIN_WAIT_2 (server closes first). 2) You have intentionally disabled KEEPALIVE, so that a close results in an RST instead of a normal shutdown of the TCP connection (I can't tell if you are doing a real call to "shutdown(2)", or if you are just relying on the OS resource tracking behaviour that is implicit to "close(2)" (but only if you don't set KEEPALIVE, and have disabled the sysctl default of always doing KEEPALIVE on every connection). In this case, it's possible that the RST was lost on the wire, and since RSTs are not retransmitted, you have shot yourself in the foot. Note: You often see this type of foolish foot shooting when running MAST, WAST, or webbench, which try to factor out response speed and measure connection speed, so that they benchmark the server, not the FS or other OS latencies in the document delivery path (which is why these tools suck as real world benchmarks go). You could also cause this (unlikely) with a bad firewall rule. 3) You've exhausted your mbufs before you've exhausted the number of simultaneous connections you are permitted, because you have incorrectly tuned your kernel, and therefore all your connections are sitting in a starvation deadlock, waiting for packets that can never be sent because there are no mbufs available. 4) You've got local hacks that your aren't telling us about (shame on you!). 5) You have found an introduced bug in -current. Note: I personally think this one is unlikely. 6) Maybe something I haven't thought of... Note: I personally think this one is unlikely, too... ;^) See RFC 793 (or Stevens) for details on the state machine for both ends of the connection, and you will see how your machine got into this mess in the first place. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message