On Tue, Sep 06, 2011 at 07:01:44PM -0400, Chris Burroughs wrote: > On 09/01/2011 09:04 PM, Chris Burroughs wrote: > > I've looked at the source code and I think that's what's going on, but > > it has been a while since I've read C networking code. > > If someone is in a particularly explanatory mood, I'm also trying to > figure out how haproxy handles the SO_LINGER blocking/throws-away-data > trap. Apache httpd for example does this: > https://github.com/apache/httpd/blob/trunk/server/connection.c#L43
Those are complex issues and we had to perform some changes in the past. To make it short, by default the system handles "orphans", which are connections that have been closed but still have unacked data. This is very common with protocols working in question/response/close mode, as the server closes after sending the response. An issue was introduced with keep-alive support in HTTP : the client may send a new request after the first one. As long as the client waits for the whole server response, it doesn't cause any issue. But if the client talks before the end of response, we risk causing the server to emit an RST and destroy part of the in-flight response. This situation happens with pipelining, because the client is pushing new requests before the server responds. In practice, browsers generally don't pipeline after the first request, so they can detect a server that would systematically close. But this can still happen if the server is wishing to close several objects later. What haproxy is doing is to read everything it can on the request while sending a response, so that we limit the risk of having unacked data in the kernel buffers in the event of a close. We had to do this recently because a browser was systematically sending a CRLF approximately one second after each post, and this CRLF was not consumed. Since you have no way to be notified when the client has ACKed all the data, the only remaining solution to this mess is to drain everything from the client when you want to close. But this is a real mess when you're sending a 302 or 403 on a POST request ! You have to read all the data you're not interested in, causing them to pass over the network and taking a lot of client time, just because you can't be notified that your FIN was read. Under linux, we're also able to issue a getsockopt() at the TCP level to check if our data were completely ACKed. But still, this requires active polling, because you're not notified for that. So if the client receives your data and disconnects from the net without closing the other side, you're never notified. Ideally we should adapt systems so that they can inform apps when it's possible to close, because the systems themselves do know it. For instance, we could have poll() return POLLOUT after a shutdown(SHUT_WR) to indicate that it's now safe to close. But without this, were doing as most other products : cover the common cases in a reasonable way, not the perfect way. Regards, Willy

