On Tue, Sep 06, 2011 at 07:01:44PM -0400, Chris Burroughs wrote:
> On 09/01/2011 09:04 PM, Chris Burroughs wrote:
> > I've looked at the source code and I think that's what's going on, but
> > it has been a while since I've read C networking code.
> 
> If someone is in a particularly explanatory mood, I'm also trying to
> figure out how haproxy handles the SO_LINGER blocking/throws-away-data
> trap.  Apache httpd for example does this:
> https://github.com/apache/httpd/blob/trunk/server/connection.c#L43

Those are complex issues and we had to perform some changes in the past.
To make it short, by default the system handles "orphans", which are
connections that have been closed but still have unacked data. This is
very common with protocols working in question/response/close mode, as
the server closes after sending the response.

An issue was introduced with keep-alive support in HTTP : the client may
send a new request after the first one. As long as the client waits for
the whole server response, it doesn't cause any issue. But if the client
talks before the end of response, we risk causing the server to emit an
RST and destroy part of the in-flight response. This situation happens
with pipelining, because the client is pushing new requests before the
server responds. In practice, browsers generally don't pipeline after the
first request, so they can detect a server that would systematically close.
But this can still happen if the server is wishing to close several objects
later. What haproxy is doing is to read everything it can on the request
while sending a response, so that we limit the risk of having unacked data
in the kernel buffers in the event of a close. We had to do this recently
because a browser was systematically sending a CRLF approximately one second
after each post, and this CRLF was not consumed.

Since you have no way to be notified when the client has ACKed all the data,
the only remaining solution to this mess is to drain everything from the
client when you want to close. But this is a real mess when you're sending
a 302 or 403 on a POST request ! You have to read all the data you're not
interested in, causing them to pass over the network and taking a lot of
client time, just because you can't be notified that your FIN was read.

Under linux, we're also able to issue a getsockopt() at the TCP level to
check if our data were completely ACKed. But still, this requires active
polling, because you're not notified for that. So if the client receives
your data and disconnects from the net without closing the other side,
you're never notified.

Ideally we should adapt systems so that they can inform apps when it's
possible to close, because the systems themselves do know it. For instance,
we could have poll() return POLLOUT after a shutdown(SHUT_WR) to indicate
that it's now safe to close.

But without this, were doing as most other products : cover the common
cases in a reasonable way, not the perfect way.

Regards,
Willy


Reply via email to