Hi guys,

today we got our 3rd regression caused by the client-side timeout changes
introduced in 1.5-dev25. And this one is a major one, causing FD leaks
and CPU spins when servers do not advertise a content-length and the
client does not respond to the FIN.  And the worst of it, is I have no
idea how to fix this at all.

I had that bitter feeling when doing these changes a month ago that
they were so much tricky that something was obviously going to break.
It has broken twice already and we could fix the issues. The second
time was quite harder, and we now see the effect of the regressions
and their workarounds spreading like an oil stain on paper, with
workarounds becoming more and more complex and less under control.

So in the end I have reverted all the patches responsible for these
regressions. The purpose of these patches was to report "cD" instead
of "sD" in the logs in the case where a client disappears during a
POST and haproxy has a shorter timeout than the server's.

I'll issue 1.5.1 shortly with the fix before everyone gets hit by busy
loops and lacks of file descriptors. If we find another way to do it
later, we'll try it in 1.6 and may consider backpoting to 1.5 if the
new solution is absolutely safe. But we're very far away from that
situation now.

I'm sorry for this mess just before the release, next time I'll be
stricter about such dangerous changes that I don't feel at ease with.

Willy


Reply via email to