Hey Willy, Thanks for the quick response! On Thu, Feb 20, 2014 at 1:57 PM, Willy Tarreau <[email protected]> wrote:
> Hi Andy, > > On Thu, Feb 20, 2014 at 11:41:20AM -0600, Andy Walker wrote: > > This is something that looks like it's come up a couple of times on the > > mailing list: > > http://thread.gmane.org/gmane.comp.web.haproxy/3253 > > http://thread.gmane.org/gmane.comp.web.haproxy/6494 > > > > and maybe even a couple of times on StackExchange: > > http://serverfault.com/questions/291467/haproxy-badreq-errors > > > http://serverfault.com/questions/285850/haproxy-display-a-badreq-badreqs-by-the-thousands > > > > Basically, we're having the same problem, but I think I may have come up > > with the cause -- at least in our situation. It all began as sporadic > > reports of users getting "The website is too busy to show the webpage -- > > HTTP 408/HTTP 409" errors. The common thread seemed to be that it was > > always IE users. We weren't seeing anything in the logs (because we had > *option > > dontlognull* turned on), and we were unable to reproduce this in-house. > > However, yesterday, we were finally able to, and I was able to get some > > packet dumps at the same time. > > > > As it turns out, IE seemed to speak a somewhat mangled version of TCP. > > Instead of honoring keep-alive timeouts and closing the connection when a > > server would send a FIN, it would do the following. > > > > - Client makes request (success!) > > - timeout http-keep-alive is reached, haproxy sends FIN to client > > - IE ACKs the FIN, but doesn't send its own FIN (connection is in a > > half-open/half-closed state) > > That's somewhat common, many clients (often webservice clients) do not > monitor their idle connections at all and they don't care about them > being closed, dead or in error. It's just a matter of total incompetence > of their developers but the end user is directly affected when some > requests are sent over a dead connection. > > > - timeout http-request is reached > > - IE tries to send another request on the connection, haproxy replies > with > > a 408 > > Here there's something I don't understand : if the FIN was already sent > by haproxy, then nothing else can be sent over the connection in that > direction. So I understand that you get the 408 in the logs instead, > right ? The other possibility is that IE just reads the 408 that was > reported *before* the FIN. > > By the way, I'm thinking about something else : haproxy does not emit > 408 when waiting for a second request, precisely to avoid returning > an error in this case. It should silently close the connection. Could > you indicate what exact version you're using, it's possible that we > have a regression or a corner case which is not correctly handled ? > I think you're onto something here. I spent the better part of two days trying to recreate this again, since my foolish self forgot to save the pcap file from before, but I finally came up with something. Here's what it looks like is happening, but it still doesn't quite make sense to me. - IE opens a handful of connections, whether or not it's going to use all of them - It makes the requests it needs on one/some of the connections. All is good so far - 10 seconds later (what I am using as both http-request and http-keep-ailve timeouts), one of the unused connections receives a "408 Request Time-out" from haproxy, seemingly out of the blue. There were no requests sent, AFAICT. - After a while, IE tries to make an actual request on this connection, and at this point, I believe it reads the previously received response, which was a 408. - BAM! We're running version 1.5-dev19 2013/06/17 -- sorry, I've been planning on getting that more up-to-date, but my schedule and problems with procrastination are making that tough :) Also, I'm not sure if this makes a difference, but the two times I was able to reproduce this today (yes, two times... the first time, Wireshark crashed *sigh*), the connections/requests were HTTP over SSL. The whole SSL bit is going to make supplying a packet capture a bit difficult, but I'll see what I can do, and likely email it directly to you if that's ok. > > Apparently, as it turns out, IIS plays nice in this situation, and just > > returns successful responses. > > It is not possible after the FIN. From what I remember, IIS does not really > apply keep-alive timeout but uses the same request timeout for all > requests, > but I may be wrong. > > > Other webservers' possible responses are > > either to send a RST, or not respond at all -- in both of those > situations, > > IE will apparently resend the request. However, in our case, HAProxy is > > returning with a 408 (which seems pretty polite to me, and the right > thing > > to do), and IE isn't bothering to resend the request. > > > > Two articles that I found that support my theory and findings are: > > http://grotto11.com/blog/slash.html?+1039831658 > > This one is slightly different. > > > and > > http://support.f5.com/kb/en-us/solutions/public/1000/600/sol1672.html > > This one is closer to what you describe and supports the broken MSIE > behaviour. > > > So, our temporary solution was to keep the http-keep-alive timeout at > 10s, > > but to raise the http-request time limit to 150s. Not ideal, and it's > > definitely causing our concurrent connections to be much higher, but it's > > preventing IE users from getting shafted for now. > > > > So, my question -- is there a current way that I can get around HAProxy > > being polite and sending 408 responses? It looks like I could use *option > > nolinger*, but that seems a little dirty for what I'm trying to > accomplish. > > Normally it should not do this during keep-alives (except due to a bug). > > > Aside from that question, I mainly just wanted to report my findings back > > to the community. My apologies if this was common knowledge, but it took > > long enough for me to track down that I'm thinking it might still be a > > mystery to some people. > > No don't worry, your report is very welcome, because I'm sure you have > not made it up, and at the same time it should not happen if we did our > job correctly, so let's try to find out exactly what happens to fix it > correctly! > Thanks for your amazing willingness to help with problems! > > Thanks, > Willy > > -- Andy Walker System Administrator FBS - creators of flexmls 3415 39th St S Fargo, ND 58104 701-235-7300

