Hey Willy,
Thanks for the quick response!

On Thu, Feb 20, 2014 at 1:57 PM, Willy Tarreau <[email protected]> wrote:

> Hi Andy,
>
> On Thu, Feb 20, 2014 at 11:41:20AM -0600, Andy Walker wrote:
> > This is something that looks like it's come up a couple of times on the
> > mailing list:
> > http://thread.gmane.org/gmane.comp.web.haproxy/3253
> > http://thread.gmane.org/gmane.comp.web.haproxy/6494
> >
> > and maybe even a couple of times on StackExchange:
> > http://serverfault.com/questions/291467/haproxy-badreq-errors
> >
> http://serverfault.com/questions/285850/haproxy-display-a-badreq-badreqs-by-the-thousands
> >
> > Basically, we're having the same problem, but I think I may have come up
> > with the cause -- at least in our situation. It all began as sporadic
> > reports of users getting "The website is too busy to show the webpage --
> > HTTP 408/HTTP 409" errors. The common thread seemed to be that it was
> > always IE users. We weren't seeing anything in the logs (because we had
> *option
> > dontlognull* turned on), and we were unable to reproduce this in-house.
> > However, yesterday, we were finally able to, and I was able to get some
> > packet dumps at the same time.
> >
> > As it turns out, IE seemed to speak a somewhat mangled version of TCP.
> > Instead of honoring keep-alive timeouts and closing the connection when a
> > server would send a FIN, it would do the following.
> >
> > - Client makes request (success!)
> > - timeout http-keep-alive is reached, haproxy sends FIN to client
> > - IE ACKs the FIN, but doesn't send its own FIN (connection is in a
> > half-open/half-closed state)
>
> That's somewhat common, many clients (often webservice clients) do not
> monitor their idle connections at all and they don't care about them
> being closed, dead or in error. It's just a matter of total incompetence
> of their developers but the end user is directly affected when some
> requests are sent over a dead connection.
>
> > - timeout http-request is reached
> > - IE tries to send another request on the connection, haproxy replies
> with
> > a 408
>
> Here there's something I don't understand : if the FIN was already sent
> by haproxy, then nothing else can be sent over the connection in that
> direction. So I understand that you get the 408 in the logs instead,
> right ? The other possibility is that IE just reads the 408 that was
> reported *before* the FIN.
>
> By the way, I'm thinking about something else : haproxy does not emit
> 408 when waiting for a second request, precisely to avoid returning
> an error in this case. It should silently close the connection. Could
> you indicate what exact version you're using, it's possible that we
> have a regression or a corner case which is not correctly handled ?
>

I think you're onto something here. I spent the better part of two days
trying to recreate this again, since my foolish self forgot to save the
pcap file from before, but I finally came up with something. Here's what it
looks like is happening, but it still doesn't quite make sense to me.
- IE opens a handful of connections, whether or not it's going to use all
of them
- It makes the requests it needs on one/some of the connections. All is
good so far
- 10 seconds later (what I am using as both http-request and
http-keep-ailve timeouts), one of the unused connections receives a "408
Request Time-out" from haproxy, seemingly out of the blue. There were no
requests sent, AFAICT.
- After a while, IE tries to make an actual request on this connection, and
at this point, I believe it reads the previously received response, which
was a 408.
- BAM!

We're running version 1.5-dev19 2013/06/17 -- sorry, I've been planning on
getting that more up-to-date, but my schedule and problems with
procrastination are making that tough :)

Also, I'm not sure if this makes a difference, but the two times I was able
to reproduce this today (yes, two times... the first time, Wireshark
crashed *sigh*), the connections/requests were HTTP over SSL. The whole SSL
bit is going to make supplying a packet capture a bit difficult, but I'll
see what I can do, and likely email it directly to you if that's ok.


> > Apparently, as it turns out, IIS plays nice in this situation, and just
> > returns successful responses.
>
> It is not possible after the FIN. From what I remember, IIS does not really
> apply keep-alive timeout but uses the same request timeout for all
> requests,
> but I may be wrong.
>
> > Other webservers' possible responses are
> > either to send a RST, or not respond at all -- in both of those
> situations,
> > IE will apparently resend the request. However, in our case, HAProxy is
> > returning with a 408 (which seems pretty polite to me, and the right
> thing
> > to do), and IE isn't bothering to resend the request.
> >
> > Two articles that I found that support my theory and findings are:
> > http://grotto11.com/blog/slash.html?+1039831658
>
> This one is slightly different.
>
> > and
> > http://support.f5.com/kb/en-us/solutions/public/1000/600/sol1672.html
>
> This one is closer to what you describe and supports the broken MSIE
> behaviour.
>
> > So, our temporary solution was to keep the http-keep-alive timeout at
> 10s,
> > but to raise the http-request time limit to 150s. Not ideal, and it's
> > definitely causing our concurrent connections to be much higher, but it's
> > preventing IE users from getting shafted for now.
> >
> > So, my question -- is there a current way that I can get around HAProxy
> > being polite and sending 408 responses? It looks like I could use *option
> > nolinger*, but that seems a little dirty for what I'm trying to
> accomplish.
>
> Normally it should not do this during keep-alives (except due to a bug).
>
> > Aside from that question, I mainly just wanted to report my findings back
> > to the community. My apologies if this was common knowledge, but it took
> > long enough for me to track down that I'm thinking it might still be a
> > mystery to some people.
>
> No don't worry, your report is very welcome, because I'm sure you have
> not made it up, and at the same time it should not happen if we did our
> job correctly, so let's try to find out exactly what happens to fix it
> correctly!
>

Thanks for your amazing willingness to help with problems!


>
> Thanks,
> Willy
>
> --
Andy Walker
System Administrator
FBS - creators of flexmls
3415 39th St S
Fargo, ND  58104
701-235-7300

Reply via email to