On Mon, Oct 08, 2012 at 05:52:40PM -0700, Jesper Noehr wrote: > On Fri, Oct 5, 2012 at 11:39 PM, Willy Tarreau <[email protected]> wrote: > > On Fri, Oct 05, 2012 at 05:01:39PM -0700, Jesper Noehr wrote: > >> >> I realize 1.5-dev12 has SSL support, but this is quite > >> >> recent, so we're using the stud->haproxy setup still. > >> > > >> > I understand :-) There are some brave users anyway who helped us spot > >> > a number of issues, but we're not finding that many bugs anymore. > >> > >> We'd love to have SSL termination inside haproxy for all our needs; > >> less moving parts makes these things a lot easier! > > > > well, we intend to use the upcoming -dev13 in our next aloha LB > > appliance, so you can expect that we're doing our best to ensure it > > will be rock solid :-) > > > >> I can reproduce this on a fairly consistent basis on a Windows laptop > >> we have sitting around. It fails less often on linux/OSX, if at all. > >> We haven't been able to reproduce it on those systems, and we haven't > >> had any reports from customers either, although they could've just > >> never reported it. > > > > That's very possible. Everytime I take network traces on a production > > system, I ask myself how the hell people don't complain! > > > >> Is there anything else you can think of? I'm almost willing to try > >> anything at this point. > > > > If you can easily reproduce it with this laptop, it would be interesting > > to test it on the LAN and over the net (eg: ADSL line) to see the effect > > of latency. I'd really bet it's only a timing issue during some operation. > > > > You can also sniff the traffic between this laptop and stud, we might > > already notice something strange once you isolate a faulty session from > > haproxy's logs. > > I have a packet dump from a failing session. Can anyone make any sense > of this? http://cl.ly/21332L3r1e35
Thanks for this capture. What I'm seeing there (port 1473) is that the client is sending data and suddenly the ssl server closes the connection during the transfer after 800kB of exchanged data and well before the end. One possibility could have been that the server returned a response before the end of the upload, but this possibility is ruled out by the haproxy log which says that it's the client that has aborted the request before the server responded. And from haproxy's point of view, the client is stud. So we're left with this : at some point during the transfer, all we know is that stud closes the connection to both the client and haproxy, supposedly at the same time. I'm seeing something particularly unusual which could explain why it happens more with this type of client than with something else. If you look closely, you'll see that the client uploads data in 536 bytes segments. This results in jerky transfers because the congestion window takes a long time to converge with "long" pauses observed till the end of the transfer. It's not a problem as it is, but it may induce a different behaviour than what other clients do, which could possibly trigger a race condition somewhere (typically an unexpected behaviour with an empty buffer). At this point it would be nice to take a similar capture between stud and haproxy to confirm that the first FIN or RST is sent by stud (just take a look at it, do not send it as it will be in clear). Then it would confirm we'd have a bug in stud, and probably that increasing its debugging level could help. If the bug is confirmed in stud, then I think the stud guys will need more info, because by experience, such issues are very hard to debug ! Regards, Willy

