On Mon, Oct 08, 2012 at 05:52:40PM -0700, Jesper Noehr wrote:
> On Fri, Oct 5, 2012 at 11:39 PM, Willy Tarreau <[email protected]> wrote:
> > On Fri, Oct 05, 2012 at 05:01:39PM -0700, Jesper Noehr wrote:
> >> >> I realize 1.5-dev12 has SSL support, but this is quite
> >> >> recent, so we're using the stud->haproxy setup still.
> >> >
> >> > I understand :-) There are some brave users anyway who helped us spot
> >> > a number of issues, but we're not finding that many bugs anymore.
> >>
> >> We'd love to have SSL termination inside haproxy for all our needs;
> >> less moving parts makes these things a lot easier!
> >
> > well, we intend to use the upcoming -dev13 in our next aloha LB
> > appliance, so you can expect that we're doing our best to ensure it
> > will be rock solid :-)
> >
> >> I can reproduce this on a fairly consistent basis on a Windows laptop
> >> we have sitting around. It fails less often on linux/OSX, if at all.
> >> We haven't been able to reproduce it on those systems, and we haven't
> >> had any reports from customers either, although they could've just
> >> never reported it.
> >
> > That's very possible. Everytime I take network traces on a production
> > system, I ask myself how the hell people don't complain!
> >
> >> Is there anything else you can think of? I'm almost willing to try
> >> anything at this point.
> >
> > If you can easily reproduce it with this laptop, it would be interesting
> > to test it on the LAN and over the net (eg: ADSL line) to see the effect
> > of latency. I'd really bet it's only a timing issue during some operation.
> >
> > You can also sniff the traffic between this laptop and stud, we might
> > already notice something strange once you isolate a faulty session from
> > haproxy's logs.
> 
> I have a packet dump from a failing session. Can anyone make any sense
> of this? http://cl.ly/21332L3r1e35

Thanks for this capture. What I'm seeing there (port 1473) is that
the client is sending data and suddenly the ssl server closes the
connection during the transfer after 800kB of exchanged data and
well before the end.

One possibility could have been that the server returned a response
before the end of the upload, but this possibility is ruled out by
the haproxy log which says that it's the client that has aborted the
request before the server responded. And from haproxy's point of view,
the client is stud.

So we're left with this : at some point during the transfer, all we
know is that stud closes the connection to both the client and haproxy,
supposedly at the same time.

I'm seeing something particularly unusual which could explain why it
happens more with this type of client than with something else. If you
look closely, you'll see that the client uploads data in 536 bytes
segments. This results in jerky transfers because the congestion window
takes a long time to converge with "long" pauses observed till the end
of the transfer. It's not a problem as it is, but it may induce a
different behaviour than what other clients do, which could possibly
trigger a race condition somewhere (typically an unexpected behaviour
with an empty buffer).

At this point it would be nice to take a similar capture between stud
and haproxy to confirm that the first FIN or RST is sent by stud (just
take a look at it, do not send it as it will be in clear). Then it would
confirm we'd have a bug in stud, and probably that increasing its
debugging level could help. If the bug is confirmed in stud, then I
think the stud guys will need more info, because by experience, such
issues are very hard to debug !

Regards,
Willy


Reply via email to