Hi Ryan,

On Tue, May 18, 2021 at 10:54:11AM -0500, Ryan O'Hara wrote:
> > > I confirmed haoproxy's log message corresponded to the
> > > shorter html, the following line is one of them.
> > >
> > > 2021-03-23T15:54:46.869626+09:00 lb01 [err] haproxy[703957]:
> > > 192.168.1.73:60572 [23/Mar/2021:15:54:46.868] vs_http01
> > rs_http01web/web01
> > > 0/0/0/0/0 200 2896 - - SD-- 12/12/7/2/0 0/0 "GET
> > > /content01.html?x=170026897 HTTP/1.1"
> >
> 
> This is exactly the same problem as I reported on the mailing list a couple
> weeks ago. I accidentally replied to off-list to Kazuhiro but will continue
> the conversation here.

Excellent, that will be one less issue to chase!

> > > So I commented out "fdtab[fd].ev &= FD_POLL_STICKY;"
> > > from both of haproxy-1.8.25 and haproxy-1.8.30,
> > > then the behavior is resolved.
> >
> > This is very strange. I could almost have understood the opposite, i.e.
> > remove an activity report before leaving so that we don't wake up, but
> > here *not* removing the flags indicates that we're leaving the FD
> > reports for another call.
> >
> > > I don't know why this commit resolves the behavior,
> > > I just tried and ran the test.
> >
> > What I *suspect* is happening is the following:
> >
> >   - the server closes with a TCP reset
> >   - sometimes for an unknown reason we do not process the event
> >     before leaving this function
> >   - we then flush the activity flags
> >   - the reset flag is still there, reported as POLLERR, forcing
> >     an immediate subsequent wakeup
> >   - the I/O handler sees POLLERR without POLLIN and concludes this is
> >     a hard error and doesn't even bother to try to read, resulting in
> >     the loss of the pending segments.
> >
> 
> In my original message, I included a portion of the tcpdump and the RST
> package is being sent by haproxy to the server. I never see a TCP reset
> from the server itself.

Ah yes, seeing it now. Thanks, this will help!

> Under wireshark, I can see that the HTTP response is a total of 3 segments,
> and as far as I can tell all three segments were received by haproxy.

In fact what you know from a trace is that they're received by the NIC,
you can know they're received by the TCP stack when you see them ACKed,
and you know they're received by haproxy if you're seeing haproxy pass
them on the other side. I mean, most of the case something that arrives
to the stack will reach haproxy but there are some situations where this
will not happen, such as  if haproxy closes before, or if some errors
are detected early and prevent it from reading. The difference is subtle
but it explains how/why some error flags may be reported indicating an
error at the transport layer, with the transport layer including the
lower layers of haproxy itself.

> Pardon if this next bit doesn't make sense, but the third and final segment
> is shown under wireshark as the HTTP response itself. In other words:
> 
> Segment #1: 2896 bytes
> Segment #2: 1448 bytes
> Segment #3: 1059 bytes <- this final frame also includes the HTTP response
> header

No that's something common and weird about wireshark/tshark, it logs the
status code on the last segment. Someone once told me why he found that
convenient but I since forgot, considering all the cases where I find this
misleading! Otherwise just use tcpdump, it doesn't try to be smart and to
report lies, it will not cheat on you.

So in your trace we can see that the stack ACKs receipt of the first two
segments (2896 bytes of data), then the 3rd segment (1448). Then haproxy
closes "cleanly" (something it shouldn't do on this side, at the risk of
leaving TIME_WAIT sockets). Maybe it found an HTTP close in the response,
but regardless it should not close like this.

I'm not sure I understand this "HTTP 1125" response, whether it's the last
bytes of response or a late retransmit for any reason, and as tshark is
still unable to display sequence and ack numbers on *all* packets, it's
impossible to reliably know what this corresponds to :-/ for the next
reset it's apparently the brutal nolinger close at the end avoiding the
TIME_WAIT, because the Seq is the same as the one on the FIN, indicating
it wipes the FIN.

> When the problem occurs, haproxy logs a termination state of "SD--" and the
> byte count is short by exactly the size of the last segment (1059 in this
> case). I think this is well established here but I just wanted to be clear.

I think it is possible that the response is not correct (i.e. too short
content-length compared to what's sent back, or incorrect chunked-encoded
length). Ryan, if you could maybe pass the trace between .152 and .153
through tcpdump -nvvX, this could be very useful (and possibly even help
figure a reproducer). Be careful, it may need to be edited to mask private
info.

> I can help test these in my environment. Thanks.

Great, thank you! I suspect that there is something else, though,
considering that haproxy closed first. Or maybe it *thought* that the
other end had already closed, in which case it could have inherited a
hangup flag from somewhere else, but I don't understand how this could
pollinate between threads :-/

Thanks,
Willy

Reply via email to