Hi Tim,

On Wed, Aug 26, 2020 at 12:59:55PM +0200, Tim Düsterhus, WoltLab GmbH wrote:
> Within the HAProxy logs we're sometimes seeing largish 'Tr' values for
> static file requests. For example 734ms for the example request below.

That sounds huge. I could have imagined a 200ms time, possibly indicating
that somewhere between the two, an extra MSG_MORE was present and prevented
a request and/or response from leaving the machine. But 734ms doesn't
correspond to anything usual.

> The two requests in nginx' log:
> 
> > Aug 26 10:23:21 *snip* abd21141cdb4[5615]: 5F46381900A0 "GET *snip* 
> > HTTP/2.0" 200 27049 0.070
> > Aug 26 10:23:26 *snip* abd21141cdb4[5615]: 5F46381E00F5 "GET *snip*.gif 
> > HTTP/2.0" 200 6341 0.000
> 
> Note how the timing for the PHP request matches up what HAProxy was
> seeing: Roughly 70ms / 0.070s. The timing for the GIF request is *way*
> off, though.

Stupid question, do the PHP request *always* match ? I'm asking because
one point is that haproxy cannot tell the difference between the two.
However the static one very likely responds instantly, which could justify
a less common code path being taken and a bug to be triggered.

However static files might also cause some disk access, and not knowing
anything about the environment, I'd say it's also possible that the whole
nginx process gets frozen for the time it takes open() to respond, and if
for whatever reason there are lots of I/Os and it takes that long, it might
be possible that it doesn't notice it, depending where the total time is
retrieved.

But it could definitely be a bug in the connection reuse or mux code, or
course. It's just that I have no idea what could cause this for now.

If you see it often enough, it might be useful to enable h2 traces and
watch them on the CLI, or even run a tcpdump between the two to observe
what's going on. The traces might be easier as a first step as you'll
see already decoded stuff. For example:

  (echo "trace h2 sink buf0; trace h2 level user; trace h2 verbosity \
   complete; trace h2 start now;show events buf0 -w -n" ; cat ) | \
  socat - /path/to/socket > /path/to/logs

Once you're confident you got it, you can disable all traces using the
short form "trace 0".

Another thing you can try is to artificially limit
tune.h2.max-concurrent-streams just in case there is contention in
the server's connection buffers. By default it's 100, you can try with
much less (e.g. 20) and see if it seems to make any difference at all.

Hoping this helps,
Willy

Reply via email to