Willy,

Am 26.08.20 um 17:51 schrieb Willy Tarreau
>> As I said: It's not really reproducible.
> 
> Yeah but we all have a definition of "not really reproducible". As you
> consider that it doesn't happen in HTTP/1 after only 4 hours, for me
> this means you expect it to usually happen at least once in 4 hours.
> If the load is not too high (no more than a few thousands requests per
> second) and you have some disk space, the h2 trace method could prove
> to be useful once you have isolated a culprit in the logs. You could
> even gzip the output on the fly to take less space, they compress very
> well.

My definition of reproducible is "I can write up a list of steps that
makes the issue happen somewhat reliably". This is not the case here. If
I test with e.g. nghttp -a to pull the HTML + all resources then
everything is working smoothly. Same if I attempt to pull down the same
static file after I am seeing an issue within the logs.

An update regarding the H1 numbers: In the 20 hours or so with HTTP/1
enabled a total of 15 (!) static requests took longer than 45ms. The
maximum being 77ms. This is still something I consider much, but nothing
compared to the H2 performance.

This morning I re-enabled H2 for the backend communication and then
plugged in the tracing. In the half of an hour since I reenabled H2 I'm
seeing 160 static requests taking longer than 45ms, with the worst ones
being > 800ms.

I now have the trace results and my HAProxy log where I can correlate
the slow requests using the timestamp and path. Unfortunately the trace
does not appear to contain the unique-id of the request.

Can I somehow filter down the trace file to just the offending requests
+ possible the requests within the same H2 connection? For privacy
reasons I would not like to provide the full trace log, even if it's in
a non-public email.

>>> Another thing you can try is to artificially limit
>>> tune.h2.max-concurrent-streams just in case there is contention in
>>> the server's connection buffers. By default it's 100, you can try with
>>> much less (e.g. 20) and see if it seems to make any difference at all.
>>>
>>
>> The fact that disabling HTTP/2 helps could indicate that something like
>> this is the case here. I'll try that tomorrow, thanks.
> 

I've not done this yet, I'd first like to hear how we go about with the
trace I've collected.

Best regards
Tim Düsterhus
Developer WoltLab GmbH

-- 

WoltLab GmbH
Nedlitzer Str. 27B
14469 Potsdam

Tel.: +49 331 96784338

[email protected]
www.woltlab.com

Managing director:
Marcel Werk

AG Potsdam HRB 26795 P

Reply via email to