Willy, Am 26.08.20 um 17:51 schrieb Willy Tarreau >> As I said: It's not really reproducible. > > Yeah but we all have a definition of "not really reproducible". As you > consider that it doesn't happen in HTTP/1 after only 4 hours, for me > this means you expect it to usually happen at least once in 4 hours. > If the load is not too high (no more than a few thousands requests per > second) and you have some disk space, the h2 trace method could prove > to be useful once you have isolated a culprit in the logs. You could > even gzip the output on the fly to take less space, they compress very > well.
My definition of reproducible is "I can write up a list of steps that makes the issue happen somewhat reliably". This is not the case here. If I test with e.g. nghttp -a to pull the HTML + all resources then everything is working smoothly. Same if I attempt to pull down the same static file after I am seeing an issue within the logs. An update regarding the H1 numbers: In the 20 hours or so with HTTP/1 enabled a total of 15 (!) static requests took longer than 45ms. The maximum being 77ms. This is still something I consider much, but nothing compared to the H2 performance. This morning I re-enabled H2 for the backend communication and then plugged in the tracing. In the half of an hour since I reenabled H2 I'm seeing 160 static requests taking longer than 45ms, with the worst ones being > 800ms. I now have the trace results and my HAProxy log where I can correlate the slow requests using the timestamp and path. Unfortunately the trace does not appear to contain the unique-id of the request. Can I somehow filter down the trace file to just the offending requests + possible the requests within the same H2 connection? For privacy reasons I would not like to provide the full trace log, even if it's in a non-public email. >>> Another thing you can try is to artificially limit >>> tune.h2.max-concurrent-streams just in case there is contention in >>> the server's connection buffers. By default it's 100, you can try with >>> much less (e.g. 20) and see if it seems to make any difference at all. >>> >> >> The fact that disabling HTTP/2 helps could indicate that something like >> this is the case here. I'll try that tomorrow, thanks. > I've not done this yet, I'd first like to hear how we go about with the trace I've collected. Best regards Tim Düsterhus Developer WoltLab GmbH -- WoltLab GmbH Nedlitzer Str. 27B 14469 Potsdam Tel.: +49 331 96784338 [email protected] www.woltlab.com Managing director: Marcel Werk AG Potsdam HRB 26795 P

