W dniu 19.12.2025 o 17:43, Willy Tarreau pisze:
On Fri, Dec 19, 2025 at 05:20:08PM +0100, Hank Bowen wrote:
However, having touched on the subject of HTTP/2 - I'm wondering if I
understand correctly why in case of http-reuse set to "aggressive" or
"always" one client can cause head-of-line blocking problem on the rest of
the clients.
Yes in theory, though since 3.1 or so, it has been significantly mitigated
by the fact that we now support a dynamic Rx buffer size and we advertise
only the allocated size. Prior to 3.1 we'd advertise 64kB despite a 16kB
per-stream buffer by default. Most of the time it would be fine thanks to
other internal buffering but not always. But HoL is inherent to H2 and is
always a trade-off between HoL risk and BDP (bandwidth-delay product).
OK, I generally understand that, that is there was overadertising which
was ineffective but if you have time I'd like to learn about the
details. I write "if you have time" here on purpose, as I'm asking this
more out of curiosity than from a real need.
That is, if we advertise 64 kB Rx buffer size then it is a problem when
a stream established between client and haproxy has 16 kB buffer or when
a stream established between haproxy and a backend server has such a
value (or both cases causes inefficiency)? Is this buffer related to
some setting that haproxy makes available (after taking a glance on
possible settings that could correspond to that I cannot tell, seems
like it can be tune.h2.max-frame-size but I'm not sure)? How exactly
advertising the Rx buffer size larger than buffer of a stream causes a
problem?
That is, I suppose that we have then to deal with a situation like that
(I have constructed this after some questioning chat gpt on that matter,
I do not know how trusted it can be here, but its explanation looks
reasonable): haproxy advertises Rx buffer size of 64 kB but buffer on a
stream is only 16 kB. Haproxy downloads 64 kB for a given stream
exhausting HTTP/2 connection-level window and due to this fact it cannot
download more data. Now, let's note that it makes sense for haproxy to
send WINDOW_UPDATE to the server only after the client has read the data
(otherwise haproxy would eventually overflow its own buffers). Haproxy
passes the first 16 kB to the mentioned stream and it waits for a client
to read it, client indeed eventually takes all this 16 kB of data but
the process is slow and during this process downloading data for other
streams is blocked (it always is but the matter here is that it is
blocked for a long time). After the client completes that operation
haproxy sends WINDOW_UPDATE to the server and only then takes (only) 16
kB for another stream (the stream given by the next request frame in the
frame sequence). Does my description correspond to reality?
Is it that the given connection's TCP buffer then significantly
fills up and when other (fast) clients request to download data, haproxy can
only download so much of them as much there is remaining space in its TCP
buffer which is low, so it must perform this operation by executing many
separate downloads and each such download is related to some overhead
(compared to the situation when all the data were downloaded to haproxy's
TCP buffer at once)?
The issue is that if you aggregate a slow and a fast reader into the same
connection, and the connection is filled with data for the slow reader,
there's no way to make the data for the fast reader bypass it since TCP
is in-order (something that QUIC addresses).
Well, I'm a bit not sure if I properly understand it. That is, under
normal circumstances, that is in case of direct client <-> server
interaction the HTTP/2 frames must indeed be sent from server to client
in order, the TCP packets also naturally have to be sent in order. And
when we have client <-> proxy <-> servers setup, the frames from a given
backend server indeed does also have to be transmitted (to the proxy) in
order and so does have TCP packets. But then haproxy can - as you said
it below - send a frame from the n+1-th frame to client B without
waiting for the n-th frame to be sent to client A (of course I assume
that clients A and B are different). So there is a sort of bypassing.
Does what I wrote about TCP buffer and overhead related to multiple
downloads (instead of quite possibly just one) hold true?
If we have a sequence of frames from the server and they are for different
clients, haproxy does not have to wait for the n-th frame to be sent to a
client in order to send the n+1-th frame to another client, am I right?
That's it, you just cannot realistically do that otherwise you send only one
frame per network round-trip, which can limit the connection's performance
to 16kB per round-trip, e.g. 160kB per second for 100ms. But as explained,
with 3.1+ and dynamic buffers we can now modulate what we advertise and do
our best to adjust to the number of readers in a same connection. However,
slow readers will stlil reserve a number of buffers that will possibly be
under-utilized and not usable by faster ones. But that's a minor issue
compared to the initial one.
I have also some more questions although I'm not sure if it is best to send
them here or to create a new topic, but they are rather closely related to
this discussion.
If they're related, let's keep going on this thread ;-)
Willy