Hello Glenn,

On Tue, Jun 07, 2022 at 05:24:09PM -0400, Glenn Strauss wrote:
> On Tue, Jun 07, 2022 at 09:27:43AM -0700, Willy Tarreau wrote:
> > Hello Glenn,
> > 
> > Thanks for your report, I understand the problem, that's very interesting.
> > I would say it's not even an optim, rather an approach trying to help the
> > whole ecosystem at a very low cost by preventing known degenerative cases
> > from happening (or helping them recover).
> 
> Reference to issue (for the list)
> https://github.com/haproxy/haproxy/pull/1732
> https://github.com/nghttp2/nghttp2/issues/1722
> https://github.com/curl/curl/pull/8965

Yeah I've read those as well, thanks for the background and for reaching
out various implementations, by the way. You could even have brought the
discussion to the IETF HTTP working group to reach all of them at once,
and maybe someone would remember why there was this choice of 65535 back
then (I suspect it was done with IoT in mind where a window size had to
be stored on 16 bit to save space, but I could be wrong).

> Summary: degenerative HTTP/2 client behavior sending mostly 1-byte DATA
> frames for a fast client which continually exhausts the HTTP/2 send
> window (e.g. for a large file upload) *and* for a simple client which
> empties and reuses a single power-2 sized buffer.  A simple server
> sending WINDOW_UPDATE for each DATA frame to replenish the window may
> reflect back the exact size of the DATA frame.
> 
> > However I don't fully agree with the solution, it might just push the issue
> > a bit sidewards.
> 
> Changing the default SETTINGS_INITIAL_WINDOW_SIZE does push the issue a
> bit sideways, but in doing so mitigates a demonstrated (not theoretical)
> issue which might befall 6+ years of curl installations, a sizable
> number of which might not be upgraded for a very long time.
> 
> > For example a client might already have worked around this issue by using a
> > 65537-bytes buffer and will now face issues again.
> 
> I do not follow.  How might that be?
> Also, do you know of any such implementations?

No, it's just to illustrate. In the HTTP ecosystem, you rarely work
around one implementation's corner case without affecting another
workload. Or similarly, a client that would read exactly 65535 at
once could end up doing this with the change (please note that I'm
purposely making this up to illustrate):

 read 65535
 send 16384, 16384, 16384, 16383
 read 1
 send 1
 receive WU for 65535 (the first 4 frames)
 read 65535
 send 16384, 16384, 16384, 16383
 receive WU for 1
 read 1
 send 1
 ...

This would result in a client being less efficient by doubling the
number of read() syscalls. I'm not particularly worried about this,
to be honest, it's just that it's typically what could cause an issue
to be filed about an observed regression in field. In this case we'd
have to revert it and that would reopen the problem with curl.

> For a slightly more intelligent client actively trying to avoid the
> degenerative behavior, a simple approach is to avoid injecting tiny DATA
> frames when the window is larger and there is more data to be sent.  I
> have proposed a patch to curl in https://github.com/curl/curl/pull/8965

I totally agree, but we also know that it's harder to fix embedded
clients in cars and central heating devices than to deploy a
reasonable workaround on the few servers that receive their connections.

> > Thus I'd first prefer to solve it at the core, then possibly merge this
> > patch later as an optimization but nor for the problem, rather to make sure
> > clients can make 4 full 16kB frames, and improve bandwidth and CPU
> > efficiency.
> 
> Yes, having the server merge the sizes of small DATA frames into a
> larger WINDOW_UPDATE might be useful, too, but is independent and
> complimentary to my patch proposed in
> https://github.com/haproxy/haproxy/pull/1732

I agree that it's independent but it's the one that is not expected to
cause any regression with any possible client. That's why I'd like to
have the two. First that one because it should be durable. Second, your
patch as an optimization, and if anyone complains about trouble with it
we can safely revert it or decide what to do without worrying about all
deployed curl clients that will not be updated.

> > Given the nature of the problem, I don't imagine we'd meet this issue with
> > interleaved 1-byte frames from multiple streams over the same connection.
> 
> Actually, haproxy sets the session window size to be a very large
> number, effectively removing HTTP/2 application-level flow-control
> at the session level.  That leaves independent streams with the
> default SETTINGS_INITIAL_WINDOW_SIZE of 65535, so multiple uploads
> can independently exhaust the stream send window and run into this
> degenerative behavior provided the client is fast enough, the bandwidth
> delay product is large enough, and the client is fully emptying a
> power-2 sized buffer before reusing it.

Sure, but I'm speaking about practicality. The problem is currently
faced with a client that uses fixed-size buffers with no sliding
window and which is not used for parallel uploads. Thus in practice
we'll get a single stream on such connections, that is extremely
efficient to acknowledge at once.

> One approach might be to collect small DATA frames and send a
> WINDOW_UPDATE once a threshold is reached.

I thought about that one, focusing on 1/16 of a max frame size at once,
but I don't like it much because it means we'd delay some WU, and the
H2 spec was very clear about the fact that we ought to do our best to
timely ack data in order to avoid deadlocks. You can have clients that
stop sending until they're able to read a new block of a preconfigured
window size for example (and that's where 65535 was an absurd choice
in the first place). For example, a client that is hard-coded to use
a 65535 buffer and needs to wait for the WU to replace that buffer
would not receive an update for the block that spans from 61440 to
65535 and would not update the buffer. Thus I really want to be certain
that we do not let processed DATA frames without the equivalent WU sent
back.

> The upcoming lighttpd 1.4.65
> does just that, collecting updates until 16384 is reached, and then
> sending a WINDOW_UPDATE for 16384.  Cost: a single 1-RTT delay in
> sending a WINDOW_UPDATE to continue uploads larger than 112k-1.

It can definitely cause some inefficient clients to hang if they need
their buffer to be fully acked and their size is not a multiple of the
frame size. Be careful about this.

> Another approach might be to pre-emptively effectively double the
> window size by sending WINDOW_UPDATE with the entire window size (65535)
> back to the client upon receive of first DATA frame, and then keeping
> track on server-side until that 65535 is exhausted before sending
> another WINDOW_UPDATE.

Sending too large stream window sizes is problematic when some streams
can be processed slowly, as it easily causes head-of-line blocking. If
the advertised window is larger than the available storage, a paused
stream will block other ones.

> Again, instead of 65535, I would recommend a
> 65536 or other multiple of SETTINGS_MAX_FRAME_SIZE.

Don't get me wrong, I totally agree with this change. It's just that it
will work around this problem by possibly causing other ones to *some*
clients, though much less serious. That's why I want to distinguish the
root cause of the problem (DATA frames not being coalesced on the sender
*and* WU frames not being coalesced on the receiver), and what makes it
a problem (window size that's not a multiple of the frame size). We can
do our part of the job by coalescing adjacent WU frames and then we can
see if increasing the window size improves the situation for most clients
without degrading it for other ones.

Cheers,
Willy

Reply via email to