Hi Eric,

thank you for the quick review.

On Mon, Mar 09, 2026 at 10:22:39AM +0100, Eric Dumazet wrote:
> On Mon, Mar 9, 2026 at 9:03???AM Simon Baatz via B4 Relay
> <[email protected]> wrote:
> >
> > From: Simon Baatz <[email protected]>
> >
> > By default, the Linux TCP implementation does not shrink the
> > advertised window (RFC 7323 calls this "window retraction") with the
> > following exceptions:
> >
> > - When an incoming segment cannot be added due to the receive buffer
> >   running out of memory. Since commit 8c670bdfa58e ("tcp: correct
> >   handling of extreme memory squeeze") a zero window will be
> >   advertised in this case. It turns out that reaching the required
> >   memory pressure is easy when window scaling is in use. In the
> >   simplest case, sending a sufficient number of segments smaller than
> >   the scale factor to a receiver that does not read data is enough.
> >
> > - Commit b650d953cd39 ("tcp: enforce receive buffer memory limits by
> >   allowing the tcp window to shrink") addressed the "eating memory"
> >   problem by introducing a sysctl knob that allows shrinking the
> >   window before running out of memory.
> >
> > However, RFC 7323 does not only state that shrinking the window is
> > necessary in some cases, it also formulates requirements for TCP
> > implementations when doing so (Section 2.4).
> >
> > This commit addresses the receiver-side requirements: After retracting
> > the window, the peer may have a snd_nxt that lies within a previously
> > advertised window but is now beyond the retracted window. This means
> > that all incoming segments (including pure ACKs) will be rejected
> > until the application happens to read enough data to let the peer's
> > snd_nxt be in window again (which may be never).
> >
> > To comply with RFC 7323, the receiver MUST honor any segment that
> > would have been in window for any ACK sent by the receiver and, when
> > window scaling is in effect, SHOULD track the maximum window sequence
> > number it has advertised. This patch tracks that maximum window
> > sequence number rcv_mwnd_seq throughout the connection and uses it in
> > tcp_sequence() when deciding whether a segment is acceptable.
> >
> > rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
> > tcp_select_window(). If we count tcp_sequence() as fast path, it is
> > read in the fast path. Therefore, rcv_mwnd_seq is put into rcv_wnd's
> > cacheline group.
> >
> > The logic for handling received data in tcp_data_queue() is already
> > sufficient and does not need to be updated.
> >
> > Signed-off-by: Simon Baatz <[email protected]>
> 
> ...
> 
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index 
> > f0ebcc7e287173be6198fd100130e7ba1a1dbf03..c86910d147f2394bf414d7691d8f90ed41c1b0e3
> >  100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -293,6 +293,7 @@ static u16 tcp_select_window(struct sock *sk)
> >                 tp->pred_flags = 0;
> >                 tp->rcv_wnd = 0;
> >                 tp->rcv_wup = tp->rcv_nxt;
> > +               tcp_update_max_rcv_wnd_seq(tp);
> 
> Presumably we do not need  tcp_update_max_rcv_wnd_seq() here ?

When we don't update here and are forced to accept a beyond-window
packet because the receive queue is empty, we can reach a state where

 rcv_mwnd_seq < rcv_wup + rcv_wnd == rcv_nxt 

I noticed this case when instrumenting the kernel and got violations
of the invariant rcv_wup + rcv_wnd <= rcv_mwnd_seq.

So, while not strictly needed (tcp_max_receive_window() would still
be 0 as rcv_nxt > rcv_mwnd_seq), I opted to include the call here to
keep rcv_mwnd_seq the actual maximum sequence number at all times.

> 
> Otherwise patch looks good, thanks.

-- 
Simon Baatz <[email protected]>

Reply via email to