Hi Wesley,
On Tue, Mar 17, 2026 at 12:51:37AM -0600, Wesley Atwell wrote:
> Add a packetdrill reproducer for the scaled no-shrink quantization case.
>
> The sequence leaves slightly more than 84 scaled units of backed credit
> after one skb is drained. The buggy ALIGN() path rounds that up and
> exposes a fresh extra unit, so the wire-visible window becomes 85.
I ran this test on current net-next with assertions on all advertised
windows. The following passes:
...
+0 < P. 10001:11024(1023) ack 1 win 257
* > . 1:1(0) ack 11024 win 85
// Free one skb, then force an outbound packet so the current advertised
// window is observable both on the wire and via TCP_INFO.
+0 read(4, ..., 10000) = 10000
+0 write(4, ..., 1) = 1
* > P. 1:2(1) ack 11024 win 85
+0 %{ assert (tcpi_rcv_wnd >> 10) == 85, tcpi_rcv_wnd }%
// Queue a tiny OOO skb. This should not create fresh sender-visible credit
// on the next ACK after the first post-drain window update.
+0 < P. 12024:12025(1) ack 2 win 257
* > . 2:2(0) ack 11024 win 85
+0 %{ assert (tcpi_rcv_wnd >> 10) == 85, tcpi_rcv_wnd }%
I do not see an âextra unitâ after draining; the sender-visible
window stays at 85 throughout.
In this flow the receive window is limited by rcv_ssthresh (86286)
and not by memory (free_space > 92000). The effect of the change
looks to be that the previous code settles at rcv_ssthresh rounded
up, while the patched code settles at rcv_ssthresh rounded down.
The choice in the current code seem to be explicit, in the
shrink_window_allowed branch we have:
if (free_space > tp->rcv_ssthresh) {
free_space = tp->rcv_ssthresh;
/* new window should always be an exact multiple of scaling
factor
*
* For this case, we ALIGN "up" (increase free_space) because
* we know free_space is not zero here, it has been reduced from
* the memory-based limit, and rcv_ssthresh is not a hard limit
* (unlike sk_rcvbuf).
*/
free_space = ALIGN(free_space, (1 << tp->rx_opt.rcv_wscale));
}
Thus, we explicitly round up when we are constrained by rcv_ssthresh.
As Paolo noted, once free_space gets smaller we already round down,
so the current behavior appears to be: round up while there is
headroom (possibly constrained by rcv_ssthresh), and round down once
memory starts to get tight, which makes sense IMHO.
Given that, this test case demonstrates that the new code now expects
rounding down even when there is ample memory, but it does not show a
concrete improvement in the situation the patch is meant to fix
(i.e. a case where the current rounding-up behavior actually hurts).
> Then queue a tiny OOO skb so the next ACK re-runs the no-shrink path
> after a small receive-memory change without advancing rcv_nxt. With the
> fix in place, both ACKs keep the sender-visible window at 84.
And without it, both ACKs keep the sender-visible window at 85.
> This provides fail-before/pass-after coverage for both the immediate
> quantization bug and the follow-on ACK transition that reuses the stored
> window state.
>
> Signed-off-by: Wesley Atwell <[email protected]>
> ---
> .../packetdrill/tcp_rcv_quantization_credit.pkt | 45
> ++++++++++++++++++++++
> 1 file changed, 45 insertions(+)
> create mode 100644
> tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt
>
> diff --git
> a/tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt
> b/tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..8ea96281b601f2d161cfd84967cad91cedb03151
> --- /dev/null
> +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt
> @@ -0,0 +1,45 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +--ip_version=ipv4
Is there a particular reason to restrict this to IPv4 only?
> +--mss=1000
> +
> +`./defaults.sh
> +sysctl -q net.ipv4.tcp_moderate_rcvbuf=0
> +sysctl -q net.ipv4.tcp_shrink_window=0
> +sysctl -q net.ipv4.tcp_rmem="4096 131072 $((32*1024*1024))"`
> +
> +// Exercise the scaled no-shrink path in __tcp_select_window().
> +// The sequence below leaves slightly more than 84 scaled units of backed
> +// credit after one skb is drained. The buggy ALIGN() path rounds that up and
I'd drop the "buggy ALIGN() path" wording in the comment. If the
change lands, there is no longer such a path, and the comment should
just describe the behavior the test is checking.
> +// exposes a fresh extra unit; the fixed path keeps the sender-visible window
> +// at 84. Then queue a tiny OOO skb so the next ACK re-runs the no-shrink
> +// path after a small receive-memory change without advancing rcv_nxt.
- Simon
--
Simon Baatz <[email protected]>