A BPF_SOCK_OPS program can enable BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB.
That reaches __tcp_sock_set_nodelay(), which may call tcp_push_pending_frames(). The transmit path then computes TCP options again, re-enters bpf_skops_hdr_opt_len(), and invokes the same BPF callback recursively. This can loop until the kernel stack overflows. TCP_NODELAY is not safe from the header option callback context. Reject it with -EOPNOTSUPP when TCP header option callbacks are enabled on the socket, so the callback cannot recurse back into tcp_push_pending_frames() through do_tcp_setsockopt(). Reported-by: Quan Sun <[email protected]> Reported-by: Yinhao Hu <[email protected]> Reported-by: Kaiyan Mei <[email protected]> Closes: https://lore.kernel.org/bpf/[email protected]/ Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt") Signed-off-by: KaFai Wan <[email protected]> --- net/ipv4/tcp.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 202a4e57a218..7ac4c98be19d 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4004,7 +4004,10 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, switch (optname) { case TCP_NODELAY: - __tcp_sock_set_nodelay(sk, val); + if (val && BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)) + err = -EOPNOTSUPP; + else + __tcp_sock_set_nodelay(sk, val); break; case TCP_THIN_LINEAR_TIMEOUTS: -- 2.43.0

