Starting from Go 1.22.0, TCPConn implements the WriteTo interface [1], which internally uses the splice(2) syscall to transfer data between file descriptors [2].
However, for sockets with sockmap enabled, sk_prot is replaced with tcp_bpf_prots which does not provide a splice_read callback. When data is redirected to a socket's psock ingress queue via bpf_msg_redirect, splice(2) cannot read from it because the splice path has no knowledge of the psock queue. This causes TCPConn.WriteTo to return 0 bytes, effectively breaking Go applications that rely on io.Copy between TCP connections when sockmap/BPF is in use [3]. The simplest fix would be registering a splice callback that just calls copy_splice_read(), but this results in redundant copies (socket -> kernel buffer -> pipe -> destination), which defeats the purpose of splice. Patch 1 adds splice_read to struct proto and sets it in TCP. Patch 2 adds inet_splice_read and uses it in inet_stream_ops. Patch 3 refactors tcp_bpf recvmsg with a read actor abstraction. Patch 4 adds basic splice_read support for sockmap, but this still involves 2 data copies. Patch 5 optimizes the splice implementation by transferring page ownership directly into the pipe, achieving true zero-copy. Benchmarks show performance on par with the read(2) path. Patch 6 adds splice selftests. Since splice can seamlessly replace read operations, we redefine read to splice in the existing selftests so that all existing test cases also cover the splice path. Patch 7 adds splice to the sockmap benchmark, which also serves to verify the effectiveness of our zero-copy implementation. Benchmark results with rx-verdict-ingress mode (loopback, 8 CPUs): read(2): ~4292 MB/s splice(2) + zero-copy: ~4270 MB/s splice(2) + always-copy: ~2770 MB/s Zero-copy splice achieves near-parity with read(2), while the always-copy fallback is ~35% slower. [1] https://github.com/golang/go/blob/master/src/net/tcpsock.go#L173 [2] https://github.com/golang/go/blob/fdf3bee/src/net/tcpsock_posix.go#L57 [3] https://github.com/jschwinger233/bpf_msg_redirect_bug_reproducer Jiayuan Chen (7): net: add splice_read to struct proto and set it in tcp_prot/tcpv6_prot inet: add inet_splice_read() and use it in inet_stream_ops/inet6_stream_ops tcp_bpf: refactor recvmsg with read actor abstraction tcp_bpf: add splice_read support for sockmap tcp_bpf: optimize splice_read with zero-copy for non-slab pages selftests/bpf: add splice_read tests for sockmap selftests/bpf: add splice option to sockmap benchmark include/linux/skmsg.h | 12 +- include/net/inet_common.h | 3 + include/net/sock.h | 3 + net/core/skmsg.c | 34 ++- net/ipv4/af_inet.c | 15 +- net/ipv4/tcp_bpf.c | 227 +++++++++++++++--- net/ipv4/tcp_ipv4.c | 1 + net/ipv6/af_inet6.c | 2 +- net/ipv6/tcp_ipv6.c | 1 + .../selftests/bpf/benchs/bench_sockmap.c | 57 ++++- .../selftests/bpf/prog_tests/sockmap_basic.c | 28 ++- .../bpf/prog_tests/sockmap_helpers.h | 62 +++++ .../selftests/bpf/prog_tests/sockmap_strp.c | 28 ++- 13 files changed, 421 insertions(+), 52 deletions(-) -- 2.43.0

