On Sat, Jan 24, 2026 at 3:34 AM Jiayuan Chen <[email protected]> wrote: > > A socket using sockmap has its own independent receive queue: ingress_msg. > This queue may contain data from its own protocol stack or from other > sockets. > > Therefore, for sockmap, relying solely on copied_seq and rcv_nxt to > calculate FIONREAD is not enough. > > This patch adds a new msg_tot_len field in the psock structure to record > the data length in ingress_msg. Additionally, we implement new ioctl > interfaces for TCP and UDP to intercept FIONREAD operations. > > Note that we intentionally do not include sk_receive_queue data in the > FIONREAD result. Data in sk_receive_queue has not yet been processed by > the BPF verdict program, and may be redirected to other sockets or > dropped. Including it would create semantic ambiguity since this data > may never be readable by the user. > > Unix and VSOCK sockets have similar issues, but fixing them is outside > the scope of this patch as it would require more intrusive changes. > > Previous work by John Fastabend made some efforts towards FIONREAD support: > commit e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq") > Although the current patch is based on the previous work by John Fastabend, > it is acceptable for our Fixes tag to point to the same commit. > > FD1:read() > -- FD1->copied_seq++ > | [read data] > | > [enqueue data] v > [sockmap] -> ingress to self -> ingress_msg queue > FD1 native stack ------> ^ > -- FD1->rcv_nxt++ -> redirect to other | [enqueue data] > | | > | ingress to FD1 > v ^ > ... | [sockmap] > FD2 native stack > > Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") > Signed-off-by: Jiayuan Chen <[email protected]>
Jakub, John, pls take another look at it and ack?

