On Wed, 13 May 2026 at 17:21, Alexander Lobakin <[email protected]> wrote: > > From: Lorenz Brun <[email protected]> > Date: Tue, 12 May 2026 17:26:56 +0200 > > > xdp_build_skb_from_zc() allocated xdp->frame_sz bytes from the per-cpu > > system_page_pool and built the skb head with napi_build_skb(). The > > latter places skb_shared_info at the tail of the buffer, but the > > helper sized the allocation as if the whole frame_sz were usable for > > data. Whenever the packet plus reserved headroom approached frame_sz, > > the head memcpy overran shinfo with packet content, corrupting > > ->flags (SKBFL_ZEROCOPY_ENABLE) and ->nr_frags, which then drove > > skb_copy_ubufs() off the end of frags[] on the RX path: > > > > UBSAN: array-index-out-of-bounds in include/linux/skbuff.h:2541 > > index 113 is out of range for type 'skb_frag_t [17]' > > skb_copy_ubufs+0x7da/0x960 > > ip_local_deliver_finish+0xcd/0x110 > > ice_napi_poll+0xe4/0x2a0 [ice] > > > > The overrun bytes come from the packet, so an on-wire sender can > > corrupt kernel memory remotely whenever the XDP program returns > > XDP_PASS. > > > > Rather than patch the sizing math, switch to the pattern used by other > > in-tree AF_XDP zero-copy drivers like mlx5 and i40e which use > > napi_alloc_skb() sized to the actual packet plus skb_put_data(). > > This sizes the head exactly for the data being copied, drops the > > system_page_pool local_lock from this path, and removes the > > structural mismatch between frame_sz and the skb head buffer. Frags > > are allocated with alloc_page() per frag, matching the other drivers. > > I used napi_build_skb() + system page_pool to enable PP recycling > improving XSk XDP_PASS performance a lot. > Are you sure there's no other way to approach this? > > napi_alloc_skb() used in other drivers works, but it's sorta old > approach which is way slower. > > System page_pools always allocate a full page, why can it create an skb > prone to overruns? > > > > > Fixes: 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb conversion") > > Cc: [email protected] > > Signed-off-by: Lorenz Brun <[email protected]> > Thanks, > Olek
Hi Olek I looked at the code again. While your approach is indeed faster, it is only faster for traffic bypassing AF_XDP, which is generally not that relevant for performance. More critically, it currently corrupts kernel memory and panics the kernel very quickly when running with frame-size set to 2048, 1500 MTU, and passing received packets. To be honest, I'm not familiar enough with the XSK subsystem to know exactly what specific sizing assumption was violated here. By comparison, the approach taken by the other drivers is a lot more obviously correct and works perfectly. If you want to preserve the current approach, I'm perfectly happy with that. However, I don't feel comfortable sending patches for it, as I don't understand exactly what the expectations of the various data blocks are. AFAIK, reproduction should be fairly easy. You just need to run a TCP connection to the receiving node (which gets passed to the kernel) while receiving some UDP packets via AF_XDP at the same time. As mentioned, it also needs frame-size 2048 to reproduce quickly. I checked if I could get you an easy reproducer, but xdp-tools is quite limited. If you want to keep your approach and can't reproduce the panic yourself, let me know and I can see if I can synthesize a minimal reproducer. Regards, Lorenz
