On Mon, May 18, 2026 at 02:57:55PM +0200, Lorenz Brun wrote: > On Wed, 13 May 2026 at 17:21, Alexander Lobakin > <[email protected]> wrote: > > > > From: Lorenz Brun <[email protected]> > > Date: Tue, 12 May 2026 17:26:56 +0200 > > > > > xdp_build_skb_from_zc() allocated xdp->frame_sz bytes from the per-cpu > > > system_page_pool and built the skb head with napi_build_skb(). The > > > latter places skb_shared_info at the tail of the buffer, but the > > > helper sized the allocation as if the whole frame_sz were usable for > > > data. Whenever the packet plus reserved headroom approached frame_sz, > > > the head memcpy overran shinfo with packet content, corrupting > > > ->flags (SKBFL_ZEROCOPY_ENABLE) and ->nr_frags, which then drove > > > skb_copy_ubufs() off the end of frags[] on the RX path: > > > > > > UBSAN: array-index-out-of-bounds in include/linux/skbuff.h:2541 > > > index 113 is out of range for type 'skb_frag_t [17]' > > > skb_copy_ubufs+0x7da/0x960 > > > ip_local_deliver_finish+0xcd/0x110 > > > ice_napi_poll+0xe4/0x2a0 [ice] > > > > > > The overrun bytes come from the packet, so an on-wire sender can > > > corrupt kernel memory remotely whenever the XDP program returns > > > XDP_PASS. > > > > > > Rather than patch the sizing math, switch to the pattern used by other > > > in-tree AF_XDP zero-copy drivers like mlx5 and i40e which use > > > napi_alloc_skb() sized to the actual packet plus skb_put_data(). > > > This sizes the head exactly for the data being copied, drops the > > > system_page_pool local_lock from this path, and removes the > > > structural mismatch between frame_sz and the skb head buffer. Frags > > > are allocated with alloc_page() per frag, matching the other drivers. > > > > I used napi_build_skb() + system page_pool to enable PP recycling > > improving XSk XDP_PASS performance a lot. > > Are you sure there's no other way to approach this? > > > > napi_alloc_skb() used in other drivers works, but it's sorta old > > approach which is way slower. > > > > System page_pools always allocate a full page, why can it create an skb > > prone to overruns? > > > > > > > > Fixes: 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb conversion") > > > Cc: [email protected] > > > Signed-off-by: Lorenz Brun <[email protected]> > > Thanks, > > Olek > > Hi Olek > > I looked at the code again. While your approach is indeed faster, it > is only faster for traffic bypassing AF_XDP, which is generally not > that relevant for performance. > > More critically, it currently corrupts kernel memory and panics the > kernel very quickly when running with frame-size set to 2048, 1500 > MTU, and passing received packets. To be honest, I'm not familiar > enough with the XSK subsystem to know exactly what specific sizing > assumption was violated here. By comparison, the approach taken by the > other drivers is a lot more obviously correct and works perfectly. > > If you want to preserve the current approach, I'm perfectly happy with > that. However, I don't feel comfortable sending patches for it, as I > don't understand exactly what the expectations of the various data > blocks are. > > AFAIK, reproduction should be fairly easy. You just need to run a TCP > connection to the receiving node (which gets passed to the kernel) > while receiving some UDP packets via AF_XDP at the same time. As > mentioned, it also needs frame-size 2048 to reproduce quickly. > > I checked if I could get you an easy reproducer, but xdp-tools is > quite limited. If you want to keep your approach and can't reproduce > the panic yourself, let me know and I can see if I can synthesize a > minimal reproducer.
We now respect the tailroom in UMEM which is supposed to address shinfo override cases. Could you re-test this on your side with cited patchset being present on your tree? https://lore.kernel.org/bpf/[email protected]/ > > Regards, > Lorenz
