On Wed, 13 May 2026 at 17:21, Alexander Lobakin
<[email protected]> wrote:
>
> From: Lorenz Brun <[email protected]>
> Date: Tue, 12 May 2026 17:26:56 +0200
>
> > xdp_build_skb_from_zc() allocated xdp->frame_sz bytes from the per-cpu
> > system_page_pool and built the skb head with napi_build_skb(). The
> > latter places skb_shared_info at the tail of the buffer, but the
> > helper sized the allocation as if the whole frame_sz were usable for
> > data. Whenever the packet plus reserved headroom approached frame_sz,
> > the head memcpy overran shinfo with packet content, corrupting
> > ->flags (SKBFL_ZEROCOPY_ENABLE) and ->nr_frags, which then drove
> > skb_copy_ubufs() off the end of frags[] on the RX path:
> >
> >   UBSAN: array-index-out-of-bounds in include/linux/skbuff.h:2541
> >   index 113 is out of range for type 'skb_frag_t [17]'
> >    skb_copy_ubufs+0x7da/0x960
> >    ip_local_deliver_finish+0xcd/0x110
> >    ice_napi_poll+0xe4/0x2a0 [ice]
> >
> > The overrun bytes come from the packet, so an on-wire sender can
> > corrupt kernel memory remotely whenever the XDP program returns
> > XDP_PASS.
> >
> > Rather than patch the sizing math, switch to the pattern used by other
> > in-tree AF_XDP zero-copy drivers like mlx5 and i40e which use
> > napi_alloc_skb() sized to the actual packet plus skb_put_data().
> > This sizes the head exactly for the data being copied, drops the
> > system_page_pool local_lock from this path, and removes the
> > structural mismatch between frame_sz and the skb head buffer. Frags
> > are allocated with alloc_page() per frag, matching the other drivers.
>
> I used napi_build_skb() + system page_pool to enable PP recycling
> improving XSk XDP_PASS performance a lot.
> Are you sure there's no other way to approach this?
>
> napi_alloc_skb() used in other drivers works, but it's sorta old
> approach which is way slower.
>
> System page_pools always allocate a full page, why can it create an skb
> prone to overruns?
>
> >
> > Fixes: 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb conversion")
> > Cc: [email protected]
> > Signed-off-by: Lorenz Brun <[email protected]>
> Thanks,
> Olek

Hi Olek

I looked at the code again. While your approach is indeed faster, it
is only faster for traffic bypassing AF_XDP, which is generally not
that relevant for performance.

More critically, it currently corrupts kernel memory and panics the
kernel very quickly when running with frame-size set to 2048, 1500
MTU, and passing received packets. To be honest, I'm not familiar
enough with the XSK subsystem to know exactly what specific sizing
assumption was violated here. By comparison, the approach taken by the
other drivers is a lot more obviously correct and works perfectly.

If you want to preserve the current approach, I'm perfectly happy with
that. However, I don't feel comfortable sending patches for it, as I
don't understand exactly what the expectations of the various data
blocks are.

AFAIK, reproduction should be fairly easy. You just need to run a TCP
connection to the receiving node (which gets passed to the kernel)
while receiving some UDP packets via AF_XDP at the same time. As
mentioned, it also needs frame-size 2048 to reproduce quickly.

I checked if I could get you an easy reproducer, but xdp-tools is
quite limited. If you want to keep your approach and can't reproduce
the panic yourself, let me know and I can see if I can synthesize a
minimal reproducer.

Regards,
Lorenz

Reply via email to