Re: [PATCH bpf 6/6] net: enetc: use truesize as XDP RxQ info frag_size

Vladimir Oltean Thu, 05 Feb 2026 05:47:46 -0800

On Thu, Feb 05, 2026 at 02:23:15PM +0100, Larysa Zaremba wrote:
> On Thu, Feb 05, 2026 at 02:46:38PM +0200, Vladimir Oltean wrote:
> > On Thu, Feb 05, 2026 at 01:41:03PM +0100, Larysa Zaremba wrote:
> > > On Thu, Feb 05, 2026 at 02:29:53PM +0200, Vladimir Oltean wrote:
> > > > On Wed, Feb 04, 2026 at 05:34:01PM -0800, Jakub Kicinski wrote:
> > > > > On Thu, 5 Feb 2026 02:59:01 +0200 Vladimir Oltean wrote:
> > > > > > Thanks! This is an extremely subtle corner case. I appreciate the 
> > > > > > patch
> > > > > > and explanation.
> > > > > > 
> > > > > > I did run tests on the blamed commit (which I still have), but to 
> > > > > > catch
> > > > > > a real issue in a meaningful way it would have been required to 
> > > > > > have a
> > > > > > program which calls bpf_xdp_adjust_tail() with a very large offset.
> > > > > > I'm noting that I'm seeing the WARN_ON() much easier after your 
> > > > > > fix, but
> > > > > > before, it was mostly inconsequential for practical cases.
> > > > > > 
> > > > > > Namely, the ENETC truesize is 2048, and XDP_PACKET_HEADROOM is 256.
> > > > > > First buffers also contain the skb_shared_info (320 bytes), while
> > > > > > subsequent buffers don't.
> > > > > 
> > > > > I can't wrap my head around this series, hope you can tell me where 
> > > > > I'm
> > > > > going wrong. AFAICT enetc splits the page into two halves for small 
> > > > > MTU.
> > > > > 
> > > > > So we have 
> > > > > 
> > > > >  |                 2k          |             2k              |
> > > > >   ----------------------------- ----------------------------- 
> > > > >  | hroom | data | troom/shinfo | hroom | data | troom/shinfo |
> > > > >   ----------------------------- ----------------------------- 
> > > > > 
> > > > > If we attach the second chunk as frag well have:
> > > > >   offset = 2k + hroom
> > > > >   size = data.len
> > > > > But we use
> > > > >   truesize / frag_size = 2k
> > > > > so
> > > > >   tailroom = rxq->frag_size - skb_frag_size(frag) - 
> > > > > skb_frag_off(frag);
> > > > >   tailroom = 2k - data.len - 2k
> > > > >   tailroom = -data.len
> > > > >   WARN(tailroom < 0) -> yes
> > > > > 
> > > > > The frag_size thing is unusable for any driver that doesn't hand out
> > > > > full pages to frags?
> > > > 
> > > > This is an excellent question.
> > > > 
> > > > Yes, you're right, bpf_xdp_frags_increase_tail() only has a 50% chance
> > > > of working - the paged data has to be in the first half of the page,
> > > > otherwise the tailroom calculations are not correct due to 
> > > > rxq->frag_size,
> > > > and the WARN_ON() will trigger.
> > > > 
> > > > The reason why I didn't notice this during my testing is stupid. I was
> > > > attaching the BPF program to the interface and then detaching it after
> > > > each test, and each test was sending less than the RX ring size (2048)
> > > > worth of packets. So all multi-buffer frames were using buffers which
> > > > were fresh out of enetc_setup_rxbdr() -> ... -> enetc_new_page() (first
> > > > halves) and never out of flipped pages (enetc_bulk_flip_buff()).
> > > > 
> > > > This seems to be a good reason to convert this driver to use page pool,
> > > > which I can look into. I'm not sure that there's anything that can be
> > > > done to make the rxq->frag_size mechanism compatible with the current
> > > > buffer allocation scheme.
> > > 
> > > I was just about to send an answer.
> > > 
> > > Seems like my mistake here. I actually think adjusting the tail should 
> > > work, if 
> > > we set rxq->frag_size to PAGE_SIZE in enetc and i40e_rx_pg_size() in 
> > > i40e, and 
> > > not to (PAGE_SIZE / 2), as I did at first, but in such case naming this 
> > > frag_size is just utterly wrong. Glad Jakub has pointed this out.
> > 
> > I mean, it should "work" given the caveat that calling bpf_xdp_adjust_tail()
> > on a first-half page buffer with a large offset risks leaking into the
> > second half, which may also be in use, and this will go undetected, right?
> > Although the practical chances of that happening are low, the requested
> > offset needs to be in the order of hundreds still.
> 
> Oh, I did get carried away there...
> Well, one thing is shared page memory model in enetc and i40e, another thing 
> is
> xsk_buff_pool, where chunk size can be between 2K and PAGE_SIZE. What about
> 
> tailroom = rxq->frag_size - skb_frag_size(frag) -
>            (skb_frag_off(frag) % rxq->frag_size);
> 
> When frag_size is set to 2K, headroom is let's say 256, so aligned DMA write
> size is 1420.
> last frag at the start of the page: offset=256, size<=1420
>     tailroom >= 2K - 1420 - 256 = 372
> last frag in the middle of the page: offset=256+2K, size<=1420
>     tailroom >= 2K - 1420 - ((256 + 2K) % 2K) = 372
> 
> And for drivers that do not fragment pages for multi-buffer packets, nothing
> changes, since offset is always less than rxq->frag_size.
> 
> This brings us back to rxq->frag_size being half of a page for enetc and i40e,
> and seems like in ZC mode it should be pool->chunk_size to work properly.


With skb_frag_off() taken into account modulo 2K for the tailroom
calculation, I can confirm bpf_xdp_frags_increase_tail() works well for
ENETC. I haven't fully considered the side effects, though.

Re: [PATCH bpf 6/6] net: enetc: use truesize as XDP RxQ info frag_size

Reply via email to