On Thu, Feb 05, 2026 at 02:23:15PM +0100, Larysa Zaremba wrote: > On Thu, Feb 05, 2026 at 02:46:38PM +0200, Vladimir Oltean wrote: > > On Thu, Feb 05, 2026 at 01:41:03PM +0100, Larysa Zaremba wrote: > > > On Thu, Feb 05, 2026 at 02:29:53PM +0200, Vladimir Oltean wrote: > > > > On Wed, Feb 04, 2026 at 05:34:01PM -0800, Jakub Kicinski wrote: > > > > > On Thu, 5 Feb 2026 02:59:01 +0200 Vladimir Oltean wrote: > > > > > > Thanks! This is an extremely subtle corner case. I appreciate the > > > > > > patch > > > > > > and explanation. > > > > > > > > > > > > I did run tests on the blamed commit (which I still have), but to > > > > > > catch > > > > > > a real issue in a meaningful way it would have been required to > > > > > > have a > > > > > > program which calls bpf_xdp_adjust_tail() with a very large offset. > > > > > > I'm noting that I'm seeing the WARN_ON() much easier after your > > > > > > fix, but > > > > > > before, it was mostly inconsequential for practical cases. > > > > > > > > > > > > Namely, the ENETC truesize is 2048, and XDP_PACKET_HEADROOM is 256. > > > > > > First buffers also contain the skb_shared_info (320 bytes), while > > > > > > subsequent buffers don't. > > > > > > > > > > I can't wrap my head around this series, hope you can tell me where > > > > > I'm > > > > > going wrong. AFAICT enetc splits the page into two halves for small > > > > > MTU. > > > > > > > > > > So we have > > > > > > > > > > | 2k | 2k | > > > > > ----------------------------- ----------------------------- > > > > > | hroom | data | troom/shinfo | hroom | data | troom/shinfo | > > > > > ----------------------------- ----------------------------- > > > > > > > > > > If we attach the second chunk as frag well have: > > > > > offset = 2k + hroom > > > > > size = data.len > > > > > But we use > > > > > truesize / frag_size = 2k > > > > > so > > > > > tailroom = rxq->frag_size - skb_frag_size(frag) - > > > > > skb_frag_off(frag); > > > > > tailroom = 2k - data.len - 2k > > > > > tailroom = -data.len > > > > > WARN(tailroom < 0) -> yes > > > > > > > > > > The frag_size thing is unusable for any driver that doesn't hand out > > > > > full pages to frags? > > > > > > > > This is an excellent question. > > > > > > > > Yes, you're right, bpf_xdp_frags_increase_tail() only has a 50% chance > > > > of working - the paged data has to be in the first half of the page, > > > > otherwise the tailroom calculations are not correct due to > > > > rxq->frag_size, > > > > and the WARN_ON() will trigger. > > > > > > > > The reason why I didn't notice this during my testing is stupid. I was > > > > attaching the BPF program to the interface and then detaching it after > > > > each test, and each test was sending less than the RX ring size (2048) > > > > worth of packets. So all multi-buffer frames were using buffers which > > > > were fresh out of enetc_setup_rxbdr() -> ... -> enetc_new_page() (first > > > > halves) and never out of flipped pages (enetc_bulk_flip_buff()). > > > > > > > > This seems to be a good reason to convert this driver to use page pool, > > > > which I can look into. I'm not sure that there's anything that can be > > > > done to make the rxq->frag_size mechanism compatible with the current > > > > buffer allocation scheme. > > > > > > I was just about to send an answer. > > > > > > Seems like my mistake here. I actually think adjusting the tail should > > > work, if > > > we set rxq->frag_size to PAGE_SIZE in enetc and i40e_rx_pg_size() in > > > i40e, and > > > not to (PAGE_SIZE / 2), as I did at first, but in such case naming this > > > frag_size is just utterly wrong. Glad Jakub has pointed this out. > > > > I mean, it should "work" given the caveat that calling bpf_xdp_adjust_tail() > > on a first-half page buffer with a large offset risks leaking into the > > second half, which may also be in use, and this will go undetected, right? > > Although the practical chances of that happening are low, the requested > > offset needs to be in the order of hundreds still. > > Oh, I did get carried away there... > Well, one thing is shared page memory model in enetc and i40e, another thing > is > xsk_buff_pool, where chunk size can be between 2K and PAGE_SIZE. What about > > tailroom = rxq->frag_size - skb_frag_size(frag) - > (skb_frag_off(frag) % rxq->frag_size); > > When frag_size is set to 2K, headroom is let's say 256, so aligned DMA write > size is 1420. > last frag at the start of the page: offset=256, size<=1420 > tailroom >= 2K - 1420 - 256 = 372 > last frag in the middle of the page: offset=256+2K, size<=1420 > tailroom >= 2K - 1420 - ((256 + 2K) % 2K) = 372 > > And for drivers that do not fragment pages for multi-buffer packets, nothing > changes, since offset is always less than rxq->frag_size. > > This brings us back to rxq->frag_size being half of a page for enetc and i40e, > and seems like in ZC mode it should be pool->chunk_size to work properly.
With skb_frag_off() taken into account modulo 2K for the tailroom calculation, I can confirm bpf_xdp_frags_increase_tail() works well for ENETC. I haven't fully considered the side effects, though.

