On 08.02.26 13:59, Vladimir Oltean wrote:
> On Thu, Feb 05, 2026 at 05:54:08PM -0800, Jakub Kicinski wrote:
>> FWIW my feeling is that instead of nickel and diming leftover space 
>> in the frags if someone actually cared about growing mbufs we should
>> have the helper allocate a new page from the PP and append it to the
>> shinfo. Much simpler, "infinite space", and works regardless of the
>> driver. I don't mean that to suggest you implement it, purely to point
>> out that I think nobody really uses positive offsets.. So we can as
>> well switch more complicated drivers back to xdp_rxq_info_reg().
> 
> FWIW, I do have a use case at least in the theoretical sense for
> bpf_xdp_adjust_tail() with positive offsets, although it's still under
> development.
> 
> I'm working on a DSA data path library for XDP, and one of the features
> it supports is redirecting from one user port to another, with in-place
> tag modification.
> 
> If the path to the egress port goes through a tail-tagging switch but
> the path from the ingress port didn't, bpf_xdp_adjust_tail() with a
> positive offset will be called to make space for the tail tags.
>
Jumping a bit late in the conversation...

We were recently discussing this limitation when trying to add growable
tail support for multi-buf XDP buffers in the mlx5 driver (Striding RQ
mode) after lifting the page_size stride limitation for XDP [1].

It turns out that it is quite complicated to do in this mode with the
with existing frag_size configuration...

The issue is that the HW can write a packet in multiple smaller strides
(256B for example) and setting rxq->frag_size to it would not work for
the following reasons:

1) The tailroom formula would yield a negative value, as pointed out by
   this series.

2) Even if this formula would not be the issue, frag_size currently
   means that the underlying storage of each fragment has a size of
   frag_size. So the number of fragments would explode in the driver.
   That's a no go.

3) And even if we would change the semantics of frag_size to mean
   something else so that the XDP code would use frag_size as the available
   growth size in the fragment
   (tailroom = skb_frag_off() + frag_size - skb_frag_size())
   we'd still be very much in the "nickel and diming" space with less than
   256B to spare.

4) The only sane way to do it would be to use a large stride but this
   would kill small packet optimization

So +1 for the direction of having a helper allocating an extra page from
the page_pool instead.

> I'm not sure about the "regardless of the driver" part of your comment.
> Is it possible to mix and match allocation models and still keep track
> of how each individual page needs to be freed? AFAICS in xdp_return_frame(),
> the mem_type is assumed to be the same for the entire xdp_frame.
> 
Wouldn't the allocations happen from the page_pool of the rx queue
so the mem_type would be homogenous. I was initially worried about the
cpumap case but seems to not allow tail growth (frag_size is initialized
to 0 in cpu_map_bpf_prog_run_xdp()).

[1] - Disclaimer: XDP multi-buf fragment growth is still supported for the
      non Striding RQ mode.

Thanks,
Dragos


Reply via email to