On Wed, Jan 14, 2026 at 04:31:31PM +0100, Morten Brørup wrote:
> > > If I'm not mistaken, the mbuf library is not a barrier for fast-
> > freeing
> > > segmented packet mbufs, and thus fast-free of jumbo frames is
> > possible.
> > >
> > > We need a driver developer to confirm that my suggested approach -
> > > resetting the mbuf fields, incl. 'm->nb_segs' and 'm->next', when
> > > preparing the Tx descriptor - is viable.
> > >
> > Excellent analysis, Morten. If I get a chance some time this release
> > cycle,
> > I will try implementing this change in our drivers, see if any
> > difference
> > is made.
> 
> Bruce,
> 
> Have you had a chance to look into the driver change requirements?
> If not, could you please try scratching the surface, to build a gut feeling.

I'll try and take a look this week. Juggling a few things at the moment, so
I had forgotten about this. Sorry.

More comments inline below.

/Bruce

> 
> I wonder if the vector implementations have strong requirements that packets 
> are not segmented...
> 
> The i40 driver only sets "tx_simple_allowed" and "tx_vec_allowed" flags when 
> MBUF_FAST_FREE is set:
> https://elixir.bootlin.com/dpdk/v25.11/source/drivers/net/intel/i40e/i40e_rxtx.c#L3502
>

Actually, it allows but does not require FAST_FREE. The check is just
verifying that the flags with everything *but* FAST_FREE masked out is the
same as the original flags, i.e. FAST_FREE is just ignored.
 
> And only when these two flags are set, it uses a vector Tx function:
> https://elixir.bootlin.com/dpdk/v25.11/source/drivers/net/intel/i40e/i40e_rxtx.c#L3550
> And a special Tx Prep function:
> https://elixir.bootlin.com/dpdk/v25.11/source/drivers/net/intel/i40e/i40e_rxtx.c#L3584
> Which fails if nb_segs != 1:
> https://elixir.bootlin.com/dpdk/v25.11/source/drivers/net/intel/i40e/i40e_rxtx.c#L1675
> 
> So currently it does.
> But does it need to?... That is the question.
> Paraphrasing:
> Can the Tx function only be vectorized when the code path doesn't have 
> branches depending on the number of segments?
> If so, then this may be the main reason for not supporting segmented packets 
> with FAST_FREE.
> 
> In that case, we cannot remove the single-segment requirement from FAST_FREE 
> without sacrificing the performance boost from vectorizing.

No, based on what I state above, this should not be a blocker. The vector
paths do require us to guarantee only one segment per packet - without
additional context descriptors - so only one descriptor per packet
(generally, or always one + ctx, in one code-path case). FAST_FREE can be
used in conjunction with that but should not be a requirement. See [1]
where in vector cleanup we explicitly check for FAST_FREE.

Similarly for scalar code path, in my latest rework, I am attempting to
standardize the use of FAST_FREE optimizations even when we have a slightly
slower Tx path [2].

[1] https://github.com/DPDK/dpdk/blob/main/drivers/net/intel/common/tx.h
[2] 
https://patches.dpdk.org/project/dpdk/patch/[email protected]/

> 
> But then we can proceed pursuing alternative optimizations, as suggested by 
> Konstantin.
> 
> Here's another idea:
> The Tx function could pre-scan each Tx burst for multi-segment packets, to 
> decide if the burst should be processed by the vector code path or a fallback 
> code path (which can also handle multi-segment packets).
> 
> 
> -Morten
> 

Reply via email to