On Wed, Jan 14, 2026 at 04:31:31PM +0100, Morten Brørup wrote: > > > If I'm not mistaken, the mbuf library is not a barrier for fast- > > freeing > > > segmented packet mbufs, and thus fast-free of jumbo frames is > > possible. > > > > > > We need a driver developer to confirm that my suggested approach - > > > resetting the mbuf fields, incl. 'm->nb_segs' and 'm->next', when > > > preparing the Tx descriptor - is viable. > > > > > Excellent analysis, Morten. If I get a chance some time this release > > cycle, > > I will try implementing this change in our drivers, see if any > > difference > > is made. > > Bruce, > > Have you had a chance to look into the driver change requirements? > If not, could you please try scratching the surface, to build a gut feeling.
I'll try and take a look this week. Juggling a few things at the moment, so I had forgotten about this. Sorry. More comments inline below. /Bruce > > I wonder if the vector implementations have strong requirements that packets > are not segmented... > > The i40 driver only sets "tx_simple_allowed" and "tx_vec_allowed" flags when > MBUF_FAST_FREE is set: > https://elixir.bootlin.com/dpdk/v25.11/source/drivers/net/intel/i40e/i40e_rxtx.c#L3502 > Actually, it allows but does not require FAST_FREE. The check is just verifying that the flags with everything *but* FAST_FREE masked out is the same as the original flags, i.e. FAST_FREE is just ignored. > And only when these two flags are set, it uses a vector Tx function: > https://elixir.bootlin.com/dpdk/v25.11/source/drivers/net/intel/i40e/i40e_rxtx.c#L3550 > And a special Tx Prep function: > https://elixir.bootlin.com/dpdk/v25.11/source/drivers/net/intel/i40e/i40e_rxtx.c#L3584 > Which fails if nb_segs != 1: > https://elixir.bootlin.com/dpdk/v25.11/source/drivers/net/intel/i40e/i40e_rxtx.c#L1675 > > So currently it does. > But does it need to?... That is the question. > Paraphrasing: > Can the Tx function only be vectorized when the code path doesn't have > branches depending on the number of segments? > If so, then this may be the main reason for not supporting segmented packets > with FAST_FREE. > > In that case, we cannot remove the single-segment requirement from FAST_FREE > without sacrificing the performance boost from vectorizing. No, based on what I state above, this should not be a blocker. The vector paths do require us to guarantee only one segment per packet - without additional context descriptors - so only one descriptor per packet (generally, or always one + ctx, in one code-path case). FAST_FREE can be used in conjunction with that but should not be a requirement. See [1] where in vector cleanup we explicitly check for FAST_FREE. Similarly for scalar code path, in my latest rework, I am attempting to standardize the use of FAST_FREE optimizations even when we have a slightly slower Tx path [2]. [1] https://github.com/DPDK/dpdk/blob/main/drivers/net/intel/common/tx.h [2] https://patches.dpdk.org/project/dpdk/patch/[email protected]/ > > But then we can proceed pursuing alternative optimizations, as suggested by > Konstantin. > > Here's another idea: > The Tx function could pre-scan each Tx burst for multi-segment packets, to > decide if the burst should be processed by the vector code path or a fallback > code path (which can also handle multi-segment packets). > > > -Morten >

