On Wed, Oct 29, 2025 at 03:57:40PM +0100, Thomas Monjalon wrote: > 29/10/2025 13:23, Morten Brørup: > > > From: Bruce Richardson [mailto:[email protected]] > > > On Wed, Oct 29, 2025 at 12:16:37PM +0300, Andrew Rybchenko wrote: > > > > On 9/18/25 5:12 PM, Konstantin Ananyev wrote: > > > > > > > From: Bruce Richardson [mailto:[email protected]] > > > > > > > On Thu, Sep 18, 2025 at 10:50:11AM +0200, Morten Brørup wrote: > > > > > > > > Dear NIC driver maintainers (CC: DPDK Tech Board), > > > > > > > > > > > > > > > > The DPDK Tech Board has discussed that patch [1] (included in > > > DPDK > > > > > > > 25.07) extended the documented requirements to the > > > > > > > RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE offload. > > > > > > > > These changes put additional limitations on applications' use > > > of the > > > > > > > MBUF_FAST_FREE TX offload, and made MBUF_FAST_FREE mutually > > > exclusive > > > > > > > with MULTI_SEGS (which is typically used for jumbo frame > > > support). > > > > > > > > The Tech Board discussed that these changes do not reflect > > > the > > > > > > > intention of the MBUF_FAST_FREE TX offload, and wants to fix > > > it. > > > > > > > > Mainly, MBUF_FAST_FREE and MULTI_SEGS should not be mutually > > > > > > > exclusive. > > > > > > > > > > > > > > > > The original RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE requirements > > > were: > > > > > > > > When set, application must guarantee that > > > > > > > > 1) per-queue all mbufs come from the same mempool, and > > > > > > > > 2) mbufs have refcnt = 1. > > > > > > > > > > > > > > > > The patch added the following requirements to the > > > MBUF_FAST_FREE > > > > > > > offload, reflecting rte_pktmbuf_prefree_seg() postconditions: > > > > > > > > 3) mbufs are direct, > > > > > > > > 4) mbufs have next = NULL and nb_segs = 1. > > > > > > > > > > > > > > > > Now, the key question is: > > > > > > > > Can we roll back to the original two requirements? > > > > > > > > Or do the drivers also depend on the third and/or fourth > > > > > > > requirements? > > > > > > > > > > > > > > > > <advertisement> > > > > > > > > Drivers freeing mbufs directly to a mempool should use the > > > new > > > > > > > rte_mbuf_raw_free_bulk() instead of rte_mempool_put_bulk(), so > > > the > > > > > > > preconditions for freeing mbufs directly into a mempool are > > > validated > > > > > > > in mbuf debug mode (with RTE_LIBRTE_MBUF_DEBUG enabled). > > > > > > > > Similarly, rte_mbuf_raw_alloc_bulk() should be used instead > > > of > > > > > > > rte_mempool_get_bulk(). > > > > > > > > </advertisement> > > > > > > > > > > > > > > > > PS: The feature documentation [2] still reflects the original > > > > > > > requirements. > > > > > > > > > > > > > > > > [1]: > > > > > > > > > > > > > > > > https://github.com/DPDK/dpdk/commit/55624173bacb2becaa67793b7139188487 > > > > > > 6 > > > > > > > 673c1 > > > > > > > > [2]: > > > > > > > > > > https://elixir.bootlin.com/dpdk/v25.07/source/doc/guides/nics/features. > > > > > > > rst#L125 > > > > > > > > > > > > > > > > > > > > > > > > Venlig hilsen / Kind regards, > > > > > > > > -Morten Brørup > > > > > > > > > > > > > > > I'm a little torn on this question, because I can see benefits > > > for both > > > > > > > approaches. Firstly, it would be nice if we made FAST_FREE as > > > > > > > accessible > > > > > > > for driver use as it was originally, with minimal requirements. > > > > > > > However, on > > > > > > > looking at the code, I believe that many drivers actually took > > > it to > > > > > > > mean > > > > > > > that scattered packets couldn't occur in that case either, so > > > the use > > > > > > > was > > > > > > > incorrect. > > > > > > > > > > > > I primarily look at Intel drivers, and that's how I read the > > > driver code too. > > > > > > > > > > > > > Similarly, and secondly, if we do have the extra > > > > > > > requirements > > > > > > > for FAST_FREE, it does mean that any use of it can be very, > > > very > > > > > > > minimal > > > > > > > and efficient, since we don't need to check anything before > > > freeing the > > > > > > > buffers. > > > > > > > > > > > > > > Given where we are now, I think keeping the more restrictive > > > definition > > > > > > > of > > > > > > > FAST_FREE is the way to go - keeping it exclusive with > > > MULTI_SEGS - > > > > > > > because > > > > > > > it means that we are less likely to have bugs. If we look to > > > change it > > > > > > > back, I think we'd have to check all drivers to ensure they are > > > using > > > > > > > the > > > > > > > flag safely. > > > > > > > > > > > > However, those driver bugs are not new. > > > > > > If we haven't received bug reports from users affected by them, > > > maybe we can > > > > > > disregard them (in this discussion about pros and cons). > > > > > > I prefer we register them as driver bugs, instead of changing the > > > API to > > > > > > accommodate bugs in the drivers. > > > > > > > > > > > > From an application perspective, here's an idea for > > > consideration: > > > > > > Assuming that indirect mbufs are uncommon, we keep requirement > > > #3. > > > > > > To allow MULTI_SEGS (jumbo frames) with FAST_FREE, we get rid of > > > requirement > > > > > > #4. > > > > > > > > > > Do we really need to enable FAST_FREE for jumbo-frames? > > > > > Jumbo-frames usually means much smaller PPS number and actual RX/TX > > > overhead > > > > > becomes really tiny. > > > > > > > > +1 > > > > > > Since the driver knows that refcnt == 1, the driver can set next > > > = NULL and > > > > > > nb_segs = 1 at any time, either when writing the TX descriptor > > > (when it reads the > > > > > > mbuf anyway), or when freeing the mbuf. > > > > > > Regarding performance, this means that the driver's TX code path > > > has to write to > > > > > > the mbufs (i.e. adding the performance cost of memory store > > > operations) when > > > > > > segmented - but that is a universal requirement when freeing > > > segmented mbufs > > > > > > to the mempool. > > > > > > > > > > It might work, but I think it will become way too complicated. > > > > > Again I don't know who is going to inspect/fix all the drivers. > > > > > Just not allowing FAST_FREE for mulsti-seg seems like a much more > > > simpler and safer approach. > > > > > > For even more optimized driver performance, as Bruce mentions... > > > > > > If a port is configured for FAST_FREE and not MULTI_SEGS, the > > > driver can use a > > > > > > super lean transmit function. > > > > > > Since the driver's transmit function pointer is per port (not per > > > queue), this would > > > > > > require the driver to provide the MULTI_SEGS capability only per > > > port, and not > > > > > > per queue. (Or we would have to add a NOT_MULTI_SEGS offload > > > flag, to ensure > > > > > > that no queue is configured for MULTI_SEGS.) > > > > > > > > > > > > FAST_FREE is not a real Tx offload, since there is no promise from > > > > driver to do something (like other Tx offloads, e.g. checksumming or > > > > TSO). Is it a promise to ignore refcount or take a look at memory > > > pool > > > > of some packets only? I guess no. If so, basically any driver may > > > > advertise it and simply ignore if the offload is requested, but > > > > driver can do nothing with these limitations on input data. > > > > > > > > It is a performance hint in fact and promise from application to > > > > follow specified limitations on Tx mbufs. > > > > > > > > So, if application specifies both FAST_FREE and MULTI_SEG, but driver > > > > code can't FAST_FREE with MULTI_SEG, it should just ignore FAST_FREE. > > > > That's it. The performance hint is simply useless in this case. > > > > There is no point to make FAST_FREE and MULTI_SEG mutual exclusive. > > > > If some drivers can really support both - great. If no, just ignore > > > > FAST_FREE and support MULTI_SEG. > > > > > > > > "mbufs are direct" must be FAST_FREE requirement. Since otherwise > > > > freeing is not simple. I guess is was simply lost in the original > > > > definition of FAST_FREE. > > > > Agree about the "mbufs are direct" statement being lost in the original > > definition. > > It can be extended to include mbufs using "pinned external buffer with > > refcnt==1", because freeing those is just as simple as freeing "direct" > > mbufs. > > > > > > > > > That's a good point and expanation of things. Perhaps we are better to > > > deprecate FAST_FREE and replace it with a couple of explicit hints that > > > better explain what they are? > > > > > > - RTE_ETH_TX_HINT_DIRECT_MBUFS > > > > In the FAST_FREE case, this hint would be > > TX_HINT_MBUF_DIRECT_OR_SINGLE_OWNER_PINNED_EXTBUF. > > > > > - RTE_ETH_TX_HINT_SINGLE_MEMPOOL > > > > Prefer TX_HINT_SINGLE_MEMPOOL -> TX_HINT_SAME_MEMPOOL, so we can add a > > globally scoped TX_HINT_SINGLE_MEMPOOL later. > > > > Also, RTE_ETH_TX_HINT_NON_SEGMENTED can be added later. > > > > I strongly agree with the finer granularity for the hints; the optimization > > of freeing to one mempool instead of a variety of mempools is orthogonal to > > the optimization of not having to consider indirect mbufs. > > And the drivers are free to only optimize if multiple hints are present; so > > there is no downside to using a finer granularity for hints. > > Yes we can have finer granularity. > > > > Although we are reusing "offload" fields for hints, there's no need for > > drivers to announce capability for such hints, including FAST_FREE; since > > the drivers can freely ignore any hints, hint capability doesn't contain > > any information about the driver's ability to do anything useful with the > > hints. > > Capability does not need to be announced, > but it would be useful to have debug logs when an optimization is enabled. > I'm not sure how we can enforce such logs in drivers. > > > > Regarding naming, we should use "promise" instead of "hint", > > to emphasize that the "hint" is not allowed to be violated. > > I'm not sure why but I'm not confortable with the word "promise". > To me, a "hint" is already something strong. >
Agree. Also, promise is too long a word. Hint is shorter.

