On Wed, 26 Nov 2025 10:57:13 +0100 Morten Brørup <[email protected]> wrote:
> > From: Pavan Nikhilesh <[email protected]> > > > > Add RTE_OPTIMAL_BURST_SIZE to allow platforms to configure the > > optimal burst size. > > > > Set default value to 64 for soc_cn10k and 32 generally. > > > > Signed-off-by: Pavan Nikhilesh <[email protected]> > > --- > > This improves performance by 5% on l2fwd, other examples showed > > negligible difference on CN10K. > > > > I support the concept of having a recommended mbuf burst size, targeting the > majority of generic applications. > Making it CPU dependent seems like a good choice. > > It should be named differently. > First of all, "optimal" depends on the use case; if targeting low latency, > shorter bursts are better, so "OPTIMAL" should not be part of the name. > Second, I would guess that it only targets mbuf bursts, not also bursts of > other operations (e.g. hash lookups), so "MBUF" should be part of the name. > > Suggestion: > /* Recommended burst size for generic applications, striking a balance > between throughput and latency. */ > dpdk_conf.set('RTE_MBUF_BURST_SIZE_MAX' (or _DEFAULT), 64) > > <feature creep> > /* Recommended burst size for generic applications targeting low latency. */ > dpdk_conf.set('RTE_MBUF_BURST_SIZE_MIN', 4) > </feature creep> > > Having these standardized will also allow libraries and drivers to optimize > for them, e.g. drivers should support bursts sizes all the way down to > RTE_MBUF_BURST_SIZE_MIN, and can static_assert() that the > RTE_MBUF_BURST_SIZE_MIN is not lower than supported by the driver/hardware. > > <more feature creep> > rte_config.h could have "#define RTE_MBUF_BURST_SIZE > RTE_MBUF_BURST_SIZE_MAX", for the application developer to change to > RTE_MBUF_BURST_SIZE_MIN for low latency applications. > This will let the libraries and drivers optimize for the specific burst size > used by the application. > </more feature creep> > > <rambling> > Intuitively, I would assume that the optimal burst size essentially depends > on the CPU's L1D cache size and the application's number of non-mbuf cache > lines accessed per burst. > Let's say a CPU core has 32 KiB cache (= 512 cache lines), and each burst > touches 4 cache lines per packet: > 2 cache lines for the mbuf > 1 cache line for the packet data > 1 cache line per packet for some table lookup/forwarding entry > > Then the mbuf burst should be max 512/4 = 128. > But local variables also use memory during processing, so using a burst of 64 > would leave room for that and some more. > </rambling> > > > config/arm/meson.build | 1 + > > config/meson.build | 1 + > > 2 files changed, 2 insertions(+) > > > > diff --git a/config/arm/meson.build b/config/arm/meson.build > > index 523b0fc0ed50..fa64c07016b1 100644 > > --- a/config/arm/meson.build > > +++ b/config/arm/meson.build > > @@ -481,6 +481,7 @@ soc_cn10k = { > > ['RTE_MAX_LCORE', 24], > > ['RTE_MAX_NUMA_NODES', 1], > > ['RTE_MEMPOOL_ALIGN', 128], > > + ['RTE_OPTIMAL_BURST_SIZE', 64], > > ], > > 'part_number': '0xd49', > > 'extra_march_features': ['crypto'], > > diff --git a/config/meson.build b/config/meson.build > > index 0cb074ab95b7..95367ae88e2d 100644 > > --- a/config/meson.build > > +++ b/config/meson.build > > @@ -386,6 +386,7 @@ if get_option('mbuf_refcnt_atomic') > > dpdk_conf.set('RTE_MBUF_REFCNT_ATOMIC', true) > > endif > > dpdk_conf.set10('RTE_IOVA_IN_MBUF', get_option('enable_iova_as_pa')) > > +dpdk_conf.set('RTE_OPTIMAL_BURST_SIZE', 32) > > > > compile_time_cpuflags = [] > > subdir(arch_subdir) > > -- > > 2.50.1 (Apple Git-155) I understand the motivation, and it make sense for a pure embedded system. But then again on an embedded system the application can just set its burst size; this config option only impacts performance of testpmd and examples. And the performance of testpmd is mostly irrelevant what matters is the real application. Making it a DPDK config option is a problem for DPDK build in distros. The optimal burst size would be driver dependent etc. Perhaps better off in the existing rx / tx descriptor hints. Most of those device configs really need to be relooked at since they were inherited from how old Intel drivers worked.

