>>> From: Pavan Nikhilesh <[email protected]>
>>>
>>> Add RTE_OPTIMAL_BURST_SIZE to allow platforms to configure the
>>> optimal burst size.
>>>
>>> Set default value to 64 for soc_cn10k and 32 generally.
>>>
>>> Signed-off-by: Pavan Nikhilesh <[email protected]>
>>> ---
>>> This improves performance by 5% on l2fwd, other examples showed
>>> negligible difference on CN10K.
>>>
>>
>>I support the concept of having a recommended mbuf burst size, targeting the 
>>majority of generic applications.
>>Making it CPU dependent seems like a good choice.
>>
>>It should be named differently.
>>First of all, "optimal" depends on the use case; if targeting low latency, 
>>shorter bursts are better, so "OPTIMAL" should not be part of the name.
>>Second, I would guess that it only targets mbuf bursts, not also bursts of 
>>other operations (e.g. hash lookups), so "MBUF" should be part of the name.
>>
>>Suggestion:
>>/* Recommended burst size for generic applications, striking a balance 
>>between throughput and latency. */
>>dpdk_conf.set('RTE_MBUF_BURST_SIZE_MAX' (or _DEFAULT), 64)
>>
>
>Agreed, would the comment be enough to say that it is a recommendation and not 
>an enforcement? or should it be added to the macro name?
>I am sceptical of changing burst size of 64 since most of the applications 
>_today_ use 32, might cause unintended regression.
>
>RTE_MBUF_BURST_SIZE_(REC)_PERF?
>
>><feature creep>
>>/* Recommended burst size for generic applications targeting low latency. */
>>dpdk_conf.set('RTE_MBUF_BURST_SIZE_MIN', 4)
>></feature creep>
>
>RTE_MBUF_BURST_SIZE_(REC)_LAT?
>
>(I am bad at names)
>>
>>Having these standardized will also allow libraries and drivers to optimize 
>>for them, e.g. drivers should support bursts sizes all the way down to 
>>RTE_MBUF_BURST_SIZE_MIN, and can static_assert() that the 
>>RTE_MBUF_BURST_SIZE_MIN is not lower than supported by the driver/hardware.
>>
>><more feature creep>
>>rte_config.h could have "#define RTE_MBUF_BURST_SIZE 
>>RTE_MBUF_BURST_SIZE_MAX", for the application developer to change to 
>>RTE_MBUF_BURST_SIZE_MIN for low latency applications.
>>This will let the libraries and drivers optimize for the specific burst size 
>>used by the application.
>></more feature creep>
>
>This is fine with me, we can wrap it around a meson option to avoid manually 
>changing rte_config.h
>
>>
>><rambling>
>>Intuitively, I would assume that the optimal burst size essentially depends 
>>on the CPU's L1D cache size and the application's number of non-mbuf cache 
>>lines accessed per burst.
>>Let's say a CPU core has 32 KiB cache (= 512 cache lines), and each burst 
>>touches 4 cache lines per packet:
>>2 cache lines for the mbuf
>>1 cache line for the packet data
>>1 cache line per packet for some table lookup/forwarding entry
>>
>>Then the mbuf burst should be max 512/4 = 128.
>>But local variables also use memory during processing, so using a burst of 64 
>>would leave room for that and some more.
>></rambling>
>
>We could probably read `/sys/devices/system/cpu/cpu0/cache/index0/size` in 
>meson and calculate the number of lines and burst but, I dont think its
>that simple, for example, CN10K has 64KiB L1D cache and anything above 64 
>burst size causes performance loss.
>
>Thanks,
>Pavan
>

@Morten, Any further thoughts? I can spin up a v1.

Reply via email to