> From: Scott <[email protected]>
> 
> __rte_raw_cksum uses a loop with memcpy on each iteration.
> GCC 15+ is able to vectorize the loop but Clang 18.1 is not.
> Replacing the memcpy with unaligned_uint16_t pointer access enables
> both GCC and Clang to vectorize with SSE/AVX/AVX-512.
> 
> This patch adds comprehensive fuzz testing and updates the performance
> test to measure the optimization impact.
> 
> Performance results from cksum_perf_autotest on Intel Xeon
> (Cascade Lake, AVX-512) built with Clang 18.1 (TSC cycles/byte):
> 
>   Block size    Before    After    Improvement
>          100      0.40     0.24        ~40%
>         1500      0.50     0.06        ~8x
>         9000      0.49     0.06        ~8x
> 
> Signed-off-by: Scott Mitchell <[email protected]>
> ---

Probably makes no practical difference, but consider marking the 
__rte_raw_cksum() function __rte_pure:
https://elixir.bootlin.com/dpdk/v25.11/source/lib/eal/include/rte_common.h#L228

With or without __rte_pure marking,
Acked-by: Morten Brørup <[email protected]>

Reply via email to