Hello Scott,
On Fri, 23 Jan 2026 at 17:03, <[email protected]> wrote:
>
> From: Scott Mitchell <[email protected]>
>
> __rte_raw_cksum uses a loop with memcpy on each iteration.
> GCC 15+ is able to vectorize the loop but Clang 18.1 is not.
>
> Replace memcpy with direct pointer access using unaligned_uint16_t.
> This enables both GCC and Clang to vectorize the loop while handling
> unaligned access safely on all architectures.
>
> Performance results from cksum_perf_autotest on Intel Xeon
> (Cascade Lake, AVX-512) built with Clang 18.1 (TSC cycles/byte):
>
> Block size Before After Improvement
> 100 0.40 0.24 ~40%
> 1500 0.50 0.06 ~8x
> 9000 0.49 0.06 ~8x
>
> Signed-off-by: Scott Mitchell <[email protected]>
Unfortunately, clang 14 (Ubuntu 22.04) is complaining about unaligned
access in the new test.
Could you have a look?
RTE>>cksum_fuzz_autotest
../lib/net/rte_cksum.h:49:10: runtime error: load of misaligned
address 0x0001816c2e81 for type 'const unaligned_uint16_t' (aka 'const
unsigned short'), which requires 2 byte alignment
0x0001816c2e81: note: pointer points here
00 00 00 00 70 f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00
^
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
../lib/net/rte_cksum.h:49:10 in
The whole backtrace is as follows:
RTE>>cksum_fuzz_autotest
../lib/net/rte_cksum.h:49:10: runtime error: load of misaligned
address 0x0001816c2e81 for type 'const unaligned_uint16_t' (aka 'const
unsigned short'), which requires 2 byte alignment
0x0001816c2e81: note: pointer points here
00 00 00 00 0e ce 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00
^
#0 0x55a725ec25e7 in __rte_raw_cksum test_cksum_fuzz.c
#1 0x55a725ec21ce in test_cksum_fuzz_length_aligned test_cksum_fuzz.c
#2 0x55a725ec1f65 in test_cksum_fuzz_length test_cksum_fuzz.c
#3 0x55a725ec1c8f in test_cksum_fuzz_edge_cases test_cksum_fuzz.c
#4 0x55a725ec1ab2 in test_cksum_fuzz test_cksum_fuzz.c
#5 0x55a725ceece9 in cmd_autotest_parsed commands.c
#6 0x7fdb96d7e668 in __cmdline_parse cmdline_parse.c
#7 0x7fdb96d7dcb1 in cmdline_parse
(/home/runner/work/dpdk/dpdk/build/app/../lib/librte_cmdline.so.26+0x1bcb1)
(BuildId: bcf9387da4939ba68c89cec1938166c878fca318)
#8 0x7fdb96d74b69 in cmdline_valid_buffer cmdline.c
#9 0x7fdb96d8b9c3 in rdline_char_in
(/home/runner/work/dpdk/dpdk/build/app/../lib/librte_cmdline.so.26+0x299c3)
(BuildId: bcf9387da4939ba68c89cec1938166c878fca318)
#10 0x7fdb96d752d3 in cmdline_in
(/home/runner/work/dpdk/dpdk/build/app/../lib/librte_cmdline.so.26+0x132d3)
(BuildId: bcf9387da4939ba68c89cec1938166c878fca318)
#11 0x55a725cf0f0b in main
(/home/runner/work/dpdk/dpdk/build/app/dpdk-test+0x4ddf0b) (BuildId:
5905b821f00329f9c5b95c7064ea051d7aacac48)
#12 0x7fdb94629d8f in __libc_start_call_main
csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#13 0x7fdb94629e3f in __libc_start_main csu/../csu/libc-start.c:392:3
#14 0x55a725cc5ed4 in _start
(/home/runner/work/dpdk/dpdk/build/app/dpdk-test+0x4b2ed4) (BuildId:
5905b821f00329f9c5b95c7064ea051d7aacac48)
[snip]
> diff --git a/app/test/test_cksum_fuzz.c b/app/test/test_cksum_fuzz.c
> new file mode 100644
> index 0000000000..3df11e3dc2
> --- /dev/null
> +++ b/app/test/test_cksum_fuzz.c
> @@ -0,0 +1,240 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2026 Apple Inc.
> + */
> +
> +#include <stdio.h>
> +#include <string.h>
> +
> +#include <rte_common.h>
> +#include <rte_cycles.h>
> +#include <rte_hexdump.h>
> +#include <rte_cksum.h>
> +#include <rte_malloc.h>
> +#include <rte_random.h>
> +
> +#include "test.h"
> +
> +/*
> + * Fuzz test for __rte_raw_cksum optimization.
> + * Compares the optimized implementation against the original reference
> + * implementation across random data of various lengths.
> + */
> +
> +#define DEFAULT_ITERATIONS 1000
> +#define MAX_TEST_LEN 65536 /* 64K to match GRO frame sizes */
> +
> +/*
> + * Original (reference) implementation of __rte_raw_cksum from DPDK v23.11.
> + * This is retained here for comparison testing against the optimized
> version.
> + */
> +static inline uint32_t
> +__rte_raw_cksum_reference(const void *buf, size_t len, uint32_t sum)
> +{
Just a nit, I prefer we don't declare test functions with the same
prefix as a public dpdk API.
It is confusing when reading the test code.
--
David Marchand