Here are my test results:

    buildtype             : debugoptimized
    default_library       : shared
    -march=x86-64-v4 (Cascade Lake)
    gcc 15.2.1
    clang 21.1.6

GCC - BEFORE
Alignment  Block size    TSC cycles/block  TSC cycles/byte
Aligned           20                20.5             1.02
Unaligned         20                14.1             0.70
Aligned           21                15.8             0.75
Unaligned         21                15.8             0.75
Aligned         1500               148.2             0.10
Unaligned       1500               148.3             0.10
Aligned         1501               148.4             0.10
Unaligned       1501               148.2             0.10

GCC - AFTER
Alignment  Block size    TSC cycles/block  TSC cycles/byte
Aligned           20                20.8             1.04
Unaligned         20                15.6             0.78
Aligned           21                16.9             0.81
Unaligned         21                16.9             0.80
Aligned         1500               109.5             0.07
Unaligned       1500               111.6             0.07
Aligned         1501               111.1             0.07
Unaligned       1501               113.0             0.08
Aligned         9000               612.4             0.07
Unaligned       9000               612.6             0.07
Aligned         9001               581.5             0.06
Unaligned       9001               601.7             0.07

CLANG - BEFORE
Alignment  Block size    TSC cycles/block  TSC cycles/byte
Aligned           20                14.2             0.71
Unaligned         20                 9.5             0.47
Aligned           21                11.7             0.56
Unaligned         21                11.8             0.56
Aligned         1500               610.7             0.41
Unaligned       1500               632.0             0.42
Aligned         1501               610.4             0.41
Unaligned       1501               627.6             0.42

CLANG - AFTER
Alignment  Block size    TSC cycles/block  TSC cycles/byte
Aligned           20                14.0             0.70
Unaligned         20                 9.1             0.45
Aligned           21                 9.7             0.46
Unaligned         21                 9.6             0.46
Aligned         1500                77.9             0.05
Unaligned       1500                79.4             0.05
Aligned         1501                79.4             0.05
Unaligned       1501                80.4             0.05
Aligned         9000               447.8             0.05
Unaligned       9000               492.1             0.05
Aligned         9001               448.5             0.05
Unaligned       9001               492.6             0.05

Before your patch,
With small block size, clang is better than GCC.
With large block size, GCC is better than clang.
After your patch, clang is always better than GCC.


07/02/2026 02:29, Scott Mitchell:
> Thanks for testing! I included my build/host config, results on the
> main branch, and then with this path applied below. What is your build
> flags/configuration (e, cpu_instruction_set, march, optimization
> level, etc.)? I wasn't able to get any Clang version (18, 19, 20) to
> vectorize on Godbolt https://godbolt.org/z/8149r7sq8, and curious if
> your config enables vectorization.
> 
> #### build / host config
>   User defined options
>     b_lto              : false
>     buildtype          : release
>     c_args             : -fno-omit-frame-pointer
> -DPACKET_QDISC_BYPASS=1 -DRTE_MEMCPY_AVX512=1
>     cpu_instruction_set: cascadelake
>     default_library    : static
>     max_lcores         : 128
>     optimization       : 3
> $ clang --version
> clang version 18.1.8 (Red Hat, Inc. 18.1.8-3.el9)
> $ cat /etc/redhat-release
> Red Hat Enterprise Linux release 9.4 (Plow)
> 
> #### main branch
> $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test
> ### rte_raw_cksum() performance ###
> Alignment  Block size    TSC cycles/block  TSC cycles/byte
> Aligned           20                10.0             0.50
> Unaligned         20                10.1             0.50
> Aligned           21                11.1             0.53
> Unaligned         21                11.6             0.55
> Aligned          100                39.4             0.39
> Unaligned        100                67.3             0.67
> Aligned          101                43.3             0.43
> Unaligned        101                41.5             0.41
> Aligned         1500               728.2             0.49
> Unaligned       1500               805.8             0.54
> Aligned         1501               768.8             0.51
> Unaligned       1501               787.3             0.52
> Test OK
> 
> #### with this patch
> $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test
> ### rte_raw_cksum() performance ###
> Alignment  Block size    TSC cycles/block  TSC cycles/byte
> Aligned           20                12.6             0.63
> Unaligned         20                12.3             0.62
> Aligned           21                13.6             0.65
> Unaligned         21                13.6             0.65
> Aligned          100                22.7             0.23
> Unaligned        100                22.6             0.23
> Aligned          101                47.4             0.47
> Unaligned        101                23.9             0.24
> Aligned         1500                73.9             0.05
> Unaligned       1500                73.9             0.05
> Aligned         1501                95.7             0.06
> Unaligned       1501                73.9             0.05
> Aligned         9000               459.8             0.05
> Unaligned       9000               523.5             0.06
> Aligned         9001               536.7             0.06
> Unaligned       9001               507.5             0.06
> Aligned        65536              3158.4             0.05
> Unaligned      65536              3506.1             0.05
> Aligned        65537              3277.6             0.05
> Unaligned      65537              3697.6             0.06
> Test OK
> 





Reply via email to