On Mon, 12 Jan 2026 11:00:36 -0500 Scott Mitchell <[email protected]> wrote:
> > > > The discussion about the optimized checksum function [1] has shown us that > > memcpy() sometimes prevents Clang from optimizing (loop unrolling and > > vectorizing) and potentially causes strict aliasing bugs with GCC, so I > > will work on a new patch version that keeps using the above types, instead > > of introducing memcpy() inside rte_memcpy(). > > > > [1]: > > https://inbox.dpdk.org/dev/CAFn2buBzBLFLVN-K=u3mgbebq-hqbgjlvpdx3vsxvkjpa0y...@mail.gmail.com/ > > > > Great timing for this thread :) > > My observation: > - clang is unable to apply optimizations with RTE_PTR_[ADD,SUB] > like loop unrolling and vectorization (e.g. cksum) > - Even when clang/gcc do apply optimizations the assembly can be non-optimal > - direct usage of unaligned_NN_t types can cause incorrect results > (due to gcc bugs) > > I don't think "rte_NN_alias" structs are safe on architectures that don't > allow > unaligned access bcz the inner "val" needs to indicate it maybe for > unaligned access. > > My suggestion: > 1. Fix unaligned_NN_t types to ensure compiler doesn't aggressively > apply strict-alias > optimizations resulting in incorrect results > (https://patches.dpdk.org/project/dpdk/patch/[email protected]/). > Intermediate structs rte_NN_alias are then unnecessary and we can directly use > unaligned_NN_t instead (e.g. > https://patches.dpdk.org/project/dpdk/patch/[email protected]/) > > 2. Improve RTE_PTR_[ADD,SUB] to be more compiler friendly > (https://patches.dpdk.org/project/dpdk/patch/[email protected]/) FYI the Linux kernel avoids the memcpy silliness. Mostly by identifying architectures where unaligned access is non-issue. On x86, unaligned access works fine. As I remember it works on ARM as well. The only place where unaligned can break badly is when this is an atomic operation.

