> > The discussion about the optimized checksum function [1] has shown us that > memcpy() sometimes prevents Clang from optimizing (loop unrolling and > vectorizing) and potentially causes strict aliasing bugs with GCC, so I will > work on a new patch version that keeps using the above types, instead of > introducing memcpy() inside rte_memcpy(). > > [1]: > https://inbox.dpdk.org/dev/CAFn2buBzBLFLVN-K=u3mgbebq-hqbgjlvpdx3vsxvkjpa0y...@mail.gmail.com/ >
Great timing for this thread :) My observation: - clang is unable to apply optimizations with RTE_PTR_[ADD,SUB] like loop unrolling and vectorization (e.g. cksum) - Even when clang/gcc do apply optimizations the assembly can be non-optimal - direct usage of unaligned_NN_t types can cause incorrect results (due to gcc bugs) I don't think "rte_NN_alias" structs are safe on architectures that don't allow unaligned access bcz the inner "val" needs to indicate it maybe for unaligned access. My suggestion: 1. Fix unaligned_NN_t types to ensure compiler doesn't aggressively apply strict-alias optimizations resulting in incorrect results (https://patches.dpdk.org/project/dpdk/patch/[email protected]/). Intermediate structs rte_NN_alias are then unnecessary and we can directly use unaligned_NN_t instead (e.g. https://patches.dpdk.org/project/dpdk/patch/[email protected]/) 2. Improve RTE_PTR_[ADD,SUB] to be more compiler friendly (https://patches.dpdk.org/project/dpdk/patch/[email protected]/)

