>
> The discussion about the optimized checksum function [1] has shown us that 
> memcpy() sometimes prevents Clang from optimizing (loop unrolling and 
> vectorizing) and potentially causes strict aliasing bugs with GCC, so I will 
> work on a new patch version that keeps using the above types, instead of 
> introducing memcpy() inside rte_memcpy().
>
> [1]: 
> https://inbox.dpdk.org/dev/CAFn2buBzBLFLVN-K=u3mgbebq-hqbgjlvpdx3vsxvkjpa0y...@mail.gmail.com/
>

Great timing for this thread :)

My observation:
- clang is unable to apply optimizations with RTE_PTR_[ADD,SUB]
like loop unrolling and vectorization (e.g. cksum)
- Even when clang/gcc do apply optimizations the assembly can be non-optimal
- direct usage of unaligned_NN_t types can cause incorrect results
(due to gcc bugs)

I don't think "rte_NN_alias" structs are safe on architectures that don't allow
unaligned access bcz the inner "val" needs to indicate it maybe for
unaligned access.

My suggestion:
1. Fix unaligned_NN_t types to ensure compiler doesn't aggressively
apply strict-alias
optimizations resulting in incorrect results
(https://patches.dpdk.org/project/dpdk/patch/[email protected]/).
Intermediate structs rte_NN_alias are then unnecessary and we can directly use
unaligned_NN_t instead (e.g.
https://patches.dpdk.org/project/dpdk/patch/[email protected]/)

2. Improve RTE_PTR_[ADD,SUB] to be more compiler friendly
(https://patches.dpdk.org/project/dpdk/patch/[email protected]/)

Reply via email to