> Does this still happen after you replaced the RTE_PTR_ADD() with native > pointer arithmetic in the checksum function? > In other words: Is this workaround still necessary?
Yes unfortunately it is necessary with the pointer access. I updated the reproducer which shows this case: https://gist.github.com/Scottmitch/bf23748b4588e68c9bdb8d124f92f1bd > This is a showstopper: > If the workaround is necessary, applications with similar use cases also need > to apply the workaround. > If we cannot somehow enforce that, the series is likely to break some > applications, which is unacceptable. That is a great point. This API isn't internal-only and this would effectively be an API breaking change which doesn't seem justified. Given what I've learned through this process (thank you & stephen for valuable feedback) we have a few paths to achieve my goal (clang optimizes __rte_raw_cksum). I've verified if the RTE_PTR_ADD macros are changed to use char* clang optimizes (and gcc still does too) [1]. To achieve this we have some options: A. Modify RTE_PTR_[ADD|SUB] to use pointers pros: - [if API can be preserved] provides benefits to all use cases w/out usage changes - no additional API surface to expose cons: - more complex macro implementation to preserve API compatibility. B. Add RTE_CONST_PTR_[ADD|SUB] with const [void*|char*] & use it in __rte_raw_cksum pros: - no risk of impacting existing RTE_PTR_[ADD|SUB] APIs - simple implementation using pointers from the start cons: - API may not support all use cases as RTE_PTR_[ADD|SUB] (e.g. ptr arg as raw integer) - requires manual opt-in to new API to get any benefit I have a draft of A I will submit as a patch and we can discuss if it makes sense or fallback to B (or other approaches). [1] https://godbolt.org/z/5bc1bTrhe

