On 2022-07-11 11:53, Olivier Matz wrote: > Hi, > > On Fri, Jul 08, 2022 at 02:56:08PM +0200, Mattias Rönnblom wrote: >> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its >> data through an uint16_t pointer, which allowed the compiler to assume >> the data was 16-bit aligned. This in turn would, with certain >> architectures and compiler flag combinations, result in code with SIMD >> load or store instructions with restrictions on data alignment. >> >> This patch keeps the old algorithm, but data is read using memcpy() >> instead of direct pointer access, forcing the compiler to always >> generate code that handles unaligned input. The __may_alias__ GCC >> attribute is no longer needed. >> >> The data on which the Internet checksum functions operates are almost >> always 16-bit aligned, but there are exceptions. In particular, the >> PDCP protocol header may (literally) have an odd size. >> >> Performance impact seems to range from none to a very slight >> regression. >> >> Bugzilla ID: 1035 >> Cc: sta...@dpdk.org >> >> --- > > Using memcpy() looks to be a good solution fix the issue, while avoiding a > branch and the __may_alias__. > > I just have one minor comment below. > >> >> v2: >> * Simplified the odd-length conditional (Morten Brørup). >> >> Reviewed-by: Morten Brørup <m...@smartsharesystems.com> >> >> Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >> --- >> lib/net/rte_ip.h | 17 ++++++++++------- >> 1 file changed, 10 insertions(+), 7 deletions(-) >> >> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h >> index b502481670..a0334d931e 100644 >> --- a/lib/net/rte_ip.h >> +++ b/lib/net/rte_ip.h >> @@ -160,18 +160,21 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr *ipv4_hdr) >> static inline uint32_t >> __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) >> { >> - /* extend strict-aliasing rules */ >> - typedef uint16_t __attribute__((__may_alias__)) u16_p; >> - const u16_p *u16_buf = (const u16_p *)buf; >> - const u16_p *end = u16_buf + len / sizeof(*u16_buf); >> + const void *end; >> >> - for (; u16_buf != end; ++u16_buf) >> - sum += *u16_buf; >> + for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) * sizeof(uint16_t)); > > What do you think about this form: > > for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); > > This also has the good property to solve the debate about the > spaces around the '/' :) >
Shorter, more readable. Looks good to me. Thanks. > >> + buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { >> + uint16_t v; >> + >> + memcpy(&v, buf, sizeof(uint16_t)); >> + sum += v; >> + } >> >> /* if length is odd, keeping it byte order independent */ >> if (unlikely(len % 2)) { >> uint16_t left = 0; >> - *(unsigned char *)&left = *(const unsigned char *)end; >> + >> + memcpy(&left, end, 1); >> sum += left; >> } >> >> -- >> 2.25.1 >>