On 2022-07-07 23:44, Morten Brørup wrote: >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] >> Sent: Thursday, 7 July 2022 20.35 >> >> From: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >> >> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its >> data through an uint16_t pointer, which allowed the compiler to assume >> the data was 16-bit aligned. This in turn would, with certain >> architectures and compiler flag combinations, result in code with SIMD >> load or store instructions with restrictions on data alignment. >> >> This patch keeps the old algorithm, but data is read using memcpy() >> instead of direct pointer access, forcing the compiler to always >> generate code that handles unaligned input. The __may_alias__ GCC >> attribute is no longer needed. >> >> The data on which the Internet checksum functions operates are almost >> always 16-bit aligned, but there are exceptions. In particular, the >> PDCP protocol header may (literally) have an odd size. >> >> Performance impact seems to range from none to a very slight >> regression. >> >> Bugzilla ID: 1035 >> Cc: sta...@dpdk.org >> >> Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >> --- >> lib/net/rte_ip.h | 19 ++++++++++++------- >> 1 file changed, 12 insertions(+), 7 deletions(-) >> >> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h >> index b502481670..a9e6251f14 100644 >> --- a/lib/net/rte_ip.h >> +++ b/lib/net/rte_ip.h >> @@ -160,18 +160,23 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr >> *ipv4_hdr) >> static inline uint32_t >> __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) >> { >> - /* extend strict-aliasing rules */ >> - typedef uint16_t __attribute__((__may_alias__)) u16_p; >> - const u16_p *u16_buf = (const u16_p *)buf; >> - const u16_p *end = u16_buf + len / sizeof(*u16_buf); >> + const void *end; > > I would set "end" here instead, possibly making the pointer const too. And > add spaces around '/'. > const void * const end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) * > sizeof(uint16_t)); >
I don't think that makes the code more readable. >> >> - for (; u16_buf != end; ++u16_buf) >> - sum += *u16_buf; >> + for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) * >> sizeof(uint16_t)); >> + buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { >> + uint16_t v; >> + >> + memcpy(&v, buf, sizeof(uint16_t)); >> + sum += v; >> + } >> >> /* if length is odd, keeping it byte order independent */ >> if (unlikely(len % 2)) { >> + uint8_t last; >> uint16_t left = 0; >> - *(unsigned char *)&left = *(const unsigned char *)end; >> + >> + memcpy(&last, end, 1); >> + *(unsigned char *)&left = last; > > Couldn't you just memcpy(&left, end, 1), and omit the temporary variable > "last"? > Good point. I don't like how this code is clever vis-à-vis byte order, but then I also don't have a better suggestion. >> sum += left; >> } >> >> -- >> 2.25.1 >> > > With our without my suggested changes, it looks good. > > Reviewed-by: Morten Brørup <m...@smartsharesystems.com> > Thanks!