> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Thursday, 7 July 2022 20.35 > > From: Mattias Rönnblom <mattias.ronnb...@ericsson.com> > > __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its > data through an uint16_t pointer, which allowed the compiler to assume > the data was 16-bit aligned. This in turn would, with certain > architectures and compiler flag combinations, result in code with SIMD > load or store instructions with restrictions on data alignment. > > This patch keeps the old algorithm, but data is read using memcpy() > instead of direct pointer access, forcing the compiler to always > generate code that handles unaligned input. The __may_alias__ GCC > attribute is no longer needed. > > The data on which the Internet checksum functions operates are almost > always 16-bit aligned, but there are exceptions. In particular, the > PDCP protocol header may (literally) have an odd size. > > Performance impact seems to range from none to a very slight > regression. > > Bugzilla ID: 1035 > Cc: sta...@dpdk.org > > Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com> > --- > lib/net/rte_ip.h | 19 ++++++++++++------- > 1 file changed, 12 insertions(+), 7 deletions(-) > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h > index b502481670..a9e6251f14 100644 > --- a/lib/net/rte_ip.h > +++ b/lib/net/rte_ip.h > @@ -160,18 +160,23 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr > *ipv4_hdr) > static inline uint32_t > __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) > { > - /* extend strict-aliasing rules */ > - typedef uint16_t __attribute__((__may_alias__)) u16_p; > - const u16_p *u16_buf = (const u16_p *)buf; > - const u16_p *end = u16_buf + len / sizeof(*u16_buf); > + const void *end;
I would set "end" here instead, possibly making the pointer const too. And add spaces around '/'. const void * const end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) * sizeof(uint16_t)); > > - for (; u16_buf != end; ++u16_buf) > - sum += *u16_buf; > + for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) * > sizeof(uint16_t)); > + buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { > + uint16_t v; > + > + memcpy(&v, buf, sizeof(uint16_t)); > + sum += v; > + } > > /* if length is odd, keeping it byte order independent */ > if (unlikely(len % 2)) { > + uint8_t last; > uint16_t left = 0; > - *(unsigned char *)&left = *(const unsigned char *)end; > + > + memcpy(&last, end, 1); > + *(unsigned char *)&left = last; Couldn't you just memcpy(&left, end, 1), and omit the temporary variable "last"? > sum += left; > } > > -- > 2.25.1 > With our without my suggested changes, it looks good. Reviewed-by: Morten Brørup <m...@smartsharesystems.com>