On 2022-07-08 15:02, Morten Brørup wrote: >> From: Mattias Rönnblom [mailto:mattias.ronnb...@ericsson.com] >> Sent: Friday, 8 July 2022 14.44 >> >> On 2022-07-07 23:44, Morten Brørup wrote: >>>> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] >>>> Sent: Thursday, 7 July 2022 20.35 >>>> >>>> From: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >>>> >>>> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed >> its >>>> data through an uint16_t pointer, which allowed the compiler to >> assume >>>> the data was 16-bit aligned. This in turn would, with certain >>>> architectures and compiler flag combinations, result in code with >> SIMD >>>> load or store instructions with restrictions on data alignment. >>>> >>>> This patch keeps the old algorithm, but data is read using memcpy() >>>> instead of direct pointer access, forcing the compiler to always >>>> generate code that handles unaligned input. The __may_alias__ GCC >>>> attribute is no longer needed. >>>> >>>> The data on which the Internet checksum functions operates are >> almost >>>> always 16-bit aligned, but there are exceptions. In particular, the >>>> PDCP protocol header may (literally) have an odd size. >>>> >>>> Performance impact seems to range from none to a very slight >>>> regression. >>>> >>>> Bugzilla ID: 1035 >>>> Cc: sta...@dpdk.org >>>> >>>> Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >>>> --- >>>> lib/net/rte_ip.h | 19 ++++++++++++------- >>>> 1 file changed, 12 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h >>>> index b502481670..a9e6251f14 100644 >>>> --- a/lib/net/rte_ip.h >>>> +++ b/lib/net/rte_ip.h >>>> @@ -160,18 +160,23 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr >>>> *ipv4_hdr) >>>> static inline uint32_t >>>> __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) >>>> { >>>> - /* extend strict-aliasing rules */ >>>> - typedef uint16_t __attribute__((__may_alias__)) u16_p; >>>> - const u16_p *u16_buf = (const u16_p *)buf; >>>> - const u16_p *end = u16_buf + len / sizeof(*u16_buf); >>>> + const void *end; >>> >>> I would set "end" here instead, possibly making the pointer const >> too. And add spaces around '/'. >>> const void * const end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) * >> sizeof(uint16_t)); >>> >> >> I don't think that makes the code more readable. > > It's only a matter of taste... Your code, your decision. :-) > > I think the spaces are required by the coding standard; not sure, though. >
If it isn't in the coding standard, it should be. But if you add spaces, you have to break the line, to fit into 80 characters. A net loss, IMO. >> >>>> >>>> - for (; u16_buf != end; ++u16_buf) >>>> - sum += *u16_buf; >>>> + for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) * >>>> sizeof(uint16_t)); >>>> + buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { >>>> + uint16_t v; >>>> + >>>> + memcpy(&v, buf, sizeof(uint16_t)); >>>> + sum += v; >>>> + } >>>> >>>> /* if length is odd, keeping it byte order independent */ >>>> if (unlikely(len % 2)) { >>>> + uint8_t last; >>>> uint16_t left = 0; >>>> - *(unsigned char *)&left = *(const unsigned char *)end; >>>> + >>>> + memcpy(&last, end, 1); >>>> + *(unsigned char *)&left = last; >>> >>> Couldn't you just memcpy(&left, end, 1), and omit the temporary >> variable "last"? >>> >> >> Good point. >> >> I don't like how this code is clever vis-à-vis byte order, but then I >> also don't have a better suggestion. > > The byte ordering cleverness has its roots in RFC 1071. > > Stephen suggested using a union, although in a slightly different context. > I'm not sure it will be more readable here, because it will require #ifdef to > support byte ordering. Just thought I'd mention it, for your consideration. > > Your patch v2 just reached my inbox, and it looks good. No further response > to this email is expected. >