On 2022-07-07 23:44, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
>> Sent: Thursday, 7 July 2022 20.35
>>
>> From: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
>>
>> __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its
>> data through an uint16_t pointer, which allowed the compiler to assume
>> the data was 16-bit aligned. This in turn would, with certain
>> architectures and compiler flag combinations, result in code with SIMD
>> load or store instructions with restrictions on data alignment.
>>
>> This patch keeps the old algorithm, but data is read using memcpy()
>> instead of direct pointer access, forcing the compiler to always
>> generate code that handles unaligned input. The __may_alias__ GCC
>> attribute is no longer needed.
>>
>> The data on which the Internet checksum functions operates are almost
>> always 16-bit aligned, but there are exceptions. In particular, the
>> PDCP protocol header may (literally) have an odd size.
>>
>> Performance impact seems to range from none to a very slight
>> regression.
>>
>> Bugzilla ID: 1035
>> Cc: sta...@dpdk.org
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
>> ---
>>   lib/net/rte_ip.h | 19 ++++++++++++-------
>>   1 file changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
>> index b502481670..a9e6251f14 100644
>> --- a/lib/net/rte_ip.h
>> +++ b/lib/net/rte_ip.h
>> @@ -160,18 +160,23 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr
>> *ipv4_hdr)
>>   static inline uint32_t
>>   __rte_raw_cksum(const void *buf, size_t len, uint32_t sum)
>>   {
>> -    /* extend strict-aliasing rules */
>> -    typedef uint16_t __attribute__((__may_alias__)) u16_p;
>> -    const u16_p *u16_buf = (const u16_p *)buf;
>> -    const u16_p *end = u16_buf + len / sizeof(*u16_buf);
>> +    const void *end;
> 
> I would set "end" here instead, possibly making the pointer const too. And 
> add spaces around '/'.
> const void * const end = RTE_PTR_ADD(buf, (len / sizeof(uint16_t)) * 
> sizeof(uint16_t));
> 

I don't think that makes the code more readable.

>>
>> -    for (; u16_buf != end; ++u16_buf)
>> -            sum += *u16_buf;
>> +    for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) *
>> sizeof(uint16_t));
>> +         buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) {
>> +            uint16_t v;
>> +
>> +            memcpy(&v, buf, sizeof(uint16_t));
>> +            sum += v;
>> +    }
>>
>>      /* if length is odd, keeping it byte order independent */
>>      if (unlikely(len % 2)) {
>> +            uint8_t last;
>>              uint16_t left = 0;
>> -            *(unsigned char *)&left = *(const unsigned char *)end;
>> +
>> +            memcpy(&last, end, 1);
>> +            *(unsigned char *)&left = last;
> 
> Couldn't you just memcpy(&left, end, 1), and omit the temporary variable 
> "last"?
> 

Good point.

I don't like how this code is clever vis-à-vis byte order, but then I 
also don't have a better suggestion.

>>              sum += left;
>>      }
>>
>> --
>> 2.25.1
>>
> 
> With our without my suggested changes, it looks good.
> 
> Reviewed-by: Morten Brørup <m...@smartsharesystems.com>
> 

Thanks!

Reply via email to