> -----Original Message----- > From: Mattias Rönnblom <hof...@lysator.liu.se> > Sent: den 27 juni 2022 14:28 > To: Morten Brørup <m...@smartsharesystems.com>; Emil Berg > <emil.b...@ericsson.com>; bruce.richard...@intel.com; dev@dpdk.org > Cc: step...@networkplumber.org; sta...@dpdk.org; bugzi...@dpdk.org; > olivier.m...@6wind.com > Subject: Re: [PATCH v4] net: fix checksum with unaligned buffer > > On 2022-06-23 14:51, Morten Brørup wrote: > >> From: Morten Brørup [mailto:m...@smartsharesystems.com] > >> Sent: Thursday, 23 June 2022 14.39 > >> > >> With this patch, the checksum can be calculated on an unaligned buffer. > >> I.e. the buf parameter is no longer required to be 16 bit aligned. > >> > >> The checksum is still calculated using a 16 bit aligned pointer, so > >> the compiler can auto-vectorize the function's inner loop. > >> > >> When the buffer is unaligned, the first byte of the buffer is handled > >> separately. Furthermore, the calculated checksum of the buffer is > >> byte shifted before being added to the initial checksum, to > >> compensate for the checksum having been calculated on the buffer > >> shifted by one byte. > >> > >> v4: > >> * Add copyright notice. > >> * Include stdbool.h (Emil Berg). > >> * Use RTE_PTR_ADD (Emil Berg). > >> * Fix one more typo in commit message. Is 'unligned' even a word? > >> v3: > >> * Remove braces from single statement block. > >> * Fix typo in commit message. > >> v2: > >> * Do not assume that the buffer is part of an aligned packet buffer. > >> > >> Bugzilla ID: 1035 > >> Cc: sta...@dpdk.org > >> > >> Signed-off-by: Morten Brørup <m...@smartsharesystems.com> > >> --- > >> lib/net/rte_ip.h | 32 +++++++++++++++++++++++++++----- > >> 1 file changed, 27 insertions(+), 5 deletions(-) > >> > >> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h index > >> b502481670..738d643da0 100644 > >> --- a/lib/net/rte_ip.h > >> +++ b/lib/net/rte_ip.h > >> @@ -3,6 +3,7 @@ > >> * The Regents of the University of California. > >> * Copyright(c) 2010-2014 Intel Corporation. > >> * Copyright(c) 2014 6WIND S.A. > >> + * Copyright(c) 2022 SmartShare Systems. > >> * All rights reserved. > >> */ > >> > >> @@ -15,6 +16,7 @@ > >> * IP-related defines > >> */ > >> > >> +#include <stdbool.h> > >> #include <stdint.h> > >> > >> #ifdef RTE_EXEC_ENV_WINDOWS > >> @@ -162,20 +164,40 @@ __rte_raw_cksum(const void *buf, size_t len, > >> uint32_t sum) > >> { > >> /* extend strict-aliasing rules */ > >> typedef uint16_t __attribute__((__may_alias__)) u16_p; > >> - const u16_p *u16_buf = (const u16_p *)buf; > >> - const u16_p *end = u16_buf + len / sizeof(*u16_buf); > >> + const u16_p *u16_buf; > >> + const u16_p *end; > >> + uint32_t bsum = 0; > >> + const bool unaligned = (uintptr_t)buf & 1; > >> + > >> + /* if buffer is unaligned, keeping it byte order independent */ > >> + if (unlikely(unaligned)) { > >> + uint16_t first = 0; > >> + if (unlikely(len == 0)) > >> + return 0; > >> + ((unsigned char *)&first)[1] = *(const unsigned > char *)buf; > >> + bsum += first; > >> + buf = RTE_PTR_ADD(buf, 1); > >> + len--; > >> + } > >> > >> + /* aligned access for compiler auto-vectorization */ > > The compiler will be able to auto vectorize even unaligned accesses, just with > different instructions. From what I can tell, there's no performance impact, > at > least not on the x86_64 systems I tried on. > > I think you should remove the first special case conditional and use > memcpy() instead of the cumbersome __may_alias__ construct to retrieve > the data. >
Here: https://www.agner.org/optimize/instruction_tables.pdf it lists the latency of vmovdqa (aligned) as 6 cycles and the latency for vmovdqu (unaligned) as 7 cycles. So I guess there can be some difference. Although in practice I'm not sure what difference it makes. I've not seen any difference in runtime between the two versions. > >> + u16_buf = (const u16_p *)buf; > >> + end = u16_buf + len / sizeof(*u16_buf); > >> for (; u16_buf != end; ++u16_buf) > >> - sum += *u16_buf; > >> + bsum += *u16_buf; > >> > >> /* if length is odd, keeping it byte order independent */ > >> if (unlikely(len % 2)) { > >> uint16_t left = 0; > >> *(unsigned char *)&left = *(const unsigned char > *)end; > >> - sum += left; > >> + bsum += left; > >> } > >> > >> - return sum; > >> + /* if buffer is unaligned, swap the checksum bytes */ > >> + if (unlikely(unaligned)) > >> + bsum = (bsum & 0xFF00FF00) >> 8 | (bsum & > 0x00FF00FF) << 8; > >> + > >> + return sum + bsum; > >> } > >> > >> /** > >> -- > >> 2.17.1 > > > > @Emil, thank you for thoroughly reviewing the previous versions. > > > > If your test succeeds and you are satisfied with the patch, remember to > reply with a "Tested-by" tag for patchwork. > >