> From: Morten Brørup [mailto:m...@smartsharesystems.com] > Sent: Thursday, 23 June 2022 14.39 > > With this patch, the checksum can be calculated on an unaligned buffer. > I.e. the buf parameter is no longer required to be 16 bit aligned. > > The checksum is still calculated using a 16 bit aligned pointer, so the > compiler can auto-vectorize the function's inner loop. > > When the buffer is unaligned, the first byte of the buffer is handled > separately. Furthermore, the calculated checksum of the buffer is byte > shifted before being added to the initial checksum, to compensate for > the > checksum having been calculated on the buffer shifted by one byte. > > v4: > * Add copyright notice. > * Include stdbool.h (Emil Berg). > * Use RTE_PTR_ADD (Emil Berg). > * Fix one more typo in commit message. Is 'unligned' even a word? > v3: > * Remove braces from single statement block. > * Fix typo in commit message. > v2: > * Do not assume that the buffer is part of an aligned packet buffer. > > Bugzilla ID: 1035 > Cc: sta...@dpdk.org > > Signed-off-by: Morten Brørup <m...@smartsharesystems.com> > --- > lib/net/rte_ip.h | 32 +++++++++++++++++++++++++++----- > 1 file changed, 27 insertions(+), 5 deletions(-) > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h > index b502481670..738d643da0 100644 > --- a/lib/net/rte_ip.h > +++ b/lib/net/rte_ip.h > @@ -3,6 +3,7 @@ > * The Regents of the University of California. > * Copyright(c) 2010-2014 Intel Corporation. > * Copyright(c) 2014 6WIND S.A. > + * Copyright(c) 2022 SmartShare Systems. > * All rights reserved. > */ > > @@ -15,6 +16,7 @@ > * IP-related defines > */ > > +#include <stdbool.h> > #include <stdint.h> > > #ifdef RTE_EXEC_ENV_WINDOWS > @@ -162,20 +164,40 @@ __rte_raw_cksum(const void *buf, size_t len, > uint32_t sum) > { > /* extend strict-aliasing rules */ > typedef uint16_t __attribute__((__may_alias__)) u16_p; > - const u16_p *u16_buf = (const u16_p *)buf; > - const u16_p *end = u16_buf + len / sizeof(*u16_buf); > + const u16_p *u16_buf; > + const u16_p *end; > + uint32_t bsum = 0; > + const bool unaligned = (uintptr_t)buf & 1; > + > + /* if buffer is unaligned, keeping it byte order independent */ > + if (unlikely(unaligned)) { > + uint16_t first = 0; > + if (unlikely(len == 0)) > + return 0; > + ((unsigned char *)&first)[1] = *(const unsigned char *)buf; > + bsum += first; > + buf = RTE_PTR_ADD(buf, 1); > + len--; > + } > > + /* aligned access for compiler auto-vectorization */ > + u16_buf = (const u16_p *)buf; > + end = u16_buf + len / sizeof(*u16_buf); > for (; u16_buf != end; ++u16_buf) > - sum += *u16_buf; > + bsum += *u16_buf; > > /* if length is odd, keeping it byte order independent */ > if (unlikely(len % 2)) { > uint16_t left = 0; > *(unsigned char *)&left = *(const unsigned char *)end; > - sum += left; > + bsum += left; > } > > - return sum; > + /* if buffer is unaligned, swap the checksum bytes */ > + if (unlikely(unaligned)) > + bsum = (bsum & 0xFF00FF00) >> 8 | (bsum & 0x00FF00FF) << 8; > + > + return sum + bsum; > } > > /** > -- > 2.17.1
@Emil, thank you for thoroughly reviewing the previous versions. If your test succeeds and you are satisfied with the patch, remember to reply with a "Tested-by" tag for patchwork.