On 2022-06-23 14:51, Morten Brørup wrote:
From: Morten Brørup [mailto:m...@smartsharesystems.com]
Sent: Thursday, 23 June 2022 14.39

With this patch, the checksum can be calculated on an unaligned buffer.
I.e. the buf parameter is no longer required to be 16 bit aligned.

The checksum is still calculated using a 16 bit aligned pointer, so the
compiler can auto-vectorize the function's inner loop.

When the buffer is unaligned, the first byte of the buffer is handled
separately. Furthermore, the calculated checksum of the buffer is byte
shifted before being added to the initial checksum, to compensate for
the
checksum having been calculated on the buffer shifted by one byte.

v4:
* Add copyright notice.
* Include stdbool.h (Emil Berg).
* Use RTE_PTR_ADD (Emil Berg).
* Fix one more typo in commit message. Is 'unligned' even a word?
v3:
* Remove braces from single statement block.
* Fix typo in commit message.
v2:
* Do not assume that the buffer is part of an aligned packet buffer.

Bugzilla ID: 1035
Cc: sta...@dpdk.org

Signed-off-by: Morten Brørup <m...@smartsharesystems.com>
---
  lib/net/rte_ip.h | 32 +++++++++++++++++++++++++++-----
  1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
index b502481670..738d643da0 100644
--- a/lib/net/rte_ip.h
+++ b/lib/net/rte_ip.h
@@ -3,6 +3,7 @@
   *      The Regents of the University of California.
   * Copyright(c) 2010-2014 Intel Corporation.
   * Copyright(c) 2014 6WIND S.A.
+ * Copyright(c) 2022 SmartShare Systems.
   * All rights reserved.
   */

@@ -15,6 +16,7 @@
   * IP-related defines
   */

+#include <stdbool.h>
  #include <stdint.h>

  #ifdef RTE_EXEC_ENV_WINDOWS
@@ -162,20 +164,40 @@ __rte_raw_cksum(const void *buf, size_t len,
uint32_t sum)
  {
        /* extend strict-aliasing rules */
        typedef uint16_t __attribute__((__may_alias__)) u16_p;
-       const u16_p *u16_buf = (const u16_p *)buf;
-       const u16_p *end = u16_buf + len / sizeof(*u16_buf);
+       const u16_p *u16_buf;
+       const u16_p *end;
+       uint32_t bsum = 0;
+       const bool unaligned = (uintptr_t)buf & 1;
+
+       /* if buffer is unaligned, keeping it byte order independent */
+       if (unlikely(unaligned)) {
+               uint16_t first = 0;
+               if (unlikely(len == 0))
+                       return 0;
+               ((unsigned char *)&first)[1] = *(const unsigned char *)buf;
+               bsum += first;
+               buf = RTE_PTR_ADD(buf, 1);
+               len--;
+       }

+       /* aligned access for compiler auto-vectorization */

The compiler will be able to auto vectorize even unaligned accesses, just with different instructions. From what I can tell, there's no performance impact, at least not on the x86_64 systems I tried on.

I think you should remove the first special case conditional and use memcpy() instead of the cumbersome __may_alias__ construct to retrieve the data.

+       u16_buf = (const u16_p *)buf;
+       end = u16_buf + len / sizeof(*u16_buf);
        for (; u16_buf != end; ++u16_buf)
-               sum += *u16_buf;
+               bsum += *u16_buf;

        /* if length is odd, keeping it byte order independent */
        if (unlikely(len % 2)) {
                uint16_t left = 0;
                *(unsigned char *)&left = *(const unsigned char *)end;
-               sum += left;
+               bsum += left;
        }

-       return sum;
+       /* if buffer is unaligned, swap the checksum bytes */
+       if (unlikely(unaligned))
+               bsum = (bsum & 0xFF00FF00) >> 8 | (bsum & 0x00FF00FF) << 8;
+
+       return sum + bsum;
  }

  /**
--
2.17.1

@Emil, thank you for thoroughly reviewing the previous versions.

If your test succeeds and you are satisfied with the patch, remember to reply with a 
"Tested-by" tag for patchwork.

Reply via email to