On 2024-05-22 21:01, Tyler Retzlaff wrote:
On Wed, May 22, 2024 at 07:57:01PM +0200, Morten Brørup wrote:
From: Stephen Hemminger [mailto:step...@networkplumber.org]
Sent: Wednesday, 22 May 2024 17.38
On Wed, 22 May 2024 10:31:39 +0200
Morten Brørup <m...@smartsharesystems.com> wrote:
+/* On 32 bit platform, need to use atomic to avoid load/store
tearing */
+typedef RTE_ATOMIC(uint64_t) rte_counter64_t;
As shown by Godbolt experiments discussed in a previous thread [2],
non-tearing 64 bit counters can be implemented without using atomic
instructions on all 32 bit architectures supported by DPDK. So we should
use the counter/offset design pattern for RTE_ARCH_32 too.
[2]:
https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F433@smarts
erver.smartshare.dk/
This code built with -O3 and -m32 on godbolt shows split problem.
#include <stdint.h>
typedef uint64_t rte_counter64_t;
void
rte_counter64_add(rte_counter64_t *counter, uint32_t val)
{
*counter += val;
}
… *counter = val;
}
rte_counter64_add:
push ebx
mov eax, DWORD PTR [esp+8]
xor ebx, ebx
mov ecx, DWORD PTR [esp+12]
add DWORD PTR [eax], ecx
adc DWORD PTR [eax+4], ebx
pop ebx
ret
rte_counter64_read:
mov eax, DWORD PTR [esp+4]
mov edx, DWORD PTR [eax+4]
mov eax, DWORD PTR [eax]
ret
rte_counter64_set:
movq xmm0, QWORD PTR [esp+8]
mov eax, DWORD PTR [esp+4]
movq QWORD PTR [eax], xmm0
ret
Sure, atomic might be required on some 32 bit architectures and/or with some
compilers.
in theory i think you should be able to use generic atomics and
depending on the target you get codegen that works. it might be
something more expensive on 32-bit and nothing on 64-bit etc..
what's the damage if we just use atomic generic and relaxed ordering? is
the codegen not optimal?
Below is what I originally proposed in the "make stats reset reliable"
thread.
struct counter
{
uint64_t count;
uint64_t offset;
};
/../
struct counter rx_pkts;
struct counter rx_bytes;
/../
static uint64_t
counter_value(const struct counter *counter)
{
uint64_t count = __atomic_load_n(&counter->count, __ATOMIC_RELAXED);
uint64_t offset = __atomic_load_n(&counter->offset, __ATOMIC_RELAXED);
return count - offset;
}
static void
counter_reset(struct counter *counter)
{
uint64_t count = __atomic_load_n(&counter->count, __ATOMIC_RELAXED);
__atomic_store_n(&counter->offset, count, __ATOMIC_RELAXED);
}
static void
counter_add(struct counter *counter, uint64_t operand)
{
__atomic_store_n(&counter->count, counter->count + operand,
__ATOMIC_RELAXED);
}
I think this solution generally compiles to something that's equivalent
to just using non-atomic loads/stores and hope for the best.
Using a non-atomic load in counter_add() will generate better code, but
doesn't work if you using _Atomic (w/o casts).
Atomic load/stores seems to have volatile semantics, so multiple counter
updates to the same counter cannot be merged. That is a drawback.
I envision a variety of 32 bit implementations, optimized for certain
architectures/compilers.
Some of them can provide non-tearing 64 bit load/store, so we should also use
the counter/offset design pattern for those.