Acquire loads don't require lfence, IIUC. x86 reads don't get reordered
with other reads. The compiler may speculate, but still has to observe
the rule even if it internally reorders (i.e. it can reorder but has to
check afterwards).
A read can get reordered with a streaming read, but I'd leave that out.
Whoever uses streaming reads can add the appropriate extra fences.
On Fri, 2023-08-18 at 08:09 -0700, Aleksandr Ivanov wrote:
> Hello everyone :wave:
>
> I'm trying to understand the intricacies of low-level concurrent
> programming, focusing on x86 for the time being. Specifically I'd
> like to write atomics that would work correctly for x86, but the more
> I dig into this, the more confused I'm getting given all the parts
> involved.
>
> Here's my current code, I was wondering if people with a better
> knowledge of the architecture could point me to the issues in my
> reasoning how this should be done.
>
> enum struct Memory_Order: u32 {
> Whatever,
> Acquire,
> Release,
> Acquire_Release,
> Sequential
> };
>
> template <typename T>
> struct Atomic {
> using Value_Type = T;
> volatile T value;
> };
>
> template <typename T>
> using Atomic_Value = typename Atomic<T>::Value_Type;
>
> #define compiler_barrier() do { asm volatile ("" ::: "memory"); }
> while (0)
> #define full_fence() do { asm volatile ("mfence" ::: "memory"); }
> while (0)
>
> template <Memory_Order order = Memory_Order::Whatever, typename T>
> static T atomic_load (const Atomic<T> *atomic) {
> using enum Memory_Order;
>
> static_assert(sizeof(T) <= sizeof(void*));
> static_assert((order == Whatever) || (order == Acquire) || (order
> == Sequential));
>
> if constexpr (order == Sequential) full_fence();
> auto result = atomic->value;
> if constexpr (order != Whatever) compiler_barrier();
>
> return result;
> }
>
> template <Memory_Order order = Memory_Order::Whatever, typename T>
> static void atomic_store (Atomic<T> *atomic, Atomic_Value<T> value) {
> using enum Memory_Order;
>
> static_assert(sizeof(T) <= sizeof(void*));
> static_assert((order == Whatever) || (order == Release) || (order
> == Sequential));
>
> if constexpr (order == Whatever) {
> atomic->value = value;
> }
> else if constexpr (order == Release) {
> compiler_barrier();
> atomic->value = value;
> }
> else {
> asm volatile (
> "lock xchg %1, %0"
> : "+r"(value), "+m"(atomic->value)
> :
> : "memory"
> );
> }
> }
>
> On x86 loads can only be reordered with store operations to a
> different memory location, since it's checking the store buffer
> first. Also loads are not reordered with other loads.
>
> This effectively guarantees an acquire semantics by default for all
> loads on x86, thus, we don't need to have any explicit memory barrier
> and only prevent the compiler from reordering instructions.
>
> Since loads could be reordered with earlier stores, we need `mfence`
> to force the core to serialize instructions and flush the store
> buffer before proceeding.
>
> In case of atomic_store, my understanding of locked instructions,
> that they guarantee sequential consistency, thus xchg is good enough
> for that case. For the Release semantics having a compiler barrier is
> also enough.
>
> My uncertainty is with speculative execution of loads and if that
> requires the use of `lfence` for the Acquire case before the load?
>
> Kind regards,
> Aleksandr.
>
> --
> You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to [email protected].
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/mechanical-sympathy/2cd4a6a2-0b69-4e78-8538-3649480ed374n%40googlegroups.com
> .
--
You received this message because you are subscribed to the Google Groups
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web, visit
https://groups.google.com/d/msgid/mechanical-sympathy/94a0ede086f15a162829501643be6d9f7ae0e0e6.camel%40scylladb.com.