Without volatile, is there any guarantee the consumer thread will see
the writes from the producer thread? For example, couldn't the
compiler cache rx_pkts in a register on the consumer side?
// Consumer thread:
uint64_t prev = 0;
while (monitoring) {
    uint64_t curr = rx_pkts;  // Compiler may cache in register
    if (curr != prev) {       // Without volatile: may always be false
      update_display();       // Never executes
      prev = curr;
    }
}

I created a godbolt comparing different approaches
(https://godbolt.org/z/1Gaz6jPxh) showing assembly for x86-64 and ARM
at -O3. Summary:
1. Plain load/store: Fastest. No visibility guarantees. May tear on 32-bit.
2. Volatile: Provides visibility. May tear on 32-bit.
3. Atomic load/store (relaxed): Same performance as volatile. Provides
visibility and guarantees no tearing (even on 32-bit).
4. Atomic_fetch_add: Heaviest cost (LOCK on x86).

My concern is that Option 1 (Plain) has no guaranteed visibility - the
compiler may optimize away loads entirely. Since Option 3 (Atomic
Load/Store) has identical instruction cost to Volatile but provides
formal guarantees (visibility + no tearing), would that be the
preferred solution?

Reply via email to