On Sun, 1 Feb 2026 23:02:53 -0800
Scott Mitchell <[email protected]> wrote:

> Without volatile, is there any guarantee the consumer thread will see
> the writes from the producer thread? For example, couldn't the
> compiler cache rx_pkts in a register on the consumer side?
> // Consumer thread:
> uint64_t prev = 0;
> while (monitoring) {
>     uint64_t curr = rx_pkts;  // Compiler may cache in register
>     if (curr != prev) {       // Without volatile: may always be false
>       update_display();       // Never executes
>       prev = curr;
>     }
> }
> 
> I created a godbolt comparing different approaches
> (https://godbolt.org/z/1Gaz6jPxh) showing assembly for x86-64 and ARM
> at -O3. Summary:
> 1. Plain load/store: Fastest. No visibility guarantees. May tear on 32-bit.
> 2. Volatile: Provides visibility. May tear on 32-bit.
> 3. Atomic load/store (relaxed): Same performance as volatile. Provides
> visibility and guarantees no tearing (even on 32-bit).
> 4. Atomic_fetch_add: Heaviest cost (LOCK on x86).
> 
> My concern is that Option 1 (Plain) has no guaranteed visibility - the
> compiler may optimize away loads entirely. Since Option 3 (Atomic
> Load/Store) has identical instruction cost to Volatile but provides
> formal guarantees (visibility + no tearing), would that be the
> preferred solution?

In normal case, compiler isn't going to be able to see across function
boundary especially through indirection of eth_dev_ops table.
With LTO it might be possible but unlikely.

Reply via email to