Thank you for the clarification about the function pointer indirection acting as a compiler barrier - that makes sense for the typical case.
I have one remaining question about 32-bit architectures: even with the implicit barrier, plain uint64_t reads aren't atomic on 32-bit platforms (reader could see torn high/low halves). Is this acceptable for stats counters in DPDK, or is 32-bit support not a concern? For context, atomic_load/store(memory_order_relaxed) formally guarantees both visibility and no tearing across architectures, with minimal (GCC) or zero (Clang) overhead for x86-64. I see there's precedent in DPDK for plain uint64_t stats, and I'm hoping to understand the assumptions/trade-offs better.

