https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817
Bug ID: 80817 Summary: [missed optimization][x86] relaxed atomics Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: Joost.VandeVondele at mat dot ethz.ch Target Milestone: --- Using gcc 7.1 on x86, the following #include <atomic> void increment_relaxed(std::atomic<uint64_t>& counter) { atomic_store_explicit(&counter, atomic_load_explicit(&counter, std::memory_order_relaxed) + 1, std::memory_order_relaxed); } compiles to: .cfi_startproc movq (%rdi), %rax addq $1, %rax movq %rax, (%rdi) ret .cfi_endproc while I would expect that .cfi_startproc addq $1, (%rdi) ret .cfi_endproc would be fine and more efficient. I also looked at atomic_fetch_add_explicit(&counter, uint64_t(1), std::memory_order_relaxed); but that surprised me with .cfi_startproc lock addq $1, (%rdi) ret .cfi_endproc