https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817

            Bug ID: 80817
           Summary: [missed optimization][x86] relaxed atomics
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Joost.VandeVondele at mat dot ethz.ch
  Target Milestone: ---

Using gcc 7.1 on x86, the following

#include <atomic>

void increment_relaxed(std::atomic<uint64_t>& counter) {

 atomic_store_explicit(&counter,
          atomic_load_explicit(&counter, std::memory_order_relaxed) + 1,
          std::memory_order_relaxed);
}

compiles to:

        .cfi_startproc
        movq    (%rdi), %rax
        addq    $1, %rax
        movq    %rax, (%rdi)
        ret
        .cfi_endproc

while I would expect that 

        .cfi_startproc
        addq    $1, (%rdi)
        ret
        .cfi_endproc

would be fine and more efficient. 

I also looked at 

atomic_fetch_add_explicit(&counter, uint64_t(1), std::memory_order_relaxed); 

but that surprised me with

        .cfi_startproc
        lock addq       $1, (%rdi)
        ret
        .cfi_endproc

Reply via email to