https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83285

            Bug ID: 83285
           Summary: non-atomic stores can reorder more aggressively with
                    seq_cst on AArch64 than x86: missed x86 optimization?
           Product: gcc
           Version: 6.3.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---

This is either an x86-64 missed optimization or an AArch64 bug.  I *think*
x86-64 missed optimization, but it's not-a-bug on AArch64 only because any
observers that could tell the difference would have data race UB.

#include <atomic>
// int na;
// std::atomic_int sync;

void seq_cst(int &na, std::atomic_int &sync) {
    na = 1;
    sync = 2;
    na = 3;
}
https://godbolt.org/g/bUwZaM

On x86, all 3 stores are there in the asm in source order (for mo_seq_cst, but
not for mo_release).

On AArch64, gcc6.3 does  does  sync=2;  na=3;  If `na` was using relaxed atomic
stores, this would be a bug (because a thread that saw `sync==2` could then see
the original value of na, not na==1 or na==3).

But for non-atomic na, reading na even after Synchronizing With the `sync=2`
(with an acquire load) would be UB, because the thread that writes sync writes
na again *after* that.  It seems that gcc's AArch64 backend is using this as
license to sink the na=1 store past the sync=2 and merge it with the na=3.

seq_cst(int&, std::atomic<int>&, std::atomic<int>&):
        mov     w2, 2     // tmp79,
        stlr    w2, [x1]        // tmp79,* sync
        mov     w1, 3     // tmp78,
        str     w1, [x0]  // tmp78, *na_2(D)
        ret

-----

If sync=2 is a release store (not seq_cst), then gcc for x86 does sink the na=1
past the release and merge.  (See the godbolt link.)  In this case it's also
allowed to hoist the na=3 store ahead of the release, because plain release is
only a one-way barrier for earlier stores.  That would be safe for
relaxed-atomic as well (unlike for non-atomic), but gcc doesn't do that.

I'm slightly worried that this is unintentional and could maybe happen for
relaxed atomics when it would be illegal.  (On AArch64 with seq_cst or release,
and on x86 only with release.)

But hopefully this is just gcc being clever and taking advantage of the fact
that writing a non-atomic after a possible synchronization point means that the
sync point is irrelevant for programs without data race UB.

Reply via email to