https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

            Bug ID: 108659
           Summary: Suboptimal 128 bit atomics codegen on AArch64 and x64
           Product: gcc
           Version: 12.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: s_gccbugzilla at nedprod dot com
  Target Milestone: ---

Related:
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94649
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

I got bitten by this again, latest GCC still does not emit single instruction
128 bit atomics, even when the -march is easily new enough. Here is a godbolt
comparing latest MSVC, latest GCC and latest clang for the skylake-avx512
architecture, which unquestionably supports cmpxchg16b. Only clang emits the
single instruction atomic:

https://godbolt.org/z/EnbeeW4az

I'm gathering from the issue comments and from the comments at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 that you're going to wait
for AMD to guarantee atomicity of SSE instructions before changing the codegen
here, which makes sense. However I also wanted to raise potentially suboptimal
128 bit atomic codegen by GCC for AArch64 as compared to clang:

https://godbolt.org/z/oKv4o81nv

GCC emits `dmb` to force a global memory fence, whereas clang does not.

I think clang is in the right here, the seq_cst atomic semantics are not
supposed to globally memory fence.

Reply via email to