https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659
Bug ID: 108659
Summary: Suboptimal 128 bit atomics codegen on AArch64 and x64
Product: gcc
Version: 12.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: s_gccbugzilla at nedprod dot com
Target Milestone: ---
Related:
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94649
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
I got bitten by this again, latest GCC still does not emit single instruction
128 bit atomics, even when the -march is easily new enough. Here is a godbolt
comparing latest MSVC, latest GCC and latest clang for the skylake-avx512
architecture, which unquestionably supports cmpxchg16b. Only clang emits the
single instruction atomic:
https://godbolt.org/z/EnbeeW4az
I'm gathering from the issue comments and from the comments at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 that you're going to wait
for AMD to guarantee atomicity of SSE instructions before changing the codegen
here, which makes sense. However I also wanted to raise potentially suboptimal
128 bit atomic codegen by GCC for AArch64 as compared to clang:
https://godbolt.org/z/oKv4o81nv
GCC emits `dmb` to force a global memory fence, whereas clang does not.
I think clang is in the right here, the seq_cst atomic semantics are not
supposed to globally memory fence.