https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
Bug ID: 104688 Summary: gcc and libatomic can use SSE for 128-bit atomic loads on Intel CPUs with AVX Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: xry111 at mengyan1223 dot wang Target Milestone: --- In Dec 2021, Intel updated the SDM and added the following content: > Processors that enumerate support for IntelĀ® AVX (by setting the feature flag > CPUID.01H:ECX.AVX[bit 28]) guarantee that the 16-byte memory operations > performed by the following instructions will always be carried out atomically: > - MOVAPD, MOVAPS, and MOVDQA. > - VMOVAPD, VMOVAPS, and VMOVDQA when encoded with VEX.128. > - VMOVAPD, VMOVAPS, VMOVDQA32, and VMOVDQA64 when encoded with EVEX.128 and > k0 (masking disabled). > > (Note that these instructions require the linear addresses of their memory > operands to be 16-byte aligned.) (see Change 13, https://cdrdv2.intel.com/v1/dl/getContent/671294) So we can use SSE for Intel CPUs with AVX, instead of a loop with LOCK CMPXCHG16B. AMD has no such guarantee (at least for now), so we still need LOCK CMPXCHG16B on old Intel CPUs and (old or new) AMD CPUs.