https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688

            Bug ID: 104688
           Summary: gcc and libatomic can use SSE for 128-bit atomic loads
                    on Intel CPUs with AVX
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: xry111 at mengyan1223 dot wang
  Target Milestone: ---

In Dec 2021, Intel updated the SDM and added the following content:

> Processors that enumerate support for IntelĀ® AVX (by setting the feature flag 
> CPUID.01H:ECX.AVX[bit 28]) guarantee that the 16-byte memory operations 
> performed by the following instructions will always be carried out atomically:
> - MOVAPD, MOVAPS, and MOVDQA.
> - VMOVAPD, VMOVAPS, and VMOVDQA when encoded with VEX.128.
> - VMOVAPD, VMOVAPS, VMOVDQA32, and VMOVDQA64 when encoded with EVEX.128 and 
> k0 (masking disabled).
> 
> (Note that these instructions require the linear addresses of their memory 
> operands to be 16-byte aligned.)

(see Change 13, https://cdrdv2.intel.com/v1/dl/getContent/671294)

So we can use SSE for Intel CPUs with AVX, instead of a loop with LOCK
CMPXCHG16B.

AMD has no such guarantee (at least for now), so we still need LOCK CMPXCHG16B
on old Intel CPUs and (old or new) AMD CPUs.

Reply via email to