================
@@ -55,11 +55,27 @@ __chkfeat(uint64_t __features) {
 /* 7.5 Swap */
 static __inline__ uint32_t __attribute__((__always_inline__, __nodebug__))
 __swp(uint32_t __x, volatile uint32_t *__p) {
----------------
efriedma-quic wrote:

"relaxed" basically means we don't emit any barriers, so the compiler and CPU 
can move memory operations to unrelated addresses across it.

In the "modern" case, it might make sense to also use the atomicrmw sequence; 
it should lower to the same thing, and the compiler understands atomics better 
than ldrex and strex (and the rules for when ldrex and strex are well-defined 
are generally weird).

The backend can't select `atomicrmw volatile xchg` to SWP on targets that don't 
have lock-free atomics: if a target doesn't support cmpxchg for a given width, 
we have to go through libatomic for all atomics of that width so the locking 
works consistently.

There are some targets that have lock-free atomics even though they don't have 
ldrex/strex.  In particular, on armv6 in Thumb mode, we can just switch to arm 
mode.  And on all Linux targets, the kernel exposes stubs that implement atomic 
ops.  On those targets, it's probably better to use the __sync_ libcall, not 
SWP, for the sake of forward-compatibility with systems that don't support SWP. 
 Performance should be reasonable; it's a libcall, but it doesn't involve any 
locks or anything like that.

https://github.com/llvm/llvm-project/pull/151354
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to