On Thu, Feb 02, 2017 at 02:48:42PM +0000, Ramana Radhakrishnan wrote: > On 30/01/17 18:54, Torvald Riegel wrote: > > This patch fixes the __atomic builtins to not implement supposedly > > lock-free atomic loads based on just a compare-and-swap operation. > > > > If there is no hardware-backed atomic load for a certain memory > > location, the current implementation can implement the load with a CAS > > while claiming that the access is lock-free. This is a bug in the cases > > of volatile atomic loads and atomic loads to read-only-mapped memory; it > > also creates a lot of contention in case of concurrent atomic loads, > > which results in at least counter-intuitive performance because most > > users probably understand "lock-free" to mean hardware-backed (and thus > > "fast") instead of just in the progress-criteria sense. > > > > This patch implements option 3b of the choices described here: > > https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html > > > Will Deacon pointed me at this thread asking if something similar could be > done on ARM. > > On armv8-a we can implement an atomic load of 16 bytes using an LDXP / STXP > loop as a 16 byte load isnt' single copy atomic. On armv8.1-a we do have a > CAS on 16 bytes.
If the AArch64 ISA guarantees LDXP is atomic, then yes, you can do that. The problem we have on x86_64 is that I think neither Intel nor AMD gave us guarantees that aligned SSE or AVX loads are guaranteed to be atomic. Jakub