On 03/02/17 15:13, Jakub Jelinek wrote:
On Fri, Feb 03, 2017 at 04:07:22PM +0100, Torvald Riegel wrote:
On Fri, 2017-02-03 at 13:44 +0000, Ramana Radhakrishnan wrote:
__atomic_load on ARM appears to be ok as well

except for

__atomic_load_di which should really be the ldrexd / strexd loop but we
could ameliorate that similar to your option 3b.

This uses just ldrexd now, and thus is not guaranteed to be atomic?

On AArch64

* <16 byte loads have always been fine. The architecture allows single
copy atomic loads using single load instructions for all other sizes and
memory models, so we are fine there.

* we have gone through the libatomic locks from day one of the port for
16 byte loads.  This has been a bit of a bugbear for a number of users
within ARM who would really like to get performance without heavy weight
locks for 16 byte atomic ops.

Would it be acceptable for those users to have loads that perform like
CAS loops, especially under contention?  Or are these users more
concerned about aarch64 not offering a true atomic 16-byte load?

Can the store you need for atomicity be into an automatic var on the stack?

No, it has to be to the same location.

Ramana

Reply via email to