On Fri, Feb 03, 2017 at 04:07:22PM +0100, Torvald Riegel wrote:
> On Fri, 2017-02-03 at 13:44 +0000, Ramana Radhakrishnan wrote:
> > __atomic_load on ARM appears to be ok as well
> > 
> > except for
> > 
> > __atomic_load_di which should really be the ldrexd / strexd loop but we 
> > could ameliorate that similar to your option 3b.
> 
> This uses just ldrexd now, and thus is not guaranteed to be atomic?
> 
> > On AArch64
> > 
> > * <16 byte loads have always been fine. The architecture allows single 
> > copy atomic loads using single load instructions for all other sizes and 
> > memory models, so we are fine there.
> > 
> > * we have gone through the libatomic locks from day one of the port for 
> > 16 byte loads.  This has been a bit of a bugbear for a number of users 
> > within ARM who would really like to get performance without heavy weight 
> > locks for 16 byte atomic ops.
> 
> Would it be acceptable for those users to have loads that perform like
> CAS loops, especially under contention?  Or are these users more
> concerned about aarch64 not offering a true atomic 16-byte load?

Can the store you need for atomicity be into an automatic var on the stack?
Then there really shouldn't be any contention on that var (especially if it
is padded so the cache-line doesn't contain anything else), does load
contention on the atomic var matter if there are no stores to it?

        Jakub

Reply via email to