On Tue, Mar 11, 2014 at 09:12:53PM +0000, John Carr wrote:
> Will Deacon <[email protected]> wrote:
> > On Tue, Mar 11, 2014 at 02:54:18AM +0000, John Carr wrote:
> > > A comment in arm/sync.md notes "We should consider issuing a inner
> > > shareability zone barrier here instead." Here is my first attempt
> > > at a patch to emit weaker memory barriers. Three instructions seem
> > > to be relevant for user mode code on my Cortex A9 Linux box:
> > >
> > > dmb ishst, dmb ish, dmb sy
> > >
> > > I believe these correspond to a release barrier, a full barrier
> > > with respect to other CPUs, and a full barrier that also orders
> > > relative to I/O.
> >
> > Not quite; DMB ISHST only orders writes with other writes, so loads can move
> > across it in both directions. That means it's not sufficient for releasing a
> > lock, for example.
>
> Release in this context doesn't mean "lock release". I understand
> it to mean release in the specific context of the C++11 memory model.
> (Similarly, if you're arguing standards compliance "inline" really
> means "relax the one definition rule for this function.")
>
> I don't see a prohibition on moving non-atomic loads across a release
> store. Can you point to an analysis that shows a full barrier is needed?
Well, you can use acquire/release to implement a lock easily enough. For
example, try feeding the following to cppmem:
int main() {
int x = 0, y = 0;
atomic_int z = 0;
{{{ { r1 = x; y = 1;
z.store(1, memory_order_release); }
||| { r0 = z.load(memory_order_acquire).readsvalue(1);
r1 = y; x = 1;}
}}}
return 0;
}
There is one consistent execution, which requires the first thread to have
r1 == 0 (i.e. read x as zero) and the second thread to have r1 == 1 (i.e.
read y as 1).
If we implement store-release using DMB ISHST, the assembly code would look
something like the following (I've treated the atomic accesses like normal
load/store instructions for clarity, since they don't affect the ordering
here):
T0:
LDR r1, [x]
STR #1, [y]
DMB ISHST
STR #1, [z]
T1:
LDR r0, [z] // Reads 1
DMB ISH
LDR r1, [y]
STR #1, [x]
The problem with this is that the LDR in T0 can be re-ordered *past* the
rest of the sequence, potentially resulting in r1 == 1, which is forbidden.
It's just like reading from a shared, lock-protected data structure without
the lock held.
> If we assume that gcc is used to generate code for processes running
> within a single inner shareable domain, then we can start by demoting
> "dmb sy" to "dmb ish" for the memory barrier with no other change.
I'm all for such a change.
> If a store-store barrier has no place in the gcc atomic memory model,
> that supports my hypothesis that a twisty maze of ifdefs is superior to
> a "portable" attractive nuisance.
I don't understand your point here.
Will