Re: [PATCH] ARM: Weaker memory barriers

Will Deacon Wed, 12 Mar 2014 06:44:19 -0700

On Tue, Mar 11, 2014 at 09:12:53PM +0000, John Carr wrote:
> Will Deacon <[email protected]> wrote:
> > On Tue, Mar 11, 2014 at 02:54:18AM +0000, John Carr wrote:
> > > A comment in arm/sync.md notes "We should consider issuing a inner
> > > shareability zone barrier here instead."  Here is my first attempt
> > > at a patch to emit weaker memory barriers.  Three instructions seem
> > > to be relevant for user mode code on my Cortex A9 Linux box:
> > > 
> > > dmb ishst, dmb ish, dmb sy
> > > 
> > > I believe these correspond to a release barrier, a full barrier
> > > with respect to other CPUs, and a full barrier that also orders
> > > relative to I/O.
> > 
> > Not quite; DMB ISHST only orders writes with other writes, so loads can move
> > across it in both directions. That means it's not sufficient for releasing a
> > lock, for example.
> 
> Release in this context doesn't mean "lock release".  I understand
> it to mean release in the specific context of the C++11 memory model.
> (Similarly, if you're arguing standards compliance "inline" really
> means "relax the one definition rule for this function.")
> 
> I don't see a prohibition on moving non-atomic loads across a release
> store.  Can you point to an analysis that shows a full barrier is needed?


Well, you can use acquire/release to implement a lock easily enough. For
example, try feeding the following to cppmem:

  int main() {
    int x = 0, y = 0;
    atomic_int z = 0;

    {{{ { r1 = x; y = 1;
          z.store(1, memory_order_release); }
    ||| { r0 = z.load(memory_order_acquire).readsvalue(1);
          r1 = y; x = 1;}
    }}}

    return 0;
  }

There is one consistent execution, which requires the first thread to have
r1 == 0 (i.e. read x as zero) and the second thread to have r1 == 1 (i.e.
read y as 1).

If we implement store-release using DMB ISHST, the assembly code would look
something like the following (I've treated the atomic accesses like normal
load/store instructions for clarity, since they don't affect the ordering
here):

  T0:

  LDR r1, [x]
  STR #1, [y]
  DMB ISHST
  STR #1, [z]

  T1:

  LDR r0, [z] // Reads 1
  DMB ISH
  LDR r1, [y]
  STR #1, [x]

The problem with this is that the LDR in T0 can be re-ordered *past* the
rest of the sequence, potentially resulting in r1 == 1, which is forbidden.
It's just like reading from a shared, lock-protected data structure without
the lock held.

> If we assume that gcc is used to generate code for processes running
> within a single inner shareable domain, then we can start by demoting
> "dmb sy" to "dmb ish" for the memory barrier with no other change.

I'm all for such a change.

> If a store-store barrier has no place in the gcc atomic memory model,
> that supports my hypothesis that a twisty maze of ifdefs is superior to
> a "portable" attractive nuisance.

I don't understand your point here.

Will

Re: [PATCH] ARM: Weaker memory barriers

Reply via email to