Hi Hans,
Given IRIW is a thorn in everyone's side and has no known
useful benefit, and can hopefully be killed off in the
future, lets not get bogged down in IRIW. But none of what
you say below relates to multi-copy-atomicity.
Cheers,
David
-----Original Message-----
*From:* hjkhbo...@gmail.com
[mailto:hjkhbo...@gmail.com]*On Behalf Of *Hans Boehm
*Sent:* Wednesday, 26 November 2014 12:04 PM
*To:* dhol...@ieee.org
*Cc:* Stephan Diestelhorst;
concurrency-inter...@cs.oswego.edu; core-libs-dev
*Subject:* Re: [concurrency-interest] RFR: 8065804:
JEP171:Clarifications/corrections for fence intrinsics
To be concrete here, on Power, loads can normally be
ordered by an address dependency or light-weight fence
(lwsync). However, neither is enough to prevent the
questionable outcome for IRIW, since it doesn't ensure
that the stores in T1 and T2 will be made visible to
other threads in a consistent order. That outcome can
be prevented by using heavyweight fences (sync)
instructions between the loads instead. Peter Sewell's
group concluded that to enforce correct volatile
behavior on Power, you essentially need a a heavyweight
fence between every pair of volatile operations on
Power. That cannot be understood based on simple
ordering constraints.
As Stephan pointed out, there are similar issues on ARM,
but they're less commonly encountered in a Java
implementation. If you're lucky, you can get to the
right implementation recipe by looking at only
reordering, I think.
On Tue, Nov 25, 2014 at 4:36 PM, David Holmes
<davidchol...@aapt.net.au
<mailto:davidchol...@aapt.net.au>> wrote:
Stephan Diestelhorst writes:
>
> David Holmes wrote:
> > Stephan Diestelhorst writes:
> > > Am Dienstag, 25. November 2014, 11:15:36
schrieb Hans Boehm:
> > > > I'm no hardware architect, but fundamentally
it seems to me that
> > > >
> > > > load x
> > > > acquire_fence
> > > >
> > > > imposes a much more stringent constraint than
> > > >
> > > > load_acquire x
> > > >
> > > > Consider the case in which the load from x
is an L1 hit, but a
> > > > preceding load (from say y) is a
long-latency miss. If we enforce
> > > > ordering by just waiting for completion of
prior operation, the
> > > > former has to wait for the load from y to
complete; while the
> > > > latter doesn't. I find it hard to believe
that this doesn't leave
> > > > an appreciable amount of performance on the
table, at least for
> > > > some interesting microarchitectures.
> > >
> > > I agree, Hans, that this is a reasonable
assumption. Load_acquire x
> > > does allow roach motel, whereas the acquire
fence does not.
> > >
> > > > In addition, for better or worse, fencing
requirements on at least
> > > > Power are actually driven as much by store
atomicity issues, as by
> > > > the ordering issues discussed in the
cookbook. This was not
> > > > understood in 2005, and unfortunately
doesn't seem to be
> amenable to
> > > > the kind of straightforward explanation as
in Doug's cookbook.
> > >
> > > Coming from a strongly ordered architecture to
a weakly ordered one
> > > myself, I also needed some mental adjustment
about store (multi-copy)
> > > atomicity. I can imagine others will be
unaware of this difference,
> > > too, even in 2014.
> >
> > Sorry I'm missing the connection between fences
and multi-copy
> atomicity.
>
> One example is the classic IRIW. With non-multi
copy atomic stores, but
> ordered (say through a dependency) loads in the
following example:
>
> Memory: foo = bar = 0
> _T1_ _T2_ _T3_ _T4_
> st (foo),1 st (bar),1 ld r1, (bar)
ld r3,(foo)
> <addr dep / local
"fence" here> <addr dep>
> ld r2, (foo)
ld r4, (bar)
>
> You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on
non-multi-copy atomic
> machines. On TSO boxes, this is not possible.
That means that the
> memory fence that will prevent such a behaviour
(DMB on ARM) needs to
> carry some additional oomph in ensuring multi-copy
atomicity, or rather
> prevent you from seeing it (which is the same thing).
I take it as given that any code for which you may
have ordering
constraints, must first have basic atomicity
properties for loads and
stores. I would not expect any kind of fence to add
multi-copy-atomicity
where there was none.
David
> Stephan
>
> _______________________________________________
> Concurrency-interest mailing list
> concurrency-inter...@cs.oswego.edu
<mailto:concurrency-inter...@cs.oswego.edu>
>
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
concurrency-inter...@cs.oswego.edu
<mailto:concurrency-inter...@cs.oswego.edu>
http://cs.oswego.edu/mailman/listinfo/concurrency-interest
_______________________________________________
Concurrency-interest mailing list
concurrency-inter...@cs.oswego.edu
http://cs.oswego.edu/mailman/listinfo/concurrency-interest