Re: [concurrency-interest] RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

Oleksandr Otenko Wed, 10 Dec 2014 15:00:07 -0800

This surely does not mean that on non-TSO architectures it is notpossible to implement linearizable algorithms.


Nor does it mean that the algorithm will contain /independent/ writes.

Count the proofs that rely on synchronization order. That's the measureof how common IRIW pattern is. And yet, even then there is hope.

Suppose, a<b means a happens before b. There are only a few axiomscoming from memory model:

order consistency: a<b -> b<a -> _|_ -- if you can prove a happensbefore b and b happens before a, it implies absurd


(semi-formally)

read consistency: (r v x) -> (exists (w v x) such that (w v x) < (r v x)&& ((w v y) -> (w v y) < (w v x) || (r v x) < (w v y))) -- for everyread r of variable v with value x there is a write w of value x suchthat the write happens before the read and all other writes eitherhappen before this write, or happen after the read.


That's about it.

Then totality of order axiom is:

total order : a -> b -> ((a < b || b < a) && (a < b -> _|_) -> b < a &&(b < a -> _|_) -> a < b) -- for any two volatile operations, they areordered in some way, and if you can prove a happens before b is absurd,then b happens before a, and converse. (the latter two things followfrom order consistency, but stated here explicitly for comparison withthe following)


DMB barrier on non-TSO is a different axiom:

dmb : a -> b -> c -> (a `po` b) -> ((a < c || c < b) && (c < b -> _|_)-> a < c && (a < c -> _|_) -> c < b) -- for any two operations a and bwith program order between them, a dmb barrier for any other operation ceither proves a happens before c, or c happens before b, and if you canprove c happens before b is absurd, then it implies a happens before c,and the converse. Slight difference in reasoning from total order isthat proving a<c does not exclude the possibility that in some other wayc<b.

You can see many similarities between total order and DMB. Thedifficulty with IRIW case is only that there is just one write in eachthread, so DMB has to enforce ordering between reads instead. And thedifficulty with DMB is that, unlike volatile stores, there is nolanguage-level idiom for it - you need to consider more than one entityto understand the effect, and more than one place in code. Total order,of course, is far easier to reason about - any two operations areordered, and their ordering is exclusive, so you only need to look inone place.



Alex


On 26/11/2014 19:00, Roman Elizarov wrote:

I'd suggest to start with the original paper by Herlihy who had comeup with the concept of Linearizability in 1990:


Linearizability: A Correctness Condition for Concurrent Objects

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.5315

There were lot of reasearch about linearizability since then (thereare almost a thouthand citations for this arcticle) expanding andimproving proof techniquies and applying it. There were nobreakthroughs of the comparable magnitude since then. All"thread-safe" objects that you enconter in the modern word arelinearizable. It is a defacto "golden standard" correctness conditionfor concurrent objects.

This position is well deserved, because having lineariazable objectsas your building blocks makes it super-easy to formally reason aboutcorrectness of your code. You will rarely encouter concurrentalgorithms that provide weaker guarantees (like quescientconsistency), because they all too hard to reason about -- they areeither not composable or not local. But when all yourconcurrent objects are linearizable, then you can ditchhappens-before, forget that everything is actually parallel andsimply reason about your code in terms of interleaving of "atomic"operations that happen in some global order. That is the beauty oflinearizability.

But Linearizability is indeed a pretty strong requirement.Linearizability of your shared memory requires that Independent Readsof Independent Writes (IRIW) are consistent. Can you get away withsome weaker requirement and still get all the same goodies aslinearizability gets you? I have not seen anything promising in thisdirection. Whoever makes this breakthrough will surely reap theworld's recognition and respect.



/Roman


------------------------------------------------------------------------
*??:* DT <[email protected]>
*??????????:* 26 ?????? 2014 ?. 20:24
*????:* Roman Elizarov; [email protected]; Hans Boehm
*?????:* core-libs-dev; [email protected]

*????:* Re: [concurrency-interest] RFR:8065804:JEP171:Clarifications/corrections for fence intrinsics

Roman,

Can you point to any specific article providing the concurrencyproblem statement with a further proof using linearizability to reasonabout solution.


Thanks,
DT

On 11/26/2014 2:59 AM, Roman Elizarov wrote:

Whether IRIW has any _/practical/_ uses is definitely subject todebate. However, there is no tractable way for formal reasoning aboutproperties of large concurrent systems, but via linearizability.Linearizability is the only property that is both local andhierarchical. It lets you build more complex linearizable algorithmsfrom simpler ones, having quite succinct and compelling proofs ateach step.

In other words, if you want to be able to construct a formal proofthat your [large] concurrent system if correct, then you must haveIRIW consistency. Do you need a formal proof of correctness? Maybenot. In many applications hand-waving is enough, but there are manyother applications where hand-waving does not count as a proof. Itmay be possible to construct formal correctness proofs for some verysimple algorithms even on a system that does not provide IRIW, butthis is beyond the state of the art of formal verification foranything sufficiently complex.


/Roman

*From:*David Holmes [mailto:[email protected]]
*Sent:* Wednesday, November 26, 2014 11:54 AM
*To:* Roman Elizarov; Hans Boehm
*Cc:* [email protected]; core-libs-dev

*Subject:* RE: [concurrency-interest] RFR:8065804:JEP171:Clarifications/corrections for fence intrinsics

Can you expand on that please. All previous discussion of IRIW I haveseen indicated that the property, while a consequence of existing JMMrules, had no practical use.


Thanks,

David

    -----Original Message-----
    *From:* Roman Elizarov [mailto:[email protected]]
    *Sent:* Wednesday, 26 November 2014 6:49 PM
    *To:* [email protected] <mailto:[email protected]>; Hans Boehm
    *Cc:* [email protected]
    <mailto:[email protected]>; core-libs-dev
    *Subject:* RE: [concurrency-interest] RFR:
    8065804:JEP171:Clarifications/corrections for fence intrinsics

    There is no conceivable way to kill IRIW consistency requirement
    while retaining ability to prove correctness of large software
    systems. If IRIW of volatile variables are not consistent, then
    volatile reads and writes are not linearizable, which breaks
    linearizabiliy of all higher-level primitives build on top of
    them and makes formal reasoning about behavior of concurrent
    systems practically impossible. There are many fields where this
    is not acceptable.

    /Roman

    *From:*[email protected]
    <mailto:[email protected]>
    [mailto:[email protected]] *On Behalf Of
    *David Holmes
    *Sent:* Wednesday, November 26, 2014 5:11 AM
    *To:* Hans Boehm
    *Cc:* [email protected]
    <mailto:[email protected]>; core-libs-dev
    *Subject:* Re: [concurrency-interest] RFR: 8065804:
    JEP171:Clarifications/corrections for fence intrinsics

    Hi Hans,

    Given IRIW is a thorn in everyone's side and has no known useful
    benefit, and can hopefully be killed off in the future, lets not
    get bogged down in IRIW. But none of what you say below relates
    to multi-copy-atomicity.

    Cheers,

    David

        -----Original Message-----
        *From:* [email protected]
        <mailto:[email protected]>[mailto:[email protected]]*On
        Behalf Of *Hans Boehm
        *Sent:* Wednesday, 26 November 2014 12:04 PM
        *To:* [email protected] <mailto:[email protected]>
        *Cc:* Stephan Diestelhorst;
        [email protected]
        <mailto:[email protected]>; core-libs-dev
        *Subject:* Re: [concurrency-interest] RFR: 8065804:
        JEP171:Clarifications/corrections for fence intrinsics

        To be concrete here, on Power, loads can normally be ordered
        by an address dependency or light-weight fence (lwsync).
        However, neither is enough to prevent the questionable
        outcome for IRIW, since it doesn't ensure that the stores in
        T1 and T2 will be made visible to other threads in a
        consistent order. That outcome can be prevented by using
        heavyweight fences (sync) instructions between the loads
        instead.  Peter Sewell's group concluded that to enforce
        correct volatile behavior on Power, you essentially need a a
        heavyweight fence between every pair of volatile operations
        on Power.  That cannot be understood based on simple ordering
        constraints.

        As Stephan pointed out, there are similar issues on ARM, but
        they're less commonly encountered in a Java implementation.
        If you're lucky, you can get to the right implementation
        recipe by looking at only reordering, I think.

        On Tue, Nov 25, 2014 at 4:36 PM, David Holmes
        <[email protected] <mailto:[email protected]>>
        wrote:

            Stephan Diestelhorst writes:
            >
            > David Holmes wrote:
            > > Stephan Diestelhorst writes:
            > > > Am Dienstag, 25. November 2014, 11:15:36 schrieb
            Hans Boehm:
            > > > > I'm no hardware architect, but fundamentally it
            seems to me that
            > > > >
            > > > > load x
            > > > > acquire_fence
            > > > >
            > > > > imposes a much more stringent constraint than
            > > > >
            > > > > load_acquire x
            > > > >
            > > > > Consider the case in which the load from x is an
            L1 hit, but a
            > > > > preceding load (from say y) is a long-latency
            miss.  If we enforce
            > > > > ordering by just waiting for completion of prior
            operation, the
            > > > > former has to wait for the load from y to
            complete; while the
            > > > > latter doesn't.  I find it hard to believe that
            this doesn't leave
            > > > > an appreciable amount of performance on the
            table, at least for
            > > > > some interesting microarchitectures.
            > > >
            > > > I agree, Hans, that this is a reasonable
            assumption.  Load_acquire x
            > > > does allow roach motel, whereas the acquire fence
            does not.
            > > >
            > > > >  In addition, for better or worse, fencing
            requirements on at least
            > > > >  Power are actually driven as much by store
            atomicity issues, as by

> > > > the ordering issues discussed in the cookbook.This was not

            > > > >  understood in 2005, and unfortunately doesn't
            seem to be
            > amenable to
            > > > >  the kind of straightforward explanation as in
            Doug's cookbook.
            > > >
            > > > Coming from a strongly ordered architecture to a
            weakly ordered one
            > > > myself, I also needed some mental adjustment about
            store (multi-copy)
            > > > atomicity.  I can imagine others will be unaware of
            this difference,
            > > > too, even in 2014.
            > >
            > > Sorry I'm missing the connection between fences and
            multi-copy
            > atomicity.
            >
            > One example is the classic IRIW. With non-multi copy
            atomic stores, but
            > ordered (say through a dependency) loads in the
            following example:
            >
            > Memory: foo = bar = 0
            > _T1_         _T2_         _T3_                       _T4_

> st (foo),1 st (bar),1 ld r1, (bar)ld r3,(foo)

            >                           <addr dep / local "fence"
            here>   <addr dep>

> ld r2, (foo)ld r4, (bar)

            >
            > You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on
            non-multi-copy atomic
            > machines.  On TSO boxes, this is not possible.  That
            means that the
            > memory fence that will prevent such a behaviour (DMB on
            ARM) needs to
            > carry some additional oomph in ensuring multi-copy
            atomicity, or rather
            > prevent you from seeing it (which is the same thing).

            I take it as given that any code for which you may have
            ordering
            constraints, must first have basic atomicity properties
            for loads and
            stores. I would not expect any kind of fence to add
            multi-copy-atomicity
            where there was none.

            David


            > Stephan
            >
            > _______________________________________________
            > Concurrency-interest mailing list
            > [email protected]
            <mailto:[email protected]>
            > http://cs.oswego.edu/mailman/listinfo/concurrency-interest

            _______________________________________________
            Concurrency-interest mailing list
            [email protected]
            <mailto:[email protected]>
            http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
[email protected]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest




_______________________________________________
Concurrency-interest mailing list
[email protected]
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: [concurrency-interest] RFR: 8065804:JEP171:Clarifications/corrections for fence intrinsics

Reply via email to