Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

Oleksandr Otenko Tue, 09 Dec 2014 16:26:26 -0800

Yes, I read the part of that paper about IRIW.

My thinking that the ordering of stores would be the more contentiouspoint also appears about the same thing.


In IRIW we have two parts of chain that's reasonable to expect to work:

x=1 <-- sw -- r1=x <-- po -- r2=y
y=1 <-- sw -- r3=y <-- po -- r4=x

Suppose r2==0. Then to show the outcome with r4==0 is forbidden need toshow an edge from r4=x to x=1. Total ordering of stores provides such anedge: x=1 <-- so -- y=1 (if chosen otherwise, then we can prove r2!=0).

It looked like enforcing sw and po is reasonable and cheap. But totalordering of stores isn't justified in the spec - it just is there, henceI was wondering.

On a non-TSO there is no edge between the stores. So to construct theproof the outcome r4==0 is forbidden they had to enforce a orderingbetween other instructions, where r2==0 implies the preceding barrieralso precedes r3=y, which observed r3==1; then it follows that r1=x <--so -- r4=x, and a contradiction shows r4==0 is forbidden.

The effort (and the need) to order the reads is tremendous, and doesn'tseem right, so I see why it is raising questions.


Alex

On 10/12/2014 00:00, David Holmes wrote:

The "no known useful benefit" is based on the paper which states "weare not aware of any cases where IRIW arises as a natural programmingidiom".

I think your example would be written:
Thread 1:
x =1; storestore; y=1;
Thread 2:
r1 = y; r2 =x.
Or more clearly, the most common pattern would be:
Thread1:
data = 1; storestore; dataReady = true;
Thread 2:
if dataReady
  r2 = data

The above does not require IRIW. Conversely if you have IRIW you don'tneed the storestore.

David

    -----Original Message-----
    *From:* [email protected]
    [mailto:[email protected]]*On Behalf Of
    *Oleksandr Otenko
    *Sent:* Wednesday, 10 December 2014 8:21 AM
    *Cc:* [email protected]; core-libs-dev
    *Subject:* Re: [concurrency-interest] RFR: 8065804:
    JEP171:Clarifications/corrections for fence intrinsics

    In that case I must say I can't see why you mentioned "no known
    useful benefit". The known useful benefit from ordering reads can
    be seen here:

    store in one order:
    Thread 1:
    x=1
    y=1

    load in reverse order:
    Thread 2:
    r1=y;
    r2=x;

    This is a common pattern, so ordering loads is already useful.
    Here, even though JMM talks about total order of all volatile
    operations, in practice the order of loads is weaker, as long as
    this weakening cannot be observed - eg on x86 enforcing order of
    loads among themselves is an entirely local matter.

    IRIW extends the store part to many threads, thus guaranteeing
    total store order for volatiles. I thought the total ordering of
    stores would be a more contentious point (but I agree with the
    point Hans makes about easier reasoning).

    Alex

    On 09/12/2014 21:36, David Holmes wrote:

    The "thorn" is the need for the barriers in the readers not the
    writers. (or perhaps as well as the writers in some cases - that
    is part of the problem.)
    David

        -----Original Message-----
        *From:* [email protected]
        [mailto:[email protected]]*On Behalf
        Of *Oleksandr Otenko
        *Sent:* Wednesday, 10 December 2014 6:34 AM
        *To:* [email protected]; Hans Boehm
        *Cc:* core-libs-dev; [email protected]
        *Subject:* Re: [concurrency-interest] RFR: 8065804:
        JEP171:Clarifications/corrections for fence intrinsics

        Is the thorn the many allowed outcomes, or the single
        disallowed outcome? (eg order consistency is too strict for
        stores with no synchronizes-with between them?)

        Alex


        On 26/11/2014 02:10, David Holmes wrote:

        Hi Hans,
        Given IRIW is a thorn in everyone's side and has no known
        useful benefit, and can hopefully be killed off in the
        future, lets not get bogged down in IRIW. But none of what
        you say below relates to multi-copy-atomicity.
        Cheers,
        David

            -----Original Message-----
            *From:* [email protected]
            [mailto:[email protected]]*On Behalf Of *Hans Boehm
            *Sent:* Wednesday, 26 November 2014 12:04 PM
            *To:* [email protected]
            *Cc:* Stephan Diestelhorst;
            [email protected]; core-libs-dev
            *Subject:* Re: [concurrency-interest] RFR: 8065804:
            JEP171:Clarifications/corrections for fence intrinsics

            To be concrete here, on Power, loads can normally be
            ordered by an address dependency or light-weight fence
            (lwsync).  However, neither is enough to prevent the
            questionable outcome for IRIW, since it doesn't ensure
            that the stores in T1 and T2 will be made visible to
            other threads in a consistent order.  That outcome can
            be prevented by using heavyweight fences (sync)
            instructions between the loads instead.  Peter Sewell's
            group concluded that to enforce correct volatile
            behavior on Power, you essentially need a a heavyweight
            fence between every pair of volatile operations on
            Power.  That cannot be understood based on simple
            ordering constraints.

            As Stephan pointed out, there are similar issues on ARM,
            but they're less commonly encountered in a Java
            implementation.  If you're lucky, you can get to the
            right implementation recipe by looking at only
            reordering, I think.


            On Tue, Nov 25, 2014 at 4:36 PM, David Holmes
            <[email protected]
            <mailto:[email protected]>> wrote:

                Stephan Diestelhorst writes:
                >
                > David Holmes wrote:
                > > Stephan Diestelhorst writes:
                > > > Am Dienstag, 25. November 2014, 11:15:36
                schrieb Hans Boehm:
                > > > > I'm no hardware architect, but fundamentally
                it seems to me that
                > > > >
                > > > > load x
                > > > > acquire_fence
                > > > >
                > > > > imposes a much more stringent constraint than
                > > > >
                > > > > load_acquire x
                > > > >
                > > > > Consider the case in which the load from x
                is an L1 hit, but a
                > > > > preceding load (from say y) is a
                long-latency miss.  If we enforce
                > > > > ordering by just waiting for completion of
                prior operation, the
                > > > > former has to wait for the load from y to
                complete; while the
                > > > > latter doesn't.  I find it hard to believe
                that this doesn't leave
                > > > > an appreciable amount of performance on the
                table, at least for
                > > > > some interesting microarchitectures.
                > > >
                > > > I agree, Hans, that this is a reasonable
                assumption.  Load_acquire x
                > > > does allow roach motel, whereas the acquire
                fence does not.
                > > >
                > > > >  In addition, for better or worse, fencing
                requirements on at least
                > > > >  Power are actually driven as much by store
                atomicity issues, as by
                > > > >  the ordering issues discussed in the
                cookbook.  This was not
                > > > >  understood in 2005, and unfortunately
                doesn't seem to be
                > amenable to
                > > > >  the kind of straightforward explanation as
                in Doug's cookbook.
                > > >
                > > > Coming from a strongly ordered architecture to
                a weakly ordered one
                > > > myself, I also needed some mental adjustment
                about store (multi-copy)
                > > > atomicity.  I can imagine others will be
                unaware of this difference,
                > > > too, even in 2014.
                > >
                > > Sorry I'm missing the connection between fences
                and multi-copy
                > atomicity.
                >
                > One example is the classic IRIW.  With non-multi
                copy atomic stores, but
                > ordered (say through a dependency) loads in the
                following example:
                >
                > Memory: foo = bar = 0
                > _T1_         _T2_         _T3_                   _T4_

> st (foo),1 st (bar),1 ld r1, (bar)ld r3,(foo)

                >                           <addr dep / local
                "fence" here>   <addr dep>

> ld r2, (foo)ld r4, (bar)

                >
                > You may observe r1 = 1, r2 = 0, r3 = 1, r4 = 0 on
                non-multi-copy atomic

> machines. On TSO boxes, this is not possible.That means that the

                > memory fence that will prevent such a behaviour
                (DMB on ARM) needs to
                > carry some additional oomph in ensuring multi-copy
                atomicity, or rather
                > prevent you from seeing it (which is the same thing).

                I take it as given that any code for which you may
                have ordering
                constraints, must first have basic atomicity
                properties for loads and
                stores. I would not expect any kind of fence to add
                multi-copy-atomicity
                where there was none.

                David

                > Stephan
                >
                > _______________________________________________
                > Concurrency-interest mailing list
                > [email protected]
                <mailto:[email protected]>
                >
                http://cs.oswego.edu/mailman/listinfo/concurrency-interest

                _______________________________________________
                Concurrency-interest mailing list
                [email protected]
                <mailto:[email protected]>
                http://cs.oswego.edu/mailman/listinfo/concurrency-interest




        _______________________________________________
        Concurrency-interest mailing list
        [email protected]
        http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: [concurrency-interest] RFR: 8065804: JEP171:Clarifications/corrections for fence intrinsics

Reply via email to