Re: [lock-free] One reason why I like atomic_thread_fence...

Dmitry Vyukov Fri, 13 Apr 2018 23:46:56 -0700

On Mon, Apr 9, 2018 at 3:38 AM, Chris M. Thomasson <[email protected]> wrote:
> On Saturday, April 7, 2018 at 1:46:20 AM UTC-7, Dmitry Vyukov wrote:
>>
>> On Thu, Apr 5, 2018 at 10:03 PM, Chris M. Thomasson <[email protected]>
>> wrote:
>> > On Tuesday, April 3, 2018 at 5:44:38 AM UTC-7, Dmitry Vyukov wrote:
>> >>
>> >> On Sat, Mar 31, 2018 at 10:41 PM, Chris M. Thomasson
>> >> <[email protected]> wrote:
>> >> > Notice how there is an acquire barrier inside of the CAS loop within
>> >> > the
>> >> > enqueue and dequeue functions of:
>> >> >
>> >> >
>> >> > http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue
>> [...]
>> > Executing an acquire barrier on every iteration of the CAS loop is not
>> > necessary. The actual version count keeps everything in order.
>> >
>> > However, you do need a single acquire fence _after_ the CAS loop
>> > succeeds in
>> > order to get a clear view of the element.
>>
>> This is true.
>
>
> Agreed. I personally like the ability to see the membars being separated out
> and
>
> standing alone. It is a habit of mine from SPARC. Now, tagged standalone
> membars
>
> aside for a moment, perhaps ones that can include memory locations they are
>
> interested in... ;^)
>
>
>>
>>
>> I don't like standalone fences because they are plague for
>> verification. Consider, a single release fences turns _all_ subsequent
>> relaxed atomic stores ever executed by the thread into release
>> operations (they release memory state up to the fence point) and
>> handling of acquire/release operations is an O(N) operation (and
>> generally done under a mutex).
>
>
> A release operation should make sure all _prior_ operations are visible
> _before_
>
> they are visible to another thread. They have no effect on subsequent
> relaxed
>
> operations. For instance:
>
>
> // producer
>
> A = 1
>
> B = 2
>
> RELEASE
>
> C = 3
>
> D = 4
>
>
> // consumer
>
> while (D != 4) backoff;
>
> ACQUIRE
>
> assert(A == 1 && B == 2);
>
>
> Well, A and B are going be in sync with an acquire such that the assert will
> never
> fail, however C can be hoisted up and not be in sync at all! C is incoherent
> wrt the
> consumer because it was not covered by the standalone release barrier.



In this case the RELEASE turned store to D into a release-store (a
subsequent store).
And ACQUIRE turned load of D into an acquire-load (a preceding load).

At lease this is how this is defined in C/C++ standards.
ACQUIRE/RELEASE fences do not establish any happens-before relations
themselves. You still need a load in one thread to observe a value
stored in another thread. And only that "materializes" standalone
fence synchronization. So a store that materializes RELEASE fence will
always be a subsequent store.


>> The same for acquire fences: a single
>> acquire fences turns _all_ loads ever executed by the thread into
>> acquire operations ton he corresponding memory locations, which means
>> that you need to handle all relaxed loads as a "shadow" acquire loads
>> for the case they will be materialized by a subsequent acquire fence.
>
>
> An acquire operation should make sure all operations wrt the release are
> visible
>
> _before_ any subsequent operations can be performed _after_ that fact is
>
> accomplished.
>
>
> Well, fwiw, the membars that can be embedded into the CAS wrt acquire and
>
> release do effect prior and subsequent activity anyway, standalone or not. A
> release will
>
> dump prior stores such that an acquire barrier will see them all. Now, when
> we
>
> are dealing with a consume barrier, well that is targeting the release
> dependency
>
> chain wrt the pointer. A consume barrier is more precisely targeted when
> compared
>
> to the wider spectrum of an acquire. Btw, iirc consume is emulated in Relacy
> as
>
> acquire right?
>
>
> Also, think of popping all nodes at once from an atomic LIFO:
>
>
> https://groups.google.com/d/topic/comp.lang.c++/V0s__czQwa0/discussion
>
>
> Well, how can we accomplish the following without using standalone fences?:
>
>
>       // try to flush all of our nodes
>       node* flush()
>       {
>           node* n = m_head.exchange(nullptr, mb_relaxed);
>
>           if (n)
>           {
>               mb_fence(mb_acquire);
>           }
>
>           return n;
>       }

I can't disagree. They are definitely more flexible.


>> The same is actually true for human reasoning. Say, I am reading your
>> code. We have 3 load operations in the loop and an acquire fences
>> after the loop. Now the question is: which of the loads we wanted to
>> turn into acquire by adding the fence? Or is it maybe 2 of them?
>> Which? Or maybe 1 in the loop and 1 somewhere before the loop, in a
>> different function?
>> One can, of course, comment that, but Relacy won't check comments, so
>> I won't trust them ;)
>
>
>
> Interesting. Still makes me think of tagged membars. I will get back to you
> with
>
> a more detailed response.


You mean something like:

       // try to flush all of our nodes
       node* flush()
       {
           node* n = m_head.exchange(nullptr, mb_relaxed);

           if (n)
           {
               mb_fence(mb_acquire, m_head);  // <---- HERE
           }

           return n;
       }

? Interesting.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/CAEeQi3v2C%3D3FfV%3DiudZhxuOpCBgabxE6za6mW_WzT3anHvpksg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [lock-free] One reason why I like atomic_thread_fence...

Reply via email to