Re: [lock-free] One reason why I like atomic_thread_fence...

Dmitry Vyukov Sat, 07 Apr 2018 01:47:22 -0700

On Thu, Apr 5, 2018 at 10:03 PM, Chris M. Thomasson <[email protected]> wrote:
> On Tuesday, April 3, 2018 at 5:44:38 AM UTC-7, Dmitry Vyukov wrote:
>>
>> On Sat, Mar 31, 2018 at 10:41 PM, Chris M. Thomasson
>> <[email protected]> wrote:
>> > Notice how there is an acquire barrier inside of the CAS loop within the
>> > enqueue and dequeue functions of:
>> >
>> >
>> >
>> > http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue
>> >
>> >
>> > ?
>> >
>> >
>> > Well, fwiw we can get rid of them by using stand alone fences:
>> >
>> >
>> > Btw, sorry for the membar abstraction. I can quickly get sick and tired
>> > of
>> > having to type out (std::memory_order_relaxed) all the damn time. ;^)
>> >
>> >
>> > Anyway, here is the simple code using Relacy:
>> >
>> > ___________________
>> [...]
>> > ___________________
>> >
>> >
>> >
>> > There is no need for any membars within the loops. The version count
>> > keeps
>> > everything in order.
>>
>> compare_exchange in C/C++ has optional failure memory ordering, so the
>> following looks equivalent to me ;)
>>
>> m_tail.compare_exchange_weak(ver, ver + 1, acquire, relaxed)
>
>
> Yes, you are correct about that. However, I personally like to keep
> everything relaxed, and only add memory barriers when they are absolutely
> needed. The stand alone fence can do this. Ahh, perhaps I am thinking too
> much of SPARC with its awesome membar instruction. For instance, the acquire
> barrier you have in the CAS loop in your MPMC queue code is what a stand
> alone fence can get rid of. Think of the following line of code, that is
> within the loop:
>
>  size_t seq =
>
>         cell->sequence_.load(std::memory_order_acquire);
>
>
> Executing an acquire barrier on every iteration of the CAS loop is not
> necessary. The actual version count keeps everything in order.
>
>
> However, you do need a single acquire fence _after_ the CAS loop succeeds in
> order to get a clear view of the element.


This is true.

I don't like standalone fences because they are plague for
verification. Consider, a single release fences turns _all_ subsequent
relaxed atomic stores ever executed by the thread into release
operations (they release memory state up to the fence point) and
handling of acquire/release operations is an O(N) operation (and
generally done under a mutex). The same for acquire fences: a single
acquire fences turns _all_ loads ever executed by the thread into
acquire operations ton he corresponding memory locations, which means
that you need to handle all relaxed loads as a "shadow" acquire loads
for the case they will be materialized by a subsequent acquire fence.

The same is actually true for human reasoning. Say, I am reading your
code. We have 3 load operations in the loop and an acquire fences
after the loop. Now the question is: which of the loads we wanted to
turn into acquire by adding the fence? Or is it maybe 2 of them?
Which? Or maybe 1 in the loop and 1 somewhere before the loop, in a
different function?
One can, of course, comment that, but Relacy won't check comments, so
I won't trust them ;)

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/CAEeQi3vVaRy52%3DUr%2Bnxk6RCmQ9LxyjjzgPuhEb6CHv2vv%2B8YeA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [lock-free] One reason why I like atomic_thread_fence...

Reply via email to