On Thu, Apr 5, 2018 at 10:03 PM, Chris M. Thomasson <[email protected]> wrote: > On Tuesday, April 3, 2018 at 5:44:38 AM UTC-7, Dmitry Vyukov wrote: >> >> On Sat, Mar 31, 2018 at 10:41 PM, Chris M. Thomasson >> <[email protected]> wrote: >> > Notice how there is an acquire barrier inside of the CAS loop within the >> > enqueue and dequeue functions of: >> > >> > >> > >> > http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue >> > >> > >> > ? >> > >> > >> > Well, fwiw we can get rid of them by using stand alone fences: >> > >> > >> > Btw, sorry for the membar abstraction. I can quickly get sick and tired >> > of >> > having to type out (std::memory_order_relaxed) all the damn time. ;^) >> > >> > >> > Anyway, here is the simple code using Relacy: >> > >> > ___________________ >> [...] >> > ___________________ >> > >> > >> > >> > There is no need for any membars within the loops. The version count >> > keeps >> > everything in order. >> >> compare_exchange in C/C++ has optional failure memory ordering, so the >> following looks equivalent to me ;) >> >> m_tail.compare_exchange_weak(ver, ver + 1, acquire, relaxed) > > > Yes, you are correct about that. However, I personally like to keep > everything relaxed, and only add memory barriers when they are absolutely > needed. The stand alone fence can do this. Ahh, perhaps I am thinking too > much of SPARC with its awesome membar instruction. For instance, the acquire > barrier you have in the CAS loop in your MPMC queue code is what a stand > alone fence can get rid of. Think of the following line of code, that is > within the loop: > > size_t seq = > > cell->sequence_.load(std::memory_order_acquire); > > > Executing an acquire barrier on every iteration of the CAS loop is not > necessary. The actual version count keeps everything in order. > > > However, you do need a single acquire fence _after_ the CAS loop succeeds in > order to get a clear view of the element.
This is true. I don't like standalone fences because they are plague for verification. Consider, a single release fences turns _all_ subsequent relaxed atomic stores ever executed by the thread into release operations (they release memory state up to the fence point) and handling of acquire/release operations is an O(N) operation (and generally done under a mutex). The same for acquire fences: a single acquire fences turns _all_ loads ever executed by the thread into acquire operations ton he corresponding memory locations, which means that you need to handle all relaxed loads as a "shadow" acquire loads for the case they will be materialized by a subsequent acquire fence. The same is actually true for human reasoning. Say, I am reading your code. We have 3 load operations in the loop and an acquire fences after the loop. Now the question is: which of the loads we wanted to turn into acquire by adding the fence? Or is it maybe 2 of them? Which? Or maybe 1 in the loop and 1 somewhere before the loop, in a different function? One can, of course, comment that, but Relacy won't check comments, so I won't trust them ;) -- --- You received this message because you are subscribed to the Google Groups "Scalable Synchronization Algorithms" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/lock-free/CAEeQi3vVaRy52%3DUr%2Bnxk6RCmQ9LxyjjzgPuhEb6CHv2vv%2B8YeA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
