Re: [lock-free] One reason why I like atomic_thread_fence...

Dmitry Vyukov Fri, 20 Apr 2018 02:06:36 -0700

On Mon, Apr 16, 2018 at 12:32 AM, Chris M. Thomasson
<cris...@charter.net> wrote:
>
>
> On Friday, April 13, 2018 at 11:45:51 PM UTC-7, Dmitry Vyukov wrote:
>>
>> On Mon, Apr 9, 2018 at 3:38 AM, Chris M. Thomasson <cri...@charter.net>
>> wrote:
>> > On Saturday, April 7, 2018 at 1:46:20 AM UTC-7, Dmitry Vyukov wrote:
>> >>
>> >> On Thu, Apr 5, 2018 at 10:03 PM, Chris M. Thomasson
>> >> <cri...@charter.net>
>> >> wrote:
>> >> > On Tuesday, April 3, 2018 at 5:44:38 AM UTC-7, Dmitry Vyukov wrote:
>> >> >>
>> >> >> On Sat, Mar 31, 2018 at 10:41 PM, Chris M. Thomasson
>> >> >> <cri...@charter.net> wrote:
>> >> >> > Notice how there is an acquire barrier inside of the CAS loop
>> >> >> > within
>> >> >> > the
>> >> >> > enqueue and dequeue functions of:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > http://www.1024cores.net/home/lock-free-algorithms/queues/bounded-mpmc-queue
>> >> [...]
>> >> > Executing an acquire barrier on every iteration of the CAS loop is
>> >> > not
>> >> > necessary. The actual version count keeps everything in order.
>> >> >
>> >> > However, you do need a single acquire fence _after_ the CAS loop
>> >> > succeeds in
>> >> > order to get a clear view of the element.
>> >>
>> >> This is true.
>> >
>> >
>> > Agreed. I personally like the ability to see the membars being separated
>> > out
>> > and
>> >
>> > standing alone. It is a habit of mine from SPARC. Now, tagged standalone
>> > membars
>> >
>> > aside for a moment, perhaps ones that can include memory locations they
>> > are
>> >
>> > interested in... ;^)
>> >
>> >
>> >>
>> >>
>> >> I don't like standalone fences because they are plague for
>> >> verification. Consider, a single release fences turns _all_ subsequent
>> >> relaxed atomic stores ever executed by the thread into release
>> >> operations (they release memory state up to the fence point) and
>> >> handling of acquire/release operations is an O(N) operation (and
>> >> generally done under a mutex).
>> >
>> >
>> > A release operation should make sure all _prior_ operations are visible
>> > _before_
>> >
>> > they are visible to another thread. They have no effect on subsequent
>> > relaxed
>> >
>> > operations. For instance:
>> >
>> >
>> > // producer
>> >
>> > A = 1
>> >
>> > B = 2
>> >
>> > RELEASE
>> >
>> > C = 3
>> >
>> > D = 4
>> >
>> >
>> > // consumer
>> >
>> > while (D != 4) backoff;
>> >
>> > ACQUIRE
>> >
>> > assert(A == 1 && B == 2);
>> >
>> >
>> > Well, A and B are going be in sync with an acquire such that the assert
>> > will
>> > never
>> > fail, however C can be hoisted up and not be in sync at all! C is
>> > incoherent
>> > wrt the
>> > consumer because it was not covered by the standalone release barrier.
>>
>>
>> In this case the RELEASE turned store to D into a release-store (a
>> subsequent store).
>> And ACQUIRE turned load of D into an acquire-load (a preceding load).
>
>
> D should be a pure relaxed store, and C should not be covered by the
> RELEASE. Iirc, it works this way on SPARC RMO mode. However, on x86, C will
> be covered because each store has implied release characteristics, wb memory
> aside for a moment.


C and D are completely symmetric wrt the RELEASE. Later you can
discover that there is also a thread that does:

// consumer 2
while (C != 3) backoff;
ACQUIRE
assert(A == 1 && B == 2);

And now suddenly C is release operation exactly the same way D is.


>> At lease this is how this is defined in C/C++ standards.
>> ACQUIRE/RELEASE fences do not establish any happens-before relations
>> themselves. You still need a load in one thread to observe a value
>> stored in another thread. And only that "materializes" standalone
>> fence synchronization. So a store that materializes RELEASE fence will
>> always be a subsequent store.
>
>
> Humm... That is too strict, and has to be there whether we use standalone
> fences or not.

No, in C/C++ memory ordering constrains tied to memory operations act
only on that memory operation.

Consider:

DATA = 1;
C.store(1, memory_order_release);
D.store(1, memory_order_relaxed);

vs:

DATA = 1;
atomic_thread_fence(memory_order_release);
C.store(1, memory_order_relaxed);
D.store(1, memory_order_relaxed);


And 2 consumers:

// consumer 1
while (C.load(memory_order_acquire) == 0) backoff();
assert(DATA == 1);

// consumer 2
while (D.load(memory_order_acquire) == 0) backoff();
assert(DATA == 1);

Both consumers are correct wrt the atomic_thread_fence version of
producer. But only the first one is correct wrt the store(1,
memory_order_release) version of producer.

And this can actually break on x86 because:

DATA = 1;
C.store(1, memory_order_release);
D.store(1, memory_order_relaxed);

can be compiled to machine code as:

D.store(1, memory_order_relaxed);
DATA = 1;
C.store(1, memory_order_release);

But:

DATA = 1;
atomic_thread_fence(memory_order_release);
C.store(1, memory_order_relaxed);
D.store(1, memory_order_relaxed);

cannot be compiled to machine code as (store to D cannot hoist above
the release fence):

D.store(1, memory_order_relaxed);
DATA = 1;
atomic_thread_fence(memory_order_release);
C.store(1, memory_order_relaxed);


> The store to D = 4 makes A and B wrt the RELEASE visible to
> the consumer threads that look for D = 4 and execute the ACQUIRE barrier
> after that fact has been observed. Afaict, C should NOT be covered.
>
>
>>
>>
>>
>> >> The same for acquire fences: a single
>> >> acquire fences turns _all_ loads ever executed by the thread into
>> >> acquire operations ton he corresponding memory locations, which means
>> >> that you need to handle all relaxed loads as a "shadow" acquire loads
>> >> for the case they will be materialized by a subsequent acquire fence.
>
>
> That sounds to coarse.

That's the semantics of stand-alone fence. Consider:

foo = 1;

... gazillion lines of code and hours later ...

atomic_thread_fence(memory_order_release);

... gazillion lines of code and hours later ...

x.store(1, relaxed);


and then in another thread:

if (x.load(acquire)) assert(foo == 1);

and now suddenly these 3 lines of code separated by gazilions lines of
code and hours of execution fold into connected synchronization
pattern.
With release-store we at least have the release and the store on the
same line _and_ we know that the release act only on this store and
not on unknown set of other stores.



>> > An acquire operation should make sure all operations wrt the release are
>> > visible
>> >
>> > _before_ any subsequent operations can be performed _after_ that fact is
>> >
>> > accomplished.
>> >
>> >
>> > Well, fwiw, the membars that can be embedded into the CAS wrt acquire
>> > and
>> >
>> > release do effect prior and subsequent activity anyway, standalone or
>> > not. A
>> > release will
>> >
>> > dump prior stores such that an acquire barrier will see them all. Now,
>> > when
>> > we
>> >
>> > are dealing with a consume barrier, well that is targeting the release
>> > dependency
>> >
>> > chain wrt the pointer. A consume barrier is more precisely targeted when
>> > compared
>> >
>> > to the wider spectrum of an acquire. Btw, iirc consume is emulated in
>> > Relacy
>> > as
>> >
>> > acquire right?
>> >
>> >
>> > Also, think of popping all nodes at once from an atomic LIFO:
>> >
>> >
>> > https://groups.google.com/d/topic/comp.lang.c++/V0s__czQwa0/discussion
>> >
>> >
>> > Well, how can we accomplish the following without using standalone
>> > fences?:
>> >
>> >
>> >       // try to flush all of our nodes
>> >       node* flush()
>> >       {
>> >           node* n = m_head.exchange(nullptr, mb_relaxed);
>> >
>> >           if (n)
>> >           {
>> >               mb_fence(mb_acquire);
>> >           }
>> >
>> >           return n;
>> >       }
>>
>> I can't disagree. They are definitely more flexible.
>
>
> Agreed.
>
>>
>>
>>
>> >> The same is actually true for human reasoning. Say, I am reading your
>> >> code. We have 3 load operations in the loop and an acquire fences
>> >> after the loop. Now the question is: which of the loads we wanted to
>> >> turn into acquire by adding the fence? Or is it maybe 2 of them?
>> >> Which? Or maybe 1 in the loop and 1 somewhere before the loop, in a
>> >> different function?
>> >> One can, of course, comment that, but Relacy won't check comments, so
>> >> I won't trust them ;)
>> >
>> >
>> >
>> > Interesting. Still makes me think of tagged membars. I will get back to
>> > you
>> > with
>> >
>> > a more detailed response.
>>
>>
>> You mean something like:
>>
>>        // try to flush all of our nodes
>>        node* flush()
>>        {
>>            node* n = m_head.exchange(nullptr, mb_relaxed);
>>
>>            if (n)
>>            {
>>                mb_fence(mb_acquire, m_head);  // <---- HERE
>>            }
>>
>>            return n;
>>        }
>>
>> ? Interesting.
>
>
> Yes! The stand alone fence can say, we want to perform an acquire barrier
> wrt m_head. Something like that should be able to create more fine grain
> setups. Perhaps even something like the following pseudo-code:
> ______________________
> // setup
> int a = 0;
> int b = 0;
> int c = 0;
> signal = false;
>
> // producer
> a = 1;
> b = 2;
> RELEASE(&signal, &a, &b);
> c = 3;
> STORE_RELAXED(&signal, true);
>
> // consumers
> while (LOAD_RELAXED(&signal) != true) backoff;
> ACQUIRE(&signal, &a, &b);
> assert(a == 1 && b == 2);
> ______________________
>
> The consumers would always see a and b as 1 and 2, however, c was not
> covered, so it is an incoherent state wrt said consumers.
>
> The acquire would only target a and b, as would the release.
>
> Hummm... Just thinking out loud here. :^)

This would help verification tools tremendously... but unfortunately
it's not the reality we are living in :)

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/CAEeQi3vbQHyaG2O3zE7xFvGKOHL4hXQ2mxwMsvWG64P1xkrzLQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [lock-free] One reason why I like atomic_thread_fence...

Reply via email to