Re: Making sense of Memory Barriers

Jean-Philippe BEMPEL Wed, 22 Jun 2016 01:36:38 -0700

Agreed with Gil.
You do not care about hardware memory barriers. I usually found many blog 
poss about them and mix it with compiler memory barriers. There is however 
a clear distinction you should make between them.


For an example of instruction reordering done by compiler:
http://jpbempel.blogspot.fr/2013/05/volatile-and-memory-barriers.html

Cheers


On Tuesday, June 21, 2016 at 7:37:54 PM UTC+2, Gil Tene wrote:
>
> The main tip I give people that try to understand fences, barriers, and 
> memory ordering rules is to completely (and very intentionally) ignore and 
> avoid thinking about any CPU's internal details (like buffers or queues or 
> flushes or such). The simple rule to remember is this: before a CPU ever 
> sees the instructions you think you are running, a compiler (static, JIT, 
> whatever) will have a chance to completely shuffle those instructions 
> around and give them to the cpu in any order it sees fit. So what a CPU 
> actually does with barriers has no functional/correntness implications, as 
> it is the compiler's job to make the CPU do what it needs to. 
> Barriers/fences/memory models are first and foremost compiler concerns, and 
> have both a correctness and (potential and profound) performance 
> implications. The CPU concerns are only secondary, and only effect 
> performance (never correctness). This is critical to understand and get 
> your head around before trying to reason about what fences/barriers/etc. 
> *mean* to program code.
>
> For example, the sort of code mutation we are talking about is not just a 
> simple change in order. It is also a combination of reordering with other 
> optimizations. E.g. a repreated load from the same location in a loop can 
> (and will) float "up above" the loop, execute only once (instead of once 
> per loop iteration) and cache the loaded result in a register, unless 
> barriers/fences/ordering rules prevent it from doing so. Similarly 
> redundant stores to the same location can be folded into a single store, 
> unless rules prevent it.
>
> With that in mind, you can start looking at the various barriers and fence 
> semantics available in various languages (that actually define them) and in 
> various ad-hoc tooling (like e.g. libraries and conventions used in C).
>
> I like to use the simplistic LoadLoad, LoadStore, StoreStore, StoreLoad 
> mental model as starting point for thinking about barriers, as it can be 
> used to model most (not all) barrier semantics. Those are simple to 
> understand:
> - LoadStore means "all loads that precede this fence will appear (from the 
> point of view of any other thread) to execute before the next store appears 
> to execute.
> - LoadLoad means "all loads that precede this fence will appear (from the 
> point of view of any other thread) to execute before the next load appears 
> to execute.
> etc. etc.
> The name tells you what the rule is, clear and simple. Nothing more is 
> promised, and nothing less.
>
> I like to think of Acquire and Release (which are somewhat more 
> restrictive) in terms of "what would a lock need to do?" (even if no lock 
> is involved). Thinking in terms of "in a critical section protected by a 
> lock, what sort of code motion would the acquire/release fences associated 
> with the lock acquire and release action allow?" allows me to answer 
> questions about them without looking up a definition or thinking about the 
> more abstract notions that define them. E.g. A lock acquire can 
> allow preceding loads and stores that (in the program) appear "before" the 
> lock acquire to "float forward past the acquire and into the locked 
> region". But it cannot allow loads and stores that (in the program) appear 
> "after" the lock acquire to "float to backwards past the acquire and out of 
> the locked region". So an acquire fence needs to enforce those rules. 
> Similarly, release can allow loads and stores that follow the release to 
> "float back past the release and into the locked region", but cannot allow 
> loads and stores that precede it to "float forward past the release and out 
> of the locked region". 
>
> Since acquire doesn't (necessarily) involve a load or a store, Acquire 
> doesn't quite equate to LoadLoad|LoadStore. It can be when acquiring the 
> lock involves a load followed by LoadLoad|LoadStore, but there are lock 
> implementations where that doesn't actually happen, and uses of Acquire 
> fences with no locking or loads involved. Similarly, since a release 
> doesn't (necessarily) involve a load or a store, it is not quite equivalent 
> to StoreLoad|StoreStore. (But can be when releasing involves a store 
> followed by StoreLoad|StoreStore.
>
> HTH
>
> On Sunday, June 12, 2016 at 4:28:14 PM UTC-7, Dain Ironfoot wrote:
>>
>> Hi Guys,
>>
>> I am attempting to understand memory barriers at a level useful for java 
>> lock-free programmers.
>> This level, I feel, is somewhere between learning about volatile from 
>> java books and learning about working of Store/Load buffers from an x86 
>> manual.
>> I spent some time reading a bunch of blogs/cookbooks and have come up 
>> with the summary shown below.
>>
>> LFENCE
>> ====================================================
>> Name             : LFENCE/Load Barrier/Acquire Fence
>> Barriers          : LoadLoad + LoadStore
>> Details           : Given sequence {Load1, LFENCE, Load2, Store1}, the 
>> barrier ensures that Load1 can't be moved south and Load2 and Store1 can't 
>> be                  moved north of the barrier. Note that Load2 and Store1 
>> can still be reordered.
>> Buffer Effect    : Causes the contents of the *LoadBuffer* (pending 
>> loads) to be processed for that CPU. 
>>                         This makes program state exposed from other CPUs 
>> visible to this CPU before Load2 and Store1 are executed.
>> Cost on x86    : Either very cheap or a no-op.
>> Java instruction: Reading a volatile variable, Unsafe.loadFence()
>>     
>>
>> SFENCE
>> ====================================================
>> Name              : SFENCE/Store Barrier/Release Fence
>> Barriers           : StoreStore + LoadStore
>> Details            : Given sequence {Load1, Store1, SFENCE, Store2, 
>> Load2}, the barrier ensures that Load1 and Store1 can't be moved south and 
>> Store2 can't be moved north of the barrier.
>>                        Note that Load1 and Store1 can still be reordered 
>> AND Load2 can be  moved north of the barrier.
>> Buffer Effect    : Causes the contents of the *StoreBuffer *flushed to 
>> cache for the CPU on which it is issued.
>>                         This will make program state visible to other 
>> CPUs before Store2 and Load1 are executed.
>> Cost on x86    : Either very cheap or a no-op.
>> Java instruction: lazySet(), Unsafe.storeFence(), Unsafe.putOrdered*()
>>
>>
>> MFENCE
>> ====================================================
>> Name            : MFENCE/Full Barrier/Fence
>> Barriers         : StoreLoad
>> Details          : Obtains the effects of the other three barrier. So can 
>> serve as a general-purpose barrier.
>>                       Given sequence {Load1, Store1, MFENCE, Store2, 
>> Load2}, the barrier ensures that Load1 and Store1 can't be moved south and 
>> Store2 and               Load2 can't be moved north of the barrier.
>>                       Note that Load1 and Store1 can still be reordered 
>> AND Store2 and Load2 can still be reordered.
>>                   
>> Buffer Effect   : Causes the contents of the *LoadBuffer *(pending 
>> loads) to be processed for that CPU. 
>>                       AND
>>                       Causes the contents of the *StoreBuffer *flushed 
>> to cache for the CPU on which it is issued.
>>
>> Cost on x86    : The most expensive kind.
>> Java instruction: Writing to a volatile, Unsafe.fullFence(), Using locks
>>     
>> My questions are:
>>
>> a) Is my understanding correct or am I missed something critical?
>> b) If both SFENCE and MFENCE drains the storeBuffer (invalidates 
>> cacheline and waits for acks from other cpus), why is one a no-op and the 
>> other a very expensive op?
>>
>> Thanks
>>
>> ** Edited to make the info easier to read.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Making sense of Memory Barriers

Reply via email to