Agreed with Gil. You do not care about hardware memory barriers. I usually found many blog poss about them and mix it with compiler memory barriers. There is however a clear distinction you should make between them.
For an example of instruction reordering done by compiler: http://jpbempel.blogspot.fr/2013/05/volatile-and-memory-barriers.html Cheers On Tuesday, June 21, 2016 at 7:37:54 PM UTC+2, Gil Tene wrote: > > The main tip I give people that try to understand fences, barriers, and > memory ordering rules is to completely (and very intentionally) ignore and > avoid thinking about any CPU's internal details (like buffers or queues or > flushes or such). The simple rule to remember is this: before a CPU ever > sees the instructions you think you are running, a compiler (static, JIT, > whatever) will have a chance to completely shuffle those instructions > around and give them to the cpu in any order it sees fit. So what a CPU > actually does with barriers has no functional/correntness implications, as > it is the compiler's job to make the CPU do what it needs to. > Barriers/fences/memory models are first and foremost compiler concerns, and > have both a correctness and (potential and profound) performance > implications. The CPU concerns are only secondary, and only effect > performance (never correctness). This is critical to understand and get > your head around before trying to reason about what fences/barriers/etc. > *mean* to program code. > > For example, the sort of code mutation we are talking about is not just a > simple change in order. It is also a combination of reordering with other > optimizations. E.g. a repreated load from the same location in a loop can > (and will) float "up above" the loop, execute only once (instead of once > per loop iteration) and cache the loaded result in a register, unless > barriers/fences/ordering rules prevent it from doing so. Similarly > redundant stores to the same location can be folded into a single store, > unless rules prevent it. > > With that in mind, you can start looking at the various barriers and fence > semantics available in various languages (that actually define them) and in > various ad-hoc tooling (like e.g. libraries and conventions used in C). > > I like to use the simplistic LoadLoad, LoadStore, StoreStore, StoreLoad > mental model as starting point for thinking about barriers, as it can be > used to model most (not all) barrier semantics. Those are simple to > understand: > - LoadStore means "all loads that precede this fence will appear (from the > point of view of any other thread) to execute before the next store appears > to execute. > - LoadLoad means "all loads that precede this fence will appear (from the > point of view of any other thread) to execute before the next load appears > to execute. > etc. etc. > The name tells you what the rule is, clear and simple. Nothing more is > promised, and nothing less. > > I like to think of Acquire and Release (which are somewhat more > restrictive) in terms of "what would a lock need to do?" (even if no lock > is involved). Thinking in terms of "in a critical section protected by a > lock, what sort of code motion would the acquire/release fences associated > with the lock acquire and release action allow?" allows me to answer > questions about them without looking up a definition or thinking about the > more abstract notions that define them. E.g. A lock acquire can > allow preceding loads and stores that (in the program) appear "before" the > lock acquire to "float forward past the acquire and into the locked > region". But it cannot allow loads and stores that (in the program) appear > "after" the lock acquire to "float to backwards past the acquire and out of > the locked region". So an acquire fence needs to enforce those rules. > Similarly, release can allow loads and stores that follow the release to > "float back past the release and into the locked region", but cannot allow > loads and stores that precede it to "float forward past the release and out > of the locked region". > > Since acquire doesn't (necessarily) involve a load or a store, Acquire > doesn't quite equate to LoadLoad|LoadStore. It can be when acquiring the > lock involves a load followed by LoadLoad|LoadStore, but there are lock > implementations where that doesn't actually happen, and uses of Acquire > fences with no locking or loads involved. Similarly, since a release > doesn't (necessarily) involve a load or a store, it is not quite equivalent > to StoreLoad|StoreStore. (But can be when releasing involves a store > followed by StoreLoad|StoreStore. > > HTH > > On Sunday, June 12, 2016 at 4:28:14 PM UTC-7, Dain Ironfoot wrote: >> >> Hi Guys, >> >> I am attempting to understand memory barriers at a level useful for java >> lock-free programmers. >> This level, I feel, is somewhere between learning about volatile from >> java books and learning about working of Store/Load buffers from an x86 >> manual. >> I spent some time reading a bunch of blogs/cookbooks and have come up >> with the summary shown below. >> >> LFENCE >> ==================================================== >> Name : LFENCE/Load Barrier/Acquire Fence >> Barriers : LoadLoad + LoadStore >> Details : Given sequence {Load1, LFENCE, Load2, Store1}, the >> barrier ensures that Load1 can't be moved south and Load2 and Store1 can't >> be moved north of the barrier. Note that Load2 and Store1 >> can still be reordered. >> Buffer Effect : Causes the contents of the *LoadBuffer* (pending >> loads) to be processed for that CPU. >> This makes program state exposed from other CPUs >> visible to this CPU before Load2 and Store1 are executed. >> Cost on x86 : Either very cheap or a no-op. >> Java instruction: Reading a volatile variable, Unsafe.loadFence() >> >> >> SFENCE >> ==================================================== >> Name : SFENCE/Store Barrier/Release Fence >> Barriers : StoreStore + LoadStore >> Details : Given sequence {Load1, Store1, SFENCE, Store2, >> Load2}, the barrier ensures that Load1 and Store1 can't be moved south and >> Store2 can't be moved north of the barrier. >> Note that Load1 and Store1 can still be reordered >> AND Load2 can be moved north of the barrier. >> Buffer Effect : Causes the contents of the *StoreBuffer *flushed to >> cache for the CPU on which it is issued. >> This will make program state visible to other >> CPUs before Store2 and Load1 are executed. >> Cost on x86 : Either very cheap or a no-op. >> Java instruction: lazySet(), Unsafe.storeFence(), Unsafe.putOrdered*() >> >> >> MFENCE >> ==================================================== >> Name : MFENCE/Full Barrier/Fence >> Barriers : StoreLoad >> Details : Obtains the effects of the other three barrier. So can >> serve as a general-purpose barrier. >> Given sequence {Load1, Store1, MFENCE, Store2, >> Load2}, the barrier ensures that Load1 and Store1 can't be moved south and >> Store2 and Load2 can't be moved north of the barrier. >> Note that Load1 and Store1 can still be reordered >> AND Store2 and Load2 can still be reordered. >> >> Buffer Effect : Causes the contents of the *LoadBuffer *(pending >> loads) to be processed for that CPU. >> AND >> Causes the contents of the *StoreBuffer *flushed >> to cache for the CPU on which it is issued. >> >> Cost on x86 : The most expensive kind. >> Java instruction: Writing to a volatile, Unsafe.fullFence(), Using locks >> >> My questions are: >> >> a) Is my understanding correct or am I missed something critical? >> b) If both SFENCE and MFENCE drains the storeBuffer (invalidates >> cacheline and waits for acks from other cpus), why is one a no-op and the >> other a very expensive op? >> >> Thanks >> >> ** Edited to make the info easier to read. >> > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
