On 14-11-2012 15:50, deadalnix wrote:
Le 14/11/2012 15:39, Alex Rønne Petersen a écrit :
On 14-11-2012 15:14, Andrei Alexandrescu wrote:
On 11/14/12 1:19 AM, Walter Bright wrote:
On 11/13/2012 11:56 PM, Jonathan M Davis wrote:
Being able to have double-checked locking work would be valuable, and
having
memory barriers would reduce race condition weirdness when locks
aren't used
properly, so I think that it would be desirable to have memory
barriers.

I'm not saying "memory barriers are bad". I'm saying that having the
compiler blindly insert them for shared reads/writes is far from the
right way to do it.

Let's not hasten. That works for Java and C#, and is allowed in C++.

Andrei



I need some clarification here: By memory barrier, do you mean x86's
mfence, sfence, and lfence? Because as Walter said, inserting those
blindly when unnecessary can lead to terrible performance because it
practically murders pipelining.


In fact, x86 is mostly sequentially consistent due to its memory model.
It only require an mfence when an shared store is followed by a shared
load.

I just used x86's fencing instructions as an example because most people here are familiar with it. The problem is much, much bigger on architectures like ARM, MIPS, and PowerPC which are not in-order.


See : http://g.oswego.edu/dl/jmm/cookbook.html for more information on
the barrier required on different architectures.

(And note that you can't optimize this either; since the dependencies
memory barriers are supposed to express are subtle and not detectable by
a compiler, the compiler would always have to insert them because it
can't know when it would be safe not to.)


Compiler is aware of what is thread local and what isn't. It means the
compiler can fully optimize TL store and load (like doing register
promotion or reorder them across shared store/load).

Thread-local loads and stores are not atomic and thus do not take part in the reordering constraints that atomic operations impose. See e.g. the LLVM docs for atomicrmw and atomic load/store.


This have a cost, indeed, but is useful, and Walter's solution to cast
away shared when a mutex is acquired is always available.

--
Alex Rønne Petersen
[email protected]
http://lycus.org

Reply via email to