* Paolo Bonzini ([email protected]) wrote: > On 09/05/2011 05:12 PM, Mathieu Desnoyers wrote: >>> In userspace we can assume no accesses to write-combining memory occur, >>> > and also that there are no non-temporal load/stores (people would >>> > presumably >>> > write those with assembly or intrinsics and put appropriate lfence/sfence >>> > manually). So rmb and wmb are no-ops on x86. >> >> What about memory barriers for DMA with devices ? For these, we might >> want to define cmm_wmb/rmb and cmm_smp_wmb/rmb differently (keep the >> fences for DMA accesses). > > Yes, splitting wmb/rmb and smp_wmb/rmb makes sense.
Quoting: www.rdrop.com/users/paulmck/scalability/paper/ordering.2007.09.19a.pdf "AMD64 AMD64 is compatible with x86, and has recently updated its memory model [1] to enforce the tighter ordering that actual implementations have provided for some time. The AMD64 implementation of the Linux smp mb() primitive is mfence, smp rmb() is lfence, and smp wmb() is sfence. In theory, these might be relaxed, but any such relaxation must take SSE and 3DNOW instructions into account." -> So I think we should document that cmm_wmb/rmb/mb take care of SSE, 3DNOW and DMA accesses, but cmm_smp_*mb does not. "x86 Since the x86 CPUs provide “process ordering” so that all CPUs agree on the order of a given CPU’s writes to memory, the smp wmb() primitive is a no-op for the CPU [7]. However, a compiler directive is required to prevent the compiler from performing optimizations that would result in reordering across the smp wmb() primitive. On the other hand, x86 CPUs have traditionally given no ordering guarantees for loads, so the smp mb() and smp rmb() primitives expand to lock;addl. This atomic instruction acts as a barrier to both loads and stores. More recently, Intel has published a memory model for x86 [8]. It turns out that Intel’s actual CPUs enforced tighter ordering than was claimed in the previous specifications, so this model is in effect simply mandating the earlier de-facto behavior. However, note that some SSE instructions are weakly ordered (clflush and non-temporal move instructions [6]). CPUs that have SSE can use mfence for smp mb(), lfence for smp rmb(), and sfence for smp wmb(). A few versions of the x86 CPU have a mode bit that enables out-of-order stores, and for these CPUs, smp wmb() must also be defined to be lock;addl. Although many older x86 implementations accommodated self-modifying code without the need for any special instructions, newer revisions of the x86 architecture no longer require x86 CPUs to be so accommodating. Interestingly enough, this relaxation comes just in time to inconvenience JIT implementors." -> So for Intel x86, it would make sense to document that cmm_rmb/wmb/mb take care of SSE, DMA accesses, non-temporal moves and clflush. We should also document that the "smp" variants of those primitives offer no guarantee for these cases. None of our fences offer ordering guarantees wrt prefetch instructions. Thanks, Mathieu -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com _______________________________________________ ltt-dev mailing list [email protected] http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
