Hi Ian, I know you were not giving any type of definitive treatise on how go treats atomics across different processors...
but is a related aspect restricting instruction reordering by the compiler itself? I don't know what the modern go compiler does at this point, but I think at least circa go 1.5 there was a nop function that seemed to be used to help prevent the compiler from inlining and then doing instruction re-ordering (first snippet below), and I think I've seen you make related comments more recently (e.g., FreeBSD atomics discussion snippet I included at the end of this post)? I haven't followed the more recent atomics related changes (including it seems in 1.10 there might have been some work around intrinsics such as CL 28076: "cmd/compile: intrinsify sync/atomic for amd64"?)... And yes, on the one hand the answer is "respect the memory model and get a clean report from the race detector, etc., etc."... but of course sometimes the performance aspect of the current compiler does matter beyond just mere natural curiosity about how the go compiler does what it does (where performance was the context I had looked at this more closely in the past). Two related snippets: ==================================================== from go 1.5 https://github.com/golang/go/blob/release-branch.go1.5/src/runtime/atomic_amd64x.go#L11 ==================================================== // The calls to nop are to keep these functions from being inlined. // If they are inlined we have no guarantee that later rewrites of the // code by optimizers will preserve the relative order of memory accesses. //go:nosplit func atomicload(ptr *uint32) uint32 { nop() return *ptr } ==================================================== ==================================================== Ian Lance Taylor response to question on FreeBSD atomics discussion on golang-dev: https://groups.google.com/forum/#!topic/golang-dev/f3PS8hp4Jfs ==================================================== *> The second issue I have is translating FreeBSD atomic operations to runtime * *> atomic ops. * *> If I understand it correctly then atomic_load_acq_32 has weaker requirements * *> compared to runtime/internal/atomic.Load. * *> On x86 the FreeBSD variant is just a compiler barrier to prevent it * *> re-oredering instructions. * The Go compiler does reorder instructions. But it doesn't reorder instructions across a non-inlined function call. On x86 a simple memory load suffices for atomic.Load because x86 has a fairly strict memory order in any case. Most other processors are more lenient, and require more work in the atomic operation. ==================================================== --thepudds On Monday, March 19, 2018 at 1:55:07 AM UTC-4, Ian Lance Taylor wrote: > > On Sun, Mar 18, 2018 at 9:47 PM, shivaram via golang-nuts > <golan...@googlegroups.com <javascript:>> wrote: > > > > I noticed that internally, the language implementation seems to rely on > the > > atomicity of reads to single-word values: > > > > > https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/runtime/chan.go#L160 > > > In the machine level, words like "atomicity" are overloaded with > different meanings. I think what you are saying is that the runtime > package is assuming that a load of a machine word will never read an > interleaving of two different store of a machine word. It will always > read the value written by a single store, though exactly which store > it sees is unknown. This is true on all the processors that Go > supports. > > > > As I understand it, this atomicity is provided by the cache coherence > > algorithms of modern architectures. Accordingly, the implementations in > > sync.atomic of word-sized loads (e.g., LoadUint32 on 386 and LoadUint64 > on > > amd64) use ordinary MOV instructions: > > > > > https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_386.s#L146 > > > > > > https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_amd64.s#L103 > > > > > However, word-sized stores on these architectures use special > instructions: > > > > > https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_amd64.s#L133 > > > > > Given that the APIs being implemented don't provide any global ordering > > guarantees, what's the reason they can't be implemented solely with MOV? > > You are not giving the correct reason for why atomic.LoadUint32 and > LoadUint64 can use ordinary MOV instructions on x86 processors. The > LoadUint32, etc., functions guarantee much more than that they read a > value that is not an interleaving a multiple writes. They are also > load-acquire operations, meaning that when the function completes, the > caller will see not only the value that was loaded but also all other > values that some other processor core wrote before writing to the > address being loaded (assuming the write was done using StoreUint32, > etc.). It happens that on x86 you can implement load-acquire using a > simple MOV instruction. Most other multicore processors use a more > complex memory model, and their sync/atomic implementations are > accordingly more complex. > > Ian > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.