Re: [go-nuts] implementation of sync.atomic primitives

thepudds1460 Mon, 19 Mar 2018 09:31:20 -0700

Hi Ian,

I know you were not giving any type of definitive treatise on how go treats 
atomics across different processors...

but is a related aspect restricting instruction reordering by the compiler 
itself?

I don't know what the modern go compiler does at this point, but I think at 
least circa go 1.5 there was a nop function that seemed to be used to help 
prevent the compiler from inlining and then doing instruction re-ordering 
(first snippet below), and I think I've seen you make related comments more 
recently (e.g., FreeBSD atomics discussion snippet I included at the end of 
this post)?

I haven't followed the more recent atomics related changes (including it 
seems in 1.10 there might have been some work around intrinsics such as CL 
28076: "cmd/compile: intrinsify sync/atomic for amd64"?)...

And yes, on the one hand the answer is "respect the memory model and get a 
clean report from the race detector, etc., etc."... but of course sometimes 
the performance aspect of the current compiler does matter beyond just mere 
natural curiosity about how the go compiler does what it does (where 
performance was the context I had looked at this more closely in the past).

Two related snippets:

====================================================
from go 1.5 
https://github.com/golang/go/blob/release-branch.go1.5/src/runtime/atomic_amd64x.go#L11
====================================================
// The calls to nop are to keep these functions from being inlined.
// If they are inlined we have no guarantee that later rewrites of the
// code by optimizers will preserve the relative order of memory accesses.

//go:nosplit
func atomicload(ptr *uint32) uint32 {
nop()
return *ptr
}
====================================================

====================================================
Ian Lance Taylor response to question on FreeBSD atomics discussion on 
golang-dev: https://groups.google.com/forum/#!topic/golang-dev/f3PS8hp4Jfs
====================================================

*> The second issue I have is translating FreeBSD atomic operations to 
runtime *
*> atomic ops. *
*> If I understand it correctly then atomic_load_acq_32 has weaker 
requirements *
*> compared to runtime/internal/atomic.Load. *
*> On x86 the FreeBSD variant is just a compiler barrier to prevent it *
*> re-oredering instructions. *

The Go compiler does reorder instructions.  But it doesn't reorder 
instructions across a non-inlined function call.  On x86 a simple 
memory load suffices for atomic.Load because x86 has a fairly strict 
memory order in any case.  Most other processors are more lenient, and 
require more work in the atomic operation. 

====================================================

--thepudds

On Monday, March 19, 2018 at 1:55:07 AM UTC-4, Ian Lance Taylor wrote:
>
> On Sun, Mar 18, 2018 at 9:47 PM, shivaram via golang-nuts 
> <golan...@googlegroups.com <javascript:>> wrote: 
> > 
> > I noticed that internally, the language implementation seems to rely on 
> the 
> > atomicity of reads to single-word values: 
> > 
> > 
> https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/runtime/chan.go#L160
>  
>
> In the machine level, words like "atomicity" are overloaded with 
> different meanings.  I think what you are saying is that the runtime 
> package is assuming that a load of a machine word will never read an 
> interleaving of two different store of a machine word.  It will always 
> read the value written by a single store, though exactly which store 
> it sees is unknown.  This is true on all the processors that Go 
> supports. 
>
>
> > As I understand it, this atomicity is provided by the cache coherence 
> > algorithms of modern architectures. Accordingly, the implementations in 
> > sync.atomic of word-sized loads (e.g., LoadUint32 on 386 and LoadUint64 
> on 
> > amd64) use ordinary MOV instructions: 
> > 
> > 
> https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_386.s#L146
>  
> > 
> > 
> https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_amd64.s#L103
>  
> > 
> > However, word-sized stores on these architectures use special 
> instructions: 
> > 
> > 
> https://github.com/golang/go/blob/bd859439e72a0c48c64259f7de9f175aae3b9c37/src/sync/atomic/asm_amd64.s#L133
>  
> > 
> > Given that the APIs being implemented don't provide any global ordering 
> > guarantees, what's the reason they can't be implemented solely with MOV? 
>
> You are not giving the correct reason for why atomic.LoadUint32 and 
> LoadUint64 can use ordinary MOV instructions on x86 processors.  The 
> LoadUint32, etc., functions guarantee much more than that they read a 
> value that is not an interleaving a multiple writes.  They are also 
> load-acquire operations, meaning that when the function completes, the 
> caller will see not only the value that was loaded but also all other 
> values that some other processor core wrote before writing to the 
> address being loaded (assuming the write was done using StoreUint32, 
> etc.).  It happens that on x86 you can implement load-acquire using a 
> simple MOV instruction.  Most other multicore processors use a more 
> complex memory model, and their sync/atomic implementations are 
> accordingly more complex. 
>
> Ian 
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] implementation of sync.atomic primitives

Reply via email to