I'm writing a simple benchmark to observe speedup with increased cores, I'm having an issue where the for core count of 1, it still uses GOMAXPROCS (in my case 4, hyper-threading turned off).
I'm thinking its just a flag is not set right, but I'm putting foward everything I know, because I think its still pretty weird for default behaviour. I'm putting *1.* version and environment info, *2.* benchmark source code, *3.* benchmark results with instruction called, *4.* cpu info *1.* version and environment info: $ go version go version go1.12.1 linux/amd64 $ go env GOARCH="amd64" GOBIN="" GOCACHE="/home/point/.cache/go-build" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOOS="linux" GOPATH="/home/point/go" GOPROXY="" GORACE="" GOROOT="/usr/lib/go" GOTMPDIR="" GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64" GCCGO="gccgo" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build577448577=/tmp/go-build -gno-record-gcc-switches" *2.* benchmark source code scale_test.go package scale import "testing" func BenchmarkScale(b *testing.B) { for i := 0; i < b.N; i++ { scale(b) } } scale.go package scale import ( "math" "runtime" "testing" ) func scale(b *testing.B) { const max int = 12 l := int(math.Pow(2, 30)) NP := runtime.GOMAXPROCS(0) var a [max]int slice := max / NP done := make(chan bool) worker := func(P int, prod chan bool) { for i := P * slice; i < (P+1)*slice; i++ { for j := 0; j < l; j++ { a[i]++ } } prod <- true } b.ResetTimer() for p := 0; p < NP; p++ { go worker(p, done) } for p := 0; p < NP; p++ { <-done } } *3.* benchmark results with instruction called Ran with 1 core for multiple times, consistent, time as expected $ go test -bench=. -cpu 1 -trace trace.out -count 5 goos: linux goarch: amd64 BenchmarkScale 1 20830211783 ns/op BenchmarkScale 1 20846636262 ns/op BenchmarkScale 1 20913565823 ns/op BenchmarkScale 1 20811254293 ns/op BenchmarkScale 1 20819699212 ns/op PASS ok _/Demo 104.225s Similar for 2 cores, relatively reasonable timings $ go test -bench=. -cpu 2 -trace trace.out -count 5 goos: linux goarch: amd64 BenchmarkScale-2 1 12717158624 ns/op BenchmarkScale-2 1 12811024997 ns/op BenchmarkScale-2 1 12774018625 ns/op BenchmarkScale-2 1 12770698880 ns/op BenchmarkScale-2 1 12757689969 ns/op PASS ok _/Demo 63.835s Similar for 3 cores, relatively reasonable timings $ go test -bench=. -cpu 3 -trace trace.out -count 5 goos: linux goarch: amd64 BenchmarkScale-3 1 10820517159 ns/op BenchmarkScale-3 1 10761484928 ns/op BenchmarkScale-3 1 10730973532 ns/op BenchmarkScale-3 1 10834395953 ns/op BenchmarkScale-3 1 11004999207 ns/op PASS ok _/Demo 54.158s Similar for 4 cores, relatively reasonable timings $ go test -bench=. -cpu 4 -trace trace.out -count 5 goos: linux goarch: amd64 BenchmarkScale-4 1 8597505567 ns/op BenchmarkScale-4 1 8523662979 ns/op BenchmarkScale-4 1 8578160668 ns/op BenchmarkScale-4 1 8585568729 ns/op BenchmarkScale-4 1 8526519203 ns/op PASS ok _/Demo 42.817s Running all of them repeatedly, the first reading is similar to the readings a 4 cores, not consistent with those at 1 core $ go test -bench=. -cpu 1,2,3,4 -trace trace.out -count 5 goos: linux goarch: amd64 BenchmarkScale 1 8164198465 ns/op BenchmarkScale 1 20826667395 ns/op BenchmarkScale 1 20804635058 ns/op BenchmarkScale 1 20794385469 ns/op BenchmarkScale 1 20816750891 ns/op BenchmarkScale-2 1 12706414818 ns/op BenchmarkScale-2 1 12708898913 ns/op BenchmarkScale-2 1 12770280120 ns/op BenchmarkScale-2 1 12775216602 ns/op BenchmarkScale-2 1 12636852361 ns/op BenchmarkScale-3 1 10756340731 ns/op BenchmarkScale-3 1 11045028650 ns/op BenchmarkScale-3 1 11089450739 ns/op BenchmarkScale-3 1 10771597158 ns/op BenchmarkScale-3 1 10991225203 ns/op BenchmarkScale-4 1 8142770549 ns/op BenchmarkScale-4 1 8448329795 ns/op BenchmarkScale-4 1 8436012552 ns/op BenchmarkScale-4 1 8011377874 ns/op BenchmarkScale-4 1 8069116380 ns/op PASS ok _/Demo 250.779s Obviously would exhibit itself better if run once $ go test -bench=. -cpu 1,2,3,4 -trace trace.out -count 1 goos: linux goarch: amd64 BenchmarkScale 1 8493942504 ns/op BenchmarkScale-2 1 12755406326 ns/op BenchmarkScale-3 1 11021525399 ns/op BenchmarkScale-4 1 8535553001 ns/op PASS ok _/Demo 40.812s Tried not running only 1 parallel, still unreasonable $ go test -bench=. -cpu 1,2,3,4 -trace trace.out -count 1 -parallel 1 goos: linux goarch: amd64 BenchmarkScale 1 9133347693 ns/op BenchmarkScale-2 1 12719535088 ns/op BenchmarkScale-3 1 11691925150 ns/op BenchmarkScale-4 1 8255112076 ns/op PASS ok _/Demo 41.806s Tried also running with -p 1, still unreasonable $ go test -bench=. -cpu 1,2,3,4 -trace trace.out -count 1 -parallel 1 -p 1 goos: linux goarch: amd64 BenchmarkScale 1 8773369668 ns/op BenchmarkScale-2 1 12913794360 ns/op BenchmarkScale-3 1 11678645565 ns/op BenchmarkScale-4 1 8864651538 ns/op PASS ok _/Demo 42.236s *4.* cpu info $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 94 Model name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz Stepping: 3 CPU MHz: 2700.492 CPU max MHz: 3500.0000 CPU min MHz: 800.0000 BogoMIPS: 5186.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.