Sorry for late response. Do you mean the total execution was not the same? If so then it is not true, you see below two bench are executed both 50000000 times:
goos: linux goarch: amd64 BenchmarkMutexWrite/goroutines-2400-8 50000000 46.5 ns/op Type: cpu Time: Aug 26, 2019 at 6:19pm (CEST) Duration: 2.50s, Total samples = 5.47s (218.62%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top20 Showing nodes accounting for 5.14s, 93.97% of 5.47s total Dropped 15 nodes (cum <= 0.03s) Showing top 20 nodes out of 48 flat flat% sum% cum cum% 1.77s 32.36% 32.36% 1.77s 32.36% runtime.procyield 1.08s 19.74% 52.10% 2.20s 40.22% sync.(*Mutex).Lock 0.40s 7.31% 59.41% 1.34s 24.50% sync.(*Mutex).Unlock 0.32s 5.85% 65.27% 2.06s 37.66% runtime.lock 0.23s 4.20% 69.47% 1.53s 27.97% runtime.findrunnable 0.20s 3.66% 73.13% 0.25s 4.57% runtime.unlock 0.18s 3.29% 76.42% 0.19s 3.47% runtime.pidleput 0.13s 2.38% 78.79% 0.13s 2.38% runtime.cansemacquire 0.12s 2.19% 80.99% 0.12s 2.19% runtime.futex 0.10s 1.83% 82.82% 0.11s 2.01% sync.runtime_nanotime 0.09s 1.65% 84.46% 3.64s 66.54% _/home/changkun/dev/tests_test.BenchmarkMutexWrite.func1.1 0.09s 1.65% 86.11% 0.09s 1.65% runtime.casgstatus 0.08s 1.46% 87.57% 0.08s 1.46% runtime.(*semaRoot).dequeue 0.08s 1.46% 89.03% 0.94s 17.18% runtime.semrelease1 0.07s 1.28% 90.31% 0.07s 1.28% runtime.gopark 0.06s 1.10% 91.41% 0.97s 17.73% runtime.semacquire1 0.04s 0.73% 92.14% 0.04s 0.73% runtime.runqempty 0.04s 0.73% 92.87% 0.04s 0.73% sync.runtime_canSpin 0.03s 0.55% 93.42% 0.03s 0.55% runtime.(*guintptr).cas 0.03s 0.55% 93.97% 0.03s 0.55% runtime.gogo -------------------- goos: linux goarch: amd64 BenchmarkMutexWrite/goroutines-4800-8 50000000 317 ns/op PASS ok _/home/changkun/dev/tests 16.020s Type: cpu Time: Aug 26, 2019 at 6:18pm (CEST) Duration: 16.01s, Total samples = 17.03s (106.35%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top20 Showing nodes accounting for 14640ms, 85.97% of 17030ms total Dropped 19 nodes (cum <= 85.15ms) Showing top 20 nodes out of 51 flat flat% sum% cum cum% 2130ms 12.51% 12.51% 2160ms 12.68% runtime.gopark 1940ms 11.39% 23.90% 7210ms 42.34% sync.(*Mutex).Lock 1640ms 9.63% 33.53% 1810ms 10.63% sync.runtime_nanotime 1490ms 8.75% 42.28% 4560ms 26.78% runtime.semacquire1 1130ms 6.64% 48.91% 1130ms 6.64% runtime.casgstatus 800ms 4.70% 53.61% 3740ms 21.96% runtime.semrelease1 590ms 3.46% 57.08% 590ms 3.46% runtime.(*guintptr).cas 560ms 3.29% 60.36% 560ms 3.29% runtime.futex 530ms 3.11% 63.48% 610ms 3.58% runtime.lock 520ms 3.05% 66.53% 520ms 3.05% runtime.unlock 440ms 2.58% 69.11% 440ms 2.58% runtime.semroot 430ms 2.52% 71.64% 430ms 2.52% runtime.usleep 430ms 2.52% 74.16% 4210ms 24.72% sync.(*Mutex).Unlock 370ms 2.17% 76.34% 1320ms 7.75% runtime.ready 370ms 2.17% 78.51% 4930ms 28.95% sync.runtime_SemacquireMutex 340ms 2.00% 80.50% 340ms 2.00% runtime.(*semaRoot).dequeue 270ms 1.59% 82.09% 11820ms 69.41% _/home/changkun/dev/tests_test.BenchmarkMutexWrite.func1.1 220ms 1.29% 83.38% 220ms 1.29% runtime.cansemacquire 220ms 1.29% 84.67% 290ms 1.70% runtime.releaseSudog 220ms 1.29% 85.97% 2490ms 14.62% runtime.schedule On Friday, August 23, 2019 at 5:44:46 AM UTC+2, robert engels wrote: > > As I expected, the test is not testing what you think it is. Many of the > Go routines created do not perform the same number of iterations. The > benchmark harness is only designed to try and perform enough iterations to > get a time per op while “running N routines”. > > You need to rework the test so that all Go routines run the same number of > iterations - and time the entire process - then you can see how the > concurrency/scheduling latency affects things (or the time of the operation > when fully contended). Then the time per is = total time / (iterations * > nRoutines) > > Here is the code that shows the number of iterations per routine: > https://play.golang.org/p/LkAvB39X3_Z > > > > On Aug 21, 2019, at 6:31 PM, Robert Engels <ren...@ix.netcom.com > <javascript:>> wrote: > > I don't think you've posted code for the atomic version... > > Each Go routine has its own stack. So when you cycle through many Go > routines you will be destroying the cache as each touches N memory > addresses (that are obviously not shared). > > That's my guess anyway - the performance profile certainly looks like a > cache issue to me. Once the cache is exhausted - the kernel based scheduler > is more efficient - so it does suggest to me that there are some > optimizations that can be done in the Go scheduler. > > I will look at a few things this evening. > > -----Original Message----- > From: changkun > Sent: Aug 21, 2019 4:51 PM > To: golang-nuts > Subject: Re: [go-nuts] sync.Mutex encounter large performance drop when > goroutine contention more than 3400 > > "less than N Go routines it fits in the L1 CPU cache," I am guessing that > you are thinking of local queues on each M, the scheduler's local queue > size is strict to 256 goroutines. However, in our case, all blocking > goroutines don't go the run queue but blocked and stored on semtable, which > is a forest and each tree is an unlimited balanced tree. When a lock is > released, only a single goroutine will be detached and put into the local > queue (so scheduler only schedules runq with a single goroutine without > content to globalq). > How could an L1/L2 problem appear here? Do you think this is still some > kind of "limited L1 cache to store large mount of goroutines" ? > > What interests me is a newly created issue, I am not sure if this question > is relevant to https://github.com/golang/go/issues/33747 > The issue talked about small contention on large Ps, but a full scale of > my benchmark is shown as follows: > > [image: performance_ channel v.s. sync.Mutex v.s. atomic.png] > > > > On Tuesday, August 20, 2019 at 6:10:32 PM UTC+2, Robert Engels wrote: >> >> I am assuming that there is an internal Go structure/process that when >> there is less than N Go routines it fits in the L1 CPU cache, and beyond a >> certain point it spills to the L2 or higher - thus the nearly order of >> magnitude performance decrease, yet consistent times within a range. >> >> Since the worker code is so trivial, you are seeing this. Most worker >> code is not as trivial so the overhead of the locking/scheduler constructs >> have far less effect (or the worker is causing L1 evictions anyway - so you >> never see the optimum performance possible of the scheduler). >> >> -----Original Message----- >> From: changkun >> Sent: Aug 20, 2019 3:33 AM >> To: golang-nuts >> Subject: Re: [go-nuts] sync.Mutex encounter large performance drop when >> goroutine contention more than 3400 >> >> Hi Robert, >> >> Thanks for your explanation. But how could I "logged the number of >> operations done per Go routine", which particular debug settings you >> referring to? >> It is reasonable that sync.Mutex rely on runtime scheduler but channels >> do not. However, it is unclear why a significant performance drop appears. >> Is it possible to determine when the performance will appear? >> >> Best, >> Changkun >> >> On Monday, August 19, 2019 at 10:27:19 PM UTC+2, Robert Engels wrote: >>> >>> I think you'll find the reason that the Mutex uses the Go scheduler. The >>> chan is controlled by a 'mutex' which eventually defers to the OS futex - >>> and the OS futex is probably more efficient at scheduling in the face of >>> large contention - although you would think it should be the other way >>> around. >>> >>> I am guessing that if you logged the number of operations done per Go >>> routine, you will see that the Mutex version is very fair, and the >>> chan/futex version is unfair - meaning many are starved. >>> >>> -----Original Message----- >>> From: changkun >>> Sent: Aug 19, 2019 12:50 PM >>> To: golang-nuts >>> Subject: [go-nuts] sync.Mutex encounter large performance drop when >>> goroutine contention more than 3400 >>> >>> I am comparing the performance regarding sync.Mutex and Go channels. >>> Here is my benchmark: https://play.golang.org/p/zLjVtsSx9gd >>> >>> The performance comparison visualization is as follows: >>> >>> [image: sync.Mutex performance (1).png] >>> What are the reasons that >>> >>> 1. sync.Mutex encounter a large performance drop when the number of >>> goroutines goes higher than roughly 3400? >>> 2. Go channels are pretty stable but slower than sync.Mutex before? >>> >>> >>> >>> Raw bench data by benchstat (go test -bench=. -count=5): >>> >>> MutexWrite/goroutines-2400-8 48.6ns ± 1% >>> MutexWrite/goroutines-2480-8 49.1ns ± 0% >>> MutexWrite/goroutines-2560-8 49.7ns ± 1% >>> MutexWrite/goroutines-2640-8 50.5ns ± 3% >>> MutexWrite/goroutines-2720-8 50.9ns ± 2% >>> MutexWrite/goroutines-2800-8 51.8ns ± 3% >>> MutexWrite/goroutines-2880-8 52.5ns ± 2% >>> MutexWrite/goroutines-2960-8 54.1ns ± 4% >>> MutexWrite/goroutines-3040-8 54.5ns ± 2% >>> MutexWrite/goroutines-3120-8 56.1ns ± 3% >>> MutexWrite/goroutines-3200-8 63.2ns ± 5% >>> MutexWrite/goroutines-3280-8 77.5ns ± 6% >>> MutexWrite/goroutines-3360-8 141ns ± 6% >>> MutexWrite/goroutines-3440-8 239ns ± 8% >>> MutexWrite/goroutines-3520-8 248ns ± 3% >>> MutexWrite/goroutines-3600-8 254ns ± 2% >>> MutexWrite/goroutines-3680-8 256ns ± 1% >>> MutexWrite/goroutines-3760-8 261ns ± 2% >>> MutexWrite/goroutines-3840-8 266ns ± 3% >>> MutexWrite/goroutines-3920-8 276ns ± 3% >>> MutexWrite/goroutines-4000-8 278ns ± 3% >>> MutexWrite/goroutines-4080-8 286ns ± 5% >>> MutexWrite/goroutines-4160-8 293ns ± 4% >>> MutexWrite/goroutines-4240-8 295ns ± 2% >>> MutexWrite/goroutines-4320-8 280ns ± 8% >>> MutexWrite/goroutines-4400-8 294ns ± 9% >>> MutexWrite/goroutines-4480-8 285ns ±10% >>> MutexWrite/goroutines-4560-8 290ns ± 8% >>> MutexWrite/goroutines-4640-8 271ns ± 3% >>> MutexWrite/goroutines-4720-8 271ns ± 4% >>> >>> ChanWrite/goroutines-2400-8 158ns ± 3% >>> ChanWrite/goroutines-2480-8 159ns ± 2% >>> ChanWrite/goroutines-2560-8 161ns ± 2% >>> ChanWrite/goroutines-2640-8 161ns ± 1% >>> ChanWrite/goroutines-2720-8 163ns ± 1% >>> ChanWrite/goroutines-2800-8 166ns ± 3% >>> ChanWrite/goroutines-2880-8 168ns ± 1% >>> ChanWrite/goroutines-2960-8 176ns ± 4% >>> ChanWrite/goroutines-3040-8 176ns ± 2% >>> ChanWrite/goroutines-3120-8 180ns ± 1% >>> ChanWrite/goroutines-3200-8 180ns ± 1% >>> ChanWrite/goroutines-3280-8 181ns ± 2% >>> ChanWrite/goroutines-3360-8 183ns ± 2% >>> ChanWrite/goroutines-3440-8 188ns ± 3% >>> ChanWrite/goroutines-3520-8 190ns ± 2% >>> ChanWrite/goroutines-3600-8 193ns ± 2% >>> ChanWrite/goroutines-3680-8 196ns ± 3% >>> ChanWrite/goroutines-3760-8 199ns ± 2% >>> ChanWrite/goroutines-3840-8 206ns ± 2% >>> ChanWrite/goroutines-3920-8 209ns ± 2% >>> ChanWrite/goroutines-4000-8 206ns ± 2% >>> ChanWrite/goroutines-4080-8 209ns ± 2% >>> ChanWrite/goroutines-4160-8 208ns ± 2% >>> ChanWrite/goroutines-4240-8 209ns ± 3% >>> ChanWrite/goroutines-4320-8 213ns ± 2% >>> ChanWrite/goroutines-4400-8 209ns ± 2% >>> ChanWrite/goroutines-4480-8 211ns ± 1% >>> ChanWrite/goroutines-4560-8 213ns ± 2% >>> ChanWrite/goroutines-4640-8 215ns ± 1% >>> ChanWrite/goroutines-4720-8 218ns ± 3% >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "golang-nuts" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to golan...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/golang-nuts/3275fb21-dfbd-411d-be42-683386e7ebe2%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/golang-nuts/3275fb21-dfbd-411d-be42-683386e7ebe2%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> >>> >>> >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "golang-nuts" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to golan...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/golang-nuts/77b8dfc3-53d2-4fbe-9538-cd070d47cd34%40googlegroups.com >> >> <https://groups.google.com/d/msgid/golang-nuts/77b8dfc3-53d2-4fbe-9538-cd070d47cd34%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> >> >> >> > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golan...@googlegroups.com <javascript:>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/golang-nuts/28298078-9aa1-4a9d-8e99-0b4f261cbb47%40googlegroups.com > > <https://groups.google.com/d/msgid/golang-nuts/28298078-9aa1-4a9d-8e99-0b4f261cbb47%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/154c04cb-154d-44ad-bf2e-35faac687e51%40googlegroups.com.