I don't think you've posted code for the atomic version...
Each Go routine has its own stack. So when you cycle through many Go routines you will be destroying the cache as each touches N memory addresses (that are obviously not shared).
-- That's my guess anyway - the performance profile certainly looks like a cache issue to me. Once the cache is exhausted - the kernel based scheduler is more efficient - so it does suggest to me that there are some optimizations that can be done in the Go scheduler.
I will look at a few things this evening.
-----Original Message-----
From: changkun
Sent: Aug 21, 2019 4:51 PM
To: golang-nuts
Subject: Re: [go-nuts] sync.Mutex encounter large performance drop when goroutine contention more than 3400--"less than N Go routines it fits in the L1 CPU cache," I am guessing that you are thinking of local queues on each M, the scheduler's local queue size is strict to 256 goroutines. However, in our case, all blocking goroutines don't go the run queue but blocked and stored on semtable, which is a forest and each tree is an unlimited balanced tree. When a lock is released, only a single goroutine will be detached and put into the local queue (so scheduler only schedules runq with a single goroutine without content to globalq).How could an L1/L2 problem appear here? Do you think this is still some kind of "limited L1 cache to store large mount of goroutines" ?What interests me is a newly created issue, I am not sure if this question is relevant to https://github.com/golang/go/issues/33747The issue talked about small contention on large Ps, but a full scale of my benchmark is shown as follows:
On Tuesday, August 20, 2019 at 6:10:32 PM UTC+2, Robert Engels wrote:I am assuming that there is an internal Go structure/process that when there is less than N Go routines it fits in the L1 CPU cache, and beyond a certain point it spills to the L2 or higher - thus the nearly order of magnitude performance decrease, yet consistent times within a range.Since the worker code is so trivial, you are seeing this. Most worker code is not as trivial so the overhead of the locking/scheduler constructs have far less effect (or the worker is causing L1 evictions anyway - so you never see the optimum performance possible of the scheduler).-----Original Message-----
From: changkun
Sent: Aug 20, 2019 3:33 AM
To: golang-nuts
Subject: Re: [go-nuts] sync.Mutex encounter large performance drop when goroutine contention more than 3400Hi Robert,Thanks for your explanation. But how could I "logged the number of operations done per Go routine", which particular debug settings you referring to?It is reasonable that sync.Mutex rely on runtime scheduler but channels do not. However, it is unclear why a significant performance drop appears. Is it possible to determine when the performance will appear?Best,Changkun
On Monday, August 19, 2019 at 10:27:19 PM UTC+2, Robert Engels wrote:I think you'll find the reason that the Mutex uses the Go scheduler. The chan is controlled by a 'mutex' which eventually defers to the OS futex - and the OS futex is probably more efficient at scheduling in the face of large contention - although you would think it should be the other way around.I am guessing that if you logged the number of operations done per Go routine, you will see that the Mutex version is very fair, and the chan/futex version is unfair - meaning many are starved.-----Original Message-----
From: changkun
Sent: Aug 19, 2019 12:50 PM
To: golang-nuts
Subject: [go-nuts] sync.Mutex encounter large performance drop when goroutine contention more than 3400I am comparing the performance regarding sync.Mutex and Go channels. Here is my benchmark: https://play.--golang.org/p/zLjVtsSx9gd The performance comparison visualization is as follows:
What are the reasons that1. sync.Mutex encounter a large performance drop when the number of goroutines goes higher than roughly 3400?2. Go channels are pretty stable but slower than sync.Mutex before?Raw bench data by benchstat (go test -bench=. -count=5):
MutexWrite/goroutines-2400-8 48.6ns ± 1%MutexWrite/goroutines-2480-8 49.1ns ± 0%MutexWrite/goroutines-2560-8 49.7ns ± 1%MutexWrite/goroutines-2640-8 50.5ns ± 3%MutexWrite/goroutines-2720-8 50.9ns ± 2%MutexWrite/goroutines-2800-8 51.8ns ± 3%MutexWrite/goroutines-2880-8 52.5ns ± 2%MutexWrite/goroutines-2960-8 54.1ns ± 4%MutexWrite/goroutines-3040-8 54.5ns ± 2%MutexWrite/goroutines-3120-8 56.1ns ± 3%MutexWrite/goroutines-3200-8 63.2ns ± 5%MutexWrite/goroutines-3280-8 77.5ns ± 6%MutexWrite/goroutines-3360-8 141ns ± 6%MutexWrite/goroutines-3440-8 239ns ± 8%MutexWrite/goroutines-3520-8 248ns ± 3%MutexWrite/goroutines-3600-8 254ns ± 2%MutexWrite/goroutines-3680-8 256ns ± 1%MutexWrite/goroutines-3760-8 261ns ± 2%MutexWrite/goroutines-3840-8 266ns ± 3%MutexWrite/goroutines-3920-8 276ns ± 3%MutexWrite/goroutines-4000-8 278ns ± 3%MutexWrite/goroutines-4080-8 286ns ± 5%MutexWrite/goroutines-4160-8 293ns ± 4%MutexWrite/goroutines-4240-8 295ns ± 2%MutexWrite/goroutines-4320-8 280ns ± 8%MutexWrite/goroutines-4400-8 294ns ± 9%MutexWrite/goroutines-4480-8 285ns ±10%MutexWrite/goroutines-4560-8 290ns ± 8%MutexWrite/goroutines-4640-8 271ns ± 3%MutexWrite/goroutines-4720-8 271ns ± 4%ChanWrite/goroutines-2400-8 158ns ± 3%ChanWrite/goroutines-2480-8 159ns ± 2%ChanWrite/goroutines-2560-8 161ns ± 2%ChanWrite/goroutines-2640-8 161ns ± 1%ChanWrite/goroutines-2720-8 163ns ± 1%ChanWrite/goroutines-2800-8 166ns ± 3%ChanWrite/goroutines-2880-8 168ns ± 1%ChanWrite/goroutines-2960-8 176ns ± 4%ChanWrite/goroutines-3040-8 176ns ± 2%ChanWrite/goroutines-3120-8 180ns ± 1%ChanWrite/goroutines-3200-8 180ns ± 1%ChanWrite/goroutines-3280-8 181ns ± 2%ChanWrite/goroutines-3360-8 183ns ± 2%ChanWrite/goroutines-3440-8 188ns ± 3%ChanWrite/goroutines-3520-8 190ns ± 2%ChanWrite/goroutines-3600-8 193ns ± 2%ChanWrite/goroutines-3680-8 196ns ± 3%ChanWrite/goroutines-3760-8 199ns ± 2%ChanWrite/goroutines-3840-8 206ns ± 2%ChanWrite/goroutines-3920-8 209ns ± 2%ChanWrite/goroutines-4000-8 206ns ± 2%ChanWrite/goroutines-4080-8 209ns ± 2%ChanWrite/goroutines-4160-8 208ns ± 2%ChanWrite/goroutines-4240-8 209ns ± 3%ChanWrite/goroutines-4320-8 213ns ± 2%ChanWrite/goroutines-4400-8 209ns ± 2%ChanWrite/goroutines-4480-8 211ns ± 1%ChanWrite/goroutines-4560-8 213ns ± 2%ChanWrite/goroutines-4640-8 215ns ± 1%ChanWrite/goroutines-4720-8 218ns ± 3%
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/3275fb21- .dfbd-411d-be42-683386e7ebe2% 40googlegroups.com
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com .
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/77b8dfc3- .53d2-4fbe-9538-cd070d47cd34% 40googlegroups.com
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/28298078-9aa1-4a9d-8e99-0b4f261cbb47%40googlegroups.com.
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/1033291083.8810.1566430289691%40wamui-kitty.atl.sa.earthlink.net.