Sorry for late response. Do you mean the total execution was not the same? 
If so then it is not true, you see below two bench are executed both 50000000 
times:

goos: linux
goarch: amd64
BenchmarkMutexWrite/goroutines-2400-8           50000000                
46.5 ns/op

Type: cpu
Time: Aug 26, 2019 at 6:19pm (CEST)
Duration: 2.50s, Total samples = 5.47s (218.62%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top20
Showing nodes accounting for 5.14s, 93.97% of 5.47s total
Dropped 15 nodes (cum <= 0.03s)
Showing top 20 nodes out of 48
      flat  flat%   sum%        cum   cum%
     1.77s 32.36% 32.36%      1.77s 32.36%  runtime.procyield
     1.08s 19.74% 52.10%      2.20s 40.22%  sync.(*Mutex).Lock
     0.40s  7.31% 59.41%      1.34s 24.50%  sync.(*Mutex).Unlock
     0.32s  5.85% 65.27%      2.06s 37.66%  runtime.lock
     0.23s  4.20% 69.47%      1.53s 27.97%  runtime.findrunnable
     0.20s  3.66% 73.13%      0.25s  4.57%  runtime.unlock
     0.18s  3.29% 76.42%      0.19s  3.47%  runtime.pidleput
     0.13s  2.38% 78.79%      0.13s  2.38%  runtime.cansemacquire
     0.12s  2.19% 80.99%      0.12s  2.19%  runtime.futex
     0.10s  1.83% 82.82%      0.11s  2.01%  sync.runtime_nanotime
     0.09s  1.65% 84.46%      3.64s 66.54%  
_/home/changkun/dev/tests_test.BenchmarkMutexWrite.func1.1
     0.09s  1.65% 86.11%      0.09s  1.65%  runtime.casgstatus
     0.08s  1.46% 87.57%      0.08s  1.46%  runtime.(*semaRoot).dequeue
     0.08s  1.46% 89.03%      0.94s 17.18%  runtime.semrelease1
     0.07s  1.28% 90.31%      0.07s  1.28%  runtime.gopark
     0.06s  1.10% 91.41%      0.97s 17.73%  runtime.semacquire1
     0.04s  0.73% 92.14%      0.04s  0.73%  runtime.runqempty
     0.04s  0.73% 92.87%      0.04s  0.73%  sync.runtime_canSpin
     0.03s  0.55% 93.42%      0.03s  0.55%  runtime.(*guintptr).cas
     0.03s  0.55% 93.97%      0.03s  0.55%  runtime.gogo


--------------------

goos: linux
goarch: amd64
BenchmarkMutexWrite/goroutines-4800-8           50000000               317 
ns/op
PASS
ok      _/home/changkun/dev/tests       16.020s

Type: cpu
Time: Aug 26, 2019 at 6:18pm (CEST)
Duration: 16.01s, Total samples = 17.03s (106.35%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top20
Showing nodes accounting for 14640ms, 85.97% of 17030ms total
Dropped 19 nodes (cum <= 85.15ms)
Showing top 20 nodes out of 51
      flat  flat%   sum%        cum   cum%
    2130ms 12.51% 12.51%     2160ms 12.68%  runtime.gopark
    1940ms 11.39% 23.90%     7210ms 42.34%  sync.(*Mutex).Lock
    1640ms  9.63% 33.53%     1810ms 10.63%  sync.runtime_nanotime
    1490ms  8.75% 42.28%     4560ms 26.78%  runtime.semacquire1
    1130ms  6.64% 48.91%     1130ms  6.64%  runtime.casgstatus
     800ms  4.70% 53.61%     3740ms 21.96%  runtime.semrelease1
     590ms  3.46% 57.08%      590ms  3.46%  runtime.(*guintptr).cas
     560ms  3.29% 60.36%      560ms  3.29%  runtime.futex
     530ms  3.11% 63.48%      610ms  3.58%  runtime.lock
     520ms  3.05% 66.53%      520ms  3.05%  runtime.unlock
     440ms  2.58% 69.11%      440ms  2.58%  runtime.semroot
     430ms  2.52% 71.64%      430ms  2.52%  runtime.usleep
     430ms  2.52% 74.16%     4210ms 24.72%  sync.(*Mutex).Unlock
     370ms  2.17% 76.34%     1320ms  7.75%  runtime.ready
     370ms  2.17% 78.51%     4930ms 28.95%  sync.runtime_SemacquireMutex
     340ms  2.00% 80.50%      340ms  2.00%  runtime.(*semaRoot).dequeue
     270ms  1.59% 82.09%    11820ms 69.41%  
_/home/changkun/dev/tests_test.BenchmarkMutexWrite.func1.1
     220ms  1.29% 83.38%      220ms  1.29%  runtime.cansemacquire
     220ms  1.29% 84.67%      290ms  1.70%  runtime.releaseSudog
     220ms  1.29% 85.97%     2490ms 14.62%  runtime.schedule




On Friday, August 23, 2019 at 5:44:46 AM UTC+2, robert engels wrote:
>
> As I expected, the test is not testing what you think it is. Many of the 
> Go routines created do not perform the same number of iterations. The 
> benchmark harness is only designed to try and perform enough iterations to 
> get a time per op while “running N routines”.
>
> You need to rework the test so that all Go routines run the same number of 
> iterations - and time the entire process - then you can see how the 
> concurrency/scheduling latency affects things (or the time of the operation 
> when fully contended). Then the time per is = total time / (iterations * 
> nRoutines)
>
> Here is the code that shows the number of iterations per routine: 
> https://play.golang.org/p/LkAvB39X3_Z
>
>
>
> On Aug 21, 2019, at 6:31 PM, Robert Engels <ren...@ix.netcom.com 
> <javascript:>> wrote:
>
> I don't think you've posted code for the atomic version...
>
> Each Go routine has its own stack. So when you cycle through many Go 
> routines you will be destroying the cache as each touches N memory 
> addresses (that are obviously not shared).
>
> That's my guess anyway - the performance profile certainly looks like a 
> cache issue to me. Once the cache is exhausted - the kernel based scheduler 
> is more efficient - so it does suggest to me that there are some 
> optimizations that can be done in the Go scheduler.
>
> I will look at a few things this evening.
>
> -----Original Message----- 
> From: changkun 
> Sent: Aug 21, 2019 4:51 PM 
> To: golang-nuts 
> Subject: Re: [go-nuts] sync.Mutex encounter large performance drop when 
> goroutine contention more than 3400 
>
> "less than N Go routines it fits in the L1 CPU cache," I am guessing that 
> you are thinking of local queues on each M, the scheduler's local queue 
> size is strict to 256 goroutines. However, in our case, all blocking 
> goroutines don't go the run queue but blocked and stored on semtable, which 
> is a forest and each tree is an unlimited balanced tree. When a lock is 
> released, only a single goroutine will be detached and put into the local 
> queue (so scheduler only schedules runq with a single goroutine without 
> content to globalq). 
> How could an L1/L2 problem appear here? Do you think this is still some 
> kind of "limited L1 cache to store large mount of goroutines" ?
>
> What interests me is a newly created issue, I am not sure if this question 
> is relevant to https://github.com/golang/go/issues/33747
> The issue talked about small contention on large Ps, but a full scale of 
> my benchmark is shown as follows:
>
> [image: performance_ channel v.s. sync.Mutex v.s. atomic.png]
>
>
>
> On Tuesday, August 20, 2019 at 6:10:32 PM UTC+2, Robert Engels wrote:
>>
>> I am assuming that there is an internal Go structure/process that when 
>> there is less than N Go routines it fits in the L1 CPU cache, and beyond a 
>> certain point it spills to the L2 or higher - thus the nearly order of 
>> magnitude performance decrease, yet consistent times within a range.
>>
>> Since the worker code is so trivial, you are seeing this. Most worker 
>> code is not as trivial so the overhead of the locking/scheduler constructs 
>> have far less effect (or the worker is causing L1 evictions anyway - so you 
>> never see the optimum performance possible of the scheduler).
>>
>> -----Original Message----- 
>> From: changkun 
>> Sent: Aug 20, 2019 3:33 AM 
>> To: golang-nuts 
>> Subject: Re: [go-nuts] sync.Mutex encounter large performance drop when 
>> goroutine contention more than 3400 
>>
>> Hi Robert,
>>
>> Thanks for your explanation. But how could I "logged the number of 
>> operations done per Go routine", which particular debug settings you 
>> referring to?
>> It is reasonable that sync.Mutex rely on runtime scheduler but channels 
>> do not. However, it is unclear why a significant performance drop appears. 
>> Is it possible to determine when the performance will appear?
>>
>> Best,
>> Changkun
>>
>> On Monday, August 19, 2019 at 10:27:19 PM UTC+2, Robert Engels wrote:
>>>
>>> I think you'll find the reason that the Mutex uses the Go scheduler. The 
>>> chan is controlled by a 'mutex' which eventually defers to the OS futex - 
>>> and the OS futex is probably more efficient at scheduling in the face of 
>>> large contention - although you would think it should be the other way 
>>> around.
>>>
>>> I am guessing that if you logged the number of operations done per Go 
>>> routine, you will see that the Mutex version is very fair, and the 
>>> chan/futex version is unfair - meaning many are starved.
>>>
>>> -----Original Message----- 
>>> From: changkun 
>>> Sent: Aug 19, 2019 12:50 PM 
>>> To: golang-nuts 
>>> Subject: [go-nuts] sync.Mutex encounter large performance drop when 
>>> goroutine contention more than 3400 
>>>
>>> I am comparing the performance regarding sync.Mutex and Go channels. 
>>> Here is my benchmark: https://play.golang.org/p/zLjVtsSx9gd
>>>
>>> The performance comparison visualization is as follows:
>>>
>>> [image: sync.Mutex performance (1).png]
>>> What are the reasons that 
>>>
>>> 1. sync.Mutex encounter a large performance drop when the number of 
>>> goroutines goes higher than roughly 3400?
>>> 2. Go channels are pretty stable but slower than sync.Mutex before?
>>>
>>>
>>>
>>> Raw bench data by benchstat (go test -bench=. -count=5):
>>>
>>> MutexWrite/goroutines-2400-8  48.6ns ± 1%
>>> MutexWrite/goroutines-2480-8  49.1ns ± 0%
>>> MutexWrite/goroutines-2560-8  49.7ns ± 1%
>>> MutexWrite/goroutines-2640-8  50.5ns ± 3%
>>> MutexWrite/goroutines-2720-8  50.9ns ± 2%
>>> MutexWrite/goroutines-2800-8  51.8ns ± 3%
>>> MutexWrite/goroutines-2880-8  52.5ns ± 2%
>>> MutexWrite/goroutines-2960-8  54.1ns ± 4%
>>> MutexWrite/goroutines-3040-8  54.5ns ± 2%
>>> MutexWrite/goroutines-3120-8  56.1ns ± 3%
>>> MutexWrite/goroutines-3200-8  63.2ns ± 5%
>>> MutexWrite/goroutines-3280-8  77.5ns ± 6%
>>> MutexWrite/goroutines-3360-8   141ns ± 6%
>>> MutexWrite/goroutines-3440-8   239ns ± 8%
>>> MutexWrite/goroutines-3520-8   248ns ± 3%
>>> MutexWrite/goroutines-3600-8   254ns ± 2%
>>> MutexWrite/goroutines-3680-8   256ns ± 1%
>>> MutexWrite/goroutines-3760-8   261ns ± 2%
>>> MutexWrite/goroutines-3840-8   266ns ± 3%
>>> MutexWrite/goroutines-3920-8   276ns ± 3%
>>> MutexWrite/goroutines-4000-8   278ns ± 3%
>>> MutexWrite/goroutines-4080-8   286ns ± 5%
>>> MutexWrite/goroutines-4160-8   293ns ± 4%
>>> MutexWrite/goroutines-4240-8   295ns ± 2%
>>> MutexWrite/goroutines-4320-8   280ns ± 8%
>>> MutexWrite/goroutines-4400-8   294ns ± 9%
>>> MutexWrite/goroutines-4480-8   285ns ±10%
>>> MutexWrite/goroutines-4560-8   290ns ± 8%
>>> MutexWrite/goroutines-4640-8   271ns ± 3%
>>> MutexWrite/goroutines-4720-8   271ns ± 4%
>>>
>>> ChanWrite/goroutines-2400-8  158ns ± 3%
>>> ChanWrite/goroutines-2480-8  159ns ± 2%
>>> ChanWrite/goroutines-2560-8  161ns ± 2%
>>> ChanWrite/goroutines-2640-8  161ns ± 1%
>>> ChanWrite/goroutines-2720-8  163ns ± 1%
>>> ChanWrite/goroutines-2800-8  166ns ± 3%
>>> ChanWrite/goroutines-2880-8  168ns ± 1%
>>> ChanWrite/goroutines-2960-8  176ns ± 4%
>>> ChanWrite/goroutines-3040-8  176ns ± 2%
>>> ChanWrite/goroutines-3120-8  180ns ± 1%
>>> ChanWrite/goroutines-3200-8  180ns ± 1%
>>> ChanWrite/goroutines-3280-8  181ns ± 2%
>>> ChanWrite/goroutines-3360-8  183ns ± 2%
>>> ChanWrite/goroutines-3440-8  188ns ± 3%
>>> ChanWrite/goroutines-3520-8  190ns ± 2%
>>> ChanWrite/goroutines-3600-8  193ns ± 2%
>>> ChanWrite/goroutines-3680-8  196ns ± 3%
>>> ChanWrite/goroutines-3760-8  199ns ± 2%
>>> ChanWrite/goroutines-3840-8  206ns ± 2%
>>> ChanWrite/goroutines-3920-8  209ns ± 2%
>>> ChanWrite/goroutines-4000-8  206ns ± 2%
>>> ChanWrite/goroutines-4080-8  209ns ± 2%
>>> ChanWrite/goroutines-4160-8  208ns ± 2%
>>> ChanWrite/goroutines-4240-8  209ns ± 3%
>>> ChanWrite/goroutines-4320-8  213ns ± 2%
>>> ChanWrite/goroutines-4400-8  209ns ± 2%
>>> ChanWrite/goroutines-4480-8  211ns ± 1%
>>> ChanWrite/goroutines-4560-8  213ns ± 2%
>>> ChanWrite/goroutines-4640-8  215ns ± 1%
>>> ChanWrite/goroutines-4720-8  218ns ± 3%
>>>
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "golang-nuts" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to golan...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/golang-nuts/3275fb21-dfbd-411d-be42-683386e7ebe2%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/golang-nuts/3275fb21-dfbd-411d-be42-683386e7ebe2%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>>
>>>
>>>
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golan...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/golang-nuts/77b8dfc3-53d2-4fbe-9538-cd070d47cd34%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/golang-nuts/77b8dfc3-53d2-4fbe-9538-cd070d47cd34%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>>
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golan...@googlegroups.com <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/28298078-9aa1-4a9d-8e99-0b4f261cbb47%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/golang-nuts/28298078-9aa1-4a9d-8e99-0b4f261cbb47%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/154c04cb-154d-44ad-bf2e-35faac687e51%40googlegroups.com.

Reply via email to