This sounds like https://github.com/golang/go/issues/13086
On Mon, 16 Oct 2017, at 09:01 PM, Caleb Spare wrote: > I have a server which is doing log processing and I'm trying to > improve throughput. The bottleneck is access to a shared resource > protected by a sync.Mutex. The issue is that even though there are > nearly always worker goroutines blocked on Lock, only about 10% of the > time (in aggregate) is spent running the code inside the critical > section. > > To be concrete, I have 300 worker goroutines, and they collectively > spend about 280 seconds / second blocked on the mutex, and about 100 > ms / second inside (as measured from Lock returning to Unlock > returning). > > To investigate this, I added some instrumentation around sync.Mutex. I > log events when a Lock is called, when Lock returns (acquire lock), > and when Unlock returns. This shows me that, typically, my code takes > around 5-15 μs between acquire and unlock, but it takes tens or > hundreds of μs between unlock and some other goroutine acquiring the > lock. > > Here's a short section of logged events. The number in square brackets > is the goroutine ID; the timestamp is μs since the trace started. > > [209] 73018.628 acquire lock (+38.798) > [209] 73028.707 unlock > [298] 73049.222 call lock > [199] 73068.748 call lock > [214] 73153.182 call lock > [355] 73259.473 call lock > [157] 73308.336 call lock > [256] 73339.965 call lock > [389] 73517.476 call lock > [325] 73541.077 acquire lock (+512.370) > [325] 73545.650 unlock > [396] 73581.819 call lock > [174] 73645.743 call lock > [224] 73708.319 call lock > [292] 73801.945 call lock > [285] 73855.717 call lock > [357] 73915.478 call lock > [276] 74013.148 acquire lock (+467.498) > > Here you can see that goroutine 209 held the lock for only about 10 μs > before returning from mu.Unlock at t=73028.707 but the next goroutine > (325) didn't return from mu.Lock until t=73541.077, more than 500 μs > after that. Goroutine 325 returned from mu.Unlock after about 5 μs, > but then it took 467 μs for another goroutine (276) to return from > mu.Lock. > What is the best way to debug (and, hopefully, fix) this issue? I'm > not really sure how to diagnose scheduling problems. I looked at > GODEBUG=schedtrace=1000 output but I'm not sure how to interpret it. > Here's a few seconds of that output for this server: > > 2017/10/16 10:38:00 SCHED 253295ms: gomaxprocs=36 idleprocs=0 > threads=56 spinningthreads=18 idlethreads=15 runqueue=1 [0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0] > 2017/10/16 10:38:01 SCHED 254305ms: gomaxprocs=36 idleprocs=1 > threads=56 spinningthreads=18 idlethreads=14 runqueue=0 [0 0 0 0 0 0 0 > 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] > 2017/10/16 10:38:02 SCHED 255307ms: gomaxprocs=36 idleprocs=2 > threads=56 spinningthreads=18 idlethreads=15 runqueue=0 [0 1 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] > 2017/10/16 10:38:03 SCHED 256311ms: gomaxprocs=36 idleprocs=1 > threads=56 spinningthreads=18 idlethreads=13 runqueue=0 [0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0] > 2017/10/16 10:38:04 SCHED 257313ms: gomaxprocs=36 idleprocs=1 > threads=56 spinningthreads=17 idlethreads=13 runqueue=0 [0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] > 2017/10/16 10:38:05 SCHED 258318ms: gomaxprocs=36 idleprocs=1 > threads=56 spinningthreads=18 idlethreads=15 runqueue=0 [0 0 0 0 0 0 0 > 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] > 2017/10/16 10:38:06 SCHED 259318ms: gomaxprocs=36 idleprocs=3 > threads=56 spinningthreads=17 idlethreads=17 runqueue=0 [0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] > 2017/10/16 10:38:07 SCHED 260323ms: gomaxprocs=36 idleprocs=1 > threads=56 spinningthreads=18 idlethreads=14 runqueue=0 [0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] > 2017/10/16 10:38:08 SCHED 261324ms: gomaxprocs=36 idleprocs=2 > threads=56 spinningthreads=18 idlethreads=15 runqueue=0 [0 0 0 0 0 0 0 > 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] > 2017/10/16 10:38:09 SCHED 262327ms: gomaxprocs=36 idleprocs=0 > threads=56 spinningthreads=18 idlethreads=15 runqueue=1 [0 0 1 0 0 0 0 > 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] > 2017/10/16 10:38:10 SCHED 263333ms: gomaxprocs=36 idleprocs=1 > threads=56 spinningthreads=18 idlethreads=14 runqueue=0 [0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] > 2017/10/16 10:38:11 SCHED 264333ms: gomaxprocs=36 idleprocs=4 > threads=56 spinningthreads=18 idlethreads=15 runqueue=0 [0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] > 2017/10/16 10:38:12 SCHED 265336ms: gomaxprocs=36 idleprocs=3 > threads=56 spinningthreads=19 idlethreads=14 runqueue=1 [0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] > 2017/10/16 10:38:13 SCHED 266339ms: gomaxprocs=36 idleprocs=0 > threads=56 spinningthreads=14 idlethreads=15 runqueue=1 [0 0 0 0 0 0 1 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] > > Any ideas? > > Thanks in advance! > Caleb > > -- > You received this message because you are subscribed to the Google Groups > "golang-nuts" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-nuts+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.