Re: [go-nuts] Debugging a mutex/scheduling issue

Ian Davis Mon, 16 Oct 2017 13:13:47 -0700

This sounds like https://github.com/golang/go/issues/13086


On Mon, 16 Oct 2017, at 09:01 PM, Caleb Spare wrote:
> I have a server which is doing log processing and I'm trying to
> improve throughput. The bottleneck is access to a shared resource
> protected by a sync.Mutex. The issue is that even though there are
> nearly always worker goroutines blocked on Lock, only about 10% of the
> time (in aggregate) is spent running the code inside the critical
> section.
> 
> To be concrete, I have 300 worker goroutines, and they collectively
> spend about 280 seconds / second blocked on the mutex, and about 100
> ms / second inside (as measured from Lock returning to Unlock
> returning).
> 
> To investigate this, I added some instrumentation around sync.Mutex. I
> log events when a Lock is called, when Lock returns (acquire lock),
> and when Unlock returns. This shows me that, typically, my code takes
> around 5-15 μs between acquire and unlock, but it takes tens or
> hundreds of μs between unlock and some other goroutine acquiring the
> lock.
> 
> Here's a short section of logged events. The number in square brackets
> is the goroutine ID; the timestamp is μs since the trace started.
> 
> [209]    73018.628   acquire lock (+38.798)
> [209]    73028.707   unlock
> [298]    73049.222   call lock
> [199]    73068.748   call lock
> [214]    73153.182   call lock
> [355]    73259.473   call lock
> [157]    73308.336   call lock
> [256]    73339.965   call lock
> [389]    73517.476   call lock
> [325]    73541.077   acquire lock (+512.370)
> [325]    73545.650   unlock
> [396]    73581.819   call lock
> [174]    73645.743   call lock
> [224]    73708.319   call lock
> [292]    73801.945   call lock
> [285]    73855.717   call lock
> [357]    73915.478   call lock
> [276]    74013.148   acquire lock (+467.498)
> 
> Here you can see that goroutine 209 held the lock for only about 10 μs
> before returning from mu.Unlock at t=73028.707 but the next goroutine
> (325) didn't return from mu.Lock until t=73541.077, more than 500 μs
> after that. Goroutine 325 returned from mu.Unlock after about 5 μs,
> but then it took 467 μs for another goroutine (276) to return from
> mu.Lock.
> What is the best way to debug (and, hopefully, fix) this issue? I'm
> not really sure how to diagnose scheduling problems. I looked at
> GODEBUG=schedtrace=1000 output but I'm not sure how to interpret it.
> Here's a few seconds of that output for this server:
> 
> 2017/10/16 10:38:00 SCHED 253295ms: gomaxprocs=36 idleprocs=0
> threads=56 spinningthreads=18 idlethreads=15 runqueue=1 [0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0]
> 2017/10/16 10:38:01 SCHED 254305ms: gomaxprocs=36 idleprocs=1
> threads=56 spinningthreads=18 idlethreads=14 runqueue=0 [0 0 0 0 0 0 0
> 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> 2017/10/16 10:38:02 SCHED 255307ms: gomaxprocs=36 idleprocs=2
> threads=56 spinningthreads=18 idlethreads=15 runqueue=0 [0 1 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> 2017/10/16 10:38:03 SCHED 256311ms: gomaxprocs=36 idleprocs=1
> threads=56 spinningthreads=18 idlethreads=13 runqueue=0 [0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
> 2017/10/16 10:38:04 SCHED 257313ms: gomaxprocs=36 idleprocs=1
> threads=56 spinningthreads=17 idlethreads=13 runqueue=0 [0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> 2017/10/16 10:38:05 SCHED 258318ms: gomaxprocs=36 idleprocs=1
> threads=56 spinningthreads=18 idlethreads=15 runqueue=0 [0 0 0 0 0 0 0
> 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> 2017/10/16 10:38:06 SCHED 259318ms: gomaxprocs=36 idleprocs=3
> threads=56 spinningthreads=17 idlethreads=17 runqueue=0 [0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> 2017/10/16 10:38:07 SCHED 260323ms: gomaxprocs=36 idleprocs=1
> threads=56 spinningthreads=18 idlethreads=14 runqueue=0 [0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> 2017/10/16 10:38:08 SCHED 261324ms: gomaxprocs=36 idleprocs=2
> threads=56 spinningthreads=18 idlethreads=15 runqueue=0 [0 0 0 0 0 0 0
> 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> 2017/10/16 10:38:09 SCHED 262327ms: gomaxprocs=36 idleprocs=0
> threads=56 spinningthreads=18 idlethreads=15 runqueue=1 [0 0 1 0 0 0 0
> 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> 2017/10/16 10:38:10 SCHED 263333ms: gomaxprocs=36 idleprocs=1
> threads=56 spinningthreads=18 idlethreads=14 runqueue=0 [0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> 2017/10/16 10:38:11 SCHED 264333ms: gomaxprocs=36 idleprocs=4
> threads=56 spinningthreads=18 idlethreads=15 runqueue=0 [0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> 2017/10/16 10:38:12 SCHED 265336ms: gomaxprocs=36 idleprocs=3
> threads=56 spinningthreads=19 idlethreads=14 runqueue=1 [0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> 2017/10/16 10:38:13 SCHED 266339ms: gomaxprocs=36 idleprocs=0
> threads=56 spinningthreads=14 idlethreads=15 runqueue=1 [0 0 0 0 0 0 1
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
> 
> Any ideas?
> 
> Thanks in advance!
> Caleb
> 
> -- 
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Debugging a mutex/scheduling issue

Reply via email to