Hi! We're trying to investigate a case of pathological behaviour of one of one HTTP services (built with Go 1.8) under heavy load.
The current hypothesis is that under the load being considered, the scheduler (or the set of available Ps) have hard time serving that load. We came to this conclusion by recording the runtime trace and then analyzing the delays between the pairs of linked events "a goroutine is unblocked by the netpoller" and "that goroutine starts executing" (these delays in our case peak to as much as 3-5 seconds). To verify our observations, we've decided to try looking at the "scheduler latency profile" which can be obtained from the runtime trace -- something like: $ go tool trace -pprof=sched runtime.trace >sched.pprof $ go tool pprof -evince <sched.pprof What I fail to make sense of, is that why the scheduler latency profile presents us with a chart of functions. I would expect to see there Gs or Ps or a combination of them or something like this. That's because I thought the scheduler is only concerned with functions because their entry points can serve as preemption points. Another property of these charts I get, and of which I'm curious, is that the function chains on the chains list all "trunk" functions as contributing about 0 amount of delay -- with only the leaf (or root?) functions (those which end the chains at the bottom of the chart) appearing to contribute the most. What makes this interesting is that almost all of these functions are internals of the "runtime" package -- such as runtime.chansend1, runtime.selectnbsend, runtime.selectgo, runtime.mallocgc. There's also sync.(*Mutex).Unlock, and two surprising functions: net/http.(*Server).Serve and net/http.(*connReader).startBackgroundRead. Based on the facts net/http.(*Server).Serve spends most of its time doing syscall.Accept() and creating goroutines, and net/http.(*connReader).startBackgroundRead also creates a goroutine, I hypothesize that the functions which end up accounted for most of the scheduler's delays are those which make a goroutine executing it get parked waiting on some resource, and the time spent being parked is considered to be the scheduling delay. Can anyone please explain how am I supposed to interpret the scheduler lateny profile charts? I've attached the result of running `go tool pprof -dot` on the scheduler latency profile obtained from a sample trace capture, as well as the SVG render of that chart. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
sched.dot.gz
Description: application/gzip
sched.svg.gz
Description: application/gzip