[go-nuts] Understanding "scheduler latency profile" charts

Konstantin Khomoutov Thu, 24 Aug 2017 03:34:36 -0700

Hi!

We're trying to investigate a case of pathological behaviour of one of
one HTTP services (built with Go 1.8) under heavy load.


The current hypothesis is that under the load being considered,
the scheduler (or the set of available Ps) have hard time serving that
load.  We came to this conclusion by recording the runtime trace and
then analyzing the delays between the pairs of linked events "a goroutine
is unblocked by the netpoller" and "that goroutine starts executing"
(these delays in our case peak to as much as 3-5 seconds).

To verify our observations, we've decided to try looking at the
"scheduler latency profile" which can be obtained from the runtime
trace -- something like:

  $ go tool trace -pprof=sched runtime.trace >sched.pprof
  $ go tool pprof -evince <sched.pprof

What I fail to make sense of, is that why the scheduler latency profile
presents us with a chart of functions.  I would expect to see there Gs or
Ps or a combination of them or something like this.  That's because I
thought the scheduler is only concerned with functions because their
entry points can serve as preemption points.

Another property of these charts I get, and of which I'm curious, is
that the function chains on the chains list all "trunk" functions as
contributing about 0 amount of delay -- with only the leaf (or root?)
functions (those which end the chains at the bottom of the chart)
appearing to contribute the most.  What makes this interesting is that
almost all of these functions are internals of the "runtime" package --
such as runtime.chansend1, runtime.selectnbsend, runtime.selectgo,
runtime.mallocgc.

There's also sync.(*Mutex).Unlock, and two surprising functions:
net/http.(*Server).Serve and net/http.(*connReader).startBackgroundRead.

Based on the facts net/http.(*Server).Serve spends most of its time
doing syscall.Accept() and creating goroutines, and
net/http.(*connReader).startBackgroundRead also creates a goroutine,
I hypothesize that the functions which end up accounted for most of the
scheduler's delays are those which make a goroutine executing it get
parked waiting on some resource, and the time spent being parked is
considered to be the scheduling delay.

Can anyone please explain how am I supposed to interpret the scheduler
lateny profile charts?


I've attached the result of running `go tool pprof -dot` on the
scheduler latency profile obtained from a sample trace capture, as well
as the SVG render of that chart.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

sched.dot.gz
Description: application/gzip

sched.svg.gz
Description: application/gzip

[go-nuts] Understanding "scheduler latency profile" charts

Reply via email to