Does your program set a very large Timeout on its mdns requests (maybe tens
of hours long)?
It looks like your program is consuming a lot of CPU cycles on managing
timers. On the left half of the flame graph, lots of CPU cycles are spent
in runtime.timerproc. Time here indicates a large number of active timers
(from time.NewTimer or time.After). The CPU cycles attributed to
runtime.sysmon in the right half of the flame graph are a side effect of
the runtime.timerproc goroutine doing a large number of short sleeps.
So why are there a large number of active timers in your process? It looks
like the mdns package has a bug wherein it sets a timeout on operations,
but never cancels that timeout if the operation completes successfully.
Instead of using time.After, the mdns package should use time.NewTimer and
then defer a call to
The default timeout is one second, but it seems likely that your process
specifies a much larger timeout—likely a couple of days to match how long
it takes before the CPU usage levels out.
You can fix this behavior in your program by using a smaller timeout, so
the "leaked" timers are released sooner, so there's a smaller number active
at any time. The mdns package should also be changed to clean up its timer
before the query method returns, via time.NewTimer and defer Stop.
On Monday, October 17, 2016 at 11:51:52 AM UTC-7, Abhay Bothra wrote:
> We are using Hashicorp's mdns library (https://github.com/hashicorp/mdns)
> for node discovery, with a frequency of 1 mdns query / minute. The CPU
> consumption by the process increases very gradually over a couple of days,
> going from 2-3% to 20-30% over 3-4 days. From the runtime instrumentation
> we have done, the number of go-routines seems to be fairly static.
> The attached flame-graph from the pprof output suggests that a log of CPU
> is being spent on runtime.goexit and runtime.mstart. To me this seems to
> suggest that we are starting very short lived go-routines.
> - Is it fair to blame lots of short-lived go-routines for this?
> - What else can lead to this sort of behavior?
> - How should be go about instrumenting our code in order to be able to get
> a root cause?
> Really appreciate any help.
You received this message because you are subscribed to the Google Groups
To unsubscribe from this group and stop receiving emails from it, send an email
For more options, visit https://groups.google.com/d/optout.