Does your program set a very large Timeout on its mdns requests (maybe tens of hours long)?
It looks like your program is consuming a lot of CPU cycles on managing timers. On the left half of the flame graph, lots of CPU cycles are spent in runtime.timerproc. Time here indicates a large number of active timers (from time.NewTimer or time.After). The CPU cycles attributed to runtime.sysmon in the right half of the flame graph are a side effect of the runtime.timerproc goroutine doing a large number of short sleeps. So why are there a large number of active timers in your process? It looks like the mdns package has a bug wherein it sets a timeout on operations, but never cancels that timeout if the operation completes successfully. Instead of using time.After, the mdns package should use time.NewTimer and then defer a call to Stop: https://github.com/hashicorp/mdns/blob/9d85cf22f9f8d53cb5c81c1b2749f438b2ee333f/client.go#L235 The default timeout is one second, but it seems likely that your process specifies a much larger timeout—likely a couple of days to match how long it takes before the CPU usage levels out. You can fix this behavior in your program by using a smaller timeout, so the "leaked" timers are released sooner, so there's a smaller number active at any time. The mdns package should also be changed to clean up its timer before the query method returns, via time.NewTimer and defer Stop. On Monday, October 17, 2016 at 11:51:52 AM UTC-7, Abhay Bothra wrote: > > We are using Hashicorp's mdns library (https://github.com/hashicorp/mdns) > for node discovery, with a frequency of 1 mdns query / minute. The CPU > consumption by the process increases very gradually over a couple of days, > going from 2-3% to 20-30% over 3-4 days. From the runtime instrumentation > we have done, the number of go-routines seems to be fairly static. > > The attached flame-graph from the pprof output suggests that a log of CPU > is being spent on runtime.goexit and runtime.mstart. To me this seems to > suggest that we are starting very short lived go-routines. > - Is it fair to blame lots of short-lived go-routines for this? > - What else can lead to this sort of behavior? > - How should be go about instrumenting our code in order to be able to get > a root cause? > > Really appreciate any help. > > Thanks! > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.