Does your program set a very large Timeout on its mdns requests (maybe tens 
of hours long)?

It looks like your program is consuming a lot of CPU cycles on managing 
timers. On the left half of the flame graph, lots of CPU cycles are spent 
in runtime.timerproc. Time here indicates a large number of active timers 
(from time.NewTimer or time.After). The CPU cycles attributed to 
runtime.sysmon in the right half of the flame graph are a side effect of 
the runtime.timerproc goroutine doing a large number of short sleeps.

So why are there a large number of active timers in your process? It looks 
like the mdns package has a bug wherein it sets a timeout on operations, 
but never cancels that timeout if the operation completes successfully. 
Instead of using time.After, the mdns package should use time.NewTimer and 
then defer a call to 
Stop: 
https://github.com/hashicorp/mdns/blob/9d85cf22f9f8d53cb5c81c1b2749f438b2ee333f/client.go#L235

The default timeout is one second, but it seems likely that your process 
specifies a much larger timeout—likely a couple of days to match how long 
it takes before the CPU usage levels out.

You can fix this behavior in your program by using a smaller timeout, so 
the "leaked" timers are released sooner, so there's a smaller number active 
at any time. The mdns package should also be changed to clean up its timer 
before the query method returns, via time.NewTimer and defer Stop.

On Monday, October 17, 2016 at 11:51:52 AM UTC-7, Abhay Bothra wrote:
>
> We are using Hashicorp's mdns library (https://github.com/hashicorp/mdns) 
> for node discovery, with a frequency of 1 mdns query / minute. The CPU 
> consumption by the process increases very gradually over a couple of days, 
> going from 2-3% to 20-30% over 3-4 days. From the runtime instrumentation 
> we have done, the number of go-routines seems to be fairly static.
>
> The attached flame-graph from the pprof output suggests that a log of CPU 
> is being spent on runtime.goexit and runtime.mstart. To me this seems to 
> suggest that we are starting very short lived go-routines.
> - Is it fair to blame lots of short-lived go-routines for this?
> - What else can lead to this sort of behavior?
> - How should be go about instrumenting our code in order to be able to get 
> a root cause?
>
> Really appreciate any help.
>
> Thanks!
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to