Although what you are saying makes a lot of sense(thanks!), I see that my
program is just using a 5 second timeout. Is it possible that it can still
lead to this performance profile?
On Tuesday, October 18, 2016 at 10:22:09 AM UTC-7, rhys.h...@gmail.com
> Does your program set a very large Timeout on its mdns requests (maybe
> tens of hours long)?
> It looks like your program is consuming a lot of CPU cycles on managing
> timers. On the left half of the flame graph, lots of CPU cycles are spent
> in runtime.timerproc. Time here indicates a large number of active timers
> (from time.NewTimer or time.After). The CPU cycles attributed to
> runtime.sysmon in the right half of the flame graph are a side effect of
> the runtime.timerproc goroutine doing a large number of short sleeps.
> So why are there a large number of active timers in your process? It looks
> like the mdns package has a bug wherein it sets a timeout on operations,
> but never cancels that timeout if the operation completes successfully.
> Instead of using time.After, the mdns package should use time.NewTimer and
> then defer a call to Stop:
> The default timeout is one second, but it seems likely that your process
> specifies a much larger timeout—likely a couple of days to match how long
> it takes before the CPU usage levels out.
> You can fix this behavior in your program by using a smaller timeout, so
> the "leaked" timers are released sooner, so there's a smaller number active
> at any time. The mdns package should also be changed to clean up its timer
> before the query method returns, via time.NewTimer and defer Stop.
> On Monday, October 17, 2016 at 11:51:52 AM UTC-7, Abhay Bothra wrote:
>> We are using Hashicorp's mdns library (https://github.com/hashicorp/mdns)
>> for node discovery, with a frequency of 1 mdns query / minute. The CPU
>> consumption by the process increases very gradually over a couple of days,
>> going from 2-3% to 20-30% over 3-4 days. From the runtime instrumentation
>> we have done, the number of go-routines seems to be fairly static.
>> The attached flame-graph from the pprof output suggests that a log of CPU
>> is being spent on runtime.goexit and runtime.mstart. To me this seems to
>> suggest that we are starting very short lived go-routines.
>> - Is it fair to blame lots of short-lived go-routines for this?
>> - What else can lead to this sort of behavior?
>> - How should be go about instrumenting our code in order to be able to
>> get a root cause?
>> Really appreciate any help.
You received this message because you are subscribed to the Google Groups
To unsubscribe from this group and stop receiving emails from it, send an email
For more options, visit https://groups.google.com/d/optout.