Hi Bruno, thanks for your comments.
On 3/31/11, Bruno Prémont <[email protected]> wrote: > Hi, > > On Thu, 31 March 2011 Vincent McIntyre <[email protected]> wrote: >> I started turning things off and found that collectd seems to be the >> culprit - the peaks go away entirely if I turn it off. If I turn off >> say half of the plugins, the load peak still occurs but with half the >> amplitude. I have a cron job printing the process table when the peak >> is occurring but nothing obvious shows up; the only process with %CPU >> larger than 0.0 is collectd. Neither does anything in the various >> plots (we use collection3), related to collectd or the other processes >> that are showing any activity (see Processes config below). > > I would say this is due to the scheduling of the various threads used by > collectd. > The "load" varies across different kernel versions e.g. for some kernels > you get those peaks, for others you don't. What kernel are you running on? > It's the stock Debian Lenny kernel, 2.6.26-2-amd64. Should I not see thread activity reflected in, say, 'top', or in the process plugin plots for collectd ? I don't see any peaks there really, certainly nothing with the same pattern in time. I'm not sure if the processes plugin tracks thread count or per-thread statistics; collection3 does not show plots of quantities like this, anyway. > You could reduce collectd to less worker threads in order to not have that > scheduling artifact. > Thanks, I'll give this a try. >> Has anyone seen this before? Any debugging tips? > > Yes, I've even seen machines where there was no CPU activity and load > average kept steadily climbing (going up and down but average climbing). > > Other, much more loaded systems remained with small load value. > > Except using a different kernel version or playing with scheduler settings > I don't see very much you could do... (but remember that load alone is not > a very good indicator, at best it's a hint to look at the other values) > > Possibly multiple processes/threads get blocking each-other in some mutex > in their syscalls because they want to do their job at same time. > Ok. I raised this because I've not seen it before, though we have collectd on quite a few machines, some of which are running the same hardware & kernel. Thanks again, Vince _______________________________________________ collectd mailing list [email protected] http://mailman.verplant.org/listinfo/collectd
