On 18 November 2010 20:18, Daniel Wittenberg <daniel.wittenberg.r...@statefarm.com> wrote: > I’m looking at minimizing the CPU impact that nagios has on our server, and > done some of the basic performance tuning stuff, but what I see right now is > a lot of the nagios worker procs generating a lot of CPU and curious if > there was a way people have used to watch what those processes and threads > were doing that might be taknig the most cycles to try and reduce it?
I've just been looking at this myself. I'm a bit suspicious about the external_command_check_interval directive ( see http://nagios.sourceforge.net/docs/3_0/configmain.html ) If it's set to "-1" (as mine was until recently) then Nagios will check external commands as often as possible. I suspect it helps if you set it to a definite interval, for example 15s, but check nagiostats to make sure your command buffers don't fill up. IME Nagios itself is usually quite light on CPU. It's the plugins and how frequently they run which affect performance the most. I always set check_interval and retry_interval as long as possible in service definitions to spread the load as much as possible. Some plugins can be real performance hogs too, especially check_esx3.pl if you use that (I don't mean to dis' it, as it's a super plugin - it just eats cpu). Run 'top' and you will probably see which plugins are the biggest hogs on your system. ndo (the interface with MySQL if you have that installed) can be a real performance hog. That's a whole other topic! If you're using pnp4nagios for graphing performance data, consider setting it up in bulk mode, ideally on a separate server. It won't make a huge difference but might help a bit. If it's more important to you to stop Nagios hammering your server than it is for Nagios to work right, you can use max_concurrent_checks to limit the number of checks Nagios can run at any time. Keep an eye on your service check latency if you do that though - if latency gets too high (more than a minute or so) you will find Nagios' usefulness diminish quite rapidly! Personally I think you should give Nagios a dedicated server and let it use as much CPU as it needs. Oh, and v3.1.3 includes a fix which improves performance of the status cgis. I'm looking forward to trying that myself next week. Ah, yes, if you have quite a few users, consider setting "refresh_rate" in cgi.cfg to a longer time, otherwise everyone who leaves a status screen open in their browser will hit your Nagios server every 90 seconds (or whatever value it's set to on your system). If I recall I set mine to 180. I'm not sure if any of this will help you, but hopefully it will give you an idea or two. Cheers, Jim ------------------------------------------------------------------------------ Beautiful is writing same markup. Internet Explorer 9 supports standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. Spend less time writing and rewriting code and more time creating great experiences on the web. Be a part of the beta today http://p.sf.net/sfu/msIE9-sfdev2dev _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null