On Oct 24, 2010, at 3:02 PM, Andreas Ericsson wrote: > On 10/24/2010 10:14 PM, Litwin, Matthew wrote: >> Hi Matthieu (and anyone else who might want to throw their hat into >> the ring): >> > > I'll chip in. Your MUA seems to not wrap lines at all though, which > makes replying inline a bit tricky.
Sorry. Blame Apple. :-) > > Note that you should wipe your status.sav files between restarts to > not let old latency affect the numbers you're seeing. I don't seem to have them on my system. > > What system are you running this on? Nagios has been known to have > issues with older non-linux systems where thread libraries aren't > as forgiving as the nptl library shipped with glibc. Also, Nagios > should never run as a virtual guest. It is a 8 core x86 server running CentOS 5.3 > As for the check_result_reaper_frequency things, we ship those unset > so they take the Nagios defaults. We used to have it at 2. I'm unsure > if removing the setting was a conscious choice or just by accident. I will give it a try, thanks. > > In general, you should keep your performance-data and checkresult > files on ramdisks. That will help preventing IO from becoming a > bottleneck. IO wait on the sever is is on average 1% so I doubt that is the problem, but certainly worth investigating. > > >> So after identifying that I have latency times that are around >> 500-600 seconds I have tried the tuning tips form the nagios docs, >> however I have fiddled with it and it while after the restart latency >> drops briefly, then just comes back up to the high levels again. At >> this point I have only been working with check_reaper_frequency and >> max_check_result_reaper_time by doubling and halving them from their >> default values. max_concurrent_checks remains at 0. Load on the >> server is very low. The machine is a 8 core machine so I really wish >> I could make better use of it. Load is a measly 1.5 on average. >> Finally, I tried enable_environment_macros = 0 which actually made it >> worse, once things quiesced after startup. >> use_large_installation_tweaks=1 did improve the latency by maybe %30 >> and I did actually start seeing RRD data come in solid for about 15 >> minutes but then it returned to being sparse again so while a modest >> improvement, it still doesn't fill RRD data to have useful data. >> >> Any other tuning suggestions? I think I have done everything in the >> performance tweaks section that seems relevant, including all of >> those that have been suggested here. >> > > Make sure you haven't got "parallelize_check" set to 0 anywhere. That > will make Nagios try to run the checks one at a time, which obviously > doesn't work too well. If that's the case, you should have a latency > that corresponds to the amount of checks you're running times the > average check execution time minus the normal check-interval. > > In other words; If you've got 900 checks in total, the average check > execution time is 1 second and you plan to run all checks in a 5 minute > interval (300 secs), you should get a latency of roughly 600 seconds. > > If you've got it set for a few checks, Nagios will still fail to run > any other checks during the time the unparallelizeable check runs, > but it doesn't check if such checks are scheduled at the same time as > other checks when it schedules them, so latency will always be a bit > higher when not all checks are run in parallel. > >> In summary, I am looking for some way to make nagios "do more" with >> the system resources as the host is barely working at all. I really >> wish there was some way to just make nagios to have some ability to >> do things more in parallel for cases where a system has plenty of >> horsepower and RAM. If I have to resort to compiling things with >> different settings I would be open to trying it, but I just feel like >> I am grasping at straws now. >> > > Are you using any eventbroker modules? If so, which ones and what > happens when you disable them? Not that I know of. > > What happens when you disable performance-data parsing and writing? Actually, that was what I am trying to get working properly. My RRD data files are sparse as a result. > > Is the system running as a virtual guest? No, it is a hard server. > > Do you have any checks with a check_interval that differs wildly > from the average check_interval? All of my check_interval settings are 5 with a few that are a little bit less. I am running 3.2.1 Documentation suggest I set the check_interval for hosts to 0. Is that appropriate? > A while back there was a bug > that caused Nagios to spread the first service-check in a window > as big as the largest check_interval. Once all checks had been > executed, latency slowly normalized again. This doesn't seem to > match what you're describing, but it could be a similar bug > somewhere else. Using the same check_interval for all hosts and > services should tell if that's the case. > > -- > Andreas Ericsson andreas.erics...@op5.se > OP5 AB www.op5.se > Tel: +46 8-230225 Fax: +46 8-230231 > > Considering the successes of the wars on alcohol, poverty, drugs and > terror, I think we should give some serious thought to declaring war > on peace. ------------------------------------------------------------------------------ Nokia and AT&T present the 2010 Calling All Innovators-North America contest Create new apps & games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null