On 10/25/2010 01:19 AM, Litwin, Matthew wrote: > On Oct 24, 2010, at 3:02 PM, Andreas Ericsson wrote: >> >> Note that you should wipe your status.sav files between restarts to >> not let old latency affect the numbers you're seeing. > > I don't seem to have them on my system.
Perhaps you haven't enabled state retention then. If you have, there should be something of the kind in nagios' /var directory. grep your nagios.cfg for retent and see what comes up. >> >> What system are you running this on? Nagios has been known to have >> issues with older non-linux systems where thread libraries aren't >> as forgiving as the nptl library shipped with glibc. Also, Nagios >> should never run as a virtual guest. > > It is a 8 core x86 server running CentOS 5.3 > Not virtual and not what you'd call underdimensioned for the task, then. >> >> In general, you should keep your performance-data and checkresult >> files on ramdisks. That will help preventing IO from becoming a >> bottleneck. > > IO wait on the sever is is on average 1% so I doubt that is the > problem, but certainly worth investigating. That won't be it then. >> >> >>> So after identifying that I have latency times that are around >>> 500-600 seconds I have tried the tuning tips form the nagios docs, >>> however I have fiddled with it and it while after the restart latency >>> drops briefly, then just comes back up to the high levels again. At >>> this point I have only been working with check_reaper_frequency and >>> max_check_result_reaper_time by doubling and halving them from their >>> default values. max_concurrent_checks remains at 0. Load on the >>> server is very low. The machine is a 8 core machine so I really wish >>> I could make better use of it. Load is a measly 1.5 on average. >>> Finally, I tried enable_environment_macros = 0 which actually made it >>> worse, once things quiesced after startup. >>> use_large_installation_tweaks=1 did improve the latency by maybe %30 >>> and I did actually start seeing RRD data come in solid for about 15 >>> minutes but then it returned to being sparse again so while a modest >>> improvement, it still doesn't fill RRD data to have useful data. >>> >>> Any other tuning suggestions? I think I have done everything in the >>> performance tweaks section that seems relevant, including all of >>> those that have been suggested here. >>> >> >> Make sure you haven't got "parallelize_check" set to 0 anywhere. That >> will make Nagios try to run the checks one at a time, which obviously >> doesn't work too well. If that's the case, you should have a latency >> that corresponds to the amount of checks you're running times the >> average check execution time minus the normal check-interval. >> Since you didn't respond to this, I'll just assume you haven't got it set to 0 for any host or service. >> >>> In summary, I am looking for some way to make nagios "do more" with >>> the system resources as the host is barely working at all. I really >>> wish there was some way to just make nagios to have some ability to >>> do things more in parallel for cases where a system has plenty of >>> horsepower and RAM. If I have to resort to compiling things with >>> different settings I would be open to trying it, but I just feel like >>> I am grasping at straws now. >>> >> >> Are you using any eventbroker modules? If so, which ones and what >> happens when you disable them? > > Not that I know of. grep broker /path/to/nagios.cfg will tell you. >> >> What happens when you disable performance-data parsing and writing? > > Actually, that was what I am trying to get working properly. My RRD > data files are sparse as a result. Even so, try disabling it for a bit and see if the way performance data is handled is causing problems. What performance-data gathering solution are you using? >> >> Do you have any checks with a check_interval that differs wildly >> from the average check_interval? > > All of my check_interval settings are 5 with a few that are a little bit less. > > I am running 3.2.1 > > Documentation suggest I set the check_interval for hosts to 0. Is that > appropriate? > That will make Nagios only run host-checks when they're needed (ie, when a services on the host changes from OK to any other state). It's definitely worth trying. It could also be worth setting check_for_updates=0 in your nagios.cfg. The update checks are high priority events which will block checks while it's running. It shouldn't matter, since those checks are run with a 22 hour interval, but every little bit helps, I guess. -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ------------------------------------------------------------------------------ Nokia and AT&T present the 2010 Calling All Innovators-North America contest Create new apps & games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null