Here are my stats... definitely have a problem if latencies are between 5-10 minutes!
check_reaper_frequency was set at 10, which seems high. I am going to try 5 as used in the core nagios guide and see what that does. Nagios Stats 3.2.1 Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org) Last Modified: 03-09-2010 License: GPL CURRENT STATUS DATA ------------------------------------------------------ Status File: /usr/local/nagios/var/status.dat Status File Age: 0d 0h 0m 29s Status File Version: 3.2.1 Program Running Time: 0d 0h 4m 9s Nagios PID: 17295 Used/High/Total Command Buffers: 0 / 0 / 4096 Total Services: 4987 Services Checked: 4987 Services Scheduled: 4970 Services Actively Checked: 4987 Services Passively Checked: 0 Total Service State Change: 0.000 / 16.970 / 0.007 % Active Service Latency: 0.034 / 526.244 / 351.201 sec Active Service Execution Time: 0.013 / 17.745 / 0.393 sec Active Service State Change: 0.000 / 16.970 / 0.007 % Active Services Last 1/5/15/60 min: 205 / 1353 / 3568 / 4970 Passive Service Latency: 0.000 / 0.000 / 0.000 sec Passive Service State Change: 0.000 / 0.000 / 0.000 % Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0 Services Ok/Warn/Unk/Crit: 4969 / 11 / 1 / 6 Services Flapping: 0 Services In Downtime: 0 Total Hosts: 241 Hosts Checked: 241 Hosts Scheduled: 241 Hosts Actively Checked: 241 Host Passively Checked: 0 Total Host State Change: 0.000 / 0.000 / 0.000 % Active Host Latency: 0.000 / 487.501 / 216.928 sec Active Host Execution Time: 0.149 / 4.310 / 3.780 sec Active Host State Change: 0.000 / 0.000 / 0.000 % Active Hosts Last 1/5/15/60 min: 38 / 131 / 199 / 241 Passive Host Latency: 0.000 / 0.000 / 0.000 sec Passive Host State Change: 0.000 / 0.000 / 0.000 % Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0 Hosts Up/Down/Unreach: 241 / 0 / 0 Hosts Flapping: 0 Hosts In Downtime: 0 Active Host Checks Last 1/5/15 min: 49 / 135 / 135 Scheduled: 48 / 131 / 131 On-demand: 1 / 4 / 4 Parallel: 48 / 131 / 131 Serial: 0 / 0 / 0 Cached: 1 / 4 / 4 Passive Host Checks Last 1/5/15 min: 0 / 0 / 0 Active Service Checks Last 1/5/15 min: 313 / 1353 / 1353 Scheduled: 313 / 1353 / 1353 On-demand: 0 / 0 / 0 Cached: 0 / 0 / 0 Passive Service Checks Last 1/5/15 min: 0 / 0 / 0 External Commands Last 1/5/15 min: 0 / 0 / 0 On Oct 22, 2010, at 6:53 PM, Frost, Mark {PBC} wrote: > Matthew, > > You don't say, but my guess would be that you have high latencies. That is > for one of several reasons, Nagios is not able to run checks when it thinks > it should. You can see this information and other stats by looking at the > Performance item near the bottom of the Nav pane in the Nagios web interface. > > You can also run, if memory serves, the "nagiostats" command located in your > Nagios "bin" directory to see this information as well. I actually use that > nagiostats data in a custom check and graph a lot of those latencies and > other Nagios performance related info. > >> From my own experience, I found that I did not pay attention to this >> information when I started using Nagios, then read about it, made a few >> tweaks to make it better then forgot about it. Then as our installation >> grew and grew, I found that some things got worse again and I had to >> consider different tuning options. > > I would recommend that you first read the "Tuning Nagios For Maximum > Performance" section of the docs: > > http://nagios.sourceforge.net/docs/3_0/tuning.html > > If nothing else, this will give you an idea of some things that can affect > latencies. > > Additionally, you may find that you see your average latencies, but then see > something with a whopping huge max latency. It can be hard to track down > what that is in the UI. I've just looked up that max latency and then > quickly looked in the status.dat file to find the service that had that same > matching latency and dug into that. You could, for example, have a few > checks that aren't really timing out so the check may take 10 minutes or more > to complete which would really screw up your overall latencies. Like the > checks wouldn't have finished before the next time they were supposed to be > run. > > Mark > > ________________________________________ > From: Litwin, Matthew [mlit...@stubhub.com] > Sent: Friday, October 22, 2010 8:29 PM > To: nagios-users@lists.sourceforge.net > Subject: [Nagios-users] Scheduled checks falling far behind > > I have been chasing my tail trying to figure out why my RRD files were very > sparsely populated, and I am realizing that my checks are falling behind of > their scheduled times up to 3 times their set check interval. For example a > service that should be checking every 5 minutes. In the example below, the > time is 00:19:02, the last check was 00:10:30 and the next scheduled check > time is 00:13:28. This means it is almost 6 minutes behind schedule and > almost 9 minutes since the last check! > > I find even if I shorten the check interval to say 3 minutes it still behaves > about the same. The server has very low load and nagios is hardly working at > all. (usually below 4% cpu) I haven't touch any of the tuning on this and > from what I have read the default settings appear unthrottled. Is there any > way to make it "work harder"? > > --Service information-- > Last Updated: Sat Oct 23 00:19:02 UTC 2010 > > --Service State Information-- > Current Status: > OK > (for 7d 16h 14m 46s) > Status Information: CPU STATISTICS OK : user=0.12% system=0.00% > iowait=0.00% idle=99.88% > Performance Data: 0.12;0.00;0.00;99.88;80;90 > Current Attempt: 1/3 (HARD state) >>>> Last Check Time: 10-23-2010 00:10:30 <<<< > Check Type: ACTIVE > Check Latency / Duration: 612.633 / 2.052 seconds >>>> Next Scheduled Check: 10-23-2010 00:13:28 <<< > Last State Change: 10-15-2010 08:04:16 > Last Notification: N/A (notification 0) > Is This Service Flapping? > NO > (0.00% state change) > In Scheduled Downtime? > NO > Last Update: 10-23-2010 00:18:33 ( 0d 0h 0m 29s ago) > > > > ------------------------------------------------------------------------------ > Nokia and AT&T present the 2010 Calling All Innovators-North America contest > Create new apps & games for the Nokia N8 for consumers in U.S. and Canada > $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing > Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store > http://p.sf.net/sfu/nokia-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > ------------------------------------------------------------------------------ > Nokia and AT&T present the 2010 Calling All Innovators-North America contest > Create new apps & games for the Nokia N8 for consumers in U.S. and Canada > $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing > Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store > http://p.sf.net/sfu/nokia-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null ------------------------------------------------------------------------------ Nokia and AT&T present the 2010 Calling All Innovators-North America contest Create new apps & games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null