Hi! On Wed, 30 Aug 2006, Marc Powell wrote: > > Active Service Checks: > > <= 1 minute: 81 (4.6%) > > <= 5 minutes: 1719 (97.4%) > > <= 15 minutes: 1727 (97.9%) > > <= 1 hour: 1727 (97.9%) > > Since program start: 1727 (97.9%) > > This seems mostly normal for a 5 minute check_interval. The small > difference between the 5 and 15 minute counts is normal as checks may be > just starting to execute or still in progress at the 5 minute mark. It > does appear that you have some number of services that are not scheduled > for execution or are executing at really long intervals. Look at Service > Detail and sort by last check. Re-examine your configuration for those > services that do not appear to be scheduled properly.
I have a few services that are disabled entirely (don't check actively, don't accept passive checks). Would they count in the above statistic? They seem to fit in with the missing 2.1% (100-97.9). Also, I saw a few checks that were last run about ~20 minutes ago. Those are log checks via NRPE that complete within <1s (no noticeable delay) if run directly on the machine (as user nagios of course). It seems acceptable (and I neither know why it would take 20 minutes nor how to find out why), so I'm willing to let it slide ;). > Looks pretty good to me. The high max check latency number may have been > a one-off event. If that number regularly changes and is always very > high then you might want to verify that you're not starving nagios for > check by running /path/to/nagios/bin/nagios -s > /path/to/nagios/etc/nagios and make sure you meet or exceed it's > recommended values. I guessed as much for the one-off event. It doesn't change, so I feel somewhat safe. As for the recommended values (-s), Nagios says it's okay the way it is. > > Active Hosts Checks: > > <= 1 minute: 0 (0.0%) > > <= 5 minutes: 3 (1.2%) > > <= 15 minutes: 3 (1.2%) > > <= 1 hour: 4 (1.6%) > > Since program start: 27 (10.8%) > > > > and > > > > Check Execution Time: 0.02 sec 10.05 sec 0.208 > sec > > Check Latency: 0.00 sec 17.48 sec 0.204 > sec > > Percent State Change: 0.00% 0.00% 0.00% > > These look normal and expected. You've had 27 service failures since > program start necessitating host checks. That is in line with what I'd expect. > > Am I the only one seeing a discrepancy here? > > The only discrepancy I see is likely due to configuration. You probably > have check intervals or timeperiods misconfigured for ~30 services. About that number of services are disabled entirely right now, so if they count into the statistic, it explains the figures. > > The only way I can make sense of this is that the "<= 15 minutes" > > means "time from being scheduled to actually starting the > > plugin". In that case I wonder what makes it take so long, the > > Check Latency is that number. On average nagios is able to run your > checks within 3.043 seconds of when they are scheduled to run. The > number you are referring to is just a simple count of the number of > plugins that have been run in that time interval. So it means "in the last N minutes, this many services completed" and *not* "this many services needed N minutes to complete (from being started to delivering the retval)"? That would be an eye opener for me :) Regards & Thanks, Tobias -- You don't need eyes to see, you need vision. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null