The trick is to carefully select what you are actually checking. You probably 
don't want to run 5000 checks every five minutes, but you really only need to 
have one check, or a few at most, per server that will tell you whether or not 
whatever you are monitoring is up; that should be enough for your SLA. Make 
sure that check is very inexpensive computationally, and you can safely run it 
once per minute.

For instance, for a  Web site, check_http is a fairly inexpensive check, 
depending on the options you use.

That said, you may also want to look at other tools. I haven't used it myself, 
but I hear that many people use Cacti for this type of higher-resolution 
monitoring/measuring.

A third option is to create your own agent that monitors something important - 
for instance, it could monitor the Web server log files and generate an alert 
if no new entries have been added for 20 seconds, or if it sees a 500 error, 
things like that. Such an agent can submit check results to Nagios as a passive 
check result, basically right as it occurs. Drawback: if the server as a whole 
is down, such an agent wouldn't report a problem. Advantage: such an agent can 
be crafted very specifically to measure whatever parameters your SLA defines.

-----Original Message-----
From: Breandan Dezendorf [mailto:brean...@dezendorf.com] 
Sent: Thursday, February 10, 2011 6:50 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Uptime Calculation Question

On Thu, Feb 10, 2011 at 9:12 PM, Yueh-Hung Liu <yuehung....@gmail.com> wrote:
> nothing will be known without checking.
> you want more precise data you have to do more checks, that is, 
> decrease the "check_interval" value.

And the lower you set the check_interval, the harder the servers have to work 
to keep up with all the checks.  While the servers we are running could very 
well run all 5000 service checks every 5 minutes (or even faster), it would 
chew up a lot of our growth capacity for the server.

--
Breandan Dezendorf
brean...@dezendorf.com
bwdez...@gmail.com

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to