Re: [Nagios-users] Nagios 3.0.4 performance issue

Alloo, Vincent Thu, 20 Nov 2008 02:42:56 -0800

Adreas,
I have changed a little bit my configuration, and I can confirm the CPU load is 
NOT coming from servicedependency but only from the big servicegroup definition.
I have removed from the conf the servicedependency definition, keeping only the 
servicegroup definition and association, and my CPU load is huge. By removing 
the servicegroup, the CPU is back to normal.
Regards,

Vincent Alloo
TI France Design Systems Operations Manager
Europe and Middle East IT Services
Texas Instruments France

E-Mail: [EMAIL PROTECTED]
Phone: +33 4 93 22 26 97
Mobile: +33 6 82 13 00 80

-----Original Message-----
From: Andreas Ericsson [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 20, 2008 10:44 AM
To: Alloo, Vincent
Cc: [email protected]
Subject: Re: [Nagios-users] Nagios 3.0.4 performance issue

Alloo, Vincent wrote:
> Andreas,
> Here is an extract of my setup:
> 
> define servicegroup{
>       servicegroup_name       nrpe_services
>       alias                   NRPE Services
> }
> 
> define servicedependency{
>       host_name                       svxnagios02
>       service_description             check_uname
>       dependent_servicegroup_name     nrpe_services
>     notification_failure_criteria     w,u,c
> }
> 
> define service {
> use                            unix_24_7
> host_name                      svxnagios02
> service_description            check_uname
> check_command                  check_nrpe_ssl!uname!0
> notification_options           c,r
> process_perf_data            0
> }
> 
> And a bunch of:
> define service {
> use                           unix_24_7
> hostgroup_name                
> sol-servers,linux-servers,sol-zone-servers,sol-servers-with_hotspare
> service_description           CPU load
> check_command                 check_nrpe_ssl!check_load!5,4,3!6,5,4
> servicegroups                 nrpe_services
> }
> .....(3600 services within the nrpe_services service group)
> 

Oh. Are you proxying all your NRPE checks through some other system? I
can't imagine why this would be a good idea, but to each his own, I suppose.

With this configuration, each of the 3600 services should each depend on
exactly one other service, so the problem I initially foresaw is not in place.
However, like Sascha mentioned, Nagios instead seems to run that extra check
before any of the other 3600 service checks.

I'll need to run some manual testing on this. Since you've only specified
"notification_failure_criteria", Nagios should be able to avoid checking
the service being depended on until it's trying to send a notification. In
fact, it should probably switch the checking order around so that the service
being depended upon is checked *after* the dependent service. That would
solve your problem until NRPE starts failing. After that, there's no help
for it, but then you should definitely see some service check cache hits
which will at least make the load on the system bearable. I'll try to find
some time to look into this next week at the latest.

-- 
Andreas Ericsson                   [EMAIL PROTECTED]
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Nagios 3.0.4 performance issue

Reply via email to