Adreas, I have changed a little bit my configuration, and I can confirm the CPU load is NOT coming from servicedependency but only from the big servicegroup definition. I have removed from the conf the servicedependency definition, keeping only the servicegroup definition and association, and my CPU load is huge. By removing the servicegroup, the CPU is back to normal. Regards,
Vincent Alloo TI France Design Systems Operations Manager Europe and Middle East IT Services Texas Instruments France E-Mail: [EMAIL PROTECTED] Phone: +33 4 93 22 26 97 Mobile: +33 6 82 13 00 80 -----Original Message----- From: Andreas Ericsson [mailto:[EMAIL PROTECTED] Sent: Thursday, November 20, 2008 10:44 AM To: Alloo, Vincent Cc: [email protected] Subject: Re: [Nagios-users] Nagios 3.0.4 performance issue Alloo, Vincent wrote: > Andreas, > Here is an extract of my setup: > > define servicegroup{ > servicegroup_name nrpe_services > alias NRPE Services > } > > define servicedependency{ > host_name svxnagios02 > service_description check_uname > dependent_servicegroup_name nrpe_services > notification_failure_criteria w,u,c > } > > define service { > use unix_24_7 > host_name svxnagios02 > service_description check_uname > check_command check_nrpe_ssl!uname!0 > notification_options c,r > process_perf_data 0 > } > > And a bunch of: > define service { > use unix_24_7 > hostgroup_name > sol-servers,linux-servers,sol-zone-servers,sol-servers-with_hotspare > service_description CPU load > check_command check_nrpe_ssl!check_load!5,4,3!6,5,4 > servicegroups nrpe_services > } > .....(3600 services within the nrpe_services service group) > Oh. Are you proxying all your NRPE checks through some other system? I can't imagine why this would be a good idea, but to each his own, I suppose. With this configuration, each of the 3600 services should each depend on exactly one other service, so the problem I initially foresaw is not in place. However, like Sascha mentioned, Nagios instead seems to run that extra check before any of the other 3600 service checks. I'll need to run some manual testing on this. Since you've only specified "notification_failure_criteria", Nagios should be able to avoid checking the service being depended on until it's trying to send a notification. In fact, it should probably switch the checking order around so that the service being depended upon is checked *after* the dependent service. That would solve your problem until NRPE starts failing. After that, there's no help for it, but then you should definitely see some service check cache hits which will at least make the load on the system bearable. I'll try to find some time to look into this next week at the latest. -- Andreas Ericsson [EMAIL PROTECTED] OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Nagios-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
