[Nagios-users] Nagios Optimization on CentOS
Hello. I've installed Nagios Core 3.3.1, and can't get it to run very fast at all. Machine specs: OS: CentOS 5.7 Processor: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (8 cores) RAM: 8G HD: 800G, 22G used Going by top, the load average of the machine hovers around 1.5-2.0. CPU usage is around 12% across all cores. Memory usage shows about 7.5G being used for buffers, so memory is actually pretty unused too. The reason this seems incredibly underused to me is because we have 6000 hosts we're pinging. Total time for this check is around 6-7 minutes. Considering the lack of load on this box, I'm pretty sure we can improve the total time significantly. We are going to aim for distributed monitoring, we're just not there yet. I figure that's going to take another week or two for me to be comfortable implementing. I'm also attaching the main nagios.cfg file. If there's more information that's needed, please let me know. =nagios.cfg= accept_passive_service_checks=1 admin_email=nagios admin_pager=pagenagios broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg cfg_file=/usr/local/nagios/etc/checkcommands.cfg cfg_file=/usr/local/nagios/etc/contactgroups.cfg cfg_file=/usr/local/nagios/etc/contacts.cfg cfg_file=/usr/local/nagios/etc/hostgroups_auto.cfg cfg_file=/usr/local/nagios/etc/hostgroups.cfg cfg_file=/usr/local/nagios/etc/hostgroups_network_auto.cfg cfg_file=/usr/local/nagios/etc/hosts_auto.cfg cfg_file=/usr/local/nagios/etc/hosts.cfg cfg_file=/usr/local/nagios/etc/hosts_nrpe.cfg cfg_file=/usr/local/nagios/etc/hosts_routers_auto.cfg cfg_file=/usr/local/nagios/etc/hosts_switches_auto.cfg cfg_file=/usr/local/nagios/etc/misccommands.cfg cfg_file=/usr/local/nagios/etc/nrpe_auto.cfg cfg_file=/usr/local/nagios/etc/services_auto.cfg cfg_file=/usr/local/nagios/etc/services.cfg cfg_file=/usr/local/nagios/etc/services_cisco.cfg cfg_file=/usr/local/nagios/etc/services_manual.cfg cfg_file=/usr/local/nagios/etc/services_nrpe.cfg cfg_file=/usr/local/nagios/etc/services_routers_auto.cfg cfg_file=/usr/local/nagios/etc/services_switches_auto.cfg cfg_file=/usr/local/nagios/etc/timeperiods.cfg check_external_commands=1 check_for_orphaned_hosts=1 check_for_orphaned_services=1 check_result_reaper_frequency=2 check_service_freshness=1 command_check_interval=-1 command_file=/usr/local/nagios/var/rw/nagios.cmd date_format=us enable_embedded_perl=1 enable_event_handlers=1 enable_flap_detection=0 enable_notifications=1 enable_predictive_host_dependency_checks=1 enable_predictive_service_dependency_checks=1 event_handler_timeout=30 execute_host_checks=1 execute_service_checks=1 external_command_buffer_slots=4096 freshness_check_interval=60 high_host_flap_threshold=20.0 high_service_flap_threshold=20.0 host_check_timeout=30 host_inter_check_delay_method=n host_perfdata_command=process-host-perfdata host_perfdata_file_mode=a host_perfdata_file_processing_command=process-host-perfdata-file host_perfdata_file_processing_interval=15 host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\ tHOSTSTATETYPE::$HOSTSTATETYPE$ host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata illegal_macro_output_chars=`~$|' illegal_object_name_chars=`~!$%^*|'?,()= interval_length=60 lock_file=/usr/local/nagios/var/nagios.lock log_archive_path=/usr/local/nagios/var/archives log_event_handlers=1 log_external_commands=1 log_file=/usr/local/nagios/var/nagios.log log_host_retries=1 log_initial_states=1 log_notifications=1 log_rotation_method=d log_service_retries=1 low_host_flap_threshold=5.0 low_service_flap_threshold=5.0 max_check_result_reaper_time=10 max_concurrent_checks=0 max_host_check_spread=2 max_service_check_spread=2 nagios_group=nagios nagios_user=nagios notification_timeout=30 obsess_over_services=0 ocsp_timeout=5 perfdata_timeout=5 process_performance_data=1 retain_state_information=1 retention_update_interval=60 service_check_timeout=60 service_inter_check_delay_method=n service_interleave_factor=10 service_perfdata_command=process-service-perfdata service_perfdata_file_mode=a service_perfdata_file_processing_command=process-service-perfdata-file service_perfdata_file_processing_interval=15 service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND:: $SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$ service_perfdata_file=/usr/local/pnp4nagios/var/service-perfdata sleep_time=1 state_retention_file=/usr/local/nagios/var/status.sav status_file=/usr/local/nagios/var/status.log status_update_interval=15 temp_file=/usr/local/nagios/var/nagios.tmp use_agressive_host_checking=0 use_large_installation_tweaks=1 use_retained_program_state=0 use_syslog=0
Re: [Nagios-users] Nagios Optimization on CentOS
Not meaning to toot my own horn, but for larger scales I did a presentation that has config examples and stuff, based on RHEL-5, but should apply the same to RHEL/CentOS 6 as well. http://planet.nagios.org/archives/84-nagios-exchange/3850-daniel-wittenberg-scaling-nagios-at-a-giant-insurance-company I hope to keep building on that based on feedback I've gotten from some other people so if you have any other experiences or issues definitely post them here! Dan -Original Message- From: mpeder...@choopa.com [mailto:mpeder...@choopa.com] Sent: Tuesday, December 27, 2011 10:09 AM To: nagios-users@lists.sourceforge.net Subject: [Nagios-users] Nagios Optimization on CentOS Hello. I've installed Nagios Core 3.3.1, and can't get it to run very fast at all. Machine specs: OS: CentOS 5.7 Processor: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (8 cores) RAM: 8G HD: 800G, 22G used Going by top, the load average of the machine hovers around 1.5-2.0. CPU usage is around 12% across all cores. Memory usage shows about 7.5G being used for buffers, so memory is actually pretty unused too. The reason this seems incredibly underused to me is because we have 6000 hosts we're pinging. Total time for this check is around 6-7 minutes. Considering the lack of load on this box, I'm pretty sure we can improve the total time significantly. We are going to aim for distributed monitoring, we're just not there yet. I figure that's going to take another week or two for me to be comfortable implementing. I'm also attaching the main nagios.cfg file. If there's more information that's needed, please let me know. =nagios.cfg= accept_passive_service_checks=1 admin_email=nagios admin_pager=pagenagios broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg cfg_file=/usr/local/nagios/etc/checkcommands.cfg cfg_file=/usr/local/nagios/etc/contactgroups.cfg cfg_file=/usr/local/nagios/etc/contacts.cfg cfg_file=/usr/local/nagios/etc/hostgroups_auto.cfg cfg_file=/usr/local/nagios/etc/hostgroups.cfg cfg_file=/usr/local/nagios/etc/hostgroups_network_auto.cfg cfg_file=/usr/local/nagios/etc/hosts_auto.cfg cfg_file=/usr/local/nagios/etc/hosts.cfg cfg_file=/usr/local/nagios/etc/hosts_nrpe.cfg cfg_file=/usr/local/nagios/etc/hosts_routers_auto.cfg cfg_file=/usr/local/nagios/etc/hosts_switches_auto.cfg cfg_file=/usr/local/nagios/etc/misccommands.cfg cfg_file=/usr/local/nagios/etc/nrpe_auto.cfg cfg_file=/usr/local/nagios/etc/services_auto.cfg cfg_file=/usr/local/nagios/etc/services.cfg cfg_file=/usr/local/nagios/etc/services_cisco.cfg cfg_file=/usr/local/nagios/etc/services_manual.cfg cfg_file=/usr/local/nagios/etc/services_nrpe.cfg cfg_file=/usr/local/nagios/etc/services_routers_auto.cfg cfg_file=/usr/local/nagios/etc/services_switches_auto.cfg cfg_file=/usr/local/nagios/etc/timeperiods.cfg check_external_commands=1 check_for_orphaned_hosts=1 check_for_orphaned_services=1 check_result_reaper_frequency=2 check_service_freshness=1 command_check_interval=-1 command_file=/usr/local/nagios/var/rw/nagios.cmd date_format=us enable_embedded_perl=1 enable_event_handlers=1 enable_flap_detection=0 enable_notifications=1 enable_predictive_host_dependency_checks=1 enable_predictive_service_dependency_checks=1 event_handler_timeout=30 execute_host_checks=1 execute_service_checks=1 external_command_buffer_slots=4096 freshness_check_interval=60 high_host_flap_threshold=20.0 high_service_flap_threshold=20.0 host_check_timeout=30 host_inter_check_delay_method=n host_perfdata_command=process-host-perfdata host_perfdata_file_mode=a host_perfdata_file_processing_command=process-host-perfdata-file host_perfdata_file_processing_interval=15 host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\ tHOSTSTATETYPE::$HOSTSTATETYPE$ host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata illegal_macro_output_chars=`~$|' illegal_object_name_chars=`~!$%^*|'?,()= interval_length=60 lock_file=/usr/local/nagios/var/nagios.lock log_archive_path=/usr/local/nagios/var/archives log_event_handlers=1 log_external_commands=1 log_file=/usr/local/nagios/var/nagios.log log_host_retries=1 log_initial_states=1 log_notifications=1 log_rotation_method=d log_service_retries=1 low_host_flap_threshold=5.0 low_service_flap_threshold=5.0 max_check_result_reaper_time=10 max_concurrent_checks=0 max_host_check_spread=2 max_service_check_spread=2 nagios_group=nagios nagios_user=nagios notification_timeout=30 obsess_over_services=0 ocsp_timeout=5 perfdata_timeout=5 process_performance_data=1 retain_state_information=1 retention_update_interval=60 service_check_timeout=60 service_inter_check_delay_method=n service_interleave_factor=10 service_perfdata_command=process-service-perfdata service_perfdata_file_mode=a service_perfdata_file_processing_command=process-service-perfdata-file service_perfdata_file_processing_interval
Re: [Nagios-users] Nagios Optimization on CentOS
On Tue, 27 Dec 2011 12:25:07 -0500, mpeder...@choopa.com wrote: Please feel free to toot your own horn there. That's the sort of writeup I needed, and I'll be reading it in a lot of detail today. And now I will sound ungrateful. I've applied the tips in here, and still others I've found online, but I'm still slower than I should be. From what I can tell, this system should be able to execute a ping check for all 6000 servers in a minute, two tops. As of right now, I'm getting 4.5 minutes for the entire check. I've managed to get my system load to (on occasion) hit 3.3, but that's it. CPU usage has remained close to constant. Network traffic is minimal (2Mbit), and disk traffic is minimal (writing out 3Mbytes/second). Any other ideas I can use? -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios Optimization on CentOS
Not at all, we're all here to help... What are you using for your ping check? What is the output from 'nagiostats'? Dan -Original Message- From: mpeder...@choopa.com [mailto:mpeder...@choopa.com] Sent: Tuesday, December 27, 2011 3:21 PM To: Nagios Users List Subject: Re: [Nagios-users] Nagios Optimization on CentOS On Tue, 27 Dec 2011 12:25:07 -0500, mpeder...@choopa.com wrote: Please feel free to toot your own horn there. That's the sort of writeup I needed, and I'll be reading it in a lot of detail today. And now I will sound ungrateful. I've applied the tips in here, and still others I've found online, but I'm still slower than I should be. From what I can tell, this system should be able to execute a ping check for all 6000 servers in a minute, two tops. As of right now, I'm getting 4.5 minutes for the entire check. I've managed to get my system load to (on occasion) hit 3.3, but that's it. CPU usage has remained close to constant. Network traffic is minimal (2Mbit), and disk traffic is minimal (writing out 3Mbytes/second). Any other ideas I can use? -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios Optimization on CentOS
Are you using fping instead of regular ping? It is a different package from ping that needs to be installed separately. Here is an excerpt from its man page: fping is a like program which uses the Internet Control Message Proto- col (ICMP) echo request to determine if a target host is responding. fping differs from ping in that you can specify any number of targets on the command line, or specify a file containing the lists of targets to ping. Instead of sending to one target until it times out or replies, fping will send out a ping packet and move on to the next tar- get in a round-robin fashion. Good Luck, Gregg. On Tue, Dec 27, 2011 at 2:26 PM, Daniel Wittenberg daniel.wittenberg.r...@statefarm.com wrote: Not at all, we're all here to help... What are you using for your ping check? What is the output from 'nagiostats'? Dan -Original Message- From: mpeder...@choopa.com [mailto:mpeder...@choopa.com] Sent: Tuesday, December 27, 2011 3:21 PM To: Nagios Users List Subject: Re: [Nagios-users] Nagios Optimization on CentOS On Tue, 27 Dec 2011 12:25:07 -0500, mpeder...@choopa.com wrote: Please feel free to toot your own horn there. That's the sort of writeup I needed, and I'll be reading it in a lot of detail today. And now I will sound ungrateful. I've applied the tips in here, and still others I've found online, but I'm still slower than I should be. From what I can tell, this system should be able to execute a ping check for all 6000 servers in a minute, two tops. As of right now, I'm getting 4.5 minutes for the entire check. I've managed to get my system load to (on occasion) hit 3.3, but that's it. CPU usage has remained close to constant. Network traffic is minimal (2Mbit), and disk traffic is minimal (writing out 3Mbytes/second). Any other ideas I can use? -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios Optimization on CentOS
I have written a number of blog posts about optimizing and tuning Nagios performance as well - you might find some of them useful: http://www.semintelligent.com/blog/ - Max -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios Optimization on CentOS
On Tue, 27 Dec 2011 21:26:15 +, Daniel Wittenberg daniel.wittenberg.r...@statefarm.com wrote: Not at all, we're all here to help... What are you using for your ping check? What is the output from 'nagiostats'? And now I'm going to admit to feeling like a blooming idiot. As it turns out, the problem was the performance data gathering. We had three separate performance gathering pieces going at once, and I didn't know it (I apologize, I started a week ago, and this was the first project, been learning what I was handed since then). NDO, pnp4nagios, and perfdata options in nagios.cfg. I turned all of them off, and suddenly my system is running the checks with less than 1s of latency (versus the 90+ I was seeing before, and that was at best). I apologize, as I feel like I've wasted a bit of everybody's time. It wasn't deliberate, and I really did go crazy on doing my research beforehand. I just didn't catch what this bottleneck was until after I'd bothered you. -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
Re: [Nagios-users] Nagios Optimization on CentOS
No prob! Hopefully learned some more about performance tuning while you were at it! Dan -Original Message- From: mpeder...@choopa.com [mailto:mpeder...@choopa.com] Sent: Tuesday, December 27, 2011 4:16 PM To: Nagios Users List Subject: Re: [Nagios-users] Nagios Optimization on CentOS On Tue, 27 Dec 2011 21:26:15 +, Daniel Wittenberg daniel.wittenberg.r...@statefarm.com wrote: Not at all, we're all here to help... What are you using for your ping check? What is the output from 'nagiostats'? And now I'm going to admit to feeling like a blooming idiot. As it turns out, the problem was the performance data gathering. We had three separate performance gathering pieces going at once, and I didn't know it (I apologize, I started a week ago, and this was the first project, been learning what I was handed since then). NDO, pnp4nagios, and perfdata options in nagios.cfg. I turned all of them off, and suddenly my system is running the checks with less than 1s of latency (versus the 90+ I was seeing before, and that was at best). I apologize, as I feel like I've wasted a bit of everybody's time. It wasn't deliberate, and I really did go crazy on doing my research beforehand. I just didn't catch what this bottleneck was until after I'd bothered you. -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev ___ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null