[Nagios-users] Nagios Optimization on CentOS

2011-12-27 Thread mpedersen
Hello.

I've installed Nagios Core 3.3.1, and can't get it to run very fast at
all.

Machine specs:
OS: CentOS 5.7
Processor: Intel(R) Xeon(R) CPU   E5420  @ 2.50GHz (8 cores)
RAM: 8G
HD: 800G, 22G used

Going by top, the load average of the machine hovers around 1.5-2.0. CPU
usage is around 12% across all cores. Memory usage shows about 7.5G being
used for buffers, so memory is actually pretty unused too.

The reason this seems incredibly underused to me is because we have 6000
hosts we're pinging. Total time for this check is around 6-7 minutes.
Considering the lack of load on this box, I'm pretty sure we can improve
the total time significantly.

We are going to aim for distributed monitoring, we're just not there yet.
I figure that's going to take another week or two for me to be comfortable
implementing.

I'm also attaching the main nagios.cfg file. If there's more information
that's needed, please let me know.


=nagios.cfg=
accept_passive_service_checks=1
admin_email=nagios
admin_pager=pagenagios
broker_module=/usr/local/nagios/bin/ndomod.o
config_file=/usr/local/nagios/etc/ndomod.cfg
cfg_file=/usr/local/nagios/etc/checkcommands.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/hostgroups_auto.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/hostgroups_network_auto.cfg
cfg_file=/usr/local/nagios/etc/hosts_auto.cfg
cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/hosts_nrpe.cfg
cfg_file=/usr/local/nagios/etc/hosts_routers_auto.cfg
cfg_file=/usr/local/nagios/etc/hosts_switches_auto.cfg
cfg_file=/usr/local/nagios/etc/misccommands.cfg
cfg_file=/usr/local/nagios/etc/nrpe_auto.cfg
cfg_file=/usr/local/nagios/etc/services_auto.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
cfg_file=/usr/local/nagios/etc/services_cisco.cfg
cfg_file=/usr/local/nagios/etc/services_manual.cfg
cfg_file=/usr/local/nagios/etc/services_nrpe.cfg
cfg_file=/usr/local/nagios/etc/services_routers_auto.cfg
cfg_file=/usr/local/nagios/etc/services_switches_auto.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_result_reaper_frequency=2
check_service_freshness=1
command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
date_format=us
enable_embedded_perl=1
enable_event_handlers=1
enable_flap_detection=0
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
external_command_buffer_slots=4096
freshness_check_interval=60
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_inter_check_delay_method=n
host_perfdata_command=process-host-perfdata
host_perfdata_file_mode=a
host_perfdata_file_processing_command=process-host-perfdata-file
host_perfdata_file_processing_interval=15
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\
tHOSTSTATETYPE::$HOSTSTATETYPE$
host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata
illegal_macro_output_chars=`~$|'
illegal_object_name_chars=`~!$%^*|'?,()=
interval_length=60
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_event_handlers=1
log_external_commands=1
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=1
log_notifications=1
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
max_check_result_reaper_time=10
max_concurrent_checks=0
max_host_check_spread=2
max_service_check_spread=2
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
obsess_over_services=0
ocsp_timeout=5
perfdata_timeout=5
process_performance_data=1
retain_state_information=1
retention_update_interval=60
service_check_timeout=60
service_inter_check_delay_method=n
service_interleave_factor=10
service_perfdata_command=process-service-perfdata
service_perfdata_file_mode=a
service_perfdata_file_processing_command=process-service-perfdata-file
service_perfdata_file_processing_interval=15
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::
$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$
service_perfdata_file=/usr/local/pnp4nagios/var/service-perfdata
sleep_time=1
state_retention_file=/usr/local/nagios/var/status.sav
status_file=/usr/local/nagios/var/status.log
status_update_interval=15
temp_file=/usr/local/nagios/var/nagios.tmp
use_agressive_host_checking=0
use_large_installation_tweaks=1
use_retained_program_state=0
use_syslog=0



Re: [Nagios-users] Nagios Optimization on CentOS

2011-12-27 Thread Daniel Wittenberg
Not meaning to toot my own horn, but for larger scales I did a presentation 
that has config examples and stuff, based on RHEL-5, but should apply the same 
to RHEL/CentOS 6 as well.

http://planet.nagios.org/archives/84-nagios-exchange/3850-daniel-wittenberg-scaling-nagios-at-a-giant-insurance-company

I hope to keep building on that based on feedback I've gotten from some other 
people so if you have any other experiences or issues definitely post them here!

Dan


-Original Message-
From: mpeder...@choopa.com [mailto:mpeder...@choopa.com] 
Sent: Tuesday, December 27, 2011 10:09 AM
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] Nagios Optimization on CentOS

Hello.

I've installed Nagios Core 3.3.1, and can't get it to run very fast at
all.

Machine specs:
OS: CentOS 5.7
Processor: Intel(R) Xeon(R) CPU   E5420  @ 2.50GHz (8 cores)
RAM: 8G
HD: 800G, 22G used

Going by top, the load average of the machine hovers around 1.5-2.0. CPU
usage is around 12% across all cores. Memory usage shows about 7.5G being
used for buffers, so memory is actually pretty unused too.

The reason this seems incredibly underused to me is because we have 6000
hosts we're pinging. Total time for this check is around 6-7 minutes.
Considering the lack of load on this box, I'm pretty sure we can improve
the total time significantly.

We are going to aim for distributed monitoring, we're just not there yet.
I figure that's going to take another week or two for me to be comfortable
implementing.

I'm also attaching the main nagios.cfg file. If there's more information
that's needed, please let me know.


=nagios.cfg=
accept_passive_service_checks=1
admin_email=nagios
admin_pager=pagenagios
broker_module=/usr/local/nagios/bin/ndomod.o
config_file=/usr/local/nagios/etc/ndomod.cfg
cfg_file=/usr/local/nagios/etc/checkcommands.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/hostgroups_auto.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/hostgroups_network_auto.cfg
cfg_file=/usr/local/nagios/etc/hosts_auto.cfg
cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/hosts_nrpe.cfg
cfg_file=/usr/local/nagios/etc/hosts_routers_auto.cfg
cfg_file=/usr/local/nagios/etc/hosts_switches_auto.cfg
cfg_file=/usr/local/nagios/etc/misccommands.cfg
cfg_file=/usr/local/nagios/etc/nrpe_auto.cfg
cfg_file=/usr/local/nagios/etc/services_auto.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
cfg_file=/usr/local/nagios/etc/services_cisco.cfg
cfg_file=/usr/local/nagios/etc/services_manual.cfg
cfg_file=/usr/local/nagios/etc/services_nrpe.cfg
cfg_file=/usr/local/nagios/etc/services_routers_auto.cfg
cfg_file=/usr/local/nagios/etc/services_switches_auto.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_result_reaper_frequency=2
check_service_freshness=1
command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
date_format=us
enable_embedded_perl=1
enable_event_handlers=1
enable_flap_detection=0
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
external_command_buffer_slots=4096
freshness_check_interval=60
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_inter_check_delay_method=n
host_perfdata_command=process-host-perfdata
host_perfdata_file_mode=a
host_perfdata_file_processing_command=process-host-perfdata-file
host_perfdata_file_processing_interval=15
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\
tHOSTSTATETYPE::$HOSTSTATETYPE$
host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdata
illegal_macro_output_chars=`~$|'
illegal_object_name_chars=`~!$%^*|'?,()=
interval_length=60
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_event_handlers=1
log_external_commands=1
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=1
log_notifications=1
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
max_check_result_reaper_time=10
max_concurrent_checks=0
max_host_check_spread=2
max_service_check_spread=2
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
obsess_over_services=0
ocsp_timeout=5
perfdata_timeout=5
process_performance_data=1
retain_state_information=1
retention_update_interval=60
service_check_timeout=60
service_inter_check_delay_method=n
service_interleave_factor=10
service_perfdata_command=process-service-perfdata
service_perfdata_file_mode=a
service_perfdata_file_processing_command=process-service-perfdata-file
service_perfdata_file_processing_interval

Re: [Nagios-users] Nagios Optimization on CentOS

2011-12-27 Thread mpedersen
On Tue, 27 Dec 2011 12:25:07 -0500, mpeder...@choopa.com wrote:
 Please feel free to toot your own horn there. That's the sort of writeup
I
 needed, and I'll be reading it in a lot of detail today.

And now I will sound ungrateful. I've applied the tips in here, and still
others I've found online, but I'm still slower than I should be. From what
I can tell, this system should be able to execute a ping check for all 6000
servers in a minute, two tops. As of right now, I'm getting 4.5 minutes for
the entire check.

I've managed to get my system load to (on occasion) hit 3.3, but that's
it. CPU usage has remained close to constant. Network traffic is minimal
(2Mbit), and disk traffic is minimal (writing out 3Mbytes/second).

Any other ideas I can use?

--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios Optimization on CentOS

2011-12-27 Thread Daniel Wittenberg
Not at all, we're all here to help...

What are you using for your ping check?  
What is the output from 'nagiostats'?

Dan

-Original Message-
From: mpeder...@choopa.com [mailto:mpeder...@choopa.com] 
Sent: Tuesday, December 27, 2011 3:21 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Nagios Optimization on CentOS

On Tue, 27 Dec 2011 12:25:07 -0500, mpeder...@choopa.com wrote:
 Please feel free to toot your own horn there. That's the sort of writeup
I
 needed, and I'll be reading it in a lot of detail today.

And now I will sound ungrateful. I've applied the tips in here, and still
others I've found online, but I'm still slower than I should be. From what
I can tell, this system should be able to execute a ping check for all 6000
servers in a minute, two tops. As of right now, I'm getting 4.5 minutes for
the entire check.

I've managed to get my system load to (on occasion) hit 3.3, but that's
it. CPU usage has remained close to constant. Network traffic is minimal
(2Mbit), and disk traffic is minimal (writing out 3Mbytes/second).

Any other ideas I can use?

--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios Optimization on CentOS

2011-12-27 Thread Gregory Phillips
Are you using fping instead of regular ping? It is a different package from
ping that needs to be installed separately. Here is an excerpt from its man
page:

fping  is a like program which uses the Internet Control Message Proto-
   col (ICMP) echo request to determine if a target  host  is
 responding.
   fping  differs  from ping in that you can specify any number of
targets
   on the command line, or specify a file containing the lists of
 targets
   to  ping.  Instead  of  sending  to  one  target  until it times out
or
   replies, fping will send out a ping packet and move on to the next
tar-
   get in a round-robin fashion.

Good Luck,

Gregg.


On Tue, Dec 27, 2011 at 2:26 PM, Daniel Wittenberg 
daniel.wittenberg.r...@statefarm.com wrote:

 Not at all, we're all here to help...

 What are you using for your ping check?
 What is the output from 'nagiostats'?

 Dan

 -Original Message-
 From: mpeder...@choopa.com [mailto:mpeder...@choopa.com]
 Sent: Tuesday, December 27, 2011 3:21 PM
 To: Nagios Users List
 Subject: Re: [Nagios-users] Nagios Optimization on CentOS

 On Tue, 27 Dec 2011 12:25:07 -0500, mpeder...@choopa.com wrote:
  Please feel free to toot your own horn there. That's the sort of writeup
 I
  needed, and I'll be reading it in a lot of detail today.

 And now I will sound ungrateful. I've applied the tips in here, and still
 others I've found online, but I'm still slower than I should be. From what
 I can tell, this system should be able to execute a ping check for all 6000
 servers in a minute, two tops. As of right now, I'm getting 4.5 minutes for
 the entire check.

 I've managed to get my system load to (on occasion) hit 3.3, but that's
 it. CPU usage has remained close to constant. Network traffic is minimal
 (2Mbit), and disk traffic is minimal (writing out 3Mbytes/second).

 Any other ideas I can use?


 --
 Write once. Port to many.
 Get the SDK and tools to simplify cross-platform app development. Create
 new or port existing apps to sell to consumers worldwide. Explore the
 Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
 http://p.sf.net/sfu/intel-appdev
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null


 --
 Write once. Port to many.
 Get the SDK and tools to simplify cross-platform app development. Create
 new or port existing apps to sell to consumers worldwide. Explore the
 Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
 http://p.sf.net/sfu/intel-appdev
 ___
 Nagios-users mailing list
 Nagios-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/nagios-users
 ::: Please include Nagios version, plugin version (-v) and OS when
 reporting any issue.
 ::: Messages without supporting info will risk being sent to /dev/null

--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Nagios Optimization on CentOS

2011-12-27 Thread Max Schubert
I have written a number of blog posts about optimizing and tuning
Nagios performance as well - you might find some of them useful:

http://www.semintelligent.com/blog/

- Max

--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios Optimization on CentOS

2011-12-27 Thread mpedersen
On Tue, 27 Dec 2011 21:26:15 +, Daniel Wittenberg
daniel.wittenberg.r...@statefarm.com wrote:
 Not at all, we're all here to help...
 
 What are you using for your ping check?  
 What is the output from 'nagiostats'?

And now I'm going to admit to feeling like a blooming idiot.

As it turns out, the problem was the performance data gathering. We had
three separate performance gathering pieces going at once, and I didn't
know it (I apologize, I started a week ago, and this was the first project,
been learning what I was handed since then).

NDO, pnp4nagios, and perfdata options in nagios.cfg.

I turned all of them off, and suddenly my system is running the checks
with less than 1s of latency (versus the 90+ I was seeing before, and that
was at best).

I apologize, as I feel like I've wasted a bit of everybody's time. It
wasn't deliberate, and I really did go crazy on doing my research
beforehand. I just didn't catch what this bottleneck was until after I'd
bothered you.

--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


Re: [Nagios-users] Nagios Optimization on CentOS

2011-12-27 Thread Daniel Wittenberg
No prob!  Hopefully learned some more about performance tuning while you were 
at it!

Dan

-Original Message-
From: mpeder...@choopa.com [mailto:mpeder...@choopa.com] 
Sent: Tuesday, December 27, 2011 4:16 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Nagios Optimization on CentOS

On Tue, 27 Dec 2011 21:26:15 +, Daniel Wittenberg
daniel.wittenberg.r...@statefarm.com wrote:
 Not at all, we're all here to help...
 
 What are you using for your ping check?  
 What is the output from 'nagiostats'?

And now I'm going to admit to feeling like a blooming idiot.

As it turns out, the problem was the performance data gathering. We had
three separate performance gathering pieces going at once, and I didn't
know it (I apologize, I started a week ago, and this was the first project,
been learning what I was handed since then).

NDO, pnp4nagios, and perfdata options in nagios.cfg.

I turned all of them off, and suddenly my system is running the checks
with less than 1s of latency (versus the 90+ I was seeing before, and that
was at best).

I apologize, as I feel like I've wasted a bit of everybody's time. It
wasn't deliberate, and I really did go crazy on doing my research
beforehand. I just didn't catch what this bottleneck was until after I'd
bothered you.

--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
___
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null