If ndoutils starts to create a heavy burden on the system you can also offload ndoutils/mysql to a second machine. We wrote the below document for Nagios XI, but the doc has the info you'd need to make it work for Nagios Core as well.
http://library.nagios.com/library/products/nagiosxi/documentation/462-offloading-mysql-to-remote-server Javier Vela Diago wrote: > I have a lot of custom checks, written mostly in perl, bash and some > in python. And some take a lo of time. > > Nevermind, I think I found the solution, or at least one part. I > configured to 1 the enable_large_instalallation_tweaks. This options, > 6 months ago, almost crashed my system, so i discarded it. Now, with > bigger problems, is the last thing that I wanted to test, but finally > this afternoon I tested it. > > When I restarted Nagios, the load has started to grow until 6-8, and > the latency problems dissapeared. I was sceptical about the utility of > this options but when the load changes form 2,5 to 6, it means that > the machine is doing a lot of work that before wasn't doing. > > Now the problem is that NDOUtils is causing some latency because of > MYSQL, but well, at least I know what to optimize. Some tips will be > apreciated :) > > Thank you and sorry for your time. > > > De: Daniel Wittenberg <daniel.wittenberg.r...@statefarm.com> > Para: Nagios Users List <nagios-users@lists.sourceforge.net> > Fecha: 11/10/2011 16:02 > Asunto: Re: [Nagios-users] High check latency in a machine with > low load > ------------------------------------------------------------------------ > > > > I think you have the enable_high_latency option enabled J j/k > > Do you have any particular checks that are taking a long time? i.e. > can you watch top and see checks taking a while? > > Dan > > > *From:* Javier Vela Diago [mailto:jv...@s2grupo.es] * > Sent:* Tuesday, October 11, 2011 6:23 AM* > To:* nagios-users@lists.sourceforge.net* > Subject:* [Nagios-users] High check latency in a machine with low load > > Hi, > > I have a Nagios 3.2.3 deployment with 1000+ Hosts and 3000+ services. > This Nagios runs together with NDO and PNP (in bulk mode) in a server > with 4GB of Ram and 4 cpus. > > One day I realized that the check delay in the performance CGI was > very high (300-400 seconds). It was very strange so I took the tunning > guide form nagios > (_http://nagios.sourceforge.net/docs/3_0/tuning.html_) and applied all > the points I could. In particular I adjusted the max_concurrent_checks > to zero (no limit): > > max_concurrent_checks=0 > > The reaper event: > > service_reaper_frequency=5 > max_check_result_reaper_time=15 > > and checked that the host checks where not forced. In addition I > configured 15 seconds of host check cache. > > cached_host_check_horizon=15 > > But the problem remains. And the load of the server is not very high. > Load of 2,5, 2 GB of free memory and an average utilization of disc of > 7%. I disabled NDO and PNP but it was useless. After the first round > of checks, the delay returns, while the load of the server doesn't grow. > > I have searched in google but all the problems area because of the > load in the server, but here this is not the main problem. So my > question is ¿what can I do now?¿There is some variable that shows me > where to look? I'm a bit lost right now and I don't know how to find > the problem. > > ¿Or maybe the only way is to configure a master-slave nagios in order > to maximize the server utilization? > > In addition, I have pretty big timeouts (60 seconds) because of the > high latency on the network. All your help is appreciated. Thank you > in advance. > * > nagiostats* > Nagios Stats 3.2.3 > Copyright (c) 2003-2008 Ethan Galstad (_www.nagios.org_) > Last Modified: 10-03-2010 > License: GPL > > CURRENT STATUS DATA > ------------------------------------------------------ > Status File: > /usr/local/argos/aplicaciones/nagios/var/status.dat > Status File Age: 0d 0h 0m 11s > Status File Version: 3.2.3 > > Program Running Time: 0d 20h 56m 7s > Nagios PID: 21834 > Used/High/Total Command Buffers: 0 / 0 / 4096 > > Total Services: 4032 > Services Checked: 4032 > Services Scheduled: 4030 > Services Actively Checked: 4032 > Services Passively Checked: 0 > Total Service State Change: 0.000 / 37.300 / 0.163 % > Active Service Latency: 32.876 / 442.138 / 415.816 sec > Active Service Execution Time: 0.051 / 60.097 / 1.545 sec > Active Service State Change: 0.000 / 37.300 / 0.163 % > Active Services Last 1/5/15/60 min: 237 / 1530 / 4020 / 4020 > Passive Service Latency: 0.000 / 0.000 / 0.000 sec > Passive Service State Change: 0.000 / 0.000 / 0.000 % > Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0 > Services Ok/Warn/Unk/Crit: 3766 / 38 / 44 / 184 > Services Flapping: 0 > Services In Downtime: 0 > > Total Hosts: 931 > Hosts Checked: 931 > Hosts Scheduled: 931 > Hosts Actively Checked: 931 > Host Passively Checked: 0 > Total Host State Change: 0.000 / 12.370 / 0.077 % > Active Host Latency: 0.000 / 441.308 / 416.063 sec > Active Host Execution Time: 0.062 / 10.113 / 0.395 sec > Active Host State Change: 0.000 / 12.370 / 0.077 % > Active Hosts Last 1/5/15/60 min: 74 / 423 / 931 / 931 > Passive Host Latency: 0.000 / 0.000 / 0.000 sec > Passive Host State Change: 0.000 / 0.000 / 0.000 % > Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0 > Hosts Up/Down/Unreach: 897 / 24 / 10 > Hosts Flapping: 0 > Hosts In Downtime: 1 > > Active Host Checks Last 1/5/15 min: 109 / 535 / 1583 > Scheduled: 87 / 433 / 1300 > On-demand: 22 / 102 / 283 > Parallel: 87 / 438 / 1323 > Serial: 0 / 0 / 0 > Cached: 22 / 97 / 260 > Passive Host Checks Last 1/5/15 min: 0 / 0 / 0 > Active Service Checks Last 1/5/15 min: 304 / 1605 / 4924 > Scheduled: 304 / 1605 / 4923 > On-demand: 0 / 0 / 1 > Cached: 0 / 0 / 0 > Passive Service Checks Last 1/5/15 min: 0 / 0 / 0 > > External Commands Last 1/5/15 min: 0 / 0 / 0 > * > nagios -s* > > Nagios Core 3.2.3 > Copyright (c) 2009-2010 Nagios Core Development Team and Community > Contributors > Copyright (c) 1999-2009 Ethan Galstad > Last Modified: 10-03-2010 > License: GPL > > Website: _http://www.nagios.org_ <http://www.nagios.org/> > Warning: aggregate_status_updates directive ignored. All status file > updates are now aggregated. > Warning: downtime_file variable ignored. Downtime entries are now > stored in the status and retention files. > Warning: comment_file variable ignored. Comments are now stored in > the status and retention files. > Timing information on object configuration processing is listed > below. You can use this information to see if precaching your > object configuration would be useful. > > Object Config Source: Config files (uncached) > > OBJECT CONFIG PROCESSING TIMES (* = Potential for precache > savings with -u option) > ---------------------------------- > Read: 0.080036 sec > Resolve: 0.010660 sec * > Recomb Contactgroups: 0.002666 sec * > Recomb Hostgroups: 0.004086 sec * > Dup Services: 0.034632 sec * > Recomb Servicegroups: 0.001277 sec * > Duplicate: 0.010939 sec * > Inherit: 0.005594 sec * > Recomb Contacts: 0.000001 sec * > Sort: 0.000000 sec * > Register: 0.074413 sec > Free: 0.008730 sec > ============ > TOTAL: 0.234920 sec * = 0.071741 sec (30.54%) > estimated savings > > > RETENTION DATA TIMES > ---------------------------------- > Read and Process: 0.495480 sec > ============ > TOTAL: 0.495480 sec > > > Timing information on configuration verification is listed below. > > CONFIG VERIFICATION TIMES (* = Potential for speedup with -x > option) > ---------------------------------- > Object Relationships: 0.060039 sec > Circular Paths: 0.026557 sec * > Misc: 0.005999 sec > ============ > TOTAL: 0.092595 sec * = 0.026557 sec (28.7%) estimated > savings > > > EVENT SCHEDULING TIMES > ------------------------------------- > Get service info: 0.014509 sec > Get host info info: 0.002853 sec > Get service params: 0.000078 sec > Schedule service times: 0.039947 sec > Schedule service events: 0.034656 sec > Get host params: 0.000001 sec > Schedule host times: 0.007519 sec > Schedule host events: 0.029519 sec > ============ > TOTAL: 0.129082 sec > > > Projected scheduling information for host and service checks > is listed below. This information assumes that you are going > to start running Nagios with your current config files. > > HOST SCHEDULING INFORMATION > --------------------------- > Total hosts: 931 > Total scheduled hosts: 931 > Host inter-check delay method: SMART > Average host check interval: 259.01 sec > Host inter-check delay: 0.28 sec > Max host check spread: 30 min > First scheduled check: Tue Oct 11 13:14:08 2011 > Last scheduled check: Tue Oct 11 13:18:26 2011 > > > SERVICE SCHEDULING INFORMATION > ------------------------------- > Total services: 4032 > Total scheduled services: 4030 > Service inter-check delay method: SMART > Average service check interval: 299.55 sec > Inter-check delay: 0.07 sec > Interleave factor method: SMART > Average services per host: 4.33 > Service interleave factor: 5 > Max service check spread: 30 min > First scheduled check: Tue Oct 11 13:15:07 2011 > Last scheduled check: Tue Oct 11 13:20:07 2011 > > > CHECK PROCESSING INFORMATION > ---------------------------- > Check result reaper interval: 5 sec > Max concurrent service checks: Unlimited > > > PERFORMANCE SUGGESTIONS > ----------------------- > I have no suggestions - things look okay. > -- > Javier Vela Diago > S2 GRUPO > Ramiro de Maeztu, 7 bajo. 46022 Valencia > Tel: 963.110.300 Fax: 963.106.086 > e-mail : jvela arroba s2grupo punto es_ > __http://www.s2grupo.es_ > <http://www.s2grupo.es/>------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct_______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > ------------------------------------------------------------------------ > > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null -- Mike Guthrie Technical Team ___ Nagios Enterprises, LLC Email: mguth...@nagios.com Web: www.nagios.com ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null