> -----Original Message----- > From: [EMAIL PROTECTED] [mailto:nagios-users- > [EMAIL PROTECTED] On Behalf Of Frost, Mark {PBG} > Sent: Tuesday, January 22, 2008 10:34 AM > To: Nagios Users > Subject: [Nagios-users] Problem with high latencies after going > distributed > > > > As I'd mentioned in a previous message, I'm in the process of converting > from a centralized > Nagios 2.10 setup all running on a single host to a distributed setup > running on at least 3 > hosts (3 to start anyway). The centralized setup has 572 hosts and 2900 > services 99.9% of which are active checks. >
Not quite to that level here but probably comparable. I'm submitting ~1200 service checks every 5 minutes from my 'largest' remote Nagios to two central boxen receiving a total of 3790 passive checks each every 5 minutes (for redundancy). > Distributed Node 1 (min/max/avg) > Active Service Latency: 0.000 / 7267.198 / > 4241.019 sec > Active Service Execution Time: 0.000 / 60.014 / 0.651 sec > > Distributed Node 2 (min/max/avg) > Active Service Latency: 0.000 / 11475.901 / > 6393.641 sec > Active Service Execution Time: 0.000 / 60.018 / 0.593 sec > > Wow. How many services are being polled/sent on each collector? My comparable stats for the collector above are -- Active Service Latency: 0.001 / 10.390 / 2.385 sec Active Service Execution Time: 0.089 / 47.674 / 1.274 sec This isn't even a dedicated nagios box. It's also doing Cricket data collection for 12831 rrd files at 5 minute intervals and other stuff. My opinion is that unless there is some magic threshold that I haven't crossed (I don't expect that there is), your numbers indicate some network or configuration problem. Others have indicated that the OCSP execution may be an issue. Your OCSP command should execute _very_ quickly so I don't see how it's a significant factor at your levels unless there's a problem there, especially when spreading out 2900 checks over 15 minutes. That's about 3 checks per second versus my 4 per second. For me to send results to _2_ central boxes takes an insignificant amount of time -- $ time ./submit_check_result test test OK test 1 data packet(s) sent to host successfully. 1 data packet(s) sent to host successfully. real 0m0.010s user 0m0.000s sys 0m0.010s Even taking into account nagios setting up the call to submit_check_result it's still trivial. Just making you aware that this is testable by you and may be a red-herring. -- Marc ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Nagios-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
