> -----Original Message----- > From: Max Schubert [mailto:[email protected]] > Sent: Sunday, 9 October 2011 2:19 AM > Subject: Re: [Nagios-users] Average Check latency and execution time > growth - 3.2.3
Sorry for the delay in response, went on break for a few weeks. > What minor RHEL rev are you running? We had one poller that was > running RHEL 5.3 that had constantly increasing latency - a Compaw / > AMD based host. None of the optimizations / configuration changes we > made to the other pollers we ran at the time seemed to help this one - > we updated the poller in-box from 5.3 to 5.4 and voila - issue gone. Fully up-to-date EL5.7. > As Joerge mentioned, probably was a memory leak / bug in a library the > parent Nagios poller process was using, we never did determine which > one and we haven't hit that same issue since then with any 5.4 or 5.5 > pollers. Embedded perl is still in use on this box (too many perl-written plugins to change it without serious thought). > Even with stable software we end up bouncing our pollers every 2-3 > days - 1) because we have an active customer base who make config > changes often and 2) because we take the metrics from the checks and > put them in a time series data warehouse that is sensitive to interval > skew...any poller that hits 10 seconds latency has to be bounced. > > We are at 12 pollers or so right now and we will be up to almost 20 by > next year at this time. Sounds fun ;) > Max > > On 10/2/11, Stuart Browne <[email protected]> wrote: > > Hi, > > > > I know this topic has been covered many times, but I've tried those > tweaks > > and I have the remaining issue. > > > > After a few days, the latency on checks explodes. It goes along quite > > happily with small values, then after (about) 3 days, the values rise > quite > > sharply. I've recently been graphing performance statistics > (nagiostats, > > mrtg) and as you can see by the two attachments (day, week), it's rather > > surprising. > > > > We restart Nagios every few days (for other reasons) so thankfully the > issue > > never gets completely out of control, but as you can see, it gets a bit > > crazy. > > > > I can't think of any combination of settings that would cause such > growth > > after such a long period of time. Does anybody have any knowledge as to > why > > it would suddenly increase after running for days without issue? > > > > Basic Nagios system stats: > > 2 x dual-core Xeon 5160 (3Ghz) > > 6GB Memory > > 4 x SAS, RAID1 (hardware, BBU, LVM over RAID1) > > RHEL5, fully patched > > Load average between 0.5 and 3.2 > > > > 'nagios -s /etc/nagios/nagios.cfg' output (trimmed): > > > > HOST SCHEDULING INFORMATION > > --------------------------- > > Total hosts: 252 > > Total scheduled hosts: 252 > > Host inter-check delay method: SMART > > Average host check interval: 300.00 sec > > Host inter-check delay: 1.19 sec > > Max host check spread: 30 min > > First scheduled check: Mon Oct 3 14:31:17 2011 > > Last scheduled check: Mon Oct 3 14:36:15 2011 > > > > > > SERVICE SCHEDULING INFORMATION > > ------------------------------- > > Total services: 1575 > > Total scheduled services: 1386 > > Service inter-check delay method: SMART > > Average service check interval: 878.40 sec > > Inter-check delay: 0.63 sec > > Interleave factor method: SMART > > Average services per host: 6.25 > > Service interleave factor: 6 > > Max service check spread: 30 min > > First scheduled check: Mon Oct 3 14:33:43 2011 > > Last scheduled check: Mon Oct 3 14:48:21 2011 > > > > CHECK PROCESSING INFORMATION > > ---------------------------- > > Check result reaper interval: 5 sec > > Max concurrent service checks: Unlimited > > > > > > PERFORMANCE SUGGESTIONS > > ----------------------- > > I have no suggestions - things look okay. ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Nagios-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
