Hi, I know this topic has been covered many times, but I've tried those tweaks and I have the remaining issue.
After a few days, the latency on checks explodes. It goes along quite happily with small values, then after (about) 3 days, the values rise quite sharply. I've recently been graphing performance statistics (nagiostats, mrtg) and as you can see by the two attachments (day, week), it's rather surprising. We restart Nagios every few days (for other reasons) so thankfully the issue never gets completely out of control, but as you can see, it gets a bit crazy. I can't think of any combination of settings that would cause such growth after such a long period of time. Does anybody have any knowledge as to why it would suddenly increase after running for days without issue? Basic Nagios system stats: 2 x dual-core Xeon 5160 (3Ghz) 6GB Memory 4 x SAS, RAID1 (hardware, BBU, LVM over RAID1) RHEL5, fully patched Load average between 0.5 and 3.2 'nagios -s /etc/nagios/nagios.cfg' output (trimmed): HOST SCHEDULING INFORMATION --------------------------- Total hosts: 252 Total scheduled hosts: 252 Host inter-check delay method: SMART Average host check interval: 300.00 sec Host inter-check delay: 1.19 sec Max host check spread: 30 min First scheduled check: Mon Oct 3 14:31:17 2011 Last scheduled check: Mon Oct 3 14:36:15 2011 SERVICE SCHEDULING INFORMATION ------------------------------- Total services: 1575 Total scheduled services: 1386 Service inter-check delay method: SMART Average service check interval: 878.40 sec Inter-check delay: 0.63 sec Interleave factor method: SMART Average services per host: 6.25 Service interleave factor: 6 Max service check spread: 30 min First scheduled check: Mon Oct 3 14:33:43 2011 Last scheduled check: Mon Oct 3 14:48:21 2011 CHECK PROCESSING INFORMATION ---------------------------- Check result reaper interval: 5 sec Max concurrent service checks: Unlimited PERFORMANCE SUGGESTIONS ----------------------- I have no suggestions - things look okay. Stuart J. Browne Senior Linux Administrator
<<attachment: nagios-a-day[1].png>>
<<attachment: nagios-a-week[1].png>>
------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null