Robert Hajime Lanning wrote: > I have also been having performance issues with Nagios 2.5 on > a Sun E220R with two 400MHz procs and 1GB ram. > > Sys stats are at http://lanning.cc/kipper.html > > The large dips in load and system CPU time are when I restart > Nagios. (cron'd twice a week, but I have also been making > a lot of service updates lately, hence the almost once a day > restarts.) For the restarts to fix the latency, I have > "use_retained_scheduling_info=0". > > After about three days the Service Check latency will grow > to over 300 seconds. It is usually steady at around 0-5 > seconds, for a couple of days, then it will rise over the > course of a few hours to over the 300 second mark. >
This is a bit bizarre and simply must be related to something else. Does Nagios run out of commandbuffer slots? Aren't they freed properly? > > I have noticed the Nagios seems to have a memory leak. As, > I have watched over the last hour the process grow from 124M > to 126M. > This can probably be attributed to the fact that Nagios fork()'s, then frees and allocates memory before running execve() in a thread. This isn't per se prohibited, but strongly discouraged. I wouldn't be surprised to find that other applications that do the same thing will leak memory on Sun. On Linux, threads are created in a 1-1 fashion (meaning each thread is actually its own process). This holds true for some other systems as well, and afaik there are 1-1 thread implementations for Sun as well. In any case, the 1-1 thing means that the kernel cleans up any left-over memory for the processes when they exit, which isn't necessarily the case in a 1-many relationship thread implementation. Possibly worth investigating. > I use ePN with caching. Most of my checks are SNMP requests > via ePN scripts (http://lanning.cc/custom_plugins/), with > p1.pl modified with: > > use SNMP 5.0; > SNMP::loadModules("ALL"); > Forgive a novice, but doesn't this make it load all SNMP submodules each time it runs a perl-module? That would certainly be a major impact on load and could well lead to memory leaks (assuming the submodules aren't always freed after having been loaded). > We have put into our budget to move Nagios to a Linux/Intel > server. But, what bugs me is the high CPU time in kernel > space, because of Nagios. > Again, this is a behaviour not regularly experienced on Linux (which is the base for most Nagios installations). Linux is simply very, very good at fork(). It doesn't do bother even trying to do other things properly (like 1-many threading), simply because it's so damn good at forking. It would be interesting to see if your problems go away when you move to Linux. I'm not saying it's superior to Solaris, but afaiu, Ethan runs all his tests on Linux and would certainly have found bugs of this kind if they had bitten him. -- Andreas Ericsson [EMAIL PROTECTED] OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null