Pagefaults - 20-30k. This seems to be the source of most of the cpu system time (understandably), which sits about 40-50%. So if I could reduce the pagefaults I think we could gain quite a bit of performance back.
I found one other huge issue...somehow in the generic service check, the check_inteval was set to 5 minutes...however, normal_check_interval wasn't set at all and appeared to be checking every minute. I deleted check_interval and added normal_check_interval and that helped a ton, latency went down to 0.5-1.5 seconds. That was only running 2 active checks and about a dozen passive on 700 hosts. I then added back in the other 9 active checks and latency once again shot back up to about 2000 *sigh*. I grabbed another vm and made it a dnx client and that seemed to help, but wish I could get the main server to handle more. Right now it has about 700 hosts and 12,100 service checks, of which about 7000 are active and rest are passive. Oh, and we do have obsessive turned off. I've even gone through as many configs as I could and removed the macros too until I can write a caching mech for the macro statements. Any more ideas? -----Original Message----- From: Andreas Ericsson [mailto:a...@op5.se] Sent: Friday, December 03, 2010 5:39 AM To: Nagios Users List Cc: Daniel Wittenberg Subject: Re: [Nagios-users] high latency On 12/02/2010 08:38 PM, Daniel Wittenberg wrote: > Someone else noticed that nagios is generating a ton of minor page > faults, and curious if that's normal and if that could be causing some > of the latency in the checks? define "a ton" $ /usr/bin/time php -r 'echo "marsipulami\n";' marsipulami 0.01user 0.01system 0:00.09elapsed 34%CPU (0avgtext+0avgdata 29104maxresident)k 10208inputs+0outputs (70major+1962minor)pagefaults 0swaps That's with a reasonably simple program, and it generates 70 major and 1962 minor pagefaults. > I've also got a tmpfs setup for the > status.dat and the checkresults directory to ease some of the disk i/o > since we're on a san-backed vm host. > That's good, although if you're using a virtual system you'll never know for sure if you're really using a ramdisk or not, since the host system might well use swap to store the ramdisk anyway. > I turned off embedded perl this morning and our latency has been holding > at< 10 seconds so far, so that seemed to help a lot. > Neat. Did it affect your pagefaults? If so, how? -- Andreas Ericsson andreas.erics...@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null