Donald, Thanks for the tips. Disk IO was my first thought, but I'm not sure of the best way to keep an eye on that. Is there a util/command that I can run/log to see the disk IO?
Mike On Wed, Feb 15, 2012 at 1:16 PM, The Donald Cowart <[email protected]>wrote: > Hmmm... based on this I have two ideas about what it might be, > > It could be something doing a bunch of DNS queries and so processes > are waiting for dig results or TCP timeouts, not sure how to prove > that though. > > It might be something compressing/rotating log files all at once, just > enough to slow the system down while it's happening. Probably > triggered within on of the applications for monitoring. You may want > to record some iostat values too just to see if disk activity is > spiking during an event. > > I hope this helps! > > --Donald > > On Wed, Feb 15, 2012 at 12:57 PM, Dean, Mike <[email protected]> wrote: > > Unfortunately, I have not been able to get a snapshot from top when it is > > running slow. The slowdowns typically only last a few seconds and do not > > occur with any sort of regularity (that I've been able to determine, > anyway. > > > > I started a "top -b" piped to a file to see if I can catch a snapshot > during > > a slow period. > > > > As for the box itself (and apologies for not including this info > > originally), it is used as a network monitor/management station. Two of > the > > applications that run on it are Nagios (up/down and other monitoring) > which > > has various checks that are shell based, Perl and compiled, and > Smokeping, > > which sends out TCP and ICMP probes every 5 minutes to some hosts (less > than > > 50) to record round trip times. > > > > It is also one of our syslog machines with a script that runs every 5 > > minutes parsing the log files (none of the files have grown to a large > size > > or increased in syslog input/output). > > > > And, none of these systems has been changed in the last week. > > > > So, other than top, is there other things to check or monitors to set? > > > > Thanks again, in advance! > > > > On Wed, Feb 15, 2012 at 11:02 AM, The Donald Cowart <[email protected]> > > wrote: > >> > >> Can you get the output from top during a slowdown or just after? > >> Also, is the boxes' function a webserver, fileserver, mathematical > >> processing, etc? Was the box rebooted after the patching? > >> > >> Something that may help run "top -b" in a while loop (with a sleep in > >> between runs) and dump it to a file or series of files, so you've got > >> snapshots over time of the system performance to help troubleshoot > >> this. > >> > >> --Donald > >> > >> On Wed, Feb 15, 2012 at 10:49 AM, Dean, Mike <[email protected]> > wrote: > >> > Hello all, hoping you can help. We have a RedHat box that two days > ago > >> > starting having periods of slow performance. The slow down is bad > >> > enough > >> > that you can see it when trying to type at a terminal and some > >> > processes, > >> > such as SNMP, don't respond. Users have also been disconnected. > >> > > >> > The last change that was made was applying the normal monthly patches > on > >> > 2/7 (the problem only started showing up yesterday). According to the > >> > information from 'top', the system seems to be fine. A typical > snapshot > >> > looks like this: > >> > > >> > Tasks: 282 total, 8 running, 274 sleeping, 0 stopped, 0 zombie > >> > Cpu(s): 1.1%us, 0.3%sy, 0.0%ni, 98.5%id, 0.0%wa, 0.0%hi, 0.0%si, > >> > 0.0%st > >> > Mem: 3909268k total, 1669580k used, 2239688k free, 231832k > buffers > >> > Swap: 6094840k total, 0k used, 6094840k free, 972756k > cached > >> > > >> > Any ideas on where to look? > >> > > >> > Thanks! > >> > > >> > Mike > >> > >> > >> > >> -- > >> Donald Cowart > >> http://www.rdex.net/ > > > > > > > > -- > Donald Cowart > http://www.rdex.net/ >

