Donald,

Thanks for the tips.  Disk IO was my first thought, but I'm not sure of the
best way to keep an eye on that.  Is there a util/command that I can
run/log to see the disk IO?

Mike

On Wed, Feb 15, 2012 at 1:16 PM, The Donald Cowart <[email protected]>wrote:

> Hmmm... based on this I have two ideas about what it might be,
>
> It could be something doing a bunch of DNS queries and so processes
> are waiting for dig results or TCP timeouts, not sure how to prove
> that though.
>
> It might be something compressing/rotating log files all at once, just
> enough to slow the system down while it's happening.  Probably
> triggered within on of the applications for monitoring.  You may want
> to record some iostat values too just to see if disk activity is
> spiking during an event.
>
> I hope this helps!
>
> --Donald
>
> On Wed, Feb 15, 2012 at 12:57 PM, Dean, Mike <[email protected]> wrote:
> > Unfortunately, I have not been able to get a snapshot from top when it is
> > running slow.  The slowdowns typically only last a few seconds and do not
> > occur with any sort of regularity (that I've been able to determine,
> anyway.
> >
> > I started a "top -b" piped to a file to see if I can catch a snapshot
> during
> > a slow period.
> >
> > As for the box itself (and apologies for not including this info
> > originally), it is used as a network monitor/management station.  Two of
> the
> > applications that run on it are Nagios (up/down and other monitoring)
> which
> > has various checks that are shell based, Perl and compiled, and
> Smokeping,
> > which sends out TCP and ICMP probes every 5 minutes to some hosts (less
> than
> > 50) to record round trip times.
> >
> > It is also one of our syslog machines with a script that runs every 5
> > minutes parsing the log files (none of the files have grown to a large
> size
> > or increased in syslog input/output).
> >
> > And, none of these systems has been changed in the last week.
> >
> > So, other than top, is there other things to check or monitors to set?
> >
> > Thanks again, in advance!
> >
> > On Wed, Feb 15, 2012 at 11:02 AM, The Donald Cowart <[email protected]>
> > wrote:
> >>
> >> Can you get the output from top during a slowdown or just after?
> >> Also, is the boxes' function a webserver, fileserver, mathematical
> >> processing, etc?  Was the box rebooted after the patching?
> >>
> >> Something that may help run "top -b" in a while loop (with a sleep in
> >> between runs) and dump it to a file or series of files, so you've got
> >> snapshots over time of the system performance to help troubleshoot
> >> this.
> >>
> >> --Donald
> >>
> >> On Wed, Feb 15, 2012 at 10:49 AM, Dean, Mike <[email protected]>
> wrote:
> >> > Hello all, hoping you can help.  We have a RedHat box that two days
> ago
> >> > starting having periods of slow performance.  The slow down is bad
> >> > enough
> >> > that you can see it when trying to type at a terminal and some
> >> > processes,
> >> > such as SNMP, don't respond.  Users have also been disconnected.
> >> >
> >> > The last change that was made was applying the normal monthly patches
> on
> >> > 2/7 (the problem only started showing up yesterday).  According to the
> >> > information from 'top', the system seems to be fine.  A typical
> snapshot
> >> > looks like this:
> >> >
> >> > Tasks: 282 total,   8 running, 274 sleeping,   0 stopped,   0 zombie
> >> > Cpu(s):  1.1%us,  0.3%sy,  0.0%ni, 98.5%id,  0.0%wa,  0.0%hi,  0.0%si,
> >> >  0.0%st
> >> > Mem:   3909268k total,  1669580k used,  2239688k free,   231832k
> buffers
> >> > Swap:  6094840k total,        0k used,  6094840k free,   972756k
> cached
> >> >
> >> > Any ideas on where to look?
> >> >
> >> > Thanks!
> >> >
> >> > Mike
> >>
> >>
> >>
> >> --
> >> Donald Cowart
> >> http://www.rdex.net/
> >
> >
>
>
>
> --
> Donald Cowart
> http://www.rdex.net/
>

Reply via email to