It wouldn't hurt to put RHEL 5 or Centos 5 on the box instead of FC.   FC is
generally meant for desktops instead of servers.

 

Your default ulimit -n is only 1024.  Just make sure you raise that to match
or exceed your Haproxy configuration prior to starting Haproxy.  Even if
that is a problem, it wouldn't explain why you have a problem when looking
at the logs.

 

The grep on /var/messages completed too quick to really catch much.  That
said, your SYS time is a little high, especially after it finished.  For an
8 core box, only 12.5% would mean one core dedicated to the task, and it
rose from 4 to 16.   Given that it was counted as sys and not user, and
generated little I/O, indicates it might be slow memory processing on the
cache.

 

What's uname -a give?

 

If you have i386 (32-bit) listed instead of x86_64, you have too much memory
in your box for a 32-bit kernel to handle well.  (32-bit takes a big hit
accessing >4gb).  Running "swapoff -a" to disable swap will help...  If you
have a 32-bit kernel, it will waste too much time trying to decide what to
keep in memory and what to swap, and swap really is pointless when it's
address space is only 4GB and you have 8GB of RAM.  If you have a 64-bit
kernel, it shouldn't be an issue.

 

If 32-bit kernel, run "swapoff -a" should help a lot (would help a little in
64-bit too, but not much), and/or reinstall with 64-bit os (assuming your
CPUs are capable).

 

If you don't have a 32-bit kernel, I am out of ideas that would explain the
problem.

 

 

From: Michael Fortson [mailto:[email protected]] 
Sent: Thursday, February 12, 2009 11:23 PM
To: John Lauro
Cc: [email protected]
Subject: Re: Reducing I/O load of logging

 

Sorry, forgot to answer the disk question. I *think* this has 6 10k rpm
drives in a raid 10. It's a dell running FC7.

 

 

On Thu, Feb 12, 2009 at 8:20 PM, Michael Fortson <[email protected]> wrote:

Here's the result:

http://pastie.org/387928

 

This box used to run everything (much of which has now been moved to other
clusters). If I can't get it to behave it'll be doing nothing soon :)

 

log/messages isn't large enough to trigger a misbehavior, but hopefully
it'll show something... I can't really do it on the nginx log (which is
massive) because I always have to kill that before enough backend tests flip
over to cause a site outage.

 

 

 

 

 

 

 

On Thu, Feb 12, 2009 at 6:44 PM, John Lauro <[email protected]>
wrote:

> I stopped logging so much in haproxy, but I get the same thing if I
> grep the nginx logs on this server: haproxy's mongrel backend checks
> start failing. I've noticed it only happens when using httpchk (or at
> least it happens much, much more quickly).
>
> Here's an iostat I ran -- the first two are during the grep on the
> nginx logs; the last one is after I finished:

The iostat looks ok.

Cut-n-past the following (or run from a script) so we can get a better idea
of the box's general load and to see if they turn up anything:

cat /proc/interrupts
free
netstat --inet -n | awk '{ print $6 }' | sort | uniq -c
ulimit -a
vmstat 1 10  & ( sleep 5 ;  grep whatever /var/log/messages >/dev/null )
cat /proc/interrupts
echo lsof count `lsof | wc -l`

What type of disk subsystem do you have?  Given how it chokes when doing a
grep, it almost sounds like you might have a faulty driver.  You do realize
8 cores is overkill for this, unless you are running other stuff on the box.
The two checks on the interrupts is to see if something (especially disk
I/O) is generating too many as we need to look at the difference.



 

 

Reply via email to