Sorry, forgot to answer the disk question. I *think* this has 6 10k rpm drives in a raid 10. It's a dell running FC7.
On Thu, Feb 12, 2009 at 8:20 PM, Michael Fortson <[email protected]> wrote: > Here's the result:http://pastie.org/387928 > > This box used to run everything (much of which has now been moved to other > clusters). If I can't get it to behave it'll be doing nothing soon :) > > log/messages isn't large enough to trigger a misbehavior, but hopefully > it'll show something... I can't really do it on the nginx log (which is > massive) because I always have to kill that before enough backend tests flip > over to cause a site outage. > > > > > > > > On Thu, Feb 12, 2009 at 6:44 PM, John Lauro > <[email protected]>wrote: > >> > I stopped logging so much in haproxy, but I get the same thing if I >> > grep the nginx logs on this server: haproxy's mongrel backend checks >> > start failing. I've noticed it only happens when using httpchk (or at >> > least it happens much, much more quickly). >> > >> > Here's an iostat I ran -- the first two are during the grep on the >> > nginx logs; the last one is after I finished: >> >> The iostat looks ok. >> >> Cut-n-past the following (or run from a script) so we can get a better >> idea >> of the box's general load and to see if they turn up anything: >> >> cat /proc/interrupts >> free >> netstat --inet -n | awk '{ print $6 }' | sort | uniq -c >> ulimit -a >> vmstat 1 10 & ( sleep 5 ; grep whatever /var/log/messages >/dev/null ) >> cat /proc/interrupts >> echo lsof count `lsof | wc -l` >> >> What type of disk subsystem do you have? Given how it chokes when doing a >> grep, it almost sounds like you might have a faulty driver. You do >> realize >> 8 cores is overkill for this, unless you are running other stuff on the >> box. >> The two checks on the interrupts is to see if something (especially disk >> I/O) is generating too many as we need to look at the difference. >> >> >> >

