Here's the result:http://pastie.org/387928
This box used to run everything (much of which has now been moved to other clusters). If I can't get it to behave it'll be doing nothing soon :) log/messages isn't large enough to trigger a misbehavior, but hopefully it'll show something... I can't really do it on the nginx log (which is massive) because I always have to kill that before enough backend tests flip over to cause a site outage. On Thu, Feb 12, 2009 at 6:44 PM, John Lauro <[email protected]>wrote: > > I stopped logging so much in haproxy, but I get the same thing if I > > grep the nginx logs on this server: haproxy's mongrel backend checks > > start failing. I've noticed it only happens when using httpchk (or at > > least it happens much, much more quickly). > > > > Here's an iostat I ran -- the first two are during the grep on the > > nginx logs; the last one is after I finished: > > The iostat looks ok. > > Cut-n-past the following (or run from a script) so we can get a better idea > of the box's general load and to see if they turn up anything: > > cat /proc/interrupts > free > netstat --inet -n | awk '{ print $6 }' | sort | uniq -c > ulimit -a > vmstat 1 10 & ( sleep 5 ; grep whatever /var/log/messages >/dev/null ) > cat /proc/interrupts > echo lsof count `lsof | wc -l` > > What type of disk subsystem do you have? Given how it chokes when doing a > grep, it almost sounds like you might have a faulty driver. You do realize > 8 cores is overkill for this, unless you are running other stuff on the > box. > The two checks on the interrupts is to see if something (especially disk > I/O) is generating too many as we need to look at the difference. > > >

