Re: load spikes revisited

Greg Ames Thu, 17 Jan 2002 07:18:21 -0800

Justin Erenkrantz wrote:
> 
> On Wed, Jan 16, 2002 at 08:54:34PM -0500, Greg Ames wrote:
> > Anyway, there's about 20 seconds worth of ktrace output at
> > http://www.apache.org/~gregames/ktrace.  We might have some kind of thundering
> > herd problem - I see a whole bunch of unproductive context switches about the
> > time a select pops.
> 
> You mentioned that daedalus doesn't use any AcceptMutex.  I tried to
> find your config files,


not hard to find - /usr/local/apache/conf/httpd.conf for the current production
server

> but couldn't find any reference to
> AcceptMutex (in /usr/local/apache*).  Could we try httpd with some
> locking mechanism?  (I'm guessing you use modified code to bypass
> AcceptMutex.)

as Jeff pointed out, FreeBSD's accept() is supposed to be free of thundering
herd, so Single Listen Unserialized Accept kicks in with the production config
which only listens on port 80.  I wouldn't jump to the conclusion that accept()
is causing it before looking at the data.

As a matter of fact, I typically listen on two ports while testing to disable
S_L_U_A, so I can easily figure out which process will get the next connection
in case I want to gdb it.  While trying out ktrace on my test config, I saw that
the fcntl() accept mutex has got a thundering herd problem on daedalus.  After
releasing the fcntl mutex, you see the kernel context switching to all of the
idle httpd processes.  The first process that wakes up gets the mutex, the rest
of the context switches simply burn a little CPU, then block again.  Moral:  the
default cure looks as bad as the disease.

> Based on looking at daedalus over the weekend, I saw the run-queue
> spike up to 50 

I don't get excited about 50.  When I put up code beyond Nov 15, the spikes go
up to the 250 - 350 range (i.e. _all_ httpd processes plus whatever else wants
to run).  After several minutes of running, the load averages start climbing
into the double digits.  If that continues long enough, Brian Behlendorf gets
paged.  He doesn't get paged when 2_0_28 is running.

Greg

Re: load spikes revisited

Reply via email to