Justin Erenkrantz wrote: > > On Wed, Jan 16, 2002 at 08:54:34PM -0500, Greg Ames wrote: > > Anyway, there's about 20 seconds worth of ktrace output at > > http://www.apache.org/~gregames/ktrace. We might have some kind of thundering > > herd problem - I see a whole bunch of unproductive context switches about the > > time a select pops. > > You mentioned that daedalus doesn't use any AcceptMutex. I tried to > find your config files,
not hard to find - /usr/local/apache/conf/httpd.conf for the current production server > but couldn't find any reference to > AcceptMutex (in /usr/local/apache*). Could we try httpd with some > locking mechanism? (I'm guessing you use modified code to bypass > AcceptMutex.) as Jeff pointed out, FreeBSD's accept() is supposed to be free of thundering herd, so Single Listen Unserialized Accept kicks in with the production config which only listens on port 80. I wouldn't jump to the conclusion that accept() is causing it before looking at the data. As a matter of fact, I typically listen on two ports while testing to disable S_L_U_A, so I can easily figure out which process will get the next connection in case I want to gdb it. While trying out ktrace on my test config, I saw that the fcntl() accept mutex has got a thundering herd problem on daedalus. After releasing the fcntl mutex, you see the kernel context switching to all of the idle httpd processes. The first process that wakes up gets the mutex, the rest of the context switches simply burn a little CPU, then block again. Moral: the default cure looks as bad as the disease. > Based on looking at daedalus over the weekend, I saw the run-queue > spike up to 50 I don't get excited about 50. When I put up code beyond Nov 15, the spikes go up to the 250 - 350 range (i.e. _all_ httpd processes plus whatever else wants to run). After several minutes of running, the load averages start climbing into the double digits. If that continues long enough, Brian Behlendorf gets paged. He doesn't get paged when 2_0_28 is running. Greg
