DO NOT REPLY [Bug 21322] - main worker process locks up and no longer accepts requests

bugzilla 12 Jul 2003 21:52:01 -0000

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21322>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21322

main worker process locks up and no longer accepts requests





------- Additional Comments From [EMAIL PROTECTED]  2003-07-12 21:54 -------
>What is making us seriously consider backing out 
>of Apache 2 are the problems with the accept mutex 
>semaphore which cause Apache 2 to stop accepting 
>requests.

a couple of things to try first:

1) switch mutex types

2) switch MPM to prefork

Maybe I didn't read the doc carefully enough and I missed something, but here
goes anyway:

The only way to know if there is a deadlock due to the accept mutex is to look
at the listener thread in *every* Apache child process and verify that each one
is in a call to apr_proc_mutex_lock().  If all but one are stuck in
apr_proc_mutex_lock, that is normal behavior.  If all listener threads are stuck
in apr_proc_mutex_lock, then you have the suspected deadlock.

If the listener thread is in pthread_cond_wait(), all workers in that process
are busy, but we don't hold the mutex in that condition and there should be
another child with idle children (unless you've hit MaxClients and the parent is
unable to create another child due to your configuration).

There is a script at http://www.apache.org/~trawick/lsap that makes checking out
all listener threads pretty easy; it displays something like this:

parent: 17943
child: 17946    listener thread not waiting on mutex
child: 17944    cgid daemon
child: 17948    listener thread waiting on mutex
child: 17945    listener thread waiting on mutex

I note that you're using proc pthread mutexes currently, which are supposed to
be robust on Solaris in that APR implements the protocol for recovering the
mutex after a process dies holding the mutex.  But in case that isn't working
perfectly for you, it is worth trying a different mutex type.  I've seen a
scenario on Solaris 8 where if you send SIGSEGV to all worker child processes
the pthread accept mutex isn't recovered properly and you get a deadlock, but
when I test it with a child crashing once every several seconds the special
Solaris recovery logic works consistently.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 21322] - main worker process locks up and no longer accepts requests

Reply via email to