DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21322>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21322 main worker process locks up and no longer accepts requests ------- Additional Comments From [EMAIL PROTECTED] 2003-07-12 21:54 ------- >What is making us seriously consider backing out >of Apache 2 are the problems with the accept mutex >semaphore which cause Apache 2 to stop accepting >requests. a couple of things to try first: 1) switch mutex types 2) switch MPM to prefork Maybe I didn't read the doc carefully enough and I missed something, but here goes anyway: The only way to know if there is a deadlock due to the accept mutex is to look at the listener thread in *every* Apache child process and verify that each one is in a call to apr_proc_mutex_lock(). If all but one are stuck in apr_proc_mutex_lock, that is normal behavior. If all listener threads are stuck in apr_proc_mutex_lock, then you have the suspected deadlock. If the listener thread is in pthread_cond_wait(), all workers in that process are busy, but we don't hold the mutex in that condition and there should be another child with idle children (unless you've hit MaxClients and the parent is unable to create another child due to your configuration). There is a script at http://www.apache.org/~trawick/lsap that makes checking out all listener threads pretty easy; it displays something like this: parent: 17943 child: 17946 listener thread not waiting on mutex child: 17944 cgid daemon child: 17948 listener thread waiting on mutex child: 17945 listener thread waiting on mutex I note that you're using proc pthread mutexes currently, which are supposed to be robust on Solaris in that APR implements the protocol for recovering the mutex after a process dies holding the mutex. But in case that isn't working perfectly for you, it is worth trying a different mutex type. I've seen a scenario on Solaris 8 where if you send SIGSEGV to all worker child processes the pthread accept mutex isn't recovered properly and you get a deadlock, but when I test it with a child crashing once every several seconds the special Solaris recovery logic works consistently. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
