DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21322>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21322 main worker process locks up and no longer accepts requests ------- Additional Comments From [EMAIL PROTECTED] 2003-07-12 13:41 ------- I finally had a server fail in a way that it was still running but not accepting requests. I checked to see what httpd process were running using ps. There were three httpd processes. One running as root, one child started at the same time as the root process for the cgid daemon, and one child for handling requests. When reviewing the logs I found that an httpd child process had core dumped just prior to when apache 2 stopped accepting HTTP requests. Here is the backtrace for the thread in the child process which caused it to core dump. Note that this is apache 2.0.46 with two patches applied to mod_cgid.c. The cgid daemon restart patch I submitted a month ago and the patch to prevent a double close on a socket. [EMAIL PROTECTED] ([EMAIL PROTECTED]) terminated by signal ILL (illegal opcode) Current function is send_parsed_content 3152 apr_bucket_delete(tmp_bkt); (/opt/SUNWspro/bin/dbx) where current thread: [EMAIL PROTECTED] [1] 0x20f7d8(0xff317ad8, 0xfdc05860, 0x32ca50, 0x2fb558, 0x20fd10, 0xfdc05784), at 0x20f7d7 =>[2] send_parsed_content(bb = 0xfdc05860, r = 0x32ca50, f = 0x2fb558), line 3152 in "mod_include.c" [3] includes_filter(f = 0x2fb558, b = 0x2fc810), line 3430 in "mod_include.c" [4] ap_pass_brigade(next = 0x2fb558, bb = 0x2fb258), line 550 in "util_filter.c" [5] cgid_handler(r = 0x32ca50), line 1484 in "mod_cgid.c" [6] ap_run_handler(0x32ca50, 0x0, 0x32ca50, 0x2f89a8, 0x0, 0x0), at 0x7fa98 [7] ap_invoke_handler(r = 0x32ca50), line 401 in "config.c" [8] ap_process_request(r = 0x32ca50), line 288 in "http_request.c" [9] ap_process_http_connection(c = 0x2f89a8), line 293 in "http_core.c" [10] ap_run_process_connection(0x2f89a8, 0x2f88e8, 0x2f88e8, 0x81, 0x2f89a0, 0x20f7f0), at 0x95b60 [11] ap_process_connection(c = 0x2f89a8, csd = 0x2f88e8), line 211 in "connection.c" [12] process_socket(p = 0x2f88b0, sock = 0x2f88e8, my_child_num = 2, my_thread_num = 1, bucket_alloc = 0x20f7f0), line 632 in "worker.c" [13] worker_thread(thd = 0x1aa400, dummy = 0x1ea550), line 947 in "worker.c" [14] dummy_worker(opaque = 0x1aa400), line 127 in "thread.c" Here is a summary of the threads for the child process which core dumped: [EMAIL PROTECTED] a [EMAIL PROTECTED] ?() LWP suspended in _read() [EMAIL PROTECTED] b [EMAIL PROTECTED] ?() LWP suspended in __signotifywait() [EMAIL PROTECTED] ?() sleep on (unknown) in _reap_wait() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() o> [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() signal SIGILL in ?() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() LWP suspended in _poll() Here is the backtrace for its listener_thread: Current function is listener_thread 762 ret = apr_poll(pollset, num_listensocks, &n, -1); [EMAIL PROTECTED] ([EMAIL PROTECTED]) stopped in _poll at 0xfef15f54 0xfef15f54: _poll+0x0008: bgeu _poll+0x30 (/opt/SUNWspro/bin/dbx) where current thread: [EMAIL PROTECTED] [1] _poll(0x4, 0x2, 0xffffffff, 0x0, 0x0, 0x0), at 0xfef15f54 [2] _ti_poll(0x20b818, 0x2, 0xfb807cd8, 0xffffffff, 0xffffffff, 0x0), at 0xfee4b438 =>[3] listener_thread(thd = 0x1aa700, dummy = 0x1edbd0), line 762 in "worker.c" [4] dummy_worker(opaque = 0x1aa700), line 127 in "thread.c" Here is a list of threads for the one remaining child httpd process for handling HTTP requests: [EMAIL PROTECTED] a [EMAIL PROTECTED] ?() running in _read() [EMAIL PROTECTED] b [EMAIL PROTECTED] ?() running in __signotifywait() [EMAIL PROTECTED] ?() sleep on (unknown) in _reap_wait() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in __lwp_sema_wait() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] dummy_worker() sleep on 0x1aa180 in _cond_wait_cancel() [EMAIL PROTECTED] a [EMAIL PROTECTED] dummy_worker() running in ___lwp_mutex_lock() Thread 30 looks like it is the worker listener_thread, here is its backtrace: Current function is proc_mutex_proc_pthread_acquire 412 if ((rv = pthread_mutex_lock(mutex->pthread_interproc))) { [EMAIL PROTECTED] ([EMAIL PROTECTED]) stopped in ___lwp_mutex_lock at 0xfef17c70 0xfef17c70: ___lwp_mutex_lock+0x0008: ta 0x8 (/opt/SUNWspro/bin/dbx) where current thread: [EMAIL PROTECTED] [1] ___lwp_mutex_lock(0xfed60000, 0x4d58, 0x20254, 0x0, 0x0, 0x0), at 0xfef17c70 [2] _mutex_lwp_lock(0xfed60000, 0x1, 0x10000, 0xfee5ca0c, 0x10000, 0xfee69474), at 0xfee3c7f0 [3] _pthread_mutex_lock(0xfed60000, 0xfee5ca0c, 0x329280, 0x0, 0x0, 0x0), at 0xfee3c914 =>[4] proc_mutex_proc_pthread_acquire(mutex = 0x1a88e0), line 412 in "proc_mutex.c" [5] apr_proc_mutex_lock(mutex = 0x1a88e0), line 857 in "proc_mutex.c" [6] listener_thread(thd = 0x1aa700, dummy = 0x1edbd0), line 736 in "worker.c" [7] dummy_worker(opaque = 0x1aa700), line 127 in "thread.c" The 25 threads used for handling requests are in one of two states: Current function is apr_thread_cond_wait 118 rv = pthread_cond_wait(cond->cond, &mutex->mutex); [EMAIL PROTECTED] ([EMAIL PROTECTED]) stopped in _cond_wait_cancel at 0xfee39b64 0xfee39b64: _cond_wait_cancel+0x00bc: call _swtch (/opt/SUNWspro/bin/dbx) where current thread: [EMAIL PROTECTED] [1] _cond_wait_cancel(0x1aa180, 0x1aa140, 0x4356, 0xfee5ca0c, 0x0, 0x3c863d), at 0xfee39b64 [2] pthread_cond_wait(0x1aa180, 0x1aa140, 0xfb9099ec, 0x58, 0x3c85e0, 0x23e248), at 0xfee39a88 =>[3] apr_thread_cond_wait(cond = 0x1aa178, mutex = 0x1aa138), line 118 in "thread_cond.c" [4] ap_queue_pop(queue = 0x1aa120, sd = 0xfb909ce4, p = 0xfb909cd8), line 300 in "fdqueue.c" [5] worker_thread(thd = 0x1aa6e0, dummy = 0x1e7420), line 915 in "worker.c" [6] dummy_worker(opaque = 0x1aa6e0), line 127 in "thread.c" Current function is apr_thread_cond_wait 118 rv = pthread_cond_wait(cond->cond, &mutex->mutex); [EMAIL PROTECTED] ([EMAIL PROTECTED]) stopped in __lwp_sema_wait at 0xfef17d34 0xfef17d34: __lwp_sema_wait+0x0008: ta 0x8 (/opt/SUNWspro/bin/dbx) where current thread: [EMAIL PROTECTED] [1] __lwp_sema_wait(0xfbb0de78, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfef17d34 [2] _park(0xfbb0ddc8, 0xfbb0de78, 0x0, 0xb, 0xfee5d7a0, 0xfc509dc8), at 0xfee3b1c8 [3] _swtch(0x5, 0xfee5ca0c, 0xfbb0de58, 0xfbb0de54, 0xfbb0de50, 0xfbb0de4c), at 0xfee3aebc [4] _cond_wait_cancel(0x1aa180, 0x1aa140, 0x4356, 0xfee5ca0c, 0x0, 0x3293cd), at 0xfee39b64 [5] pthread_cond_wait(0x1aa180, 0x1aa140, 0xfbb0d9ec, 0x56, 0x329370, 0x23a158), at 0xfee39a88 =>[6] apr_thread_cond_wait(cond = 0x1aa178, mutex = 0x1aa138), line 118 in "thread_cond.c" [7] ap_queue_pop(queue = 0x1aa120, sd = 0xfbb0dce4, p = 0xfbb0dcd8), line 300 in "fdqueue.c" [8] worker_thread(thd = 0x1aa6a0, dummy = 0x1e43d8), line 915 in "worker.c" [9] dummy_worker(opaque = 0x1aa6a0), line 127 in "thread.c" Which looks like what we would expect if this child were hung, they are all waiting to for a request to handle in the queue. Conclusions: The listener_thread for the child process which core dumped had a lock on the accept mutex when it failed. The process mutex's are implemented using a semaphore. The remaining child process is hung trying to obtain a lock on the accept mutex. According to the man pages for Solaris, when a process recieves a signal which triggers a core all the actions of exit() are executed to cleanup the process. This includes closing any open semaphores. So it appears that semaphores are not getting closed when a child process fails with a core dump on these two servers where we are seeing this problem. Does this sound like a valid conclusion of the problem? The question now is whether this is a bug in Solaris 7 for which there is a patch? Thanks, Glenn --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
