DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21322>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21322

main worker process locks up and no longer accepts requests





------- Additional Comments From [EMAIL PROTECTED]  2003-07-12 13:41 -------
I finally had a server fail in a way that it was still running but not
accepting requests.

I checked to see what httpd process were running using ps.  There were
three httpd processes.  One running as root, one child started at the
same time as the root process for the cgid daemon, and one child for
handling requests.

When reviewing the logs I found that an httpd child process had core dumped
just prior to when apache 2 stopped accepting HTTP requests.
  
Here is the backtrace for the thread in the child process which caused it
to core dump.  Note that this is apache 2.0.46 with two patches applied
to mod_cgid.c.  The cgid daemon restart patch I submitted a month ago
and the patch to prevent a double close on a socket.
  
[EMAIL PROTECTED] ([EMAIL PROTECTED]) terminated by signal ILL (illegal opcode)
Current function is send_parsed_content
 3152                       apr_bucket_delete(tmp_bkt);
(/opt/SUNWspro/bin/dbx) where
current thread: [EMAIL PROTECTED]
  [1] 0x20f7d8(0xff317ad8, 0xfdc05860, 0x32ca50, 0x2fb558, 0x20fd10,
0xfdc05784), at 0x20f7d7
=>[2] send_parsed_content(bb = 0xfdc05860, r = 0x32ca50, f = 0x2fb558), line
3152 in "mod_include.c"
  [3] includes_filter(f = 0x2fb558, b = 0x2fc810), line 3430 in "mod_include.c"
  [4] ap_pass_brigade(next = 0x2fb558, bb = 0x2fb258), line 550 in 
"util_filter.c"  
  [5] cgid_handler(r = 0x32ca50), line 1484 in "mod_cgid.c"
  [6] ap_run_handler(0x32ca50, 0x0, 0x32ca50, 0x2f89a8, 0x0, 0x0), at 0x7fa98
  [7] ap_invoke_handler(r = 0x32ca50), line 401 in "config.c"
  [8] ap_process_request(r = 0x32ca50), line 288 in "http_request.c"
  [9] ap_process_http_connection(c = 0x2f89a8), line 293 in "http_core.c"
  [10] ap_run_process_connection(0x2f89a8, 0x2f88e8, 0x2f88e8, 0x81, 0x2f89a0,
0x20f7f0), at 0x95b60
  [11] ap_process_connection(c = 0x2f89a8, csd = 0x2f88e8), line 211 in
"connection.c"
  [12] process_socket(p = 0x2f88b0, sock = 0x2f88e8, my_child_num = 2,
my_thread_num = 1, bucket_alloc = 0x20f7f0), line 632 in "worker.c"
  [13] worker_thread(thd = 0x1aa400, dummy = 0x1ea550), line 947 in "worker.c"
  [14] dummy_worker(opaque = 0x1aa400), line 127 in "thread.c"


Here is a summary of the threads for the child process which core dumped:

      [EMAIL PROTECTED]  a [EMAIL PROTECTED]  ?()   LWP suspended   in _read()
      [EMAIL PROTECTED]  b [EMAIL PROTECTED]  ?()   LWP suspended   in 
__signotifywait()
      [EMAIL PROTECTED]         ?()   sleep on (unknown)      in _reap_wait()
      [EMAIL PROTECTED]  a [EMAIL PROTECTED] dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
o>    [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        signal 
SIGILL   in ?()
      [EMAIL PROTECTED]  a [EMAIL PROTECTED] dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
      [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
      [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED] dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        LWP 
suspended   in _poll()

Here is the backtrace for its listener_thread:

Current function is listener_thread
  762                   ret = apr_poll(pollset, num_listensocks, &n, -1);
[EMAIL PROTECTED] ([EMAIL PROTECTED]) stopped in _poll at 0xfef15f54
0xfef15f54: _poll+0x0008:       bgeu    _poll+0x30
(/opt/SUNWspro/bin/dbx) where
current thread: [EMAIL PROTECTED]
  [1] _poll(0x4, 0x2, 0xffffffff, 0x0, 0x0, 0x0), at 0xfef15f54
  [2] _ti_poll(0x20b818, 0x2, 0xfb807cd8, 0xffffffff, 0xffffffff, 0x0), at
0xfee4b438
=>[3] listener_thread(thd = 0x1aa700, dummy = 0x1edbd0), line 762 in "worker.c"
  [4] dummy_worker(opaque = 0x1aa700), line 127 in "thread.c"

Here is a list of threads for the one remaining child httpd process for
handling HTTP requests:

      [EMAIL PROTECTED]  a [EMAIL PROTECTED]  ?()   running                 in 
_read()
      [EMAIL PROTECTED]  b [EMAIL PROTECTED]  ?()   running                 in 
__signotifywait()
      [EMAIL PROTECTED]         ?()   sleep on (unknown)      in _reap_wait()
      [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
      [EMAIL PROTECTED]  a [EMAIL PROTECTED] dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
      [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
      [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
      [EMAIL PROTECTED]  a [EMAIL PROTECTED] dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED] dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED] dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED] dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED] dummy_worker()        sleep on 
0x1aa180       in __lwp_sema_wait()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]         dummy_worker()        sleep on 0x1aa180       in
_cond_wait_cancel()
     [EMAIL PROTECTED]  a [EMAIL PROTECTED]  dummy_worker()        running      
           in
___lwp_mutex_lock()

Thread 30 looks like it is the worker listener_thread, here is its backtrace:

Current function is proc_mutex_proc_pthread_acquire
  412       if ((rv = pthread_mutex_lock(mutex->pthread_interproc))) {
[EMAIL PROTECTED] ([EMAIL PROTECTED]) stopped in ___lwp_mutex_lock at 0xfef17c70
0xfef17c70: ___lwp_mutex_lock+0x0008:   ta      0x8
(/opt/SUNWspro/bin/dbx) where
current thread: [EMAIL PROTECTED]
  [1] ___lwp_mutex_lock(0xfed60000, 0x4d58, 0x20254, 0x0, 0x0, 0x0), at 
0xfef17c70
  [2] _mutex_lwp_lock(0xfed60000, 0x1, 0x10000, 0xfee5ca0c, 0x10000,
0xfee69474), at 0xfee3c7f0
  [3] _pthread_mutex_lock(0xfed60000, 0xfee5ca0c, 0x329280, 0x0, 0x0, 0x0), at
0xfee3c914
=>[4] proc_mutex_proc_pthread_acquire(mutex = 0x1a88e0), line 412 in 
"proc_mutex.c"
  [5] apr_proc_mutex_lock(mutex = 0x1a88e0), line 857 in "proc_mutex.c"
  [6] listener_thread(thd = 0x1aa700, dummy = 0x1edbd0), line 736 in "worker.c"
  [7] dummy_worker(opaque = 0x1aa700), line 127 in "thread.c"

The 25 threads used for handling requests are in one of two states:

Current function is apr_thread_cond_wait
  118       rv = pthread_cond_wait(cond->cond, &mutex->mutex);
[EMAIL PROTECTED] ([EMAIL PROTECTED]) stopped in _cond_wait_cancel at 0xfee39b64
0xfee39b64: _cond_wait_cancel+0x00bc:   call    _swtch
(/opt/SUNWspro/bin/dbx) where
current thread: [EMAIL PROTECTED]
  [1] _cond_wait_cancel(0x1aa180, 0x1aa140, 0x4356, 0xfee5ca0c, 0x0, 0x3c863d),
at 0xfee39b64
  [2] pthread_cond_wait(0x1aa180, 0x1aa140, 0xfb9099ec, 0x58, 0x3c85e0,
0x23e248), at 0xfee39a88
=>[3] apr_thread_cond_wait(cond = 0x1aa178, mutex = 0x1aa138), line 118 in
"thread_cond.c"
  [4] ap_queue_pop(queue = 0x1aa120, sd = 0xfb909ce4, p = 0xfb909cd8), line 300
in "fdqueue.c"
  [5] worker_thread(thd = 0x1aa6e0, dummy = 0x1e7420), line 915 in "worker.c"
  [6] dummy_worker(opaque = 0x1aa6e0), line 127 in "thread.c"

Current function is apr_thread_cond_wait
  118       rv = pthread_cond_wait(cond->cond, &mutex->mutex);
[EMAIL PROTECTED] ([EMAIL PROTECTED]) stopped in __lwp_sema_wait at 0xfef17d34
0xfef17d34: __lwp_sema_wait+0x0008:     ta      0x8
(/opt/SUNWspro/bin/dbx) where
current thread: [EMAIL PROTECTED]
  [1] __lwp_sema_wait(0xfbb0de78, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfef17d34
  [2] _park(0xfbb0ddc8, 0xfbb0de78, 0x0, 0xb, 0xfee5d7a0, 0xfc509dc8), at 
0xfee3b1c8
  [3] _swtch(0x5, 0xfee5ca0c, 0xfbb0de58, 0xfbb0de54, 0xfbb0de50, 0xfbb0de4c),
at 0xfee3aebc
  [4] _cond_wait_cancel(0x1aa180, 0x1aa140, 0x4356, 0xfee5ca0c, 0x0, 0x3293cd),
at 0xfee39b64
  [5] pthread_cond_wait(0x1aa180, 0x1aa140, 0xfbb0d9ec, 0x56, 0x329370,
0x23a158), at 0xfee39a88
=>[6] apr_thread_cond_wait(cond = 0x1aa178, mutex = 0x1aa138), line 118 in
"thread_cond.c"
  [7] ap_queue_pop(queue = 0x1aa120, sd = 0xfbb0dce4, p = 0xfbb0dcd8), line 300
in "fdqueue.c"
  [8] worker_thread(thd = 0x1aa6a0, dummy = 0x1e43d8), line 915 in "worker.c"
  [9] dummy_worker(opaque = 0x1aa6a0), line 127 in "thread.c"

Which looks like what we would expect if this child were hung, they are all 
waiting
to for a request to handle in the queue.

Conclusions:

The listener_thread for the child process which core dumped had a lock on the
accept mutex when it failed. The process mutex's are implemented using a 
semaphore.
The remaining child process is hung trying to obtain a lock on the accept mutex.
According to the man pages for Solaris, when a process recieves a signal which
triggers a core all the actions of exit() are executed to cleanup the process.
This includes closing any open semaphores. So it appears that semaphores are
not getting closed when a child process fails with a core dump on these two
servers where we are seeing this problem.

Does this sound like a valid conclusion of the problem?

The question now is whether this is a bug in Solaris 7 for which there is a 
patch?

Thanks,

Glenn

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to