> Am 07.07.2021 um 11:49 schrieb Ruediger Pluem <rpl...@apache.org>:
>
>
>
> On 7/7/21 11:45 AM, Stefan Eissing wrote:
>> In my h2 test suite, I do a setup where I use proxy configs against the
>> server itself. We seem to have a problem performing a clean child exit with
>> that. Test in 2.4.48 and trunk:
>> - run tests with several graceful restarts
>> - no proxied request, clean exit
>> - with proxied requests
>> AH00045: child process 53682 still did not exit, sending a SIGTERM
>> AH00045: child process 53682 still did not exit, sending a SIGTERM
>> [often stops here, sometimes]
>> ...
>> AH00046: child process 53682 still did not exit, sending a SIGKILL
>>
>> Question:
>> - is such a test setup doomed to fail in general?
>> - are we sure that we cannot encounter such states in "normal" setups?
>>
>> If someone wants at log at whatever LogLevels, let me know. It's seems
>> highly reproducible.
>
> Do you have stack traces where these processes are hanging and a simple
> config that causes this?
I added a TRACE1 log in event.c before/after join_workers (line 2921) and see:
[Wed Jul 07 10:06:03.144044 2021] [mpm_event:trace1] [pid 72886:tid 4493635072]
event.c(2921): graceful termination received, join workers
[Wed Jul 07 10:06:03.144213 2021] [mpm_event:trace1] [pid 72886:tid
123145435672576] event.c(1799): All workers are busy or dying, will close 1
keep-alive connections
[Wed Jul 07 10:06:08.079690 2021] [mpm_event:trace1] [pid 72886:tid
123145435672576] event.c(1799): All workers are busy or dying, will close 0
keep-alive connections
[Wed Jul 07 10:06:10.787777 2021] [core:warn] [pid 72813:tid 4493635072]
AH00045: child process 72886 still did not exit, sending a SIGTERM
[Wed Jul 07 10:06:12.789335 2021] [core:warn] [pid 72813:tid 4493635072]
AH00045: child process 72886 still did not exit, sending a SIGTERM
[Wed Jul 07 10:06:14.791281 2021] [core:warn] [pid 72813:tid 4493635072]
AH00045: child process 72886 still did not exit, sending a SIGTERM
[Wed Jul 07 10:06:16.792983 2021] [core:error] [pid 72813:tid 4493635072]
AH00046: child process 72886 still did not exit, sending a SIGKILL
So, I assume the keep-alive connection is the mod_proxy_http connection to the
sever itself. Since the join_workers() never returns, there seems to be a
thread not finishing.
On another run, I got a stacktrace of the child:
Call graph:
264 Thread_36766542 DispatchQueue_1: com.apple.main-thread (serial)
+ 264 start (in libdyld.dylib) + 1 [0x7fff2032cf5d]
+ 264 main (in httpd) + 2278 [0x104729b86] main.c:862
+ 264 ap_run_mpm (in httpd) + 75 [0x10473917b] mpm_common.c:100
+ 264 event_run (in mod_mpm_event.so) + 2994 [0x10502ae62]
event.c:3398
+ 264 make_child (in mod_mpm_event.so) + 436 [0x10502bbc4]
event.c:2997
+ 264 child_main (in mod_mpm_event.so) + 1734 [0x10502c2e6]
event.c:2924
+ 264 join_workers (in mod_mpm_event.so) + 386 [0x10502cc72]
event.c:2717
+ 264 apr_thread_join (in libapr-2.0.dylib) + 44
[0x1048b347c] thread.c:256
+ 264 _pthread_join (in libsystem_pthread.dylib) + 362
[0x7fff20312f60]
+ 264 __ulock_wait (in libsystem_kernel.dylib) + 10
[0x7fff202dd9ee]
264 Thread_36766548
+ 264 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
+ 264 _pthread_start (in libsystem_pthread.dylib) + 224 [0x7fff203118fc]
+ 264 dummy_worker (in libapr-2.0.dylib) + 30 [0x1048b33ee]
thread.c:148
+ 264 slot_run (in mod_http2.so) + 189 [0x104f99a8d]
h2_workers.c:260
+ 264 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
[0x7fff20311e49]
+ 264 __psynch_cvwait (in libsystem_kernel.dylib) + 10
[0x7fff202decde]
264 Thread_36766549
+ 264 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
+ 264 _pthread_start (in libsystem_pthread.dylib) + 224 [0x7fff203118fc]
+ 264 dummy_worker (in libapr-2.0.dylib) + 30 [0x1048b33ee]
thread.c:148
+ 264 slot_run (in mod_http2.so) + 189 [0x104f99a8d]
h2_workers.c:260
+ 264 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
[0x7fff20311e49]
+ 264 __psynch_cvwait (in libsystem_kernel.dylib) + 10
[0x7fff202decde]
264 Thread_36766550
+ 264 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
+ 264 _pthread_start (in libsystem_pthread.dylib) + 224 [0x7fff203118fc]
+ 264 dummy_worker (in libapr-2.0.dylib) + 30 [0x1048b33ee]
thread.c:148
+ 264 slot_run (in mod_http2.so) + 189 [0x104f99a8d]
h2_workers.c:260
+ 264 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
[0x7fff20311e49]
+ 264 __psynch_cvwait (in libsystem_kernel.dylib) + 10
[0x7fff202decde]
264 Thread_36766551
+ 264 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
+ 264 _pthread_start (in libsystem_pthread.dylib) + 224 [0x7fff203118fc]
+ 264 dummy_worker (in libapr-2.0.dylib) + 30 [0x1048b33ee]
thread.c:148
+ 264 slot_run (in mod_http2.so) + 189 [0x104f99a8d]
h2_workers.c:260
+ 264 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
[0x7fff20311e49]
+ 264 __psynch_cvwait (in libsystem_kernel.dylib) + 10
[0x7fff202decde]
264 Thread_36766565
+ 264 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
+ 264 _pthread_start (in libsystem_pthread.dylib) + 224 [0x7fff203118fc]
+ 264 dummy_worker (in libapr-2.0.dylib) + 30 [0x1048b33ee]
thread.c:148
+ 264 worker_thread (in mod_mpm_event.so) + 380 [0x10502cf2c]
event.c:2340
+ 264 ap_queue_pop_something (in httpd) + 142 [0x10473cf3e]
mpm_fdqueue.c:447
+ 264 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
[0x7fff20311e49]
+ 264 __psynch_cvwait (in libsystem_kernel.dylib) + 10
[0x7fff202decde]
264 Thread_36766579
264 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
264 _pthread_start (in libsystem_pthread.dylib) + 224 [0x7fff203118fc]
264 dummy_worker (in libapr-2.0.dylib) + 30 [0x1048b33ee]
thread.c:148
264 listener_thread (in mod_mpm_event.so) + 1129 [0x10502e069]
event.c:1940
264 impl_pollset_poll (in libapr-2.0.dylib) + 114 [0x1048aec82]
kqueue.c:272
264 kevent (in libsystem_kernel.dylib) + 10 [0x7fff202e0c4a]
>
> Regards
>
> RĂ¼diger