> Am 07.07.2021 um 13:51 schrieb Yann Ylavic <[email protected]>:
>
> On Wed, Jul 7, 2021 at 12:16 PM Stefan Eissing
> <[email protected]> wrote:
>>
>> I added a TRACE1 log in event.c before/after join_workers (line 2921) and
>> see:
>>
>> [Wed Jul 07 10:06:03.144044 2021] [mpm_event:trace1] [pid 72886:tid
>> 4493635072] event.c(2921): graceful termination received, join workers
>> [Wed Jul 07 10:06:03.144213 2021] [mpm_event:trace1] [pid 72886:tid
>> 123145435672576] event.c(1799): All workers are busy or dying, will close 1
>> keep-alive connections
>> [Wed Jul 07 10:06:08.079690 2021] [mpm_event:trace1] [pid 72886:tid
>> 123145435672576] event.c(1799): All workers are busy or dying, will close 0
>> keep-alive connections
>> [Wed Jul 07 10:06:10.787777 2021] [core:warn] [pid 72813:tid 4493635072]
>> AH00045: child process 72886 still did not exit, sending a SIGTERM
>> [Wed Jul 07 10:06:12.789335 2021] [core:warn] [pid 72813:tid 4493635072]
>> AH00045: child process 72886 still did not exit, sending a SIGTERM
>> [Wed Jul 07 10:06:14.791281 2021] [core:warn] [pid 72813:tid 4493635072]
>> AH00045: child process 72886 still did not exit, sending a SIGTERM
>> [Wed Jul 07 10:06:16.792983 2021] [core:error] [pid 72813:tid 4493635072]
>> AH00046: child process 72886 still did not exit, sending a SIGKILL
>>
>> So, I assume the keep-alive connection is the mod_proxy_http connection to
>> the sever itself. Since the join_workers() never returns, there seems to be
>> a thread not finishing.
>
> No, MPM event doesn't handle mod_proxy kept alive connections, those
> are client connections.
>
>>
>> On another run, I got a stacktrace of the child:
>>
>> Call graph:
>> 264 Thread_36766542 DispatchQueue_1: com.apple.main-thread (serial)
>> + 264 start (in libdyld.dylib) + 1 [0x7fff2032cf5d]
>> + 264 main (in httpd) + 2278 [0x104729b86] main.c:862
>> + 264 ap_run_mpm (in httpd) + 75 [0x10473917b] mpm_common.c:100
>> + 264 event_run (in mod_mpm_event.so) + 2994 [0x10502ae62]
>> event.c:3398
>> + 264 make_child (in mod_mpm_event.so) + 436 [0x10502bbc4]
>> event.c:2997
>> + 264 child_main (in mod_mpm_event.so) + 1734 [0x10502c2e6]
>> event.c:2924
>> + 264 join_workers (in mod_mpm_event.so) + 386
>> [0x10502cc72] event.c:2717
>> + 264 apr_thread_join (in libapr-2.0.dylib) + 44
>> [0x1048b347c] thread.c:256
>> + 264 _pthread_join (in libsystem_pthread.dylib) + 362
>> [0x7fff20312f60]
>> + 264 __ulock_wait (in libsystem_kernel.dylib) + 10
>> [0x7fff202dd9ee]
>
> OK, that's where the child waits for worker threads.
>
>> 264 Thread_36766548
>> + 264 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
>> + 264 _pthread_start (in libsystem_pthread.dylib) + 224
>> [0x7fff203118fc]
>> + 264 dummy_worker (in libapr-2.0.dylib) + 30 [0x1048b33ee]
>> thread.c:148
>> + 264 slot_run (in mod_http2.so) + 189 [0x104f99a8d]
>> h2_workers.c:260
>> + 264 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
>> [0x7fff20311e49]
>> + 264 __psynch_cvwait (in libsystem_kernel.dylib) + 10
>> [0x7fff202decde]
>> 264 Thread_36766549
>> + 264 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
>> + 264 _pthread_start (in libsystem_pthread.dylib) + 224
>> [0x7fff203118fc]
>> + 264 dummy_worker (in libapr-2.0.dylib) + 30 [0x1048b33ee]
>> thread.c:148
>> + 264 slot_run (in mod_http2.so) + 189 [0x104f99a8d]
>> h2_workers.c:260
>> + 264 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
>> [0x7fff20311e49]
>> + 264 __psynch_cvwait (in libsystem_kernel.dylib) + 10
>> [0x7fff202decde]
>> 264 Thread_36766550
>> + 264 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
>> + 264 _pthread_start (in libsystem_pthread.dylib) + 224
>> [0x7fff203118fc]
>> + 264 dummy_worker (in libapr-2.0.dylib) + 30 [0x1048b33ee]
>> thread.c:148
>> + 264 slot_run (in mod_http2.so) + 189 [0x104f99a8d]
>> h2_workers.c:260
>> + 264 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
>> [0x7fff20311e49]
>> + 264 __psynch_cvwait (in libsystem_kernel.dylib) + 10
>> [0x7fff202decde]
>> 264 Thread_36766551
>> + 264 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
>> + 264 _pthread_start (in libsystem_pthread.dylib) + 224
>> [0x7fff203118fc]
>> + 264 dummy_worker (in libapr-2.0.dylib) + 30 [0x1048b33ee]
>> thread.c:148
>> + 264 slot_run (in mod_http2.so) + 189 [0x104f99a8d]
>> h2_workers.c:260
>> + 264 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
>> [0x7fff20311e49]
>> + 264 __psynch_cvwait (in libsystem_kernel.dylib) + 10
>> [0x7fff202decde]
>
> The above mod_http2 threads are not accounted for by the MPM I suppose
> (they should be stopped by the future child_exit hook, but still the
> MPM will only care about its own worker threads, i.e. those running
> the process_connection hook).
Yes, the h2 worker thread currently get cleaned up on the pool pre cleanup.
Those are ok to see as that has not happened yet.
>
>> 264 Thread_36766565
>> + 264 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
>> + 264 _pthread_start (in libsystem_pthread.dylib) + 224
>> [0x7fff203118fc]
>> + 264 dummy_worker (in libapr-2.0.dylib) + 30 [0x1048b33ee]
>> thread.c:148
>> + 264 worker_thread (in mod_mpm_event.so) + 380 [0x10502cf2c]
>> event.c:2340
>> + 264 ap_queue_pop_something (in httpd) + 142 [0x10473cf3e]
>> mpm_fdqueue.c:447
>> + 264 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
>> [0x7fff20311e49]
>> + 264 __psynch_cvwait (in libsystem_kernel.dylib) + 10
>> [0x7fff202decde]
>
> This worker didn't get the signal to stop from ap_queue_term()..
>
>> 264 Thread_36766579
>> 264 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
>> 264 _pthread_start (in libsystem_pthread.dylib) + 224
>> [0x7fff203118fc]
>> 264 dummy_worker (in libapr-2.0.dylib) + 30 [0x1048b33ee]
>> thread.c:148
>> 264 listener_thread (in mod_mpm_event.so) + 1129 [0x10502e069]
>> event.c:1940
>> 264 impl_pollset_poll (in libapr-2.0.dylib) + 114
>> [0x1048aec82] kqueue.c:272
>> 264 kevent (in libsystem_kernel.dylib) + 10 [0x7fff202e0c4a]
>
> .. and that's because the listener didn't exit either, hmm.
>
> Your system sets listener_is_wakeable (AH02471), right?
once at process start:
[Wed Jul 07 10:11:32.368594 2021] [mpm_event:debug] [pid 73128:tid
123145437016064] event.c(2611): AH02471: start_threads: Using kqueue (wakeable)
> Does wakeup_listener() get called for the graceful restart that hangs?
>
> Cheers;
> Yann.
Inserted some debug logging into event.c (and of course, the more I insert, the
less the likelihood of it happening):
Also, I had apr trunk in srclib and switched to branches/1.7.x - same behaviour.
End of log for a hanging process:
[Wed Jul 07 14:35:12.300919 2021] [mpm_event:debug] [pid 80258:tid 4439064064]
event.c(599): AH: wakeup_listener: start
[Wed Jul 07 14:35:12.300938 2021] [mpm_event:debug] [pid 80258:tid 4439064064]
event.c(603): AH: wakeup_listener: apr_pollset_wakeup
[Wed Jul 07 14:35:12.300952 2021] [mpm_event:debug] [pid 80258:tid 4439064064]
event.c(610): AH: wakeup_listener: ap_queue_info_term
[Wed Jul 07 14:35:12.301082 2021] [mpm_event:trace1] [pid 80258:tid
123145536962560] event.c(1809): All workers are busy or dying, will close 1
keep-alive connections
[Wed Jul 07 14:35:12.301194 2021] [mpm_event:debug] [pid 80258:tid 4439064064]
event.c(629): AH: wakeup_listener: pthread_kill
[Wed Jul 07 14:35:12.301433 2021] [mpm_event:debug] [pid 80258:tid 4439064064]
event.c(635): AH: wakeup_listener: end
[Wed Jul 07 14:35:12.301444 2021] [mpm_event:trace1] [pid 80258:tid 4439064064]
event.c(2933): graceful termination received, join workers
[Wed Jul 07 14:35:16.501276 2021] [mpm_event:trace1] [pid 80258:tid
123145536962560] event.c(1809): All workers are busy or dying, will close 0
keep-alive connections
macOS process analysis:
Call graph:
2942 Thread_36817225 DispatchQueue_1: com.apple.main-thread (serial)
+ 2942 start (in libdyld.dylib) + 1 [0x7fff2032cf5d]
+ 2942 main (in httpd) + 2278 [0x107ec3b86] main.c:862
+ 2942 ap_run_mpm (in httpd) + 75 [0x107ed317b] mpm_common.c:100
+ 2942 event_run (in mod_mpm_event.so) + 2994 [0x1087c4b72]
event.c:3410
+ 2942 make_child (in mod_mpm_event.so) + 436 [0x1087c58d4]
event.c:3009
+ 2942 child_main (in mod_mpm_event.so) + 1734 [0x1087c5ff6]
event.c:2936
+ 2942 join_workers (in mod_mpm_event.so) + 197 [0x1087c67e5]
event.c:2729
+ 2942 apr_thread_join (in libapr-2.0.dylib) + 44
[0x10804d47c] thread.c:256
+ 2942 _pthread_join (in libsystem_pthread.dylib) + 362
[0x7fff20312f60]
+ 2942 __ulock_wait (in libsystem_kernel.dylib) + 10
[0x7fff202dd9ee]
2942 Thread_36817227
+ 2942 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
+ 2942 _pthread_start (in libsystem_pthread.dylib) + 224
[0x7fff203118fc]
+ 2942 dummy_worker (in libapr-2.0.dylib) + 30 [0x10804d3ee]
thread.c:148
+ 2942 slot_run (in mod_http2.so) + 189 [0x108733a8d]
h2_workers.c:260
+ 2942 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
[0x7fff20311e49]
+ 2942 __psynch_cvwait (in libsystem_kernel.dylib) + 10
[0x7fff202decde]
2942 Thread_36817228
+ 2942 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
+ 2942 _pthread_start (in libsystem_pthread.dylib) + 224
[0x7fff203118fc]
+ 2942 dummy_worker (in libapr-2.0.dylib) + 30 [0x10804d3ee]
thread.c:148
+ 2942 slot_run (in mod_http2.so) + 189 [0x108733a8d]
h2_workers.c:260
+ 2942 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
[0x7fff20311e49]
+ 2942 __psynch_cvwait (in libsystem_kernel.dylib) + 10
[0x7fff202decde]
2942 Thread_36817229
+ 2942 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
+ 2942 _pthread_start (in libsystem_pthread.dylib) + 224
[0x7fff203118fc]
+ 2942 dummy_worker (in libapr-2.0.dylib) + 30 [0x10804d3ee]
thread.c:148
+ 2942 slot_run (in mod_http2.so) + 189 [0x108733a8d]
h2_workers.c:260
+ 2942 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
[0x7fff20311e49]
+ 2942 __psynch_cvwait (in libsystem_kernel.dylib) + 10
[0x7fff202decde]
2942 Thread_36817230
+ 2942 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
+ 2942 _pthread_start (in libsystem_pthread.dylib) + 224
[0x7fff203118fc]
+ 2942 dummy_worker (in libapr-2.0.dylib) + 30 [0x10804d3ee]
thread.c:148
+ 2942 slot_run (in mod_http2.so) + 189 [0x108733a8d]
h2_workers.c:260
+ 2942 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
[0x7fff20311e49]
+ 2942 __psynch_cvwait (in libsystem_kernel.dylib) + 10
[0x7fff202decde]
2942 Thread_36817254
+ 2942 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
+ 2942 _pthread_start (in libsystem_pthread.dylib) + 224
[0x7fff203118fc]
+ 2942 dummy_worker (in libapr-2.0.dylib) + 30 [0x10804d3ee]
thread.c:148
+ 2942 worker_thread (in mod_mpm_event.so) + 300 [0x1087c6a4c]
event.c:2350
+ 2942 ap_queue_pop_something (in httpd) + 142 [0x107ed6f3e]
mpm_fdqueue.c:447
+ 2942 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
[0x7fff20311e49]
+ 2942 __psynch_cvwait (in libsystem_kernel.dylib) + 10
[0x7fff202decde]
2942 Thread_36817257
2942 thread_start (in libsystem_pthread.dylib) + 15 [0x7fff2030d443]
2942 _pthread_start (in libsystem_pthread.dylib) + 224
[0x7fff203118fc]
2942 dummy_worker (in libapr-2.0.dylib) + 30 [0x10804d3ee]
thread.c:148
2942 listener_thread (in mod_mpm_event.so) + 1148 [0x1087c7c2c]
event.c:1950
2942 impl_pollset_poll (in libapr-2.0.dylib) + 114
[0x108048c82] kqueue.c:272
2942 kevent (in libsystem_kernel.dylib) + 10 [0x7fff202e0c4a]
Total number in stack (recursive counted multiple, when >=5):
6 _pthread_start (in libsystem_pthread.dylib) + 224
[0x7fff203118fc]
6 dummy_worker (in libapr-2.0.dylib) + 30 [0x10804d3ee]
thread.c:148
6 thread_start (in libsystem_pthread.dylib) + 15
[0x7fff2030d443]
5 __psynch_cvwait (in libsystem_kernel.dylib) + 0
[0x7fff202decd4]
5 _pthread_cond_wait (in libsystem_pthread.dylib) + 1298
[0x7fff20311e49]
Sort by top of stack, same collapsed (when >= 5):
__psynch_cvwait (in libsystem_kernel.dylib) 14710
__ulock_wait (in libsystem_kernel.dylib) 2942
kevent (in libsystem_kernel.dylib) 2942