> Am 07.07.2021 um 11:49 schrieb Ruediger Pluem <rpl...@apache.org>:
> 
> 
> 
> On 7/7/21 11:45 AM, Stefan Eissing wrote:
>> In my h2 test suite, I do a setup where I use proxy configs against the 
>> server itself. We seem to have a problem performing a clean child exit with 
>> that. Test in 2.4.48 and trunk:
>> - run tests with several graceful restarts
>> - no proxied request, clean exit
>> - with proxied requests
>>  AH00045: child process 53682 still did not exit, sending a SIGTERM
>>  AH00045: child process 53682 still did not exit, sending a SIGTERM
>>  [often stops here, sometimes]
>>  ...
>>  AH00046: child process 53682 still did not exit, sending a SIGKILL
>> 
>> Question:
>> - is such a test setup doomed to fail in general?
>> - are we sure that we cannot encounter such states in "normal" setups?
>> 
>> If someone wants at log at whatever LogLevels, let me know. It's seems 
>> highly reproducible.
> 
> Do you have stack traces where these processes are hanging and a simple 
> config that causes this?

I added a TRACE1 log in event.c before/after join_workers (line 2921) and see:

[Wed Jul 07 10:06:03.144044 2021] [mpm_event:trace1] [pid 72886:tid 4493635072] 
event.c(2921): graceful termination received, join workers
[Wed Jul 07 10:06:03.144213 2021] [mpm_event:trace1] [pid 72886:tid 
123145435672576] event.c(1799): All workers are busy or dying, will close 1 
keep-alive connections
[Wed Jul 07 10:06:08.079690 2021] [mpm_event:trace1] [pid 72886:tid 
123145435672576] event.c(1799): All workers are busy or dying, will close 0 
keep-alive connections
[Wed Jul 07 10:06:10.787777 2021] [core:warn] [pid 72813:tid 4493635072] 
AH00045: child process 72886 still did not exit, sending a SIGTERM
[Wed Jul 07 10:06:12.789335 2021] [core:warn] [pid 72813:tid 4493635072] 
AH00045: child process 72886 still did not exit, sending a SIGTERM
[Wed Jul 07 10:06:14.791281 2021] [core:warn] [pid 72813:tid 4493635072] 
AH00045: child process 72886 still did not exit, sending a SIGTERM
[Wed Jul 07 10:06:16.792983 2021] [core:error] [pid 72813:tid 4493635072] 
AH00046: child process 72886 still did not exit, sending a SIGKILL

So, I assume the keep-alive connection is the mod_proxy_http connection to the 
sever itself. Since the join_workers() never returns, there seems to be a 
thread not finishing.

On another run, I got a stacktrace of the child:

Call graph:
    264 Thread_36766542   DispatchQueue_1: com.apple.main-thread  (serial)
    + 264 start  (in libdyld.dylib) + 1  [0x7fff2032cf5d]
    +   264 main  (in httpd) + 2278  [0x104729b86]  main.c:862
    +     264 ap_run_mpm  (in httpd) + 75  [0x10473917b]  mpm_common.c:100
    +       264 event_run  (in mod_mpm_event.so) + 2994  [0x10502ae62]  
event.c:3398
    +         264 make_child  (in mod_mpm_event.so) + 436  [0x10502bbc4]  
event.c:2997
    +           264 child_main  (in mod_mpm_event.so) + 1734  [0x10502c2e6]  
event.c:2924
    +             264 join_workers  (in mod_mpm_event.so) + 386  [0x10502cc72]  
event.c:2717
    +               264 apr_thread_join  (in libapr-2.0.dylib) + 44  
[0x1048b347c]  thread.c:256
    +                 264 _pthread_join  (in libsystem_pthread.dylib) + 362  
[0x7fff20312f60]
    +                   264 __ulock_wait  (in libsystem_kernel.dylib) + 10  
[0x7fff202dd9ee]
    264 Thread_36766548
    + 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
    +   264 _pthread_start  (in libsystem_pthread.dylib) + 224  [0x7fff203118fc]
    +     264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
thread.c:148
    +       264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
h2_workers.c:260
    +         264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
[0x7fff20311e49]
    +           264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
[0x7fff202decde]
    264 Thread_36766549
    + 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
    +   264 _pthread_start  (in libsystem_pthread.dylib) + 224  [0x7fff203118fc]
    +     264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
thread.c:148
    +       264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
h2_workers.c:260
    +         264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
[0x7fff20311e49]
    +           264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
[0x7fff202decde]
    264 Thread_36766550
    + 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
    +   264 _pthread_start  (in libsystem_pthread.dylib) + 224  [0x7fff203118fc]
    +     264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
thread.c:148
    +       264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
h2_workers.c:260
    +         264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
[0x7fff20311e49]
    +           264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
[0x7fff202decde]
    264 Thread_36766551
    + 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
    +   264 _pthread_start  (in libsystem_pthread.dylib) + 224  [0x7fff203118fc]
    +     264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
thread.c:148
    +       264 slot_run  (in mod_http2.so) + 189  [0x104f99a8d]  
h2_workers.c:260
    +         264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
[0x7fff20311e49]
    +           264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
[0x7fff202decde]
    264 Thread_36766565
    + 264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
    +   264 _pthread_start  (in libsystem_pthread.dylib) + 224  [0x7fff203118fc]
    +     264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
thread.c:148
    +       264 worker_thread  (in mod_mpm_event.so) + 380  [0x10502cf2c]  
event.c:2340
    +         264 ap_queue_pop_something  (in httpd) + 142  [0x10473cf3e]  
mpm_fdqueue.c:447
    +           264 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  
[0x7fff20311e49]
    +             264 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  
[0x7fff202decde]
    264 Thread_36766579
      264 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff2030d443]
        264 _pthread_start  (in libsystem_pthread.dylib) + 224  [0x7fff203118fc]
          264 dummy_worker  (in libapr-2.0.dylib) + 30  [0x1048b33ee]  
thread.c:148
            264 listener_thread  (in mod_mpm_event.so) + 1129  [0x10502e069]  
event.c:1940
              264 impl_pollset_poll  (in libapr-2.0.dylib) + 114  [0x1048aec82] 
 kqueue.c:272
                264 kevent  (in libsystem_kernel.dylib) + 10  [0x7fff202e0c4a]



> 
> Regards
> 
> RĂ¼diger

Reply via email to