On 4/6/24 13:10, Yann Ylavic wrote:
On Sat, Apr 6, 2024 at 10:46 AM jean-frederic clere <jfcl...@gmail.com> wrote:

On 4/5/24 07:55, Ruediger Pluem wrote:

Are you able to provide a stacktrace of the hanging process (thread apply all 
bt full)?

It seems pthread_kill(t, 0) returns 0 even the thread t has exited...
older version of fedora will return 3 (I have tried fc28)

If pthread_kill() does not work we probably should use the global
"dying" variable like in mpm_event.
But it's not clear from your earlier "bt full" whether there are other
threads, could you try "thread apply all bt full" instead to show all
the threads?

(gdb) thread apply all bt full

Thread 1 (Thread 0x7ffbf3f5ad40 (LWP 2891875)):
#0 0x00007ffbf429b087 in __GI___select (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7fff56cbb0b0) at ../sysdeps/unix/sysv/linux/select.c:69
        sc_ret = -4
        sc_cancel_oldtype = 0
        sc_ret = <optimized out>
        s = <optimized out>
        us = <optimized out>
        ns = <optimized out>
        ts64 = {tv_sec = 0, tv_nsec = 155950744}
        pts64 = 0x7fff56cbb050
        r = <optimized out>
#1 0x00007ffbf43d9d92 in apr_sleep (t=t@entry=500000) at time/unix/time.c:249
        tv = {tv_sec = 0, tv_usec = 500000}
#2 0x0000000000440733 in join_workers (listener=0x87c170, threads=threads@entry=0x91e150, mode=mode@entry=2) at worker.c:1069
        iter = 7
        i = <optimized out>
        rv = <optimized out>
        thread_rv = 0
#3 0x00000000004412d9 in child_main (child_num_arg=child_num_arg@entry=0, child_bucket=child_bucket@entry=0) at worker.c:1310
        threads = 0x91e150
        rv = 1
        ts = 0x815a78
        thread_attr = 0x815a98
        start_thread_id = 0x815b08
        i = <optimized out>
#4 0x000000000044161a in make_child (s=0x818d00, slot=slot@entry=0, bucket=0) at worker.c:1376
        pid = 0
#5 0x00000000004416be in startup_children (number_to_start=3) at worker.c:1403
        i = 0
#6 0x00000000004428f9 in worker_run (_pconf=<optimized out>, plog=0x81b998, s=0x818d00) at worker.c:1928
        listen_buckets = 0x875480
        num_buckets = 1
        remaining_children_to_start = <optimized out>
        rv = <optimized out>
        id = "0\000\000\000\000\000\000\000\t\000\000\000\000\000\000"
        i = <optimized out>
#7 0x0000000000456930 in ap_run_mpm (pconf=pconf@entry=0x7ec3e8, plog=0x81b998, s=0x818d00) at mpm_common.c:102
        pHook = <optimized out>
        n = 0
        rv = -1
#8 0x000000000043350e in main (argc=<optimized out>, argv=<optimized out>) at main.c:882
        c = 102 'f'
        showcompile = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        showdirectives = <optimized out>
        confname = <optimized out>
        def_server_root = <optimized out>
        temp_error_log = <optimized out>
        error = <optimized out>
        process = 0x7ea4c8
        pconf = 0x7ec3e8
        plog = 0x81b998
        ptemp = 0x815678
        pcommands = <optimized out>
        opt = 0x810ef0
        rv = <optimized out>
        mod = <optimized out>
opt_arg = 0x7fff56cbcb64 "/home/jfclere/httpd-trunk/test/pyhttpd/../gen/apache/conf/httpd.conf"
        signal_server = <optimized out>
        rc = <optimized out>
(gdb)

I have added a kill(pid, SIGABRT); in server/mpm_unix.c after the ap_log_error() as it is not easy to get a core otherwise.


It's clear from the main thread's backtrace that it's waiting for the
listener in the "iter" loop, but nothing tells if the listener already
exited or not. The listener for instance could be waiting indefinitely
apr_pollset_poll() at this point, and since there is no pollset wakeup
in mpm_worker I don't think that wakeup_listener() can help here.

According to my tests using assert(0) in the join_workers() in different location, the listener thread is stopped by wakeup_listener() but the pthread_kill() doesn't report that.


So maybe we need to add an apr_pollset_wakeup() in wakeup_listener()
too, like in mpm_event too.

Overall something like the attached patch?

Yes the attached patch helps



Regards;
Yann.

--
Cheers

Jean-Frederic

Reply via email to