Hi -- An older but essentially identical version of this patch is in Bugzilla PR #38737.
Using the worker MPM (but not the event MPM), if Keep-Alives are enabled and the timeout is reasonably long (e.g., 15 seconds), then worker threads wait in poll() after handling a request for any further requests on a Keep-Alive connection. On most Unix-like OSes (e.g., Solaris), when you perform a hard restart or shutdown (not a graceful one), the close_worker_sockets() function successfully alerts all worker threads that they should exit quickly. In particular, if the worker threads are polling on the socket, closing the socket in the main thread has the side-effect of causing the worker thread to immediately receive an EBADF. However, on Linux, this side-effect doesn't occur. This is intentional; see: http://bugme.osdl.org/show_bug.cgi?id=546 The consequence is that the child process's main thread waits in apr_thread_join() for each worker thread and if these are just starting a reasonably long Keep-Alive timeout period, then the parent process soon decides that the child process has stalled and ap_reclaim_child_processes() sends SIGTERM and SIGKILL to the child process. In turn, any modules that have registered pool cleanup functions on the pchild memory pool, as passed in the child_init stage, won't see their cleanup functions run. For certain modules, e.g., mod_dbd, this has relatively severe consequences, such as failing to cleanly disconnect from a database. This patch causes the main thread to signal each worker thread before attempting apr_thread_join(); if any workers are waiting in poll() they then receive EBADF immediately after the signal and this allows the restart or shutdown process to proceed as expected, with calls to all the registered pool cleanup functions. Chris. ===================================================================== --- server/mpm/worker/worker.c.orig 2006-05-03 15:04:28.429547123 -0400 +++ server/mpm/worker/worker.c 2006-05-03 15:07:04.659719568 -0400 @@ -213,6 +213,19 @@ */ #define LISTENER_SIGNAL SIGHUP +/* The WORKER_SIGNAL signal will be sent from the main thread to the + * worker threads during an ungraceful restart or shutdown. + * This ensures that on systems (i.e., Linux) where closing the worker + * socket doesn't awake the worker thread when it is polling on the socket + * (especially in apr_wait_for_io_or_timeout() when handling + * Keep-Alive connections), close_worker_sockets() and join_workers() + * still function in timely manner and allow ungraceful shutdowns to + * proceed to completion. Otherwise join_workers() doesn't return + * before the main process decides the child process is non-responsive + * and sends a SIGKILL. + */ +#define WORKER_SIGNAL AP_SIG_GRACEFUL + /* An array of socket descriptors in use by each thread used to * perform a non-graceful (forced) shutdown of the server. */ static apr_socket_t **worker_sockets; @@ -822,6 +835,11 @@ ap_scoreboard_image->servers[process_slot][thread_slot].generation = ap_my_generation; ap_update_child_status_from_indexes(process_slot, thread_slot, SERVER_STARTING, NULL); +#ifdef HAVE_PTHREAD_KILL + unblock_signal(WORKER_SIGNAL); + apr_signal(WORKER_SIGNAL, dummy_signal_handler); +#endif + while (!workers_may_exit) { if (!is_idle) { rv = ap_queue_info_set_idle(worker_queue_info, last_ptrans); @@ -1077,6 +1095,13 @@ for (i = 0; i < ap_threads_per_child; i++) { if (threads[i]) { /* if we ever created this thread */ +#ifdef HAVE_PTHREAD_KILL + apr_os_thread_t *worker_os_thread; + + apr_os_thread_get(&worker_os_thread, threads[i]); + pthread_kill(*worker_os_thread, WORKER_SIGNAL); +#endif + rv = apr_thread_join(&thread_rv, threads[i]); if (rv != APR_SUCCESS) { ap_log_error(APLOG_MARK, APLOG_CRIT, rv, ap_server_conf,