[PATCH 5/6] hard restart on Linux #38737

Chris Darroch Thu, 04 May 2006 11:17:07 -0700

Hi --

   An older but essentially identical version of this patch is
in Bugzilla PR #38737.


   Using the worker MPM (but not the event MPM), if Keep-Alives
are enabled and the timeout is reasonably long (e.g., 15 seconds),
then worker threads wait in poll() after handling a request
for any further requests on a Keep-Alive connection.

   On most Unix-like OSes (e.g., Solaris), when you perform a hard
restart or shutdown (not a graceful one), the close_worker_sockets()
function successfully alerts all worker threads that they should
exit quickly.  In particular, if the worker threads are polling
on the socket, closing the socket in the main thread has the
side-effect of causing the worker thread to immediately receive
an EBADF.

   However, on Linux, this side-effect doesn't occur.  This is
intentional; see:

http://bugme.osdl.org/show_bug.cgi?id=546

   The consequence is that the child process's main thread
waits in apr_thread_join() for each worker thread and if these
are just starting a reasonably long Keep-Alive timeout period,
then the parent process soon decides that the child process
has stalled and ap_reclaim_child_processes() sends SIGTERM
and SIGKILL to the child process.

   In turn, any modules that have registered pool cleanup
functions on the pchild memory pool, as passed in the child_init
stage, won't see their cleanup functions run.  For certain
modules, e.g., mod_dbd, this has relatively severe consequences,
such as failing to cleanly disconnect from a database.

   This patch causes the main thread to signal each worker
thread before attempting apr_thread_join(); if any workers
are waiting in poll() they then receive EBADF immediately
after the signal and this allows the restart or shutdown
process to proceed as expected, with calls to all the
registered pool cleanup functions.

Chris.

=====================================================================
--- server/mpm/worker/worker.c.orig     2006-05-03 15:04:28.429547123 -0400
+++ server/mpm/worker/worker.c  2006-05-03 15:07:04.659719568 -0400
@@ -213,6 +213,19 @@
  */
 #define LISTENER_SIGNAL     SIGHUP
 
+/* The WORKER_SIGNAL signal will be sent from the main thread to the
+ * worker threads during an ungraceful restart or shutdown.
+ * This ensures that on systems (i.e., Linux) where closing the worker
+ * socket doesn't awake the worker thread when it is polling on the socket
+ * (especially in apr_wait_for_io_or_timeout() when handling
+ * Keep-Alive connections), close_worker_sockets() and join_workers()
+ * still function in timely manner and allow ungraceful shutdowns to
+ * proceed to completion.  Otherwise join_workers() doesn't return
+ * before the main process decides the child process is non-responsive
+ * and sends a SIGKILL.
+ */
+#define WORKER_SIGNAL       AP_SIG_GRACEFUL
+
 /* An array of socket descriptors in use by each thread used to
  * perform a non-graceful (forced) shutdown of the server. */
 static apr_socket_t **worker_sockets;
@@ -822,6 +835,11 @@
     ap_scoreboard_image->servers[process_slot][thread_slot].generation = 
ap_my_generation;
     ap_update_child_status_from_indexes(process_slot, thread_slot, 
SERVER_STARTING, NULL);
 
+#ifdef HAVE_PTHREAD_KILL
+    unblock_signal(WORKER_SIGNAL);
+    apr_signal(WORKER_SIGNAL, dummy_signal_handler);
+#endif
+
     while (!workers_may_exit) {
         if (!is_idle) {
             rv = ap_queue_info_set_idle(worker_queue_info, last_ptrans);
@@ -1077,6 +1095,13 @@
 
     for (i = 0; i < ap_threads_per_child; i++) {
         if (threads[i]) { /* if we ever created this thread */
+#ifdef HAVE_PTHREAD_KILL
+            apr_os_thread_t *worker_os_thread;
+
+            apr_os_thread_get(&worker_os_thread, threads[i]);
+            pthread_kill(*worker_os_thread, WORKER_SIGNAL);
+#endif
+
             rv = apr_thread_join(&thread_rv, threads[i]);
             if (rv != APR_SUCCESS) {
                 ap_log_error(APLOG_MARK, APLOG_CRIT, rv, ap_server_conf,

[PATCH 5/6] hard restart on Linux #38737

Reply via email to