On Fri, Aug 13, 2004 at 06:25:32PM -0400, Jeff Trawick wrote:
> this sounds a bit more sane to me for timing, as long as we can exit
> as soon as all children have exited:
> 
> shutdown + 0:
>   send SIGTERM
> shutdown + 4:
>   for each child still remaining, bitch to error log and send SIGTERM again
> shutdown + 8:
>   for each child still remaining, bitch to error log and send SIGTERM again
> shutdown + 12:
>   for each child still remaining, bitch to error log and send SIGKILL
> shutdown + 16:
>   for each child still remaining, bitch to error log, send SIGKILL again, 
>     and exit anyway

I dropped the ball on this one... your proposed timing sounds OK to me
except I don't see any point in sending SIGKILL more than once. SIGTERM
can be ignored so sending that twice is semi-justified, but SIGKILL is
always fatal.

So how does this this look, it's slightly more complicated than I'd have
liked, but that is to mainly to ensure that it will never wait more than
a second between waitpid() calls:

The logic in handling the 'terminate' argument really dates to 1.3 and
doesn't make much sense any more (it only ever sends SIGTERM not
SIGHUP), not sure what to do about that.

Index: server/mpm_common.c
===================================================================
RCS file: /home/cvs/httpd-2.0/server/mpm_common.c,v
retrieving revision 1.121
diff -u -r1.121 mpm_common.c
--- server/mpm_common.c 14 Aug 2004 10:49:43 -0000      1.121
+++ server/mpm_common.c 14 Sep 2004 11:08:48 -0000
@@ -62,21 +62,31 @@
 void ap_reclaim_child_processes(int terminate)
 {
     int i;
-    long int waittime = 1024 * 16;      /* in usecs */
     apr_status_t waitret;
     int tries;
     int not_dead_yet;
     int max_daemons;
+#define NUMWAITS (7)
+    static const int timing[NUMWAITS] = {
+        16, 64, 256, 768, 1000, 1000, 1000
+    };
 
     ap_mpm_query(AP_MPMQ_MAX_DAEMON_USED, &max_daemons);
 
-    for (tries = terminate ? 4 : 1; tries <= 9; ++tries) {
-        /* don't want to hold up progress any more than
-         * necessary, but we need to allow children a few moments to exit.
-         * Set delay with an exponential backoff.
-         */
-        apr_sleep(waittime);
-        waittime = waittime * 4;
+    /* Algorithm as follows: step 1 omitted if terminate is true:
+     * 1. spend ~4 seconds wait()ing for children, 
+     * 2. log an error and send a SIGTERM to remaining children
+     * 3. spend another ~4 seconds wait()ing for children.
+     * 4. log an error and send a SIGTERM to remaining children
+     * 5. spend another ~4 seconds wait()ing for children.
+     * 6. log an error and send a SIGKILL to remaining children
+     * 7. after 64ms wait() for remaining children
+     *    and log an error for each remaining live child
+     */
+    for (tries = terminate ? NUMWAITS : 0;
+         tries <= NUMWAITS * 3 + 1; ++tries) {
+
+        apr_sleep(timing[tries % NUMWAITS] * 1000);
 
         /* now see who is done */
         not_dead_yet = 0;
@@ -95,16 +105,8 @@
             }
 
             ++not_dead_yet;
-            switch (tries) {
-            case 1:     /*  16ms */
-            case 2:     /*  82ms */
-            case 3:     /* 344ms */
-            case 4:     /*  16ms */
-                break;
-
-            case 5:     /*  82ms */
-            case 6:     /* 344ms */
-            case 7:     /* 1.4sec */
+
+            if (tries == NUMWAITS || tries == NUMWAITS * 2) {
                 /* ok, now it's being annoying */
                 ap_log_error(APLOG_MARK, APLOG_WARNING,
                              0, ap_server_conf,
@@ -112,9 +114,7 @@
                              "sending a SIGTERM",
                              (long)pid);
                 kill(pid, SIGTERM);
-                break;
-
-            case 8:     /*  6 sec */
+            } else if (tries == NUMWAITS * 3) {
                 /* die child scum */
                 ap_log_error(APLOG_MARK, APLOG_ERR,
                              0, ap_server_conf,
@@ -132,9 +132,7 @@
                  */
                 kill_thread(pid);
 #endif
-                break;
-
-            case 9:     /* 14 sec */
+            } else if (tries == NUMWAITS * 3 + 1) {
                 /* gave it our best shot, but alas...  If this really
                  * is a child we are trying to kill and it really hasn't
                  * exited, we will likely fail to bind to the port
@@ -145,7 +143,6 @@
                              "could not make child process %ld exit, "
                              "attempting to continue anyway",
                              (long)pid);
-                break;
             }
         }
 

Reply via email to