When creating a subprocess (ie a CGI in this case), APR has several
choices on how to clean it up, when the parent process is exiting or
running a pool cleanup. Using apr_pool_note_subprocess, the choices are:
APR_KILL_NEVER -- process is never sent any signals
APR_KILL_ALWAYS-- process is sent SIGKILL on apr_pool_t cleanup
APR_KILL_AFTER_TIMEOUT -- SIGTERM, wait 3 seconds, SIGKILL
APR_JUST_WAIT -- wait forever for the process to complete
APR_KILL_ONLY_ONCE -- send SIGTERM and then wait
Currently, mod_cgi{d} sets APR_KILL_AFTER_TIMEOUT. It appears however,
on a server under high load (100 load average), it is possible for only
the initial SIGTERM to be sent. It appears that the SIGKILL was never
sent to the child CGI processes.
I believe that the parent process, which is supposed to have a 7 second
space between its own SIGTERM and SIGKILL, is getting the SIGKILL before
it has slept for 3 seconds *and* sent the final SIGKILL to the child
CGIs. This has the potential to leave broken child processes behind.
In the real world, this didn't happen on just a single machine... but
was widespread over a cluster of machines. About 70% of them had this
problem after an `apachectl stop`.
Attached is a patch to mod_cgi and mod_cgid, which switches it to using
APR_KILL_ALWAYS, which appears to resolve this issue.
Is there anything seriously wrong with using SIGKILL first?
Thanks,
Paul
Index: modules/generators/mod_cgi.c
===
--- modules/generators/mod_cgi.c(revision 467373)
+++ modules/generators/mod_cgi.c(working copy)
@@ -462,7 +462,7 @@
apr_filepath_name_get(r-filename));
}
else {
-apr_pool_note_subprocess(p, procnew, APR_KILL_AFTER_TIMEOUT);
+apr_pool_note_subprocess(p, procnew, APR_KILL_ALWAYS);
*script_in = procnew-out;
if (!*script_in)
Index: modules/generators/mod_cgid.c
===
--- modules/generators/mod_cgid.c (revision 467373)
+++ modules/generators/mod_cgid.c (working copy)
@@ -802,7 +802,7 @@
}
procnew-pid = daemon_pid;
procnew-err = procnew-in = procnew-out = NULL;
-apr_pool_note_subprocess(p, procnew, APR_KILL_AFTER_TIMEOUT);
+apr_pool_note_subprocess(p, procnew, APR_KILL_ALWAYS);
#if APR_HAS_OTHER_CHILD
apr_proc_other_child_register(procnew, cgid_maint, procnew, NULL, p);
#endif