Re: cgi: KILL_AFTER_TIMEOUT vs KILL_ALWAYS

2006-10-25 Thread Joe Orton
On Wed, Oct 25, 2006 at 08:17:22AM +0200, Plüm, Rüdiger, VF EITO wrote:
   I believe that the parent process, which is supposed to have a 7 
   second space between its own SIGTERM and SIGKILL, is getting the 
   SIGKILL before it has slept for 3 seconds *and* sent the final 
   SIGKILL to the child CGIs.  This has the potential to leave broken 
   child processes behind.
  
  apachectl stop never SIGKILLs the parent though, are you using init 
  scripts for this really?  We had exactly that problem in the
 
 But the parent httpd process sents SIGKILL to its child processes after a
 timeout, correct? And the cgi's are childs of the httpd childs.

Yes, but Paul was writing about the *parent* getting SIGKILLed before it 
gets a chance to itself SIGKILL any errant CGI scripts.  That won't 
happen with apachectl; but init scripts are indeed written to do a 
SIGTERM, wait, SIGKILL on the parent.

joe


cgi: KILL_AFTER_TIMEOUT vs KILL_ALWAYS

2006-10-24 Thread Paul Querna
When creating a subprocess (ie a CGI in this case), APR has several 
choices on how to clean it up, when the parent process is exiting or 
running a pool cleanup.  Using apr_pool_note_subprocess, the choices are:

 APR_KILL_NEVER -- process is never sent any signals
 APR_KILL_ALWAYS-- process is sent SIGKILL on apr_pool_t cleanup
 APR_KILL_AFTER_TIMEOUT -- SIGTERM, wait 3 seconds, SIGKILL
 APR_JUST_WAIT  -- wait forever for the process to complete
 APR_KILL_ONLY_ONCE -- send SIGTERM and then wait

Currently, mod_cgi{d} sets APR_KILL_AFTER_TIMEOUT.  It appears however, 
on a server under high load (100 load average), it is possible for only 
the initial SIGTERM to be sent.  It appears that the SIGKILL was never 
sent to the child CGI processes.


I believe that the parent process, which is supposed to have a 7 second 
space between its own SIGTERM and SIGKILL, is getting the SIGKILL before 
it has slept for 3 seconds *and* sent the final SIGKILL to the child 
CGIs.  This has the potential to leave broken child processes behind.


In the real world, this didn't happen on just a single machine... but 
was widespread over a cluster of machines.  About 70% of them had this 
problem after an `apachectl stop`.


Attached is a patch to mod_cgi and mod_cgid, which switches it to using 
APR_KILL_ALWAYS, which appears to resolve this issue.


Is there anything seriously wrong with using SIGKILL first?

Thanks,

Paul
Index: modules/generators/mod_cgi.c
===
--- modules/generators/mod_cgi.c(revision 467373)
+++ modules/generators/mod_cgi.c(working copy)
@@ -462,7 +462,7 @@
   apr_filepath_name_get(r-filename));
 }
 else {
-apr_pool_note_subprocess(p, procnew, APR_KILL_AFTER_TIMEOUT);
+apr_pool_note_subprocess(p, procnew, APR_KILL_ALWAYS);
 
 *script_in = procnew-out;
 if (!*script_in)
Index: modules/generators/mod_cgid.c
===
--- modules/generators/mod_cgid.c   (revision 467373)
+++ modules/generators/mod_cgid.c   (working copy)
@@ -802,7 +802,7 @@
 }
 procnew-pid = daemon_pid;
 procnew-err = procnew-in = procnew-out = NULL;
-apr_pool_note_subprocess(p, procnew, APR_KILL_AFTER_TIMEOUT);
+apr_pool_note_subprocess(p, procnew, APR_KILL_ALWAYS);
 #if APR_HAS_OTHER_CHILD
 apr_proc_other_child_register(procnew, cgid_maint, procnew, NULL, p);
 #endif