[Bug 4703] process leak for helper applications (pyzor)

bugzilla-daemon Wed, 23 Nov 2005 04:35:29 -0800

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4703






------- Additional Comments From [EMAIL PROTECTED]  2005-11-23 13:35 -------
(In reply to comment #10)
> I don't know. Why aren't people seeing the zombies in 3.1 after the patch in
> 4518 was applied? I notice that the 3.1 code comments out the calls to
> cleanup_kids. How does it get away with that?
I just downloaded 3.1. I see lot has changed in terms of children handling. 
If you take a look just before the end of the eval block (in Pyzor.pm, where the
alarm signal is handled, around line 383), you can see a waitpid in place of a
cleanup_kids call. So, afaik, the zombie is leftover only when the alarm is
delivered, and waitpid not called (the patch in bug 4518 only takes care of
closing file handles), which happens _very_ rarely in low/medium loaded mail
servers.

By looking in the cleanup_kids code, I believe it has been replaced with the
simple waitpid call since in enter_helper_run_mode in Dns.pm the CHLD handler is
always set to DEFAULT. cleanup_kids thus becomes useless, since it only checks
if $SIG{'CHLD'} is unset or set to IGNORE.

In practice, to avoid zombies, every forked children must be waited for, or
SIG{'CHLD'} set to 'IGNORE'. Currently, I believe that SIG{'CHLD'} is set to
DEFAULT, and when the alarm is delivered, the eval block leaves in advance, and
the process is never waited for. I really don't know if there is something
around the spamassassin code that once in a while loops over all dead children
and waits for them ... (I can see a SIG{'CHLD'} handler in the spamassassin
parent, but that is just to handle forking and other connection handlers) but 
in:
+    if ($pid) {
+      if (kill('TERM',$pid)) { dbg("pyzor: killed stale helper [$pid]") }
+      else { dbg("pyzor: killing helper application [$pid] failed: $!") }
+    }

I'd add something like:
  if ($pid) {
    if (kill('TERM',$pid)) { 
      dbg("pyzor: succesfully sent TERM signal to helper [$pid]"); 
      sleep(1);  # give it the time to really exit
    } else { 
      # Note that the process may have died by
      # itself after the alarm but before the kill, and
      # note that a succesfully kill call doesn't mean the
      # process was killed...
      dbg("pyzor: couldn't send kill signal to helper" . 
          "application [$pid] failed: $! [maybe it died in the mean time??]");
    }

      # Check if the process has died, and get its 
      # status (cleaning up any eventual zombie)
    if(waitpid($pid, WNOHANG) == 0) {
        # it is still alive
      kill('KILL', $pid);
        # Now, the pid will die for sure, so, do a
        # blocking wait
      waitpid($pid, 0);
      dbg("pyzor: [$pid] has been killed with SIGKILL");
    } else {
      dbg("pyzor: [$pid] died nicely with $? status" . 
          " after SIGTERM was delivered");
    }
  }

To avoid the sleep, the first waitpid could be replaced by a waitpid($pid, 0),
with a timeout implemented once again using alarms or SIGCHLD handlers.
  If a timeout is not implemented, in case the helper does not terminate when
SIGTERM is received, the spamd process would hang until the child really dies
(which could take forever). 

Cheers,
Carlo



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4703] process leak for helper applications (pyzor)

Reply via email to