http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5665

           Summary: spamd doesn't recognize when children have exited
           Product: Spamassassin
           Version: 3.2.3
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P4
         Component: spamc/spamd
        AssignedTo: [email protected]
        ReportedBy: [EMAIL PROTECTED]


We're running a cluster of 4 spamd servers on Debian etch, amd64.  With a recent
upgrade to 3.2.3, we've started seeing spamd not notice that exiting children
have in fact exited (according to ps and top), and retains a ghost record in the
K state.

Over time, this fills up spamd's internal child tracking table, and eventually
all processing stalls out.

With the default values for --min-children, --min-spare, and
--max-conn-per-child, the first ghost entry shows up within about 15 minutes. 
Raising one or several of these in combination seems to make the problem less
likely.

Each ghost entry can be seen to happen along with a set of log entries like 
these:

prefork: cannot ping 25046, file handle not defined, child likely to still be
processing SIGCHLD handler after killing itself
prefork: killing failed child 25046 fd=undefined at
/opt/spamassassin-3.2.3/share/perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm
line 171.
prefork: kill of failed child 25046 failed: No such process
prefork: killed child 25046

This appears to be similar to bug 5313, but inverted;  the child processes *are*
killed successfully according to the OS, but spamd doesn't find out about it. 
Checking with ps or top shows that the PID in the log has in fact exited.

Enabling --round-robin seems to be working around the problem for now, but the
overall system load is much higher.

SA is installed from source on all four machines by a script set up to keep the
installations as close as possible to identical.

The Bayes DB is in MySQL on one machine;  that system is slightly slower to lose
track of its spamd children than the other 3.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to