http://bugzilla.spamassassin.org/show_bug.cgi?id=4316

           Summary: spamd loops spawning children that die immediately.
           Product: Spamassassin
           Version: 3.0.2
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: critical
          Priority: P1
         Component: spamc/spamd
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: [EMAIL PROTECTED]


I'm running a Sendmail/procmail/spamassassin combo on Debian Linux using
spamc/spamd and recently noticed the following errors in my sendmail log:

sm-mta[30568]: j412QfpS030544: timeout waiting for input from local during
Draining Input

Checking the running processes showed things like:

17759 ?        S      0:00 spamc
17812 ?        S      0:00 spamc
17831 ?        S      0:00 spamc
17980 ?        S      0:00 spamc
18614 ?        S      0:00 spamc
19870 ?        S      0:00 spamc
20297 ?        S      0:00 spamc
20517 ?        S      0:00 spamc
21465 ?        R     13:23 /usr/sbin/spamd --create-prefs --max-children 5
--helper-home-dir --max-conn-per-child 5 -d --pidfile=/var/run/spamd.pid
22280 ?        Z      0:00 [spamd <defunct>]

Telnetting to spamd brought no response. Turning on logging would show
everything normal up until the problem occured - at which point there were NO
log entries from SA at all, and the last entry would show the finish of a
completely normal message processing. strace turned out to be more informative
and pointed me to the real culprit (I'll attach a trace in a minute). The end
result was failed communication between SA and syslog. A few days earlier I had
replaced my sysklogd for syslog-ng (1.5.15-1.1). It appears after several hours
(12-24 hours on the average with --max-conn-per-child set to 200) that the
communication between SA and syslog-ng would break at which point spamd would
start madly spawning children that would die immediately. It appears that spamd
is completely dependant upon syslog and cannot function correctly without it. I
stopped and started spamd and ran a strace on it. Then to test the theory, I
stopepd the syslog service. Once the syslog service was dead the process began.
The children died slowly at first as the parent tried to keep up by spawning
processes but eventually as more than one would die at a time, the parent
couldn't keep up and I was down to one parent madly spawning one child would
would die before the parent had a chance to spawn a second - naturally no spam
processing could be accomplished during this time. I've replaced syslog-ng with
my old sysklogd in hopes this will serve as a fix for the time being. I should
note that decreasing the value for --max-conn-per-child appears (have not done
enough testing to be certain) to also decrease the time beween failures of spamd
(or in other words, setting it to one may bring about the failure in 2-4 hours
instead of 12-24 hours). Stopping syslog seems to bring about the failure
immediately (it might not be immediately obvious from a ps, but a strace will
tell the story in a hurry).



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to