http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5413

           Summary: Child processes not being fully killed
           Product: Spamassassin
           Version: 3.1.8
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P5
         Component: spamc/spamd
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


We are running 2 servers using Fedora Core 4, and spamassassin 3.1.8.

We have noticed that the child processes are being marked as killed, but that
they actually still exist. As such eventually we find all the child processes in
a killed state, and spamd cannot do any actual work.

We had a similar problem last year, but this was due to SELinux. Both servers
currently have SELinux disabled. See
http://marc.info/?l=spamassassin-users&m=115763870219401&w=2

The problem only seems to occur when the server becomes busy, but even then does
not always happen. The log file shows for example:

  Apr 12 11:46:10 mary spamd[16566]: prefork: child states: KIIKKK

I currently have a process monitoring this to warn me when 3 'K's appear, so
neither server should actually stop processing spam. Our max-children is set to
10, and a minimum of 4. Max-conn-per-child is set to 100.

An strace of a 'stuck' child shows:

=========================================================
[EMAIL PROTECTED] ~]# strace -Ff -p 25131
Process 25131 attached - interrupt to quit
select(16, [10], NULL, NULL, {258, 808000}) = 1 (in [10], left {149, 268000})
read(10, "P....\n", 6)                  = 6
read(10, 0xb20ba00, 6)                  = -1 EAGAIN (Resource temporarily
unavailable)
time(NULL)                              = 1176374820
select(16, [10], NULL, NULL, {300, 0})  = 1 (in [10], left {149, 628000})
read(10, "P....\n", 6)                  = 6
read(10, 0xb20ba00, 6)                  = -1 EAGAIN (Resource temporarily
unavailable)
time(NULL)                              = 1176374971
=========================================================

This just seems to loop round forever. Using 'kill' gets rid of the child.

I have looked through some of the other bug reports, but can find none quite
like this one. The log file shows no error messages such as sysread/syswrite
errors or timeouts.


John.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to