http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4590





------- Additional Comments From [EMAIL PROTECTED]  2005-12-01 02:25 -------
OK, I've been looking at
http://cyan.dostech.net/sa-bug4590/spamd-strace.syswrite7.trimmed . for what
it's worth, here's what I can tell so far.  It just doesn't make much sense.


PIDS:

  15882: the spamd parent
  19783: a child PID that continues working OK
  30625: a child PID that stops processing mail and gets hung

FDS:

  15882 fd 4 => syslog

  15882 fd 7 => control channel for 30625, attached to fd 8
                in that process

  15882 fd 8 => control channel for 19783, fd 9


ANNOTATED STRACE:

child, waiting for events:

30625 00:02:31.541351 select(16, [8], NULL, NULL, {300, 0} <unfinished ...>

everything goes well until (event: linenumber):
parent, last good write: 12070

15882 00:05:03.547505 write(7, "P....\n", 6 <unfinished ...>
15882 00:05:03.547564 <... write resumed> ) = 6

The "P....\n" data is a ping, to check to ensure that the child can
process mails.  The child doesn't need to respond to this, it's
a one-way operation.

child, last successful read: 12070 - 12094

After that write from the parent, the child wakes up from its select(), reads
the ping,
and goes back to another select():

30625 00:05:03.547950 <... select resumed> ) = 1 (in [8], left {148, 0})
15882 00:05:03.548020 rt_sigprocmask(SIG_BLOCK, ~[],  <unfinished ...>
30625 00:05:03.548132 read(8,  <unfinished ...>
15882 00:05:03.548168 <... rt_sigprocmask resumed> [33], 8) = 0
30625 00:05:03.548265 <... read resumed> "P....\n", 6) = 6
15882 00:05:03.548334 rt_sigprocmask(SIG_BLOCK, [ALRM],  <unfinished ...>
30625 00:05:03.548440 read(8,  <unfinished ...>
15882 00:05:03.548461 <... rt_sigprocmask resumed> ~[KILL STOP], 8) = 0
30625 00:05:03.548505 <... read resumed> 0x9f81dc0, 6) = -1 EAGAIN (Resource
temporarily unavailable)
15882 00:05:03.548529 rt_sigaction(SIGALRM, {0x80a7be8, [], SA_RESTORER,
0x420276f8},  <unfinished ...>
30625 00:05:03.548591 time( <unfinished ...>
15882 00:05:03.548609 <... rt_sigaction resumed> {0x80a6dcc, [], SA_RESTORER,
0x420276f8}, 8) = 0
30625 00:05:03.548700 <... time resumed> NULL) = 1132463103
15882 00:05:03.548720 rt_sigprocmask(SIG_SETMASK, ~[KILL STOP],  <unfinished 
...>
30625 00:05:03.548792 select(16, [8], NULL, NULL, {300, 0} <unfinished ...>

and this is where the problem happens, somewhere between line 12094 and 12854.
parent, first failed write: 12854

15882 00:07:35.737337 write(7, "P....\n", 6) = -1 EAGAIN (Resource temporarily
unavailable)

parent, second failed write: 12874

15882 00:07:41.834433 write(7, "P....\n", 6) = -1 EAGAIN (Resource temporarily
unavailable)

(many more attempts 5 seconds apart, all fail)
child, failed read: 13338 - 13351

30625 00:10:03.561013 <... select resumed> ) = 0 (Timeout)
30625 00:10:03.606427 read(8, 0x9f81dc0, 6) = -1 EAGAIN (Resource temporarily
unavailable)
30625 00:10:03.614481 time(NULL)        = 1132463403
30625 00:10:03.635766 select(16, [8], NULL, NULL, {1, 0}) = 0 (Timeout)
30625 00:10:04.631238 read(8, 0x9f81dc0, 6) = -1 EAGAIN (Resource temporarily
unavailable)
30625 00:10:04.631361 time(NULL)        = 1132463404
30625 00:10:04.818517 rt_sigprocmask(SIG_BLOCK, [PIPE], [33], 8) = 0
30625 00:10:04.822736 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
30625 00:10:04.822854 rt_sigprocmask(SIG_SETMASK, [33], NULL, 8) = 0
30625 00:10:04.822906 rt_sigprocmask(SIG_BLOCK, [PIPE], [33], 8) = 0
30625 00:10:04.822962 rt_sigaction(SIGPIPE, {0x80a7be8, [], SA_RESTORER,
0x420276f8}, {SIG_DFL}, 8) = 0
30625 00:10:04.823016 rt_sigprocmask(SIG_SETMASK, [33], NULL, 8) = 0
30625 00:10:04.945869 select(8, [4], NULL, [4], {0, 0}) = 0 (Timeout)
30625 00:10:04.953285 write(4, "<20>spamd[30625]: prefork: sysre"..., 149) = 149

and it carries on with both sides retrying and failing, until end of strace.

So the problem happens somewhere between line 12094 and 12854.  However,
there's absolutely nothing of note in those lines -- just the usual idle
signal-twiddling and 2-second sleeping.

If anyone can spot anything, I'd be much obliged!  I'm stumped ;)





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to