http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5422





------- Additional Comments From [EMAIL PROTECTED]  2007-04-24 06:52 -------
ok, I have a theory.   I think this is what's happening: at some point, the
{kids} array is not coherent.  for example, this order of two
immediately-contiguous lines in the log demonstrates it:

Apr 24 13:56:51 mxin001 spamd[44308]: JMD bug5313
read_one_message_from_child_socket 69792=I at
/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line 
400.
Apr 24 13:56:51 mxin001 spamd[44308]: prefork: child states:
BBKBBBBBBBBBBBBBBBBBBBBBBBBB

After the first line, $self->set_child_state ($pid, PFSTATE_IDLE); is called,
which sets the {kids} entry for pid 69792 to PFSTATE_IDLE unless the pid has no
entry (which only happens for servers which have exited).  However, there's no
"I" in the "child states" line!

This later results in a kid notifying the parent that its state is "B"
(PFSTATE_BUSY), the parent notes this, but the notification is "lost" somehow
-- hence the parent attempts to assign a job to the supposedly PFSTATE_IDLE
task, causing the error.

I think the reason it's becoming incoherent is due to an intermittent race
condition between the main thread and the SIGCHLD signal handler.  The latter
performs write ops on the {kids} hash -- it deletes entries from the hash.
Perhaps when this happens at bad times, it results in other entries getting
"lost" somehow, and therefore causing the incoherence.

I'll upload a new version of SpamdForkScaling.pm (in entirety, not a patch, too
many patches = getting messy ;).   This version moves all deletions from the
{kids} hash into the mainline, and adds (yet more) debugging info.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to