[Bug 3828] [review] spamd parent stops accepting requests

bugzilla-daemon 17 Nov 2004 18:35:12 -0000

http://bugzilla.spamassassin.org/show_bug.cgi?id=3828






------- Additional Comments From [EMAIL PROTECTED]  2004-11-17 10:34 -------
i got a copy of the email that hung on child pid 22626 at 11:39 am...  thanks to
full debug and 1gig log files :)

[EMAIL PROTECTED] old]# ls -la *22626*
-rw-------    1 root     root         1385 Nov 17 11:39
spamassassin.22626.IQWf2F.tmp

but as you can see, it scans fine

[EMAIL PROTECTED] old]# cat spamassassin.22626.IQWf2F.tmp | spamc -x 2>&1 
>/dev/null
[EMAIL PROTECTED] old]# STATUS=$?; echo $STATUS
0

as does the orig-* email that was dumped to disk from the incoming smtp
connection with is identical to the spamassassin tmp file.

so, we can rule out a specially crafted message for causing this.  (progress?)  
:)

here is the debug log, with strace enabled, of that scan....

-------------------
child created..

2004-11-17 11:39:01.466228500 brk(0x8574000)                          = 
0x8574000
2004-11-17 11:39:01.466231500 logmsg: [20563] server successfully spawned child
process, pid 22626
2004-11-17 11:39:01.466289500 brk(0x8575000)                          = 
0x8575000

it took in 5 check requests..

[EMAIL PROTECTED] spamd]# grep 22626 outfile  | grep retrieving
2004-11-17 11:39:04.487069500 debug: [22626] config: retrieving prefs for
$global from SQL server
2004-11-17 11:39:08.696062500
stat64("/usr/lib/perl5/site_perl/5.6.1/i386-linux/auto/Digest/SHA1/SHA1.bs",
debug: [22626] config: retrieving prefs for $global from SQL server
2004-11-17 11:39:12.844853500 debug: [22626] config: retrieving prefs for
$global from SQL server
2004-11-17 11:39:16.594829500 debug: [22626] config: retrieving prefs for
$global from SQL server
2004-11-17 11:39:21.764975500 brk(0x82cd000debug: [22626] config: retrieving
prefs for $global from SQL server


it processed the first 4, and never finished on #5.

check # 5 started at  
 11:39:21.764975500

it waited on a bayes lock for 7 seconds... finally syncing at 
 2004-11-17 11:39:28.485283500 stat64("../lib/Pod/Text.pmc", debug: [22626]
bayes: synced databases from journal in 7 seconds: 15500 unique entries (19526
total entries)

pyzor timed out in 3 seconds, that takes us to 11:39:32
 2004-11-17 11:39:32.422429500 debug: [22626] pyzor: check timed out after 3 
secs.

uribldns lookups complete successfully 
 2004-11-17 11:39:32.479788500 debug: [22626] uridnsbl: queries completed: 4
started: 2

last log entry from check #5 at 
 2004-11-17 11:39:32.503748500 debug: [22626] auto-whitelist: sql-based
connected to DBI:mysql:logs:localhost:3306


then, a couple minutes later, child destroyed (by my kill -9)

2004-11-17 11:41:41.995309500 logmsg: [20563] received child_handler CHLD
2004-11-17 11:41:41.995360500 logmsg: [20563] server hit by SIGCHLD
2004-11-17 11:41:41.995423500 logmsg: [20563] handled cleanup of child pid 22626
2004-11-17 11:41:41.995444500 logmsg: [20563] set SIGCHLD
2004-11-17 11:41:41.995486500 logmsg: [20563] Need to Spawn a new CHILD!!!
2004-11-17 11:41:41.999028500 logmsg: [20563] server successfully spawned child
process, pid 29029
-------------------


so not sure what happended, but this one hung on AWL lookup...   i'll trace
another here shortly as they seem to be hanging quite frequently on this box.

fyi, AWL works fine on all other messages... and there is nothing wrong with the
mysql end of it.

mysql> select count(*) from awl;
+----------+
| count(*) |
+----------+
|   673013 |
+----------+
1 row in set (0.00 sec)





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 3828] [review] spamd parent stops accepting requests

Reply via email to