> 
> However, among the topics discussed include a large number of 
> developers claiming that SpamAssassin essentially crashed 
> their system by using up way too much memory. jm, quinlan, I 
> suspect you may be interested in these reports. I've linked 
> them below:
> 

Ok.. I will chime in with my thoughts.  Feel free to flame me or tell me
I'm completely wrong, I'm okay with that.

I think the real-time auto expiry on bayes is causing some major issues.
I've seen scan times get pushed out 60-180 seconds when a spamc child
has to do the auto expiry on bayes... And in my case, this causes
problems because I run  spamc -x -t60 and tempfail on error codes > 0.
Like Theo mentioned somewhere, passing something back to the parent and
forking a new child for expiration would be a good thing.

Also when expiration is happening, I'm seeing 90-99% cpu utilization by
that child and it pushes my loads pretty hard for a couple minutes.  I
think the expiry descision made by BayesStore.pm is too lax... It causes
expiries to wait longer and causes _toks to grow pretty large on high
traffic systems before expiry ever commences.   And then when it does,
it has to work very hard to expire.  I know we can cron job a sa-learn
--force-expire on high traffic sites, but it shouldn't come to this
should it?

A 'GLOBAL' bayes _toks on systems seeing 10-40k msg/hour grow to 15-20MB
before expiry happens.  When expiry does occur, it takes about 300
seconds, bringing the system to its knees for several minutes.  I had to
change some code in BayesStore.pm that removed a lot of the expiry
restrictions so I could get it down to expiring every 1 to 2 hours and
keeping _toks around 4-6 MB.  I even went as far as upping the
auto-learn spam and lowering the auto-learn non-spam thresholds even
further than defaults to prevent more learning from occuring to reduce
the amount of _toks.  Still too much learning going on.   A 15-20MB
global _tok's causes a lot of .lock files to hang around, and a lot of
spamd child to sit spinning waiting on the locks to go away.   This
pushes out scan times, backups up request, and sometimes exits 74 when
the timeout is reached.

I guess I'm no longer experiencing the high memory utilization that a
lot of people report because I softlimit my spamd calls.  In 2.6X I was
seeing alot of "Out of memory!" (produced by daemontools, no SA)
messages in the message log with a softlimit of 75mb.  In 3.0.0, I'm
still seeing them.  

[EMAIL PROTECTED] spamd]# grep 'Out of' spamdlog
2004-10-08 02:14:25.356978500 Out of memory!
2004-10-08 02:14:36.197282500 Out of memory!
2004-10-08 02:14:43.524008500 Out of memory!
2004-10-08 02:14:51.968360500 Out of memory!
2004-10-08 02:15:36.171130500 Out of memory!
2004-10-08 03:14:25.567605500 Out of memory!
2004-10-08 04:14:17.167221500 Out of memory!
2004-10-08 04:14:35.019993500 Out of memory!
2004-10-08 04:14:55.984522500 Out of memory!
2004-10-08 04:15:03.615092500 Out of memory!
2004-10-08 04:28:14.128027500 Out of memory!
2004-10-08 05:14:17.391842500 Out of memory!
2004-10-08 05:14:23.865585500 Out of memory!
2004-10-08 05:14:44.957526500 Out of memory!
2004-10-08 06:15:14.764137500 Out of memory!
2004-10-08 07:15:07.621800500 Out of memory!
2004-10-08 08:14:21.782449500 Out of memory!
2004-10-08 08:15:11.661760500 Out of memory!
2004-10-08 09:14:31.732801500 Out of memory!
2004-10-08 09:14:47.962657500 Out of memory!
2004-10-08 09:15:07.340121500 Out of memory!
2004-10-08 09:15:20.094086500 Out of memory!
2004-10-08 10:14:25.233403500 Out of memory!
2004-10-08 10:14:30.909341500 Out of memory!
2004-10-08 10:14:44.925797500 Out of memory!

This tells me I'd be seeing the big memory usage pid's if I didn't have
them softlimited.

See the times?  I get out of memory's at approximately the same minute
on every hour.. 14 or 15 minutes past the hour.  I have check and I have
no cron jobs running at at those times accoring to /var/log/cron. 

Oct  8 10:10:00 spamd2 CROND[32441]: (root) CMD (`which mrtg`
/etc/mrtg/mrtg.conf)
Oct  8 10:10:00 spamd2 CROND[32442]: (root) CMD (/usr/lib/sa/sa1 1 1)
Oct  8 10:15:00 spamd2 CROND[32703]: (root) CMD (`which mrtg`
/etc/mrtg/mrtg.conf)
Oct  8 10:20:00 spamd2 CROND[470]: (root) CMD (`which mrtg`
/etc/mrtg/mrtg.conf)

My only guess is that this is bayes expiry.

I am running a big debug right now of some fairly high traffic servers.
I should have some good info on this soon... Just waiting till 11:15 to
see if I can catch it happening!





-- 
Dallas Engelken
NMGI

Reply via email to