> > However, among the topics discussed include a large number of > developers claiming that SpamAssassin essentially crashed > their system by using up way too much memory. jm, quinlan, I > suspect you may be interested in these reports. I've linked > them below: >
Ok.. I will chime in with my thoughts. Feel free to flame me or tell me I'm completely wrong, I'm okay with that. I think the real-time auto expiry on bayes is causing some major issues. I've seen scan times get pushed out 60-180 seconds when a spamc child has to do the auto expiry on bayes... And in my case, this causes problems because I run spamc -x -t60 and tempfail on error codes > 0. Like Theo mentioned somewhere, passing something back to the parent and forking a new child for expiration would be a good thing. Also when expiration is happening, I'm seeing 90-99% cpu utilization by that child and it pushes my loads pretty hard for a couple minutes. I think the expiry descision made by BayesStore.pm is too lax... It causes expiries to wait longer and causes _toks to grow pretty large on high traffic systems before expiry ever commences. And then when it does, it has to work very hard to expire. I know we can cron job a sa-learn --force-expire on high traffic sites, but it shouldn't come to this should it? A 'GLOBAL' bayes _toks on systems seeing 10-40k msg/hour grow to 15-20MB before expiry happens. When expiry does occur, it takes about 300 seconds, bringing the system to its knees for several minutes. I had to change some code in BayesStore.pm that removed a lot of the expiry restrictions so I could get it down to expiring every 1 to 2 hours and keeping _toks around 4-6 MB. I even went as far as upping the auto-learn spam and lowering the auto-learn non-spam thresholds even further than defaults to prevent more learning from occuring to reduce the amount of _toks. Still too much learning going on. A 15-20MB global _tok's causes a lot of .lock files to hang around, and a lot of spamd child to sit spinning waiting on the locks to go away. This pushes out scan times, backups up request, and sometimes exits 74 when the timeout is reached. I guess I'm no longer experiencing the high memory utilization that a lot of people report because I softlimit my spamd calls. In 2.6X I was seeing alot of "Out of memory!" (produced by daemontools, no SA) messages in the message log with a softlimit of 75mb. In 3.0.0, I'm still seeing them. [EMAIL PROTECTED] spamd]# grep 'Out of' spamdlog 2004-10-08 02:14:25.356978500 Out of memory! 2004-10-08 02:14:36.197282500 Out of memory! 2004-10-08 02:14:43.524008500 Out of memory! 2004-10-08 02:14:51.968360500 Out of memory! 2004-10-08 02:15:36.171130500 Out of memory! 2004-10-08 03:14:25.567605500 Out of memory! 2004-10-08 04:14:17.167221500 Out of memory! 2004-10-08 04:14:35.019993500 Out of memory! 2004-10-08 04:14:55.984522500 Out of memory! 2004-10-08 04:15:03.615092500 Out of memory! 2004-10-08 04:28:14.128027500 Out of memory! 2004-10-08 05:14:17.391842500 Out of memory! 2004-10-08 05:14:23.865585500 Out of memory! 2004-10-08 05:14:44.957526500 Out of memory! 2004-10-08 06:15:14.764137500 Out of memory! 2004-10-08 07:15:07.621800500 Out of memory! 2004-10-08 08:14:21.782449500 Out of memory! 2004-10-08 08:15:11.661760500 Out of memory! 2004-10-08 09:14:31.732801500 Out of memory! 2004-10-08 09:14:47.962657500 Out of memory! 2004-10-08 09:15:07.340121500 Out of memory! 2004-10-08 09:15:20.094086500 Out of memory! 2004-10-08 10:14:25.233403500 Out of memory! 2004-10-08 10:14:30.909341500 Out of memory! 2004-10-08 10:14:44.925797500 Out of memory! This tells me I'd be seeing the big memory usage pid's if I didn't have them softlimited. See the times? I get out of memory's at approximately the same minute on every hour.. 14 or 15 minutes past the hour. I have check and I have no cron jobs running at at those times accoring to /var/log/cron. Oct 8 10:10:00 spamd2 CROND[32441]: (root) CMD (`which mrtg` /etc/mrtg/mrtg.conf) Oct 8 10:10:00 spamd2 CROND[32442]: (root) CMD (/usr/lib/sa/sa1 1 1) Oct 8 10:15:00 spamd2 CROND[32703]: (root) CMD (`which mrtg` /etc/mrtg/mrtg.conf) Oct 8 10:20:00 spamd2 CROND[470]: (root) CMD (`which mrtg` /etc/mrtg/mrtg.conf) My only guess is that this is bayes expiry. I am running a big debug right now of some fairly high traffic servers. I should have some good info on this soon... Just waiting till 11:15 to see if I can catch it happening! -- Dallas Engelken NMGI
