Re: Question about spamassassin using MySQL

Jason Stephenson Mon, 25 Apr 2005 20:59:22 -0700

Benjamin Scott wrote:

On Apr 25 at 3:13pm, Bruce Dawson wrote:
Steven: Thanks for the clarification. I was under the impression that the milter is called only after the message had been received.
Obviously, in order to do content analysis or other magic on a message, you have to receive the content. As I understand it, what these tools do is allow the SMTP "DATA" verb to be sent, and to receive some or all of the data from the sender. Then, before the SMTP result code 250 ("Message accepted for delivery") code is sent, the filter runs and makes a decision. If the message fails, an SMTP error status code is sent instead.

Yes, that is pretty much how spamass-milter and exim with exiscan-acls works.

This is fine as long as your mail volume is reasonably low. As mail volumes increase, however, it becomes impractical to do this all in "real time" on your MX.

We had serious problems at my day job using spamass-milter. Dunno if the problem was with our version of sendmail being buggy or what. (There are known stability problems with spamass-milter and certain versions of sendmail.) Sendmail would lock up, spamassassin would die, and occasionally the swap manager would start thrashing. Sometimes a shutdown -r was the quickest way to fix the mess. This tended to happen when either we were at our busiest part of the day (between 9:00 and 10:00 a.m.) or when processing a message with a large attachment, large being a variable value depending on circumstances.

Switching to exim and adding RAM to the system really helped. The computer now has 1 GB of RAM (instead of 512 MB), and processes 500MB+ of mail per week, over 30,000 messages. I'd say that's a medium-sized installation.

(I should add that I switched to exim only after adding RAM and still having problems with spamass-milter. I'm not saying that it will do this in all installations, but it certainly did so in ours.)

So, if anyone is interested in exim ACLs to use spamassassin, as well as clamav, drop me a line. I've got a set up that works well for 600+ users.

Before I send this message, I'm going to pop back in here and add something that I think is appropriate.

I've used spamassassin and exim in 3 environments, now. With several different set-ups. What I've found is that spamassassin really shines when each user on the system has their own bayes db, preferences, and auto-whitelist. This is almost always the case in the procmail environment. Spamassassin was, I believe, designed to be used this way, and it is very, very accurate when processing a single user's mail.

When spamassassin is used on a system level, as is more often the case when it is run from an ACL or milter, it maintains 1 spam db, 1 set of preferences, and 1 AWL for all the users on the system. This is because spamd normally runs as some user (nobody in my case) on the mail server, and the MTA communicates with spamassassin via spamd.

In the case of my current day job, I have to run it this way, because the customers of my MTA (600+ librarians, a few of whom hate computers as an distraction from their "real job") don't have the ability to maintain their own bayes db, preferences and white list.--If it's not as simple as point and click, they won't do it, and why should they? It takes time to process all that spam every day.

Don't forget, too, that you should run the false positives back through the system so that spamassassin can adjust itself to be more accurate.

That's why I have mixed feelings about sites that do unilateral blocking based on blacklists. Many of these systems find 75% of their mail volume is bogus (spam, worms, phishing, and backscatter). They get faced with the proposition of lowering the load on their systems by 90% at the cost of 5% of their legitimate mail. If you're an ISP trying to get by on paper thin margins, that might be considered "acceptable losses". Of course, that's cold comfort to those who (like me) *are* the acceptable losses. :(
    Spam sucks.


Yep, it does.

With the default spamassassin set up, the RBLs only add to the score of a message, so it's not as bad as rejecting outright. I know many sites also configure to reject any connections from IPs on certain RBLS.

I don't use any third-party RBLs for out right blocking, but allow spamassassin to adjust a message's score. I do, however, maintain my own list of "known bad actor" IPs that I refuse connections from. Keeping track of this list is a bit tedious, but a couple of shell scripts help. Generally, you have to send us spam that makes it through the filter to end up on this list.

I know it isn't perfect because of collateral damage, and blocking by IP is practically pointless since IPs in dynamic blocks do actually change. I've considered removing the list for a week to see how it affects the system, and possibly removing it completely if the amount of spam getting through to my end users unfiltered doesn't appreciably change.

At home (sigio.com), I don't bother keeping a list.

Additionally, I highly recommend disconnecting the SMTP connection of any computer that uses your mail server's IP or host name in its HELO/EHLO that isn't on your allowed relay list. That alone cuts off a number of the spambots before they even get to say MAIL FROM.

It's also a good idea to enable whatever option makes your MTA somewhat pedantic about the SMTP protocol. This cuts off anyone who starts spewing data before your server can respond to their HELO. So far, the only "valid" MTA I've seen that consistently does this is moveon.org's web mail or mailing list sender. Other than that, it's been all spammers dropped by being pedantic.--I've tried mailing moveon.org to tell them that they need to fix their MTA, but they haven't responded. Just like a couple sites that use exchange and announce their host name with an _ in it.

Battling spam can be time consuming, but I feel like I've made some progress. So far, I've not liked many of the proposals for alternates to SMTP, or for the extensions like SPF or Domain Keys that I've read about. They all have draw backs and SPF and Domain Keys could very easily be used by spammers to "legitimize" their spam and then you're right back to having to block by IP address.

I've also been following the IM2000 mailing list discussions and I'm not sure that anything that I've heard about on there is the FUSSP. Not that I have one, myself. I thought I did, but I must have lost it. ;)

So anyway, just some pseudo-random thoughts on fighting spam from someone who has a bit of experience with spamassassin.

Cheers,
Jason

P.S.

That's it! I'm sending this message. The more I edit it, the more I think of to say. Well, I need to get to bed. I have work in the morning.

J.
_______________________________________________
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss

Re: Question about spamassassin using MySQL

Reply via email to