On Tue, May 21, 2013 at 12:55:13PM +0100, Ramana Kumar wrote: > What is your email setup specifically? Like, what software do you use?
Postfix for SMTP, SpamAssassin, Pyzor, Razor, DSPAM, procmail, mutt, and some custom code that looks up the AS number, GeoIP, and RBL memberships of incoming IP addresses. I use Postfix, procmail, and mutt more or less unmodified. By themselves, they'll easily handle hundreds of thousands of emails per day on a modern server machine, so they'll serve the needs of a few individual users many times over. Spam filtering is a whole different matter. If you're not intentionally running a spam trap and you get less than 1000 spam per day, SpamAssassin and DSPAM on an entry-level dedicated server is fine, as long as you pay attention to what happens under worst-case load conditions and plan/configure for them (e.g. if your VPS has 1024MB of RAM it can reliably filter no more than two or three emails simultaneously, so the MTA default limit of 100 local delivery processes at once is probably too high). My custom AS/GeoIP/RBL lookup code isn't really important. Only the RBL lookups are useful for spam filtering, and SpamAssassin already does those. The GeoIP and AS number lookups had to be debugged and working so I could get the data that shows they're not as relevant as RBL data, but I've had no reason to take them out since (and it's kind of nice to have a header in every message that says what ISP and what part of the world it came from). I also publish some graphs on how often the RBL memberships match the final message's spam status, and those graphs need data different from what SpamAssassin provides. I run SpamAssassin configured to run only its static rules engine and network checks, and annotate messages with lists of matching rules in the headers; however, it is DSPAM that ultimately decides whether a message is spam or not. SpamAssassin's false positive rate is far too high for me (well over 0.1%, which would make lost mail a weekly event), its algorithm for combining rule results (adding scores) makes no mathematical sense, and its machine learning storage backends are too slow to keep up with my mail load; however, SA's static rules can identify pattern-based message features like "fake Outlook headers" or "looks like Nigerian 419 scam" and the network checks can add information like "listed in Pyzor" or "contains blacklisted URI" which help DSPAM make better filtering decisions for specific flavors of spam. DSPAM is a Bayes classifier, so it automatically learns what my non-spam email looks like (and which SA rules and RBL listings are reliable indicators of non-spam or spam and which are noise). The price for configuration simplicity is that I have to provide timely feedback to the filter every single time spam gets through (or non-spam gets trapped), or DSPAM learns the wrong things. I have macros in mutt to retrain the filter on misclassified messages, so in practice this is a single keypress per message. Any Bayes classifier will work as long as it looks at mail headers (beware, some do not!). What I like in particular about DSPAM is the API for remote storage, so I can split filtering between one machine that has access to the readable message text and another that has hashed token statistics. A well-trained spam filter can end up knowing who your friends are, where you shop, where you live, and what work you do, so the token stats DB has some privacy implications. DSPAM's architecture helps since the DSPAM data store is not as easy to read as a folder full of plaintext email messages (though it's still possible to extract some private information with a dictionary attack). Presumably you are not visiting your VPS mail server in its data center, so if you have a significantly more capable machine to run your MUA you can do the spam filtering on that, and let your VPS server handle only unsophisticated MTA and storage tasks. Having used both, if I had to build my mail server again, I'd use Exim instead of Postfix. Postfix has a lot of configuration restrictions designed to protect unsophisticated administrators from their own ignorance, like not being able to use back-references in regexp rules, or having to jump through obscure hoops to match multiple SMTP envelope fields in a single message (e.g. you want to redirect a message when client IP = x AND recipient address = y). In practice I want to do these things fairly often. Exim can easily express these kinds of rules in its configuration language, but Postfix requires external mail filter daemons, internal code changes, or non-default build options. > And does anyone on list have any experience with migrating out of Gmail to > a self-hosted setup for email? If it's a personal Gmail account you can set it up to do SMTP forwarding (Google pushes to your server) or enable IMAP access (your server pulls from Google, e.g. with fetchmail) on the Gmail side. I use IMAP because IMAP lets me scrape my Gmail Spam folder (Google false-positives almost as badly as SpamAssassin, and the SMTP forwarding will not forward anything it believes is spam). This lets you transition gracefully to a new email address. In the future, when the only people sending mail to your Gmail address are spammers, you can turn the Gmail account off. > I have a VPS, and I'm using exim to just forward messages sent to my > domain to my Gmail account, but I would really like to be just keeping it > all on my server and dropping the Gmail. Pick a MUA (or set up an IMAP server) and a spam filter, set up local delivery, and you're mostly done already. > If anyone else is in a similar situation, maybe we can work it out > together. > > Cheers, > Ramana
signature.asc
Description: Digital signature
