I have installed dspam in this site, and it has been running quite
satisfactorily a few months now. The installation veers off a bit
from the instructions, and this was driven by some particulars of the
site. Actually, the more I look at it, the more it looks like a Rube
Goldberg contraption; yet, it works.
In what follows, I describe the site and how dspam was installed,
along with the rationales. The description is high level; I refrained
from given the details, in order to save you from terminal boredom.
All other info, scripts and so on are available to whoever is
interested (private or list).
My purpose in writing here is twofold:
a) Feedback. Am I missing something or making things inordinately
complicated? I would really appreciate comments and suggestions.
b) Maybe it is useful. I don't think my site is so special as far as
the users and mail system go. Perhaps some of the ideas here can
be applied elsewhere.
Site particulars:
* Users have real unix accounts, not just mailboxes.
* Large number of users are not computer savvy and are set on their ways, with
no time or interest in changing them.
* MTA is qmail.
* Several MUAs are in use. Some are IMAP based (this includes a webmail
interface), with the inbox mapped to a maildir directory in the user's home.
Others use mailboxes, which may be mbox or maildir format. Local tradition
had most users keep their mboxes in ~/folders and their IMAP tree inside
~/Maildir.
* Previous spam filter was spamassassin. A script provided an "installation"
of SA in the user account: edited .qmail to filter messages through
procmail, and installed a basic .procmailrc. That one filtered messages
through SA, and dropped messages that were classified as spam into a spam
folder compatible with the user's MUA.
Requirements for the dspam installation:
* Respect the variety of MUAs in use.
* Have a trivial upgrade path from SA to dspam for users that had adopted our
procmail configuration.
* Have a simple upgrade path from SA to dspam for users that manage their
mail filtering configuration knowledgeably.
* Make training easy and error proof.
The solution:
a) dspam
I used the mysql storage, dspam daemon, a single merged group whose global
user was corpus trained, and TUM as default training mode. Dspam was not
set to be a delivery agent, and was expected to be used as a filter only.
b) upgrade of users' configurations
The standard procmail use was to filter each message through
/usr/local/bin/spamc and check the resulting message for an
X-Spam-Flag: YES in the header. So, I had /usr/local/bin/dspamc became a
(perl) wrapper that:
a) called dspam as a filter
b) checked for an X-DSPAM-Result: Spam header, and included the
corresponding X-Spam-Flag header.
The basic upgrade was just a matter of inserting a "d" in the spamc
call.
c) some instruction on what to look for was given to those that like to
tinker with their procmail config.
c) training
Forwarding mail to a spam and a notspam address is boring and prone
to error. On most MUAs one either has to type the address (or an
alias), or point and click on an addressbook. On the other hand,
saving a message to a fixed folder is a uniformly simple process,
two clicks away or whatever. So, I defined the following protocol:
0) Users were told about TOE.
1) They save their misclassified messages into a single folder. The name
follows a well-defined pattern, allowing for some user choice, and
folder is whatever the MUA think is a folder. Actually, if a user uses
more than one MUA, she can use different training folders. The main
point is that the user does not separate false positives from false
negatives, they are all errors.
2) Periodically, a script crawls the home directories, looking for such
folders in mbox and maildir formats in some standard directories. They
are processed for training on behalf of the user, and deleted from that
folder.
d) stats
I wasn't too sanguine about installing a web server on the mail server
host, so the web interface was eschewed. A mail address was set up so that
by sending a message to that, a user gets an answer with the respective
output of dspam_stats -H.
That's it, I believe.
Cheers,
am
--
Arnaldo Mandel
Departamento de Ciência da Computação - Computer Science Department
Universidade de São Paulo, Bra[sz]il
[EMAIL PROTECTED]
Talvez você seja um Bright http://the-brights.net Maybe you are a Bright.