I have installed dspam in this site, and it has been running quite
satisfactorily a few months now.  The installation veers off a bit
from the instructions, and this was driven by some particulars of the
site.  Actually, the more I look at it, the more it looks like a Rube
Goldberg contraption; yet, it works.

In what follows, I describe the site and how dspam was installed,
along with the rationales.  The description is high level; I refrained
from given the details, in order to save you from terminal boredom.
All other info, scripts and so on are available to whoever is
interested (private or list).

My purpose in writing here is twofold:

a) Feedback.  Am I missing something or making things inordinately
   complicated?  I would really appreciate comments and suggestions.

b) Maybe it is useful.  I don't think my site is so special as far as
   the users and mail system go.  Perhaps some of the ideas here can
   be applied elsewhere.



Site particulars:

* Users have real unix accounts, not just mailboxes.

* Large number of users are not computer savvy and are set on their ways, with
  no time or interest in changing them.

* MTA is qmail.

* Several MUAs are in use.  Some are IMAP based (this includes a webmail
  interface), with the inbox mapped to a maildir directory in the user's home.
  Others use mailboxes, which may be mbox or maildir format.  Local tradition
  had most users keep their mboxes in ~/folders and their IMAP tree inside
  ~/Maildir.
  
* Previous spam filter was spamassassin.  A script provided an "installation"
  of SA in the user account: edited .qmail to filter messages through
  procmail, and installed a basic .procmailrc.  That one filtered messages
  through SA, and dropped messages that were classified as spam into a spam
  folder compatible with the user's MUA.


Requirements for the dspam installation:

* Respect the variety of MUAs in use.

* Have a trivial upgrade path from SA to dspam for users that had adopted our
  procmail configuration.

* Have a simple upgrade path from SA to dspam for users that manage their
  mail filtering configuration knowledgeably.

* Make training easy and error proof.


The solution:

a) dspam 
   I used the mysql storage, dspam daemon, a single merged group whose global
   user was corpus trained, and TUM as default training mode.  Dspam was not
   set to be a delivery agent, and was expected to be used as a filter only.

b) upgrade of users' configurations
   The standard procmail use was to filter each message through
   /usr/local/bin/spamc and check the resulting message for an 
   X-Spam-Flag: YES in the header.  So, I had /usr/local/bin/dspamc became a
   (perl) wrapper that:
   a) called dspam as a filter
   b) checked for an X-DSPAM-Result: Spam header, and included the
      corresponding X-Spam-Flag header.
      The basic upgrade was just a matter of inserting a "d" in the spamc
      call. 
   c) some instruction on what to look for was given to those that like to
      tinker with their procmail config.


c) training 
   Forwarding mail to a spam and a notspam address is boring and prone
   to error.  On most MUAs one either has to type the address (or an
   alias), or point and click on an addressbook.  On the other hand,
   saving a message to a fixed folder is a uniformly simple process,
   two clicks away or whatever.  So, I defined the following protocol:
   0) Users were told about TOE.
   1) They save their misclassified messages into a single folder.  The name
      follows a well-defined pattern, allowing for some user choice, and
      folder is whatever the MUA think is a folder.  Actually, if a user uses
      more than one MUA, she can use different training folders.  The main
      point is that the user does not separate false positives from false
      negatives, they are all errors.
   2) Periodically, a script crawls the home directories, looking for such
      folders in mbox and maildir formats in some standard directories.  They
      are processed for training on behalf of the user, and deleted from that
      folder.

d) stats
   I wasn't too sanguine about installing a web server on the mail server
   host, so the web interface was eschewed.  A mail address was set up so that
   by sending a message to that, a user gets an answer with the respective
   output of dspam_stats -H.



That's it, I believe.

Cheers,

am

-- 
Arnaldo Mandel                        
Departamento de Ciência da Computação - Computer Science Department
Universidade de São Paulo, Bra[sz]il      
[EMAIL PROTECTED]
Talvez você seja um Bright http://the-brights.net Maybe you are a Bright.

Reply via email to