Re: [dspam-users] Catchall training

Lars Stavholm Mon, 19 Mar 2007 07:03:27 -0800

Lars Stavholm wrote:
> David Reid wrote:
>> Sorry to have to ask again, but despite trying a lot of variations the
>> situation still isn't clear and isn't improving for the affected users.
>>
>> The situation is that some domains have a catch-all address, ie
>> <anything>@domain maps to a single email address. In this situation the
>> training works on the address that the mail was sent to - which is as
>> expected. My question is whether there is a way to have all training for
>> any domain address used for all domain addresses? Can some form of
>> groups setup be used?
> 
> Take a look in the 3.6.8 README, section 2.1 CONFIGURING GROUPS.


In addition, here's my working setup, thanks to Tony Earnshow:

Postfix -> DSPAM -> Cyrus IMAP

# dspam --version
DSPAM Anti-Spam Suite 3.6.8 (agent/library)
Copyright (c) 2002-2006 Jonathan A. Zdziarski
http://dspam.nuclearelephant.com
DSPAM may be copied only under the terms of the GNU General Public
License, a copy of which can be found with the DSPAM distribution kit.
Configuration parameters: --prefix=/usr --sysconfdir=/etc
--with-dspam-home=/var/lib/dspam --mandir=/usr/share/man --enable-daemon
--enable-debug --enable-clamav --enable-syslog --enable-homedir

# cat /var/lib/dspam/group
users:shared:[EMAIL PROTECTED]

# egrep -v '^#|^$' /etc/dspam.conf
Home /var/lib/dspam
TrustedDeliveryAgent "/usr/lib/cyrus/bin/deliver"
DeliveryHost        127.0.0.1
DeliveryPort        10026
DeliveryIdent       localhost
DeliveryProto       SMTP
OnFail error
Trust root
Trust mail
Trust dspam
Trust wwwrun
TrainingMode teft
TestConditionalTraining on
Feature noise
Feature chained
Feature whitelist
Algorithm graham burton
PValue graham
ImprobabilityDrive on
Preference "spamAction=deliver"
Preference "signatureLocation=headers"  # 'message' or 'headers'
Preference "showFactors=off"
AllowOverride trainingMode
AllowOverride spamAction
AllowOverride spamSubject
AllowOverride statisticalSedation
AllowOverride enableBNR
AllowOverride enableWhitelist
AllowOverride signatureLocation
AllowOverride showFactors
AllowOverride optIn optOut
AllowOverride whitelistThreshold
HashRecMax              98317
HashAutoExtend          on
HashMaxExtents          0
HashExtentSize          49157
HashMaxSeek             100
HashConnectionCache     10
Lookup  "rabl.nuclearelephant.com"
RBLInoculate on
Notifications   off
PurgeSignatures 14
PurgeNeutral    90
PurgeUnused     90
PurgeHapaxes    30
PurgeHits1S     15
PurgeHits1I     15
LocalMX 127.0.0.1
SystemLog on
UserLog   off
TrainPristine on
Opt out
Broken lineStripping
ClamAVPort      3310
ClamAVHost      127.0.0.1
ClamAVResponse  spam
ServerPID              /var/run/dspam.pid
ServerMode auto
ServerParameters        "--deliver=innocent,spam -d %u"
ServerIdent             "mail.domain.tld"
ServerDomainSocketPath  "/var/tmp/dspam.sock"
ClientHost      /var/tmp/dspam.sock
ProcessorBias on

With this setup, however, the webui doesn't work,
except for the global statistics page.

As you can see, we use the hash drive and shared groups,
works like a charm.

For user mail training we use a simple script that collects
misclassified ham/spam on an hourly basis from dedicated
user IMAP folders like so:

#!/bin/bash
# $Id: dspam_learn.sh.in 1971 2007-03-16 22:18:02Z stava $
# @(#) Look for user/$user/spam/{ham,train} and if all those directories
exists,
# @(#) and there's at least one mail message to learn from,
# @(#) perform the training and the subsequent cleanup (remove the mails).

id="`id | cut -d= -f2 | cut -d\( -f1`"
[ "$id" = "0" ] || { echo >&2 "$0: must be root"; exit 1; }

# look here for cyrus imap users...
basedir="/var/spool/imap/user"

# establish working directory...
cd /var/tmp

# loop through all users...
for u in $basedir/*; do
  user="`basename $u`"; ham=; spam=
  # if all user directories (folders) exists, and only then...
  [ -d $u/Spam ] && [ -d $u/Spam/train ] && \
  [ -d $u/Spam/train/ham ] && [ -d $u/Spam/train/spam ] && {
    ls $u/Spam/train/ham/[0-9]*. &> /dev/null && {
      echo -n "ham: "
      for mail in $u/Spam/train/ham/[0-9]*.; do
        echo -n "`basename $mail`"
        sed '/^X-DSPAM-/d' $mail | \
          dspam --user users --class=innocent --deliver=innocent
--source=error
        [ $? = 0 ] && rm $mail
      done
      echo ""
      ham=.
    }
    ls $u/Spam/train/spam/[0-9]*. &> /dev/null && {
      echo -n "spam: "
      for mail in $u/Spam/train/spam/[0-9]*.; do
        echo -n "`basename $mail`"
        sed '/^X-DSPAM-/d' $mail | \
          dspam --user users --class=spam --deliver=spam --source=error
        [ $? = 0 ] && rm $mail
      done
      echo ""
      spam=.
    }
    # tell cyrus that we removed some mail messages...
    [ $ham  ] && su - cyrus -c "reconstruct -r user/$user/Spam/train/ham"
    [ $spam ] && su - cyrus -c "reconstruct -r user/$user/Spam/train/spam"
  }
done
exit 0

This all works beautifully now. After a few days only,
just a few hundred mails, on a low volume site, we get:

 dspam_stats -H
users:
                TP True Positives:            136
                TN True Negatives:            392
                FP False Positives:             5
                FN False Negatives:            33
                SC Spam Corpusfed:              0
                NC Nonspam Corpusfed:           0
                TL Training Left:            2103
                SHR Spam Hit Rate          80.47%
                HSR Ham Strike Rate:        1.26%
                OCA Overall Accuracy:      93.29%

...were the Overall Accuracy is climbing rapidly.

Kudos to Tony who helped me to get thus far.

If of any use, our dspam is packaged as an rpm which
works right-out-of-the-box on a SuSE Linux 10.1 platform:
<http://www.linadd.org/download/mail/dspam-3.6.8-1.i586.rpm>.

Hope this helps
/Lars

Re: [dspam-users] Catchall training

Reply via email to