Re: [dspam-users] Catchall training

David Reid Mon, 19 Mar 2007 07:17:09 -0800

Lars Stavholm wrote:
> Lars Stavholm wrote:
>> David Reid wrote:
>>> Sorry to have to ask again, but despite trying a lot of variations the
>>> situation still isn't clear and isn't improving for the affected users.
>>>
>>> The situation is that some domains have a catch-all address, ie
>>> <anything>@domain maps to a single email address. In this situation the
>>> training works on the address that the mail was sent to - which is as
>>> expected. My question is whether there is a way to have all training for
>>> any domain address used for all domain addresses? Can some form of
>>> groups setup be used?
>> Take a look in the 3.6.8 README, section 2.1 CONFIGURING GROUPS.
> 
> In addition, here's my working setup, thanks to Tony Earnshow:
> 
> Postfix -> DSPAM -> Cyrus IMAP
> 
> # dspam --version
> DSPAM Anti-Spam Suite 3.6.8 (agent/library)
> Copyright (c) 2002-2006 Jonathan A. Zdziarski
> http://dspam.nuclearelephant.com
> DSPAM may be copied only under the terms of the GNU General Public
> License, a copy of which can be found with the DSPAM distribution kit.
> Configuration parameters: --prefix=/usr --sysconfdir=/etc
> --with-dspam-home=/var/lib/dspam --mandir=/usr/share/man --enable-daemon
> --enable-debug --enable-clamav --enable-syslog --enable-homedir
> 
> # cat /var/lib/dspam/group
> users:shared:[EMAIL PROTECTED]
> 
> # egrep -v '^#|^$' /etc/dspam.conf
> Home /var/lib/dspam
> TrustedDeliveryAgent "/usr/lib/cyrus/bin/deliver"
> DeliveryHost        127.0.0.1
> DeliveryPort        10026
> DeliveryIdent       localhost
> DeliveryProto       SMTP
> OnFail error
> Trust root
> Trust mail
> Trust dspam
> Trust wwwrun
> TrainingMode teft
> TestConditionalTraining on
> Feature noise
> Feature chained
> Feature whitelist
> Algorithm graham burton
> PValue graham
> ImprobabilityDrive on
> Preference "spamAction=deliver"
> Preference "signatureLocation=headers"  # 'message' or 'headers'
> Preference "showFactors=off"
> AllowOverride trainingMode
> AllowOverride spamAction
> AllowOverride spamSubject
> AllowOverride statisticalSedation
> AllowOverride enableBNR
> AllowOverride enableWhitelist
> AllowOverride signatureLocation
> AllowOverride showFactors
> AllowOverride optIn optOut
> AllowOverride whitelistThreshold
> HashRecMax              98317
> HashAutoExtend          on
> HashMaxExtents          0
> HashExtentSize          49157
> HashMaxSeek             100
> HashConnectionCache     10
> Lookup  "rabl.nuclearelephant.com"
> RBLInoculate on
> Notifications   off
> PurgeSignatures 14
> PurgeNeutral    90
> PurgeUnused     90
> PurgeHapaxes    30
> PurgeHits1S     15
> PurgeHits1I     15
> LocalMX 127.0.0.1
> SystemLog on
> UserLog   off
> TrainPristine on
> Opt out
> Broken lineStripping
> ClamAVPort      3310
> ClamAVHost      127.0.0.1
> ClamAVResponse  spam
> ServerPID              /var/run/dspam.pid
> ServerMode auto
> ServerParameters        "--deliver=innocent,spam -d %u"
> ServerIdent             "mail.domain.tld"
> ServerDomainSocketPath  "/var/tmp/dspam.sock"
> ClientHost      /var/tmp/dspam.sock
> ProcessorBias on
> 
> With this setup, however, the webui doesn't work,
> except for the global statistics page.
> 
> As you can see, we use the hash drive and shared groups,
> works like a charm.
> 
> For user mail training we use a simple script that collects
> misclassified ham/spam on an hourly basis from dedicated
> user IMAP folders like so:
> 
> #!/bin/bash
> # $Id: dspam_learn.sh.in 1971 2007-03-16 22:18:02Z stava $
> # @(#) Look for user/$user/spam/{ham,train} and if all those directories
> exists,
> # @(#) and there's at least one mail message to learn from,
> # @(#) perform the training and the subsequent cleanup (remove the mails).
> 
> id="`id | cut -d= -f2 | cut -d\( -f1`"
> [ "$id" = "0" ] || { echo >&2 "$0: must be root"; exit 1; }
> 
> # look here for cyrus imap users...
> basedir="/var/spool/imap/user"
> 
> # establish working directory...
> cd /var/tmp
> 
> # loop through all users...
> for u in $basedir/*; do
>   user="`basename $u`"; ham=; spam=
>   # if all user directories (folders) exists, and only then...
>   [ -d $u/Spam ] && [ -d $u/Spam/train ] && \
>   [ -d $u/Spam/train/ham ] && [ -d $u/Spam/train/spam ] && {
>     ls $u/Spam/train/ham/[0-9]*. &> /dev/null && {
>       echo -n "ham: "
>       for mail in $u/Spam/train/ham/[0-9]*.; do
>         echo -n "`basename $mail`"
>         sed '/^X-DSPAM-/d' $mail | \
>           dspam --user users --class=innocent --deliver=innocent
> --source=error
>         [ $? = 0 ] && rm $mail
>       done
>       echo ""
>       ham=.
>     }
>     ls $u/Spam/train/spam/[0-9]*. &> /dev/null && {
>       echo -n "spam: "
>       for mail in $u/Spam/train/spam/[0-9]*.; do
>         echo -n "`basename $mail`"
>         sed '/^X-DSPAM-/d' $mail | \
>           dspam --user users --class=spam --deliver=spam --source=error
>         [ $? = 0 ] && rm $mail
>       done
>       echo ""
>       spam=.
>     }
>     # tell cyrus that we removed some mail messages...
>     [ $ham  ] && su - cyrus -c "reconstruct -r user/$user/Spam/train/ham"
>     [ $spam ] && su - cyrus -c "reconstruct -r user/$user/Spam/train/spam"
>   }
> done
> exit 0
> 
> This all works beautifully now. After a few days only,
> just a few hundred mails, on a low volume site, we get:
> 
>  dspam_stats -H
> users:
>                 TP True Positives:            136
>                 TN True Negatives:            392
>                 FP False Positives:             5
>                 FN False Negatives:            33
>                 SC Spam Corpusfed:              0
>                 NC Nonspam Corpusfed:           0
>                 TL Training Left:            2103
>                 SHR Spam Hit Rate          80.47%
>                 HSR Ham Strike Rate:        1.26%
>                 OCA Overall Accuracy:      93.29%
> 
> ...were the Overall Accuracy is climbing rapidly.
> 
> Kudos to Tony who helped me to get thus far.


Many thanks! I'll try the shared group for the domain in question :-)


> 
> If of any use, our dspam is packaged as an rpm which
> works right-out-of-the-box on a SuSE Linux 10.1 platform:
> <http://www.linadd.org/download/mail/dspam-3.6.8-1.i586.rpm>.
> 
> Hope this helps
> /Lars
> 
> !DSPAM:16,45fea6311814931510095!
> 
>

Re: [dspam-users] Catchall training

Reply via email to