Tony Earnshaw wrote: > Erland Nylend wrote, on 20. mar 2007 14:41: > > [...] > >> I'm using one shared group, with the "global" user as parent, like >> this: global:shared:* >> >> I've set the learning mode in dspam.conf to toe, and these are the >> preferences for the users: >> >> | mysql> select * from dspam_preferences; >> | +-----+--------------+-------+ >> | | uid | preference | value | >> | +-----+--------------+-------+ >> | | 11 | optin | on | | | 11 | trainingMode | toe | | | >> 13 | optin | on | | +-----+--------------+-------+ >> | 3 rows in set (0.00 sec) >> >> (uid 11 is the global user, and the other one is the one I'm sending >> ham/spam to) >> >> I've done some initial training of the global user, and dspam seems >> to work as expected when sending mail to myself (uid 13). My problem >> is that when dspam misses spam, and I want to notify dspam about the >> errors, it does not work. >> >> This is the command I am using: >> ~# dspam --user global --class=spam --source=error >> --signature=11,45ffaeec41871548770753 >> >> I see no change in dspam_stats, and I cannot see any improvement in >> how dspam filters the spam messages, either. >> Anyone on the list who could offer some tips? > > [...] > > Well, I might be able to. > > 1: Please revise the recent thread on this list between Lars Stavholm > ([EMAIL PROTECTED]) and myself, which dealt with exactly the same thing. > > 2: Basically, if you use a *shared* group (which all my sites do), you > can't initiate any other user than the user of the shared group itself, > in your case user "global". So you can't expect any other user to be > existent in the group, you can't have any other user in your setup than > uid 11. If you do, it simply won't work, as you've found out for > yourself. You can't submit data in the name of a user that doesn't exist. > > You send mail to uid 13, fair enough, the user receives the mail - this > is a factor of your MTA and (if you have one) your MLA. But dspam will > not retrain under any other uid than that of the shared group. > > If you can peruse the exchange between Lars and me on the same subject, > then that would be the best. Otherwise, we can take it from step 1: again.
I posted my complete setup under another thread, but for your convenience, here it is again: Postfix -> DSPAM -> Cyrus IMAP # dspam --version DSPAM Anti-Spam Suite 3.6.8 (agent/library) Copyright (c) 2002-2006 Jonathan A. Zdziarski http://dspam.nuclearelephant.com DSPAM may be copied only under the terms of the GNU General Public License, a copy of which can be found with the DSPAM distribution kit. Configuration parameters: --prefix=/usr --sysconfdir=/etc --with-dspam-home=/var/lib/dspam --mandir=/usr/share/man --enable-daemon --enable-debug --enable-clamav --enable-syslog --enable-homedir # cat /var/lib/dspam/group users:shared:[EMAIL PROTECTED] # egrep -v '^#|^$' /etc/dspam.conf Home /var/lib/dspam TrustedDeliveryAgent "/usr/lib/cyrus/bin/deliver" DeliveryHost 127.0.0.1 DeliveryPort 10026 DeliveryIdent localhost DeliveryProto SMTP OnFail error Trust root Trust mail Trust dspam Trust wwwrun TrainingMode teft TestConditionalTraining on Feature noise Feature chained Feature whitelist Algorithm graham burton PValue graham ImprobabilityDrive on Preference "spamAction=deliver" Preference "signatureLocation=headers" # 'message' or 'headers' Preference "showFactors=off" AllowOverride trainingMode AllowOverride spamAction AllowOverride spamSubject AllowOverride statisticalSedation AllowOverride enableBNR AllowOverride enableWhitelist AllowOverride signatureLocation AllowOverride showFactors AllowOverride optIn optOut AllowOverride whitelistThreshold HashRecMax 98317 HashAutoExtend on HashMaxExtents 0 HashExtentSize 49157 HashMaxSeek 100 HashConnectionCache 10 Lookup "rabl.nuclearelephant.com" RBLInoculate on Notifications off PurgeSignatures 14 PurgeNeutral 90 PurgeUnused 90 PurgeHapaxes 30 PurgeHits1S 15 PurgeHits1I 15 LocalMX 127.0.0.1 SystemLog on UserLog off TrainPristine on Opt out Broken lineStripping ClamAVPort 3310 ClamAVHost 127.0.0.1 ClamAVResponse spam ServerPID /var/run/dspam.pid ServerMode auto ServerParameters "--deliver=innocent,spam -d %u" ServerIdent "mail.domain.tld" ServerDomainSocketPath "/var/tmp/dspam.sock" ClientHost /var/tmp/dspam.sock ProcessorBias on With this setup, however, the webui doesn't work, except for the global statistics page. As you can see, we use the hash drive and shared groups, works like a charm. For user mail training we use a simple script that collects misclassified ham/spam on an hourly basis from dedicated user IMAP folders like so: #!/bin/bash # $Id: dspam_learn.sh.in 1971 2007-03-16 22:18:02Z stava $ # @(#) Look for user/$user/spam/{ham,train} and if all those directories exists, # @(#) and there's at least one mail message to learn from, # @(#) perform the training and the subsequent cleanup (remove the mails). id="`id | cut -d= -f2 | cut -d\( -f1`" [ "$id" = "0" ] || { echo >&2 "$0: must be root"; exit 1; } # look here for cyrus imap users... basedir="/var/spool/imap/user" # establish working directory... cd /var/tmp # loop through all users... for u in $basedir/*; do user="`basename $u`"; ham=; spam= # if all user directories (folders) exists, and only then... [ -d $u/Spam ] && [ -d $u/Spam/train ] && \ [ -d $u/Spam/train/ham ] && [ -d $u/Spam/train/spam ] && { ls $u/Spam/train/ham/[0-9]*. &> /dev/null && { echo -n "ham: " for mail in $u/Spam/train/ham/[0-9]*.; do echo -n "`basename $mail`" sed '/^X-DSPAM-/d' $mail | \ dspam --user users --class=innocent --deliver=innocent --source=error [ $? = 0 ] && rm $mail done echo "" ham=. } ls $u/Spam/train/spam/[0-9]*. &> /dev/null && { echo -n "spam: " for mail in $u/Spam/train/spam/[0-9]*.; do echo -n "`basename $mail`" sed '/^X-DSPAM-/d' $mail | \ dspam --user users --class=spam --deliver=spam --source=error [ $? = 0 ] && rm $mail done echo "" spam=. } # tell cyrus that we removed some mail messages... [ $ham ] && su - cyrus -c "reconstruct -r user/$user/Spam/train/ham" [ $spam ] && su - cyrus -c "reconstruct -r user/$user/Spam/train/spam" } done exit 0 This all works beautifully now. After a few days only, just a few hundred mails, on a low volume site, we get: # dspam_stats -H users: TP True Positives: 136 TN True Negatives: 392 FP False Positives: 5 FN False Negatives: 33 SC Spam Corpusfed: 0 NC Nonspam Corpusfed: 0 TL Training Left: 2103 SHR Spam Hit Rate 80.47% HSR Ham Strike Rate: 1.26% OCA Overall Accuracy: 93.29% ...were the Overall Accuracy is climbing rapidly. Kudos to Tony who helped me to get thus far. If of any use, our dspam is packaged as an rpm which works right-out-of-the-box on a SuSE Linux 10.1 platform: <http://www.linadd.org/download/mail/dspam-3.6.8-1.i586.rpm>. Hope this helps /Lars
