On Apr 20, 2012, at 9:39 AM, Stevan Bajić wrote: > On 20.04.2012 07:32, Steve Fatula wrote: >> [...] > >> If you give me the SPAM corpus, I can just run dspam_train on it (and I'd >> even add my 80). But it will be pretty unbalanced since I have few HAM >> messages since I only keep a month (maybe a few thousand messages). I am not >> sure that matters much? In the end, won't the detection still work, maybe >> biased towards SPAM at first, but, surely, it woudln't take too long to stop >> false positives? >> > Lets say you want to make that merged global group. Then this is what you > should do: > > 1) Create a new DSPAM user. If you can create a flat user (no > localp...@domain.tld) because a flat user name will be easier to recognize on > your setup where you usually have full blown up email addresses as user name. > Lets say that new created user is called "SpamHitRate". > > 2) Change preferences for that user to: > dspam_admin change preference "SpamHitRate" "dailyQuarantineSummary" "off" > dspam_admin change preference "SpamHitRate" "enableBNR" "on" > dspam_admin change preference "SpamHitRate" "enableWhitelist" "off" > dspam_admin change preference "SpamHitRate" "fallbackDomain" "off" > dspam_admin change preference "SpamHitRate" "ignoreGroups" "on" > dspam_admin change preference "SpamHitRate" "ignoreRBLLookups" "on" > dspam_admin change preference "SpamHitRate" "makeCorpus" "off" > dspam_admin change preference "SpamHitRate" "optIn" "on" > dspam_admin change preference "SpamHitRate" "optOut" "off" > dspam_admin change preference "SpamHitRate" "optOutClamAV" "on" > dspam_admin change preference "SpamHitRate" "processorBias" "off" > dspam_admin change preference "SpamHitRate" "showFactors" "off" > dspam_admin change preference "SpamHitRate" "signatureLocation" "headers" > dspam_admin change preference "SpamHitRate" "spamAction" "deliver" > dspam_admin change preference "SpamHitRate" "spamSubject" "" > dspam_admin change preference "SpamHitRate" "statisticalSedation" "0" > dspam_admin change preference "SpamHitRate" "storeFragments" "off" > dspam_admin change preference "SpamHitRate" "tagNonspam" "off" > dspam_admin change preference "SpamHitRate" "tagSpam" "off" > dspam_admin change preference "SpamHitRate" "trainingMode" "TOE" > dspam_admin change preference "SpamHitRate" "trainPristine" "off" > dspam_admin change preference "SpamHitRate" "whitelistThreshold" "9999999" > > Basically you want that user to not use ClamAV, nor any groups, nor any RBL, > nor do you want whitelisting or any other mambo jambo. Usually you would not > turn off that many helper mechanism on a normal user but this is not a normal > user. You want that user to be as hard as possible. You don't care about > false positive or false negative on that user. In fact this is exactly what > you want. You want that user to generate as much false positive / negative as > needed. Because the more FP/FN you have the more you can make DSPAM to learn. > And this is what you are going to do mainly with that user. You are going to > use dspam_train with Spam/Ham corpi. > > 3) Now go on and train with dspam_train: dspam_train SpamHitRate [spam_corpus > maildir or mbox] [nonspam_corpus maildir or mbox] > > 4) After you are finished with dspam_train you should go on and run > dspam_clean: dspam_clean -s0 -p0 -u0,0,0,0 SpamHitRate > > 5) Now you enable the merged global group by editing the DSPAM group file and > there you add: > SpamHitRate:merged:* > > 6) You are using MySQL right? Now it is time to delete all users tokens > except for SpamHitRate. To do that you just execute this (assuming the uid of > SpamHitRate is 1000): > > delete from dspam_signature_data where uid!=1000; > delete from dspam_stats where uid!=1000; > delete from dspam_token_data where uid!=1000; > > analyze table dspam_signature_data; > analyze table dspam_stats; > analyze table dspam_token_data; > > optimize table dspam_signature_data; > optimize table dspam_stats; > optimize table dspam_token_data; > > After you have done that all old tokens and signatures and statistics for > each user should be removed. This will lead to problems if user are going to > try to retrain stuff that they got in the last days (since the signature data > is purged). I don't think this will be a big issue on your setup since your > users are using the dovecot anti-spam plugin and all DSPAM stuff is > masked/hidden for them. > > 7) Change your dspam.conf to run in TOE instead of TEFT. Don't forget to > check the preferences of each user if they don't have set by accident > "trainingMode" to anything other than "TOE". Actually you could delete > "trainingMode" if the user has that preference (it will fall back to that > what you have set in dspam.conf, which should be in your case TOE). > > 8) Restart the DSPAM daemon.
Thank you Stevan :) Regards, Bradley Giesbrecht ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user