On 20.04.2012 19:41, Bradley Giesbrecht wrote: > On Apr 20, 2012, at 9:39 AM, Stevan Bajić wrote: > >> On 20.04.2012 07:32, Steve Fatula wrote: >>> [...] >>> If you give me the SPAM corpus, I can just run dspam_train on it (and I'd >>> even add my 80). But it will be pretty unbalanced since I have few HAM >>> messages since I only keep a month (maybe a few thousand messages). I am >>> not sure that matters much? In the end, won't the detection still work, >>> maybe biased towards SPAM at first, but, surely, it woudln't take too long >>> to stop false positives? >>> >> Lets say you want to make that merged global group. Then this is what you >> should do: >> >> 1) Create a new DSPAM user. If you can create a flat user (no >> localp...@domain.tld) because a flat user name will be easier to recognize >> on your setup where you usually have full blown up email addresses as user >> name. Lets say that new created user is called "SpamHitRate". >> >> 2) Change preferences for that user to: >> dspam_admin change preference "SpamHitRate" "dailyQuarantineSummary" "off" >> dspam_admin change preference "SpamHitRate" "enableBNR" "on" >> dspam_admin change preference "SpamHitRate" "enableWhitelist" "off" >> dspam_admin change preference "SpamHitRate" "fallbackDomain" "off" >> dspam_admin change preference "SpamHitRate" "ignoreGroups" "on" >> dspam_admin change preference "SpamHitRate" "ignoreRBLLookups" "on" >> dspam_admin change preference "SpamHitRate" "makeCorpus" "off" >> dspam_admin change preference "SpamHitRate" "optIn" "on" >> dspam_admin change preference "SpamHitRate" "optOut" "off" >> dspam_admin change preference "SpamHitRate" "optOutClamAV" "on" >> dspam_admin change preference "SpamHitRate" "processorBias" "off" >> dspam_admin change preference "SpamHitRate" "showFactors" "off" >> dspam_admin change preference "SpamHitRate" "signatureLocation" "headers" >> dspam_admin change preference "SpamHitRate" "spamAction" "deliver" >> dspam_admin change preference "SpamHitRate" "spamSubject" "" >> dspam_admin change preference "SpamHitRate" "statisticalSedation" "0" >> dspam_admin change preference "SpamHitRate" "storeFragments" "off" >> dspam_admin change preference "SpamHitRate" "tagNonspam" "off" >> dspam_admin change preference "SpamHitRate" "tagSpam" "off" >> dspam_admin change preference "SpamHitRate" "trainingMode" "TOE" >> dspam_admin change preference "SpamHitRate" "trainPristine" "off" >> dspam_admin change preference "SpamHitRate" "whitelistThreshold" "9999999" >> >> Basically you want that user to not use ClamAV, nor any groups, nor any RBL, >> nor do you want whitelisting or any other mambo jambo. Usually you would not >> turn off that many helper mechanism on a normal user but this is not a >> normal user. You want that user to be as hard as possible. You don't care >> about false positive or false negative on that user. In fact this is exactly >> what you want. You want that user to generate as much false positive / >> negative as needed. Because the more FP/FN you have the more you can make >> DSPAM to learn. And this is what you are going to do mainly with that user. >> You are going to use dspam_train with Spam/Ham corpi. >> >> 3) Now go on and train with dspam_train: dspam_train SpamHitRate >> [spam_corpus maildir or mbox] [nonspam_corpus maildir or mbox] >> >> 4) After you are finished with dspam_train you should go on and run >> dspam_clean: dspam_clean -s0 -p0 -u0,0,0,0 SpamHitRate >> >> 5) Now you enable the merged global group by editing the DSPAM group file >> and there you add: >> SpamHitRate:merged:* >> >> 6) You are using MySQL right? Now it is time to delete all users tokens >> except for SpamHitRate. To do that you just execute this (assuming the uid >> of SpamHitRate is 1000): >> >> delete from dspam_signature_data where uid!=1000; >> delete from dspam_stats where uid!=1000; >> delete from dspam_token_data where uid!=1000; >> >> analyze table dspam_signature_data; >> analyze table dspam_stats; >> analyze table dspam_token_data; >> >> optimize table dspam_signature_data; >> optimize table dspam_stats; >> optimize table dspam_token_data; >> >> After you have done that all old tokens and signatures and statistics for >> each user should be removed. This will lead to problems if user are going to >> try to retrain stuff that they got in the last days (since the signature >> data is purged). I don't think this will be a big issue on your setup since >> your users are using the dovecot anti-spam plugin and all DSPAM stuff is >> masked/hidden for them. >> >> 7) Change your dspam.conf to run in TOE instead of TEFT. Don't forget to >> check the preferences of each user if they don't have set by accident >> "trainingMode" to anything other than "TOE". Actually you could delete >> "trainingMode" if the user has that preference (it will fall back to that >> what you have set in dspam.conf, which should be in your case TOE). >> >> 8) Restart the DSPAM daemon. > Thank you Stevan :) No problem. If anyone needs a bunch of spam corpi then have a look here -> http://untroubled.org/spam/
> Regards, > Bradley Giesbrecht > > > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > Dspam-user mailing list > Dspam-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspam-user -- Kind Regards from Switzerland, Stevan Bajić ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user