On 20.04.2012 07:32, Steve Fatula wrote:
[...]

If you give me the SPAM corpus, I can just run dspam_train on it (and I'd even add my 80). But it will be pretty unbalanced since I have few HAM messages since I only keep a month (maybe a few thousand messages). I am not sure that matters much? In the end, won't the detection still work, maybe biased towards SPAM at first, but, surely, it woudln't take too long to stop false positives?

Lets say you want to make that merged global group. Then this is what you should do:

1) Create a new DSPAM user. If you can create a flat user (no localp...@domain.tld) because a flat user name will be easier to recognize on your setup where you usually have full blown up email addresses as user name. Lets say that new created user is called "SpamHitRate".

2) Change preferences for that user to:
dspam_admin change preference "SpamHitRate" "dailyQuarantineSummary" "off"
dspam_admin change preference "SpamHitRate" "enableBNR" "on"
dspam_admin change preference "SpamHitRate" "enableWhitelist" "off"
dspam_admin change preference "SpamHitRate" "fallbackDomain" "off"
dspam_admin change preference "SpamHitRate" "ignoreGroups" "on"
dspam_admin change preference "SpamHitRate" "ignoreRBLLookups" "on"
dspam_admin change preference "SpamHitRate" "makeCorpus" "off"
dspam_admin change preference "SpamHitRate" "optIn" "on"
dspam_admin change preference "SpamHitRate" "optOut" "off"
dspam_admin change preference "SpamHitRate" "optOutClamAV" "on"
dspam_admin change preference "SpamHitRate" "processorBias" "off"
dspam_admin change preference "SpamHitRate" "showFactors" "off"
dspam_admin change preference "SpamHitRate" "signatureLocation" "headers"
dspam_admin change preference "SpamHitRate" "spamAction" "deliver"
dspam_admin change preference "SpamHitRate" "spamSubject" ""
dspam_admin change preference "SpamHitRate" "statisticalSedation" "0"
dspam_admin change preference "SpamHitRate" "storeFragments" "off"
dspam_admin change preference "SpamHitRate" "tagNonspam" "off"
dspam_admin change preference "SpamHitRate" "tagSpam" "off"
dspam_admin change preference "SpamHitRate" "trainingMode" "TOE"
dspam_admin change preference "SpamHitRate" "trainPristine" "off"
dspam_admin change preference "SpamHitRate" "whitelistThreshold" "9999999"

Basically you want that user to not use ClamAV, nor any groups, nor any RBL, nor do you want whitelisting or any other mambo jambo. Usually you would not turn off that many helper mechanism on a normal user but this is not a normal user. You want that user to be as hard as possible. You don't care about false positive or false negative on that user. In fact this is exactly what you want. You want that user to generate as much false positive / negative as needed. Because the more FP/FN you have the more you can make DSPAM to learn. And this is what you are going to do mainly with that user. You are going to use dspam_train with Spam/Ham corpi.

3) Now go on and train with dspam_train: dspam_train SpamHitRate [spam_corpus maildir or mbox] [nonspam_corpus maildir or mbox]

4) After you are finished with dspam_train you should go on and run dspam_clean: dspam_clean -s0 -p0 -u0,0,0,0 SpamHitRate

5) Now you enable the merged global group by editing the DSPAM group file and there you add:
SpamHitRate:merged:*

6) You are using MySQL right? Now it is time to delete all users tokens except for SpamHitRate. To do that you just execute this (assuming the uid of SpamHitRate is 1000):

delete from dspam_signature_data where uid!=1000;
delete from dspam_stats where uid!=1000;
delete from dspam_token_data where uid!=1000;

analyze table dspam_signature_data;
analyze table dspam_stats;
analyze table dspam_token_data;

optimize table dspam_signature_data;
optimize table dspam_stats;
optimize table dspam_token_data;

After you have done that all old tokens and signatures and statistics for each user should be removed. This will lead to problems if user are going to try to retrain stuff that they got in the last days (since the signature data is purged). I don't think this will be a big issue on your setup since your users are using the dovecot anti-spam plugin and all DSPAM stuff is masked/hidden for them.

7) Change your dspam.conf to run in TOE instead of TEFT. Don't forget to check the preferences of each user if they don't have set by accident "trainingMode" to anything other than "TOE". Actually you could delete "trainingMode" if the user has that preference (it will fall back to that what you have set in dspam.conf, which should be in your case TOE).

8) Restart the DSPAM daemon.


Thanks in advance!



--
Kind Regards from Switzerland,

Stevan Bajić

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to