On 26.04.2012 05:01, Chad M Stewart wrote: > On Apr 20, 2012, at 11:39 AM, Stevan Bajić wrote: > >> On 20.04.2012 07:32, Steve Fatula wrote: >>> [...] >>> If you give me the SPAM corpus, I can just run dspam_train on it (and I'd >>> even add my 80). But it will be pretty unbalanced since I have few HAM >>> messages since I only keep a month (maybe a few thousand messages). I am >>> not sure that matters much? In the end, won't the detection still work, >>> maybe biased towards SPAM at first, but, surely, it woudln't take too long >>> to stop false positives? >>> >> Lets say you want to make that merged global group. Then this is what you >> should do: >> >> 1) Create a new DSPAM user. If you can create a flat user (no >> localp...@domain.tld) because a flat user name will be easier to recognize >> on your setup where you usually have full blown up email addresses as user >> name. Lets say that new created user is called "SpamHitRate". >> >> 2) Change preferences for that user to: >> dspam_admin change preference "SpamHitRate" "dailyQuarantineSummary" "off" >> dspam_admin change preference "SpamHitRate" "enableBNR" "on" >> dspam_admin change preference "SpamHitRate" "enableWhitelist" "off" >> dspam_admin change preference "SpamHitRate" "fallbackDomain" "off" >> dspam_admin change preference "SpamHitRate" "ignoreGroups" "on" >> dspam_admin change preference "SpamHitRate" "ignoreRBLLookups" "on" >> dspam_admin change preference "SpamHitRate" "makeCorpus" "off" >> dspam_admin change preference "SpamHitRate" "optIn" "on" >> dspam_admin change preference "SpamHitRate" "optOut" "off" >> dspam_admin change preference "SpamHitRate" "optOutClamAV" "on" >> dspam_admin change preference "SpamHitRate" "processorBias" "off" >> dspam_admin change preference "SpamHitRate" "showFactors" "off" >> dspam_admin change preference "SpamHitRate" "signatureLocation" "headers" >> dspam_admin change preference "SpamHitRate" "spamAction" "deliver" >> dspam_admin change preference "SpamHitRate" "spamSubject" "" >> dspam_admin change preference "SpamHitRate" "statisticalSedation" "0" >> dspam_admin change preference "SpamHitRate" "storeFragments" "off" >> dspam_admin change preference "SpamHitRate" "tagNonspam" "off" >> dspam_admin change preference "SpamHitRate" "tagSpam" "off" >> dspam_admin change preference "SpamHitRate" "trainingMode" "TOE" >> dspam_admin change preference "SpamHitRate" "trainPristine" "off" >> dspam_admin change preference "SpamHitRate" "whitelistThreshold" "9999999" >> >> Basically you want that user to not use ClamAV, nor any groups, nor any RBL, >> nor do you want whitelisting or any other mambo jambo. Usually you would not >> turn off that many helper mechanism on a normal user but this is not a >> normal user. You want that user to be as hard as possible. You don't care >> about false positive or false negative on that user. In fact this is exactly >> what you want. You want that user to generate as much false positive / >> negative as needed. Because the more FP/FN you have the more you can make >> DSPAM to learn. And this is what you are going to do mainly with that user. >> You are going to use dspam_train with Spam/Ham corpi. >> >> 3) Now go on and train with dspam_train: dspam_train SpamHitRate >> [spam_corpus maildir or mbox] [nonspam_corpus maildir or mbox] > > Would something like > > /usr/local/bin/dspam --client --mode=toe --source=innoculation --class=spam > --user SpamHitRate --deliver=summary< msgs > > work just as well? It would work but not 'just as well' good as normal corpus-feeding.
> I'm curious to learn the difference between source=corpus and > source=inoculation. The difference is well documented. > If I'm setting up a new dspam system and have messages that are nothing > but 100% spam... would inoculation be better? From my past experience the answer is NO! Inoculation is very intensive way of training. While it might sound like a good way for normal corpus training... it is not in real life. Try it. It should not take you much time to train once with inoculation and then look how well it is scoring in production and if it is not okay then you can at any time go back and drop the data and restart from beginning and use normal corpus-feeding. Inoculation has a reason to be there in DSPAM but for day to day training it is not the right way. But as usual.... don't take my word. Try it yourself. Be curious and try different approaches. It might be that inoculation is indeed delivering better result for you. I can't with 100% confidence say that it will not. > > Thank you, > Chad > > > > -- Kind Regards from Switzerland, Stevan Bajić ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user