On Mon, Apr 12, 2010 at 09:18:43PM +0200, Stevan Bajić wrote: > On Sat, 10 Apr 2010 17:59:25 +0800 > Michael Alger <ds...@mm.quex.org> wrote: > > On Fri, Apr 09, 2010 at 11:23:16PM -0700, Terry Barnum wrote: > > >>> I've been running DSPAM for approximately 2 weeks and looking > > >>> at the output of dspam_stats, I'm curious how long training > > >>> normally takes. > > >>> > > >>> $ cat /usr/local/dspam.conf | grep -v ^# | grep -v ^$ > > >>> > > >>> TrainingMode toe > > >>> Preference "trainingMode=TOE" > > > > Your default settings are TOE mode. Are you overriding this for any > > of the users in their preferences? If not, this would explain why > > it's only learning from errors: because you told it to. > > > > Try switching this to TUM or TEFT. > > > I think most users here don't understand what training is in the > context of Anti-Spam. So I am going to try to explain quickly what > all those different training modes are.
Thank you for this explanation and after a quick test I see that the TL counter does decrement (and TN increments) when I process mail using TOE. If I set it to NOTRAIN, then none of the statistics are updated when the messages is processed. However, I don't understand why simply classifying a message using TOE decrements the Training Left counter. My understanding is that token statistics are only updated when retraining a misclassified message; classifying a message shouldn't cause any changes here, and thus logically shouldn't be construed as "training" the system. Is this done purely so the statistical sedation is deactivated in TOE mode after 2,500 messages have been processed, or are there other reasons? > TUM is exactly like TEFT. He takes the test and after the test he > as well is buying a book (+/- 100 pages) about the tested topic > and reading/learning the book. But as soon as he has successfully > passed 2'500 tests he changes his strategy and stops buying books > after he has passed a test. He is only buying and reading/learning > a book if he has failed on a test. Does TUM base its decision to learn purely on the value of the TL counter (i.e. stops learning once that reaches 0), or is the TL just a hint and TUM actually uses some heuristic based on the number of tokens available to it and their scores? Is TL used by anything other than the statistical sedation feature? > TOE is totally different from the above 3. He is taking a test and > if he is failing to pass the test he goes on and buys a book (+/- > 100 pages) about the tested topic and reads/learns the book. He > does that for ever. Every test he takes he is doing the same. If > he passes the test he does not buy the book and he does not read > those +/- 100 pages. He just has passed the test and he knows that > he has passed. So no need for him to invest time in reading 100 > pages for nothing. He is already knowledgeable in that topic he > tested (remeber: he passed the test). I think saying "TOE is totally different from {NOTRAIN, TEFT, TUM}" is a little strong. It seems to me that TEFT and TOE are quite different, while TUM is a combination of the two: TEFT until it has enough data, and then TOE. Or have I misunderstood? ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user