Re: [Dspam-user] training time?

Michael Alger Thu, 15 Apr 2010 02:44:19 -0700

On Mon, Apr 12, 2010 at 09:18:43PM +0200, Stevan Bajić wrote:
> On Sat, 10 Apr 2010 17:59:25 +0800
> Michael Alger <ds...@mm.quex.org> wrote:
> > On Fri, Apr 09, 2010 at 11:23:16PM -0700, Terry Barnum wrote:
> > >>> I've been running DSPAM for approximately 2 weeks and looking
> > >>> at the output of dspam_stats, I'm curious how long training
> > >>> normally takes.
> > >>>
> > >>> $ cat /usr/local/dspam.conf | grep -v ^# | grep -v ^$
> > >>>
> > >>> TrainingMode toe
> > >>> Preference "trainingMode=TOE"
> > 
> > Your default settings are TOE mode. Are you overriding this for any
> > of the users in their preferences? If not, this would explain why
> > it's only learning from errors: because you told it to.
> > 
> > Try switching this to TUM or TEFT.
> > 
> I think most users here don't understand what training is in the
> context of Anti-Spam. So I am going to try to explain quickly what
> all those different training modes are.


Thank you for this explanation and after a quick test I see that the
TL counter does decrement (and TN increments) when I process mail
using TOE. If I set it to NOTRAIN, then none of the statistics are
updated when the messages is processed.

However, I don't understand why simply classifying a message using
TOE decrements the Training Left counter. My understanding is that
token statistics are only updated when retraining a misclassified
message; classifying a message shouldn't cause any changes here, and
thus logically shouldn't be construed as "training" the system.

Is this done purely so the statistical sedation is deactivated in
TOE mode after 2,500 messages have been processed, or are there
other reasons?

> TUM is exactly like TEFT. He takes the test and after the test he
> as well is buying a book (+/- 100 pages) about the tested topic
> and reading/learning the book. But as soon as he has successfully
> passed 2'500 tests he changes his strategy and stops buying books
> after he has passed a test. He is only buying and reading/learning
> a book if he has failed on a test.

Does TUM base its decision to learn purely on the value of the TL
counter (i.e. stops learning once that reaches 0), or is the TL just
a hint and TUM actually uses some heuristic based on the number of
tokens available to it and their scores?

Is TL used by anything other than the statistical sedation feature?

> TOE is totally different from the above 3. He is taking a test and
> if he is failing to pass the test he goes on and buys a book (+/-
> 100 pages) about the tested topic and reads/learns the book. He
> does that for ever. Every test he takes he is doing the same. If
> he passes the test he does not buy the book and he does not read
> those +/- 100 pages. He just has passed the test and he knows that
> he has passed. So no need for him to invest time in reading 100
> pages for nothing. He is already knowledgeable in that topic he
> tested (remeber: he passed the test).

I think saying "TOE is totally different from {NOTRAIN, TEFT, TUM}"
is a little strong. It seems to me that TEFT and TOE are quite
different, while TUM is a combination of the two: TEFT until it has
enough data, and then TOE. Or have I misunderstood?

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Re: [Dspam-user] training time?

Reply via email to