On Tue, 05 Apr 2011 23:28:01 +0200
Elias Oltmanns <e...@nebensachen.de> wrote:

> Stevan Bajić <ste...@bajic.ch> wrote:
> >  On Tue, 05 Apr 2011 18:08:12 +0200, Elias Oltmanns wrote:
> >
> >> Kenneth Marshall wrote:
> [...]
> >>> Hi Elias, Stevan already sent you the correct query to look at the
> >>> whitelist tokens. The tokens are valuable for performance on
> >>> correspondance from "known" senders. Personally, I would not bother
> >>> with migrating them and just have them be reset as they get 
> >>> processed
> >>> in the new DB.
> >> Well, if I understand correctly, emails from "known senders" will 
> >> still
> >> be trained as ham and thus ensure innocent hits on "the right 
> >> tokens".
> >>
> >  Not if you use something like TOE which does not automatically learns 
> >  like TEFT or TUM.
> 
> Yes, I'm aware of that.
> 
> >
> >
> >> Since I have always used dspam as a low maintenance system in a 
> >> rather
> >> strict sense (no corpus feeding and such like), I think I'll opt for
> >> keeping all the old tokens, switching back to teft for a while and
> >> letting the expiration mechanism do its job. Unless I have overlooked
> >> something, this should eventually produce pretty much the same result 
> >> as
> >> if I had started with an empty database
> >>
> >  From a strict mathematical viewpoint the result will not be the same.
> 
> Right, you asked for it ;-). So, here I go again:
> What is the difference (from a mathematical viewpoint) then? As far as I
> can gather from what you said and from the documentation, none of the
> old CHAINED tokens will be matched when OSB probes the database during
> classification; the only exception being, of course, the whitelist
> tokens. So, if I switch back to teft mode, I expect all CHAINED tokens
> to disappear after two weeks, while the database fills up with OSB
> tokens (pardon my sloppy terminology). Some whitelist entries might
> disappear too if I don't get emails from the respective senders in that
> period of time, but if I had started with an empty database, those
> entries wouldn't be there either. So, the difference really is that some
> emails, that might have been classified as spam if I had started with an
> empty database, may now be classified and accordingly trained as ham
> because they come from a known sender, which, in all likelyhood, will be
> desirable.
> 
> Have I missed something there? I'm far from being an expert on
> statistics but always appreciate a bit of mathematics, so, fire away.
> 
The major point you miss to consider are the TN, TP, FP and FN counters. Not 
resetting them leads to different result.

This is just one part. Off course the result is mathematically not the same 
when you compare starting from scratch and continuing to use older tokens 
(incl. whitelist tokens).


> Regards,
> 
> Elias
> 
> 
> ------------------------------------------------------------------------------
> Xperia(TM) PLAY
> It's a major breakthrough. An authentic gaming
> smartphone on the nation's most reliable network.
> And it wants your games.
> http://p.sf.net/sfu/verizon-sfdev
> _______________________________________________
> Dspam-user mailing list
> Dspam-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspam-user

------------------------------------------------------------------------------
Xperia(TM) PLAY
It's a major breakthrough. An authentic gaming
smartphone on the nation's most reliable network.
And it wants your games.
http://p.sf.net/sfu/verizon-sfdev
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to