Stevan Bajić <ste...@bajic.ch> wrote: > On Tue, 05 Apr 2011 18:08:12 +0200, Elias Oltmanns wrote: > >> Kenneth Marshall wrote: [...] >>> Hi Elias, Stevan already sent you the correct query to look at the >>> whitelist tokens. The tokens are valuable for performance on >>> correspondance from "known" senders. Personally, I would not bother >>> with migrating them and just have them be reset as they get >>> processed >>> in the new DB. >> Well, if I understand correctly, emails from "known senders" will >> still >> be trained as ham and thus ensure innocent hits on "the right >> tokens". >> > Not if you use something like TOE which does not automatically learns > like TEFT or TUM.
Yes, I'm aware of that. > > >> Since I have always used dspam as a low maintenance system in a >> rather >> strict sense (no corpus feeding and such like), I think I'll opt for >> keeping all the old tokens, switching back to teft for a while and >> letting the expiration mechanism do its job. Unless I have overlooked >> something, this should eventually produce pretty much the same result >> as >> if I had started with an empty database >> > From a strict mathematical viewpoint the result will not be the same. Right, you asked for it ;-). So, here I go again: What is the difference (from a mathematical viewpoint) then? As far as I can gather from what you said and from the documentation, none of the old CHAINED tokens will be matched when OSB probes the database during classification; the only exception being, of course, the whitelist tokens. So, if I switch back to teft mode, I expect all CHAINED tokens to disappear after two weeks, while the database fills up with OSB tokens (pardon my sloppy terminology). Some whitelist entries might disappear too if I don't get emails from the respective senders in that period of time, but if I had started with an empty database, those entries wouldn't be there either. So, the difference really is that some emails, that might have been classified as spam if I had started with an empty database, may now be classified and accordingly trained as ham because they come from a known sender, which, in all likelyhood, will be desirable. Have I missed something there? I'm far from being an expert on statistics but always appreciate a bit of mathematics, so, fire away. Regards, Elias ------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ Dspam-user mailing list Dspam-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-user