Stevan Bajić <ste...@bajic.ch> wrote:
>  On Tue, 05 Apr 2011 18:08:12 +0200, Elias Oltmanns wrote:
>
>> Kenneth Marshall wrote:
[...]
>>> Hi Elias, Stevan already sent you the correct query to look at the
>>> whitelist tokens. The tokens are valuable for performance on
>>> correspondance from "known" senders. Personally, I would not bother
>>> with migrating them and just have them be reset as they get 
>>> processed
>>> in the new DB.
>> Well, if I understand correctly, emails from "known senders" will 
>> still
>> be trained as ham and thus ensure innocent hits on "the right 
>> tokens".
>>
>  Not if you use something like TOE which does not automatically learns 
>  like TEFT or TUM.

Yes, I'm aware of that.

>
>
>> Since I have always used dspam as a low maintenance system in a 
>> rather
>> strict sense (no corpus feeding and such like), I think I'll opt for
>> keeping all the old tokens, switching back to teft for a while and
>> letting the expiration mechanism do its job. Unless I have overlooked
>> something, this should eventually produce pretty much the same result 
>> as
>> if I had started with an empty database
>>
>  From a strict mathematical viewpoint the result will not be the same.

Right, you asked for it ;-). So, here I go again:
What is the difference (from a mathematical viewpoint) then? As far as I
can gather from what you said and from the documentation, none of the
old CHAINED tokens will be matched when OSB probes the database during
classification; the only exception being, of course, the whitelist
tokens. So, if I switch back to teft mode, I expect all CHAINED tokens
to disappear after two weeks, while the database fills up with OSB
tokens (pardon my sloppy terminology). Some whitelist entries might
disappear too if I don't get emails from the respective senders in that
period of time, but if I had started with an empty database, those
entries wouldn't be there either. So, the difference really is that some
emails, that might have been classified as spam if I had started with an
empty database, may now be classified and accordingly trained as ham
because they come from a known sender, which, in all likelyhood, will be
desirable.

Have I missed something there? I'm far from being an expert on
statistics but always appreciate a bit of mathematics, so, fire away.

Regards,

Elias


------------------------------------------------------------------------------
Xperia(TM) PLAY
It's a major breakthrough. An authentic gaming
smartphone on the nation's most reliable network.
And it wants your games.
http://p.sf.net/sfu/verizon-sfdev
_______________________________________________
Dspam-user mailing list
Dspam-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-user

Reply via email to