Re: Idea: New way to train Bayes

Sidney Markowitz 6 Dec 2004 20:42:19 -0000

Nick Leverton said that papers he has seen found that learn on error always works better than learn everything. But I recall one that looked more carefully at longer term results and found that learn on error degrades over time. They found it best to retrain on fresh data every few months. (I don't have the reference handy).

That makes sense if you consider that spam (and possibly ham) patterns change over time, even more so to the degree that spam patterns are actively adapting to try to beat spam filters.

BTW, at least one spam learning filter I've seen reduces its memory requirements by using a small hash size (like 32 bits) for representing tokens. Such systems will show poorer results for learn everything compared to learn on error simply because of collision effects once they learn too many tokens.

What I haven't seen discussed is the effect of token expiration as is done SpamAssassin. Wouldn't that produce he same effect as periodic retraining, thereby allowing learn on everything to work well? Doesn't that prevent the problems of converging to a mean and slowing down the learning? How does the effect of token expiration compare to the use of back-propagation?

-- sidney

signature.asc
Description: OpenPGP digital signature

Re: Idea: New way to train Bayes

Reply via email to