Sidney Markowitz wrote:

Nick Leverton said that papers he has seen found that learn on error
always works better than learn everything. But I recall one that
looked more carefully at longer term results and found that learn on
error degrades over time. They found it best to retrain on fresh data
every few months. (I don't have the reference handy).

I'd like to ignore those train on error and train on everything comparisons for this research. I'm proposing a method of updating the probability tables, not deciding on which entries to train. Adding in the different training patterns would only serve to complicate things and confuse the results.

That makes sense if you consider that spam (and possibly ham) patterns
change over time, even more so to the degree that spam patterns are
actively adapting to try to beat spam filters.

If the filter can't respond to changes in the input, then it's lacking plasticity.

What I haven't seen discussed is the effect of token expiration as is
done SpamAssassin. Wouldn't that produce he same effect as periodic
retraining, thereby allowing learn on everything to work

This method isn't able to cope with changes in word usage over time. Spammers change the phrase "click here" to "press here." But, you once wrote a lot of e-mails looking for parts for your broken "drill press." Your filter has already converged on a mean. Moving it to the new mean would require a very large number of inputs.

well? Doesn't that prevent the problems of converging to a mean and
slowing down the learning? How does the effect of token

Periodic retraining requires you to save and maintain your corpus. Most users don't do that.

expiration compare to the use of back-propagation?

Something similar to expiration can be done with back propagation by adding a second term to the error function for weight decay. We'd want unused terms to converge to an 0.5 probability, so we would make the terms decay thusly.

I don't want to distract myself too much from my thesis (it's almost
done!), so this conversation will have to wait a little while.  Keep
thinking about it, though!

Henry

Attachment: signature.asc
Description: OpenPGP digital signature



Reply via email to