Sidney Markowitz wrote:
Nick Leverton said that papers he has seen found that learn on error always works better than learn everything. But I recall one that looked more carefully at longer term results and found that learn on error degrades over time. They found it best to retrain on fresh data every few months. (I don't have the reference handy).
I'd like to ignore those train on error and train on everything comparisons for this research. I'm proposing a method of updating the probability tables, not deciding on which entries to train. Adding in the different training patterns would only serve to complicate things and confuse the results.
That makes sense if you consider that spam (and possibly ham) patterns change over time, even more so to the degree that spam patterns are actively adapting to try to beat spam filters.
If the filter can't respond to changes in the input, then it's lacking plasticity.
What I haven't seen discussed is the effect of token expiration as is done SpamAssassin. Wouldn't that produce he same effect as periodic retraining, thereby allowing learn on everything to work
This method isn't able to cope with changes in word usage over time. Spammers change the phrase "click here" to "press here." But, you once wrote a lot of e-mails looking for parts for your broken "drill press." Your filter has already converged on a mean. Moving it to the new mean would require a very large number of inputs.
well? Doesn't that prevent the problems of converging to a mean and slowing down the learning? How does the effect of token
Periodic retraining requires you to save and maintain your corpus. Most users don't do that.
expiration compare to the use of back-propagation?
Something similar to expiration can be done with back propagation by adding a second term to the error function for weight decay. We'd want unused terms to converge to an 0.5 probability, so we would make the terms decay thusly.
I don't want to distract myself too much from my thesis (it's almost done!), so this conversation will have to wait a little while. Keep thinking about it, though!
Henry
signature.asc
Description: OpenPGP digital signature
