After comparing autolearn to Hebbian learning in my thesis, I got to thinking: The "count them up" method of learning probabilities is very good for building an initial classifier, but the probabilities tend to converge to a mean. This means that it becomes geometrically slower over time for a personalised text classifier to adapt to a major change.
The probability table for the Bayes classifier can be trained using error backpropagation, the same way that I train the perceptron. The only difference is that it requires a different delta rule. I have derived it for the basic naive Bayes classifier:
Let a = P(S|w1), b= P(S|w2), c = P(S)
d / da a*b*c / (a*b*c + (1-a)*(1-b)*(1-c)) = b*c*(b-1)*(c-1)/((b-1)*(c-1) - a*(b+c-1))
So, there are two ways that the training process could progress:
1. Bootstrap the probabilities with the old counting method. Iteratively train the classifier with new instances as they arrive.
2. Start with random values. Train the classifier with gradient descent using a bootstrap data set. Iteratively train the classifier with new instances as they arrive.
I prefer #2. Gradient descent with #1 will probably get stuck in local optima near the initialisation parameters.
Any comments? Interest in co-authoring a research paper (*poke*, Vaishnavi and friends at UWash)? Has this been done before?
Henry
signature.asc
Description: OpenPGP digital signature
