After comparing autolearn to Hebbian learning in my thesis, I got to
thinking:  The "count them up" method of learning probabilities is very
good for building an initial classifier, but the probabilities tend to
converge to a mean.  This means that it becomes geometrically slower
over time for a personalised text classifier to adapt to a major change.

The probability table for the Bayes classifier can be trained using
error backpropagation, the same way that I train the perceptron.  The
only difference is that it requires a different delta rule.  I have
derived it for the basic naive Bayes classifier:

Let a = P(S|w1), b= P(S|w2), c = P(S)

d / da a*b*c / (a*b*c + (1-a)*(1-b)*(1-c))
= b*c*(b-1)*(c-1)/((b-1)*(c-1) - a*(b+c-1))

So, there are two ways that the training process could progress:

1.  Bootstrap the probabilities with the old counting method.
Iteratively train the classifier with new instances as they arrive.

2.  Start with random values.  Train the classifier with gradient
descent using a bootstrap data set.  Iteratively train the classifier
with new instances as they arrive.

I prefer #2.  Gradient descent with #1 will probably get stuck in local
optima near the initialisation parameters.

Any comments?  Interest in co-authoring a research paper (*poke*,
Vaishnavi and friends at UWash)?  Has this been done before?

Henry

Attachment: signature.asc
Description: OpenPGP digital signature



Reply via email to