Problems with the Bayesian classifiers.

Philippe Lamarche Sat, 19 Jul 2008 18:13:57 -0700

 Hi,

I have been working for a little while with Mahout and the Bayesian
classifier for a school project.


I am using the Enron email corpus and the UC Berkeley classified
emails (http://www.cs.cmu.edu/~enron/). I did a few tests and I can't
seem to make it work. I wonder if I am doing something wrong.

For example, I am getting correct prediction under 10%, with Bayes and
around 1% with CBayes. The problem seems to lie in the fact that all
instances of a class will be predicted to another class, or that they
will all be predicted to the class containing the more feature.

I also tested with the 20News corpus and I get similar result where
all instances of a class will be predicted to another class. (e.g. all
421 "rec.motorcycles" get predicted as "talk.politics.mideast").
Attached is two confusions matrix displaying results for bayes and
cbayes. Both used the same division in the training and testing set.

Am I doing something wrong?

Thanks,

Philippe Lamarche.

Problems with the Bayesian classifiers.

Reply via email to