Can you upload your split somewhere. On Sun, Jul 20, 2008 at 6:46 AM, Philippe Lamarche < [EMAIL PROTECTED]> wrote:
> Now, with the attachment. > Sorry. > > On Sat, Jul 19, 2008 at 9:13 PM, Philippe Lamarche > <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I have been working for a little while with Mahout and the Bayesian > > classifier for a school project. > > > > I am using the Enron email corpus and the UC Berkeley classified > > emails (http://www.cs.cmu.edu/~enron/ <http://www.cs.cmu.edu/%7Eenron/>). > I did a few tests and I can't > > seem to make it work. I wonder if I am doing something wrong. > > > > For example, I am getting correct prediction under 10%, with Bayes and > > around 1% with CBayes. The problem seems to lie in the fact that all > > instances of a class will be predicted to another class, or that they > > will all be predicted to the class containing the more feature. > > > > I also tested with the 20News corpus and I get similar result where > > all instances of a class will be predicted to another class. (e.g. all > > 421 "rec.motorcycles" get predicted as "talk.politics.mideast"). > > Attached is two confusions matrix displaying results for bayes and > > cbayes. Both used the same division in the training and testing set. > > > > Am I doing something wrong? > > > > Thanks, > > > > Philippe Lamarche. > > > Thanks Robin
