I haven't done a lot of testing w/ M-9 yet, so it is more than likely
there are bugs ;-)
-Grant
On Jul 20, 2008, at 6:21 AM, Miles Osborne wrote:
i think it would also be useful to cross-check your results against
a text
classification system which is known to work. look at rainbow:
http://www.cs.cmu.edu/~mccallum/bow/rainbow/
if you get the correct results here then either you have somehow
messed-up
with Mahout or else there really is a bug
Miles
2008/7/20 Robin Anil <[EMAIL PROTECTED]>:
Can you upload your split somewhere.
On Sun, Jul 20, 2008 at 6:46 AM, Philippe Lamarche <
[EMAIL PROTECTED]> wrote:
Now, with the attachment.
Sorry.
On Sat, Jul 19, 2008 at 9:13 PM, Philippe Lamarche
<[EMAIL PROTECTED]> wrote:
Hi,
I have been working for a little while with Mahout and the Bayesian
classifier for a school project.
I am using the Enron email corpus and the UC Berkeley classified
emails (http://www.cs.cmu.edu/~enron/<http://www.cs.cmu.edu/%7Eenron/
><
http://www.cs.cmu.edu/%7Eenron/>).
I did a few tests and I can't
seem to make it work. I wonder if I am doing something wrong.
For example, I am getting correct prediction under 10%, with
Bayes and
around 1% with CBayes. The problem seems to lie in the fact that
all
instances of a class will be predicted to another class, or that
they
will all be predicted to the class containing the more feature.
I also tested with the 20News corpus and I get similar result where
all instances of a class will be predicted to another class.
(e.g. all
421 "rec.motorcycles" get predicted as "talk.politics.mideast").
Attached is two confusions matrix displaying results for bayes and
cbayes. Both used the same division in the training and testing
set.
Am I doing something wrong?
Thanks,
Philippe Lamarche.
Thanks
Robin
--
The University of Edinburgh is a charitable body, registered in
Scotland,
with registration number SC005336.
--------------------------
Grant Ingersoll
http://www.lucidimagination.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ