[ 
https://issues.apache.org/jira/browse/MAHOUT-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleg Kalnichevski resolved MAHOUT-562.
--------------------------------------

    Resolution: Invalid

Apparently I used the wrong module produced with the 'bayes' algorithm type. My 
bad.  Apologies for the noise.

Oleg

> Results produced by Complementary Bayes Classifier seem odd
> -----------------------------------------------------------
>
>                 Key: MAHOUT-562
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-562
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.4
>            Reporter: Oleg Kalnichevski
>
> The 20newsgroups example produces expected results (95% correctness rate) 
> when using the Naive Bayes algorithm. When switching the algorithm to the 
> Complementary Bayes while all other parameters remain the same the rate of 
> correctly classified documents drops to 5%. This seems odd to me. 
> I admit I know next to nothing about the Bayes theorem and possibly my 
> expectations are totally off. 
> ---
> Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier 
> classifySequential
> INFO: Loading model from: 
> {basePath=/home/oleg/data/mahout/20news-bayes-model, classifierType=cbayes, 
> alpha_i=1, dataSource=hdfs, gramSize=1, verbose=false, encoding=UTF-8, 
> defaultCat=unknown, 
> testDirPath=/home/oleg/data/mahout/20news-bayes-train-input}
> Dec 11, 2010 8:47:47 PM org.apache.mahout.classifier.bayes.TestClassifier 
> classifySequential
> INFO: Testing Complementary Bayes Classifier
> ...
> INFO: =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :        578      5.1087%
> Incorrectly Classified Instances        :      10736     94.8913%
> Total Classified Instances              :      11314
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a     b       c       d       e       f       g       h       i       j       
> k       l       m       n       o       p       q       r       s       t     
>   <--Classified as
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       597     0       0       0       0       0       0     
>    |  597         a     = rec.sport.baseball
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       595     0       0       0       0       0       0     
>    |  595         b     = sci.crypt
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       600     0       0       0       0       0       0     
>    |  600         c     = rec.sport.hockey
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       546     0       0       0       0       0       0     
>    |  546         d     = talk.politics.guns
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       599     0       0       0       0       0       0     
>    |  599         e     = soc.religion.christian
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       591     0       0       0       0       0       0     
>    |  591         f     = sci.electronics
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       591     0       0       0       0       0       0     
>    |  591         g     = comp.os.ms-windows.misc
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       585     0       0       0       0       0       0     
>    |  585         h     = misc.forsale
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       377     0       0       0       0       0       0     
>    |  377         i     = talk.religion.misc
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       480     0       0       0       0       0       0     
>    |  480         j     = alt.atheism
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       593     0       0       0       0       0       0     
>    |  593         k     = comp.windows.x
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       564     0       0       0       0       0       0     
>    |  564         l     = talk.politics.mideast
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       590     0       0       0       0       0       0     
>    |  590         m     = comp.sys.ibm.pc.hardware
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       578     0       0       0       0       0       0     
>    |  578         n     = comp.sys.mac.hardware
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       593     0       0       0       0       0       0     
>    |  593         o     = sci.space
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       598     0       0       0       0       0       0     
>    |  598         p     = rec.motorcycles
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       594     0       0       0       0       0       0     
>    |  594         q     = rec.autos
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       584     0       0       0       0       0       0     
>    |  584         r     = comp.graphics
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       465     0       0       0       0       0       0     
>    |  465         s     = talk.politics.misc
> 0     0       0       0       0       0       0       0       0       0       
> 0       0       0       594     0       0       0       0       0       0     
>    |  594         t     = sci.med
> Default Category: unknown: 20

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to