Now, with the attachment.
Sorry.

On Sat, Jul 19, 2008 at 9:13 PM, Philippe Lamarche
<[EMAIL PROTECTED]> wrote:
>  Hi,
>
> I have been working for a little while with Mahout and the Bayesian
> classifier for a school project.
>
> I am using the Enron email corpus and the UC Berkeley classified
> emails (http://www.cs.cmu.edu/~enron/). I did a few tests and I can't
> seem to make it work. I wonder if I am doing something wrong.
>
> For example, I am getting correct prediction under 10%, with Bayes and
> around 1% with CBayes. The problem seems to lie in the fact that all
> instances of a class will be predicted to another class, or that they
> will all be predicted to the class containing the more feature.
>
> I also tested with the 20News corpus and I get similar result where
> all instances of a class will be predicted to another class. (e.g. all
> 421 "rec.motorcycles" get predicted as "talk.politics.mideast").
> Attached is two confusions matrix displaying results for bayes and
> cbayes. Both used the same division in the training and testing set.
>
> Am I doing something wrong?
>
> Thanks,
>
> Philippe Lamarche.
>
=-=-=-=-=-=-=-=-=-=This is for 20News CBayes=-=-=-=-=-=-=-=-=-=
|                          |               alt.atheism |             
comp.graphics |   comp.os.ms-windows.misc |  comp.sys.ibm.pc.hardware |     
comp.sys.mac.hardware |            comp.windows.x |              misc.forsale | 
                rec.autos |           rec.motorcycles |        
rec.sport.baseball |          rec.sport.hockey |                 sci.crypt |    
       sci.electronics |                   sci.med |                 sci.space 
|    soc.religion.christian |        talk.politics.guns |     
talk.politics.mideast |        talk.politics.misc |        talk.religion.misc | 
|              alt.atheism |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                       214 |                         0 
|                         0 | 
|            comp.graphics |                         0 |                       
421 |                         0 |                         0 |                   
      0 |                         0 |                         0 |               
          0 |                         0 |                         0 |           
              0 |                         0 |                         0 |       
                  0 |                         0 |                         0 |   
                      0 |                         0 |                         0 
|                         0 | 
|  comp.os.ms-windows.misc |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                       421 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                         0 |                         0 
|                         0 | 
| comp.sys.ibm.pc.hardware |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                       421 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                         0 |                         0 
|                         0 | 
|    comp.sys.mac.hardware |                         0 |                        
 0 |                         0 |                         0 |                    
   421 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                         0 |                         0 
|                         0 | 
|           comp.windows.x |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                       421 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                         0 |                         0 
|                         0 | 
|             misc.forsale |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
                 0 |                       121 |                         0 |    
                     0 |                         0 |                         0 
|                         0 | 
|                rec.autos |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
       421 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                         0 |                         0 
|                         0 | 
|          rec.motorcycles |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                       421 |                         0 
|                         0 | 
|       rec.sport.baseball |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                        85 |                         0 
|                         0 | 
|         rec.sport.hockey |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
            57 |                         0 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                         0 |                         0 
|                         0 | 
|                sci.crypt |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                       421 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                         0 |                         0 
|                         0 | 
|          sci.electronics |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                       192 |        
                 0 |                         0 |                         0 |    
                     0 |                         0 |                         0 
|                         0 | 
|                  sci.med |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
               421 |                         0 |                         0 |    
                     0 |                         0 |                         0 
|                         0 | 
|                sci.space |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
                 0 |                       158 |                         0 |    
                     0 |                         0 |                         0 
|                         0 | 
|   soc.religion.christian |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
                 0 |                         0 |                       421 |    
                     0 |                         0 |                         0 
|                         0 | 
|       talk.politics.guns |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                         0 |                       421 
|                         0 | 
|    talk.politics.mideast |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                       421 |                         0 
|                         0 | 
|       talk.politics.misc |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                         0 |                       354 
|                         0 | 
|       talk.religion.misc |                         0 |                        
 0 |                         0 |                         0 |                    
     0 |                         0 |                         0 |                
         0 |                         0 |                         0 |            
             0 |                         0 |                         0 |        
                 0 |                         0 |                         0 |    
                     0 |                         0 |                       207 
|                         0 |  
 Correctly classified 0.6411490683229814
 4129/6440


=-=-=-=-=-=-=-=-=-=This is for 20News CBayes=-=-=-=-=-=-=-=-=-=

=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :         57        0.8851%
Incorrectly Classified Instances        :       6383       99.1149%
Total Classified Instances              :       6440

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       e       f       g       h       i       j       
k       l       m       n       o       p       q       r       s       t       
<--Classified as
0       0       0       0       0       0       0       421     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  421         a     = rec.motorcycles
0       0       0       0       0       0       0       421     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  421         b     = comp.windows.x
0       0       0       0       0       0       0       421     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  421         c     = talk.politics.mideast
0       0       0       0       0       0       0       421     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  421         d     = talk.politics.guns
0       0       0       0       0       0       0       207     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  207         e     = talk.religion.misc
0       0       0       0       0       0       0       421     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  421         f     = rec.autos
0       0       0       0       0       0       0       85      0       0       
0       0       0       0       0       0       0       0       0       0       
 |  85          g     = rec.sport.baseball
0       0       0       0       0       0       0       57      0       0       
0       0       0       0       0       0       0       0       0       0       
 |  57          h     = rec.sport.hockey
0       0       0       0       0       0       0       421     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  421         i     = comp.sys.mac.hardware
0       0       0       0       0       0       0       158     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  158         j     = sci.space
0       0       0       0       0       0       0       421     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  421         k     = comp.sys.ibm.pc.hardware
0       0       0       0       0       0       0       354     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  354         l     = talk.politics.misc
0       0       0       0       0       0       0       421     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  421         m     = comp.graphics
0       0       0       0       0       0       0       192     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  192         n     = sci.electronics
0       0       0       0       0       0       0       421     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  421         o     = soc.religion.christian
0       0       0       0       0       0       0       421     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  421         p     = sci.med
0       0       0       0       0       0       0       421     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  421         q     = sci.crypt
0       0       0       0       0       0       0       214     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  214         r     = alt.atheism
0       0       0       0       0       0       0       121     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  121         s     = misc.forsale
0       0       0       0       0       0       0       421     0       0       
0       0       0       0       0       0       0       0       0       0       
 |  421         t     = comp.os.ms-windows.misc

Reply via email to