[ 
https://issues.apache.org/jira/browse/MAHOUT-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291658#comment-13291658
 ] 

Lance Norskog commented on MAHOUT-941:
--------------------------------------

This is the output of classify-20newsgroups.sh. "Accuracy" is 90.4 percent. 
"Reliability" is 85%. The standard deviation of "reliability" is .21. "Kappa" 
is 0.87- it is the relationship between "accuracy" v.s. "random 
classification". I do not know if Kappa includes "unclassified" in its formula, 
or assumes all are classified to known labels. Or perhaps it should be 
calculated both ways?

{quote}

Summary
-------------------------------------------------------
Correctly Classified Instances          :       6788       90.4102%
Incorrectly Classified Instances        :        720        9.5898%
Total Classified Instances              :       7508

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       e       f       g       h       i       j       
k       l       m       n       o       p       q       r       s       t       
<--Classified as
296     0       0       0       0       0       0       0       0       0       
0       0       0       0       1       8       0       2       7       3       
 |  317         a     = alt.atheism
1       327     4       20      6       14      2       1       0       0       
0       1       5       3       1       0       1       0       0       0       
 |  386         b     = comp.graphics
0       27      217     76      21      17      5       0       0       0       
0       4       8       1       1       0       0       0       1       3       
 |  381         c     = comp.os.ms-windows.misc
0       10      1       315     23      3       9       2       0       0       
0       0       8       0       0       0       0       0       0       0       
 |  371         d     = comp.sys.ibm.pc.hardware
0       5       1       9       348     0       5       1       0       0       
0       0       4       0       0       0       0       0       1       1       
 |  375         e     = comp.sys.mac.hardware
0       23      2       7       1       328     1       0       0       0       
0       1       0       1       1       0       0       0       0       0       
 |  365         f     = comp.windows.x
0       5       0       19      11      0       337     8       2       1       
4       4       5       0       3       0       0       0       0       1       
 |  400         g     = misc.forsale
0       0       0       3       3       1       8       402     2       1       
0       0       3       1       0       0       0       0       0       3       
 |  427         h     = rec.autos
0       0       0       0       0       1       7       5       368     0       
0       0       0       1       0       0       0       1       0       1       
 |  384         i     = rec.motorcycles
1       0       0       0       0       0       1       1       0       379     
7       0       0       1       0       0       0       0       0       0       
 |  390         j     = rec.sport.baseball
0       0       0       1       2       0       0       1       0       4       
387     0       0       0       0       1       0       0       0       2       
 |  398         k     = rec.sport.hockey
0       3       0       1       3       2       0       0       0       0       
0       393     2       0       0       0       1       3       1       2       
 |  411         l     = sci.crypt
0       5       0       12      10      0       5       1       1       0       
0       1       328     0       2       0       0       2       1       0       
 |  368         m     = sci.electronics
1       5       1       3       1       1       1       0       0       0       
0       0       2       377     4       0       0       0       1       4       
 |  401         n     = sci.med
0       5       0       0       1       1       1       0       0       1       
0       2       0       1       389     0       0       0       2       2       
 |  405         o     = sci.space
4       2       0       1       2       0       0       1       0       1       
1       0       0       1       0       397     2       2       5       1       
 |  420         p     = soc.religion.christian
1       1       0       0       0       0       1       0       0       0       
0       0       0       0       0       4       359     0       0       1       
 |  367         q     = talk.politics.mideast
0       0       0       0       0       0       0       0       1       1       
0       0       1       0       0       0       0       360     0       8       
 |  371         r     = talk.politics.guns
26      1       0       1       0       0       1       1       1       1       
0       0       1       0       2       18      1       4       197     7       
 |  262         s     = talk.religion.misc
0       0       0       0       1       0       0       1       0       2       
0       2       0       0       3       0       3       10      3       284     
 |  309         t     = talk.politics.misc

=======================================================
Statistics
-------------------------------------------------------
Kappa                                       0.8759
Accuracy                                   90.4102%
Reliability                                85.8359%
Reliability (standard deviation)            0.2183
{quote}

                
> Improve ConfusionMatrix statistics
> ----------------------------------
>
>                 Key: MAHOUT-941
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-941
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>            Reporter: Lance Norskog
>            Assignee: Robin Anil
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: Bayes.zip, MAHOUT-941.patch, MAHOUT-941.patch, 
> MAHOUT-941.patch, SGD.zip
>
>
> This patch adds more statistics to the ConfusionMatrix and RequestAnalyzer.
> # Add Kappa measure - a standard measure comparing a sample v.s. random 
> assignment.
> # Add mean & standard deviation of "Reliability" (User Accuracy) - assist in 
> identifying consistent mal-assignment against "good" and "bad" labels.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to