[
https://issues.apache.org/jira/browse/MAHOUT-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291658#comment-13291658
]
Lance Norskog commented on MAHOUT-941:
--------------------------------------
This is the output of classify-20newsgroups.sh. "Accuracy" is 90.4 percent.
"Reliability" is 85%. The standard deviation of "reliability" is .21. "Kappa"
is 0.87- it is the relationship between "accuracy" v.s. "random
classification". I do not know if Kappa includes "unclassified" in its formula,
or assumes all are classified to known labels. Or perhaps it should be
calculated both ways?
{quote}
Summary
-------------------------------------------------------
Correctly Classified Instances : 6788 90.4102%
Incorrectly Classified Instances : 720 9.5898%
Total Classified Instances : 7508
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j
k l m n o p q r s t
<--Classified as
296 0 0 0 0 0 0 0 0 0
0 0 0 0 1 8 0 2 7 3
| 317 a = alt.atheism
1 327 4 20 6 14 2 1 0 0
0 1 5 3 1 0 1 0 0 0
| 386 b = comp.graphics
0 27 217 76 21 17 5 0 0 0
0 4 8 1 1 0 0 0 1 3
| 381 c = comp.os.ms-windows.misc
0 10 1 315 23 3 9 2 0 0
0 0 8 0 0 0 0 0 0 0
| 371 d = comp.sys.ibm.pc.hardware
0 5 1 9 348 0 5 1 0 0
0 0 4 0 0 0 0 0 1 1
| 375 e = comp.sys.mac.hardware
0 23 2 7 1 328 1 0 0 0
0 1 0 1 1 0 0 0 0 0
| 365 f = comp.windows.x
0 5 0 19 11 0 337 8 2 1
4 4 5 0 3 0 0 0 0 1
| 400 g = misc.forsale
0 0 0 3 3 1 8 402 2 1
0 0 3 1 0 0 0 0 0 3
| 427 h = rec.autos
0 0 0 0 0 1 7 5 368 0
0 0 0 1 0 0 0 1 0 1
| 384 i = rec.motorcycles
1 0 0 0 0 0 1 1 0 379
7 0 0 1 0 0 0 0 0 0
| 390 j = rec.sport.baseball
0 0 0 1 2 0 0 1 0 4
387 0 0 0 0 1 0 0 0 2
| 398 k = rec.sport.hockey
0 3 0 1 3 2 0 0 0 0
0 393 2 0 0 0 1 3 1 2
| 411 l = sci.crypt
0 5 0 12 10 0 5 1 1 0
0 1 328 0 2 0 0 2 1 0
| 368 m = sci.electronics
1 5 1 3 1 1 1 0 0 0
0 0 2 377 4 0 0 0 1 4
| 401 n = sci.med
0 5 0 0 1 1 1 0 0 1
0 2 0 1 389 0 0 0 2 2
| 405 o = sci.space
4 2 0 1 2 0 0 1 0 1
1 0 0 1 0 397 2 2 5 1
| 420 p = soc.religion.christian
1 1 0 0 0 0 1 0 0 0
0 0 0 0 0 4 359 0 0 1
| 367 q = talk.politics.mideast
0 0 0 0 0 0 0 0 1 1
0 0 1 0 0 0 0 360 0 8
| 371 r = talk.politics.guns
26 1 0 1 0 0 1 1 1 1
0 0 1 0 2 18 1 4 197 7
| 262 s = talk.religion.misc
0 0 0 0 1 0 0 1 0 2
0 2 0 0 3 0 3 10 3 284
| 309 t = talk.politics.misc
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.8759
Accuracy 90.4102%
Reliability 85.8359%
Reliability (standard deviation) 0.2183
{quote}
> Improve ConfusionMatrix statistics
> ----------------------------------
>
> Key: MAHOUT-941
> URL: https://issues.apache.org/jira/browse/MAHOUT-941
> Project: Mahout
> Issue Type: Improvement
> Components: Classification
> Reporter: Lance Norskog
> Assignee: Robin Anil
> Priority: Minor
> Fix For: 0.8
>
> Attachments: Bayes.zip, MAHOUT-941.patch, MAHOUT-941.patch,
> MAHOUT-941.patch, SGD.zip
>
>
> This patch adds more statistics to the ConfusionMatrix and RequestAnalyzer.
> # Add Kappa measure - a standard measure comparing a sample v.s. random
> assignment.
> # Add mean & standard deviation of "Reliability" (User Accuracy) - assist in
> identifying consistent mal-assignment against "good" and "bad" labels.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira