[ 
https://issues.apache.org/jira/browse/MAHOUT-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289928#comment-13289928
 ] 

Lance Norskog commented on MAHOUT-941:
--------------------------------------

1)  Grrrr.. correct is supposed to be a summer.
{code}
 correct = confusionMatrix[labelId][labelId];
{code}

2) This is printed out wrong. The "accuracy" up above is "producer's accuracy". 
This code calculates that and "user's accuracy", or "reliability". These are 
different. The printout should show both accuracies. Possibly also the mean of 
the two. 

Imagine classification as the code throwing balls of different sizes to robot 
arms each programmed to grab one size. If none grab the ball, that's 
'unclassified' Producer's accuracy is from the thrower's point of view, user's 
accuracy is from the robot arms' points of view. They are different counts 
because 'unclassified' is part of the producer's 'wrong' count, while it is 
ignored by the user's counts.

[http://spatial-analyst.net/ILWIS/htm/ilwismen/confusion_matrix.htm]


                
> Improve ConfusionMatrix statistics
> ----------------------------------
>
>                 Key: MAHOUT-941
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-941
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>            Reporter: Lance Norskog
>            Assignee: Robin Anil
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: Bayes.zip, MAHOUT-941.patch, MAHOUT-941.patch, SGD.zip
>
>
> This patch adds more statistics to the ConfusionMatrix and RequestAnalyzer.
> # Add Kappa measure - a standard measure comparing a sample v.s. random 
> assignment.
> # Add mean & standard deviation of individual labels - assist in identifying 
> consistent mal-assignment v.s. high and low quality labels.
> Also, the SGD solver saves its model periodically to /tmp/news-groups-number. 
> This patch moves those captures to the model/ output directory. (These 
> intermediate models are interesting for tracking SGD incremental development.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to