[
https://issues.apache.org/jira/browse/MAHOUT-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039573#comment-13039573
]
Oleg Levchenko edited comment on MAHOUT-713 at 5/26/11 7:54 AM:
----------------------------------------------------------------
ok, effectively I am suggesting to augment package of callbacks
(org.apache.mahout.df.callback) with a couple of additional callbacks - one for
collating inter case (aka "instances" in
org.apache.mahout.df.callback.ForestPredictions) proximities matrix, and the
second one for extraction of prototypes based on proximities matrix.
Should I amend Description of ticket or this comment is just fine?
was (Author: u35tpus):
ok, effectively I am suggesting augment package of callbacks
(org.apache.mahout.df.callback) with a couple of additional callbacks - one for
collating inter case (aka "instances" in
org.apache.mahout.df.callback.ForestPredictions) proximities matrix, and the
second one for extraction of prototypes nased on proximities matrix.
Should I amend Description of ticket or this comment is just fine?
> Random Forest Prototypes
> ------------------------
>
> Key: MAHOUT-713
> URL: https://issues.apache.org/jira/browse/MAHOUT-713
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Reporter: Oleg Levchenko
> Priority: Minor
>
> Below is an explanation by Breinman
> (http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#prototype):
> Prototypes are a way of getting a picture of how the variables relate to the
> classification.
> For the jth class, we find the case that has the largest number of class j
> cases among its k nearest neighbors, determined using the proximities. Among
> these k cases we find the median, 25th percentile, and 75th percentile for
> each variable.
> The medians are the prototype for class j and the quartiles give an estimate
> of is stability.
> For the second prototype, we repeat the procedure but only consider cases
> that are not among the original k, and so on.
> Prototypes for continuous variables are standardized by subtractng the 5th
> percentile and dividing by the difference between the 95th and 5th
> percentiles.
> For categorical variables, the prototype is the most frequent value.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira