[
https://issues.apache.org/jira/browse/MAHOUT-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039569#comment-13039569
]
Sean Owen commented on MAHOUT-713:
----------------------------------
Right, of course, but, this is not stating a particular change to Mahout. I
understand it to be "implement this", but would be good to specify more
concretely how and where this idea might live.
> Random Forest Prototypes
> ------------------------
>
> Key: MAHOUT-713
> URL: https://issues.apache.org/jira/browse/MAHOUT-713
> Project: Mahout
> Issue Type: New Feature
> Components: Classification
> Reporter: Oleg Levchenko
> Priority: Minor
>
> Below is an explanation by Breinman
> (http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#prototype):
> Prototypes are a way of getting a picture of how the variables relate to the
> classification.
> For the jth class, we find the case that has the largest number of class j
> cases among its k nearest neighbors, determined using the proximities. Among
> these k cases we find the median, 25th percentile, and 75th percentile for
> each variable.
> The medians are the prototype for class j and the quartiles give an estimate
> of is stability.
> For the second prototype, we repeat the procedure but only consider cases
> that are not among the original k, and so on.
> Prototypes for continuous variables are standardized by subtractng the 5th
> percentile and dividing by the difference between the 95th and 5th
> percentiles.
> For categorical variables, the prototype is the most frequent value.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira