Random Forest Prototypes
------------------------
Key: MAHOUT-713
URL: https://issues.apache.org/jira/browse/MAHOUT-713
Project: Mahout
Issue Type: New Feature
Components: Classification
Reporter: Oleg Levchenko
Priority: Minor
Below is an explanation by Breinman
(http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#prototype):
Prototypes are a way of getting a picture of how the variables relate to the
classification.
For the jth class, we find the case that has the largest number of class j
cases among its k nearest neighbors, determined using the proximities. Among
these k cases we find the median, 25th percentile, and 75th percentile for each
variable.
The medians are the prototype for class j and the quartiles give an estimate of
is stability.
For the second prototype, we repeat the procedure but only consider cases that
are not among the original k, and so on.
Prototypes for continuous variables are standardized by subtractng the 5th
percentile and dividing by the difference between the 95th and 5th percentiles.
For categorical variables, the prototype is the most frequent value.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira