Random Forest Prototypes
------------------------

                 Key: MAHOUT-713
                 URL: https://issues.apache.org/jira/browse/MAHOUT-713
             Project: Mahout
          Issue Type: New Feature
          Components: Classification
            Reporter: Oleg Levchenko
            Priority: Minor


Below is an explanation by Breinman 
(http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#prototype):

Prototypes are a way of getting a picture of how the variables relate to the 
classification. 

For the jth class, we find the case that has the largest number of class j 
cases among its k nearest neighbors, determined using the proximities. Among 
these k cases we find the median, 25th percentile, and 75th percentile for each 
variable. 

The medians are the prototype for class j and the quartiles give an estimate of 
is stability. 

For the second prototype, we repeat the procedure but only consider cases that 
are not among the original k, and so on. 

Prototypes for continuous variables are standardized by subtractng the 5th 
percentile and dividing by the difference between the 95th and 5th percentiles. 

For categorical variables, the prototype is the most frequent value.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to