Dear help list,

I think I found a bug a the R Random Forest. Hopefully, you are able to
reproduce it.
I use R version 2.7.2 and RF version 4.5-27.
This is a minimal code to describe the problem:

library(randomForest)
tries <- 20
dimension <- 20
n <- 200
outlyingness <- rep(NaN,tries)
for (o_number in 1:tries){
        features <- matrix(rnorm(n*dimension,0,1),n,dimension)
#Generate features, n uncorrelated normally distributed points
        outlier.rf <- randomForest(features, ntree=100, proximity=TRUE)
#Compute Random Forest including the proximity matrix
        outlyingness_all <- apply(outlier.rf$proximity,2,mean) #Compute
the mean proximity for each of the n points
            better <- sum(outlyingness_all[1]<outlyingness_all) #Compute
the rank of a certain point according to the outlyingness
            outlyingness[o_number] <- 1+better
}
outlyingness


Point number 1 plays a special role in this code fragment.
A typical value for "outlyingness" is 
200 200 200 200 196 200 200 200 200 200 200 200 200 200 200 200 199 200
200 200
whereas one obtains what one would expect for any other point. So, if 
better <- sum(outlyingness_all[1]<outlyingness_all) 
is for example replaced by
better <- sum(outlyingness_all[17]<outlyingness_all) 
one gets
194   7 184  76  25  40 175 174 137  75  49 146 175 150 148 118 100  88
121 14

Is this a bug or am I confused?
Can anybody help me? Does anybody know the problem? 

Best regards

Jens Roeder




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to