Dear R Helpers, I need help with a slightly unusual situation in which I am trying to select some columns from a data frame. I know how to use the subset statement with column names as in:
x=as.data.frame(matrix(c(1,2,3, 1,2,3, 1,2,2, 1,2,2, 1,1,1),ncol=3,byrow=T)) all.cols<-colnames(x) to.keep<-all.cols[1:2] Kept<-subset(x,select=to.keep) Kept However, if I want to select some columns based on a selection of the most important variables from a random forest then I find myself stuck. The example below demonstrates the problem. library(randomForest) data(mtcars) mtcars.rf <- randomForest(mpg ~ ., data=mtcars,importance=TRUE) Importance<-data.frame(mtcars.rf$importance) Importance MSEImportance<-head(Importance[order(Importance$X.IncMSE, decreasing=TRUE),],3) MSEVars<-row.names(MSEImportance) MSEVars<-data.frame(MSEVars,stringsAsFactors = FALSE) colnames(MSEVars)<-"Vars" NodeImportance<-head(Importance[order(Importance$IncNodePurity,decreasing=TRUE),], 3) NodeVars<-row.names(NodeImportance) NodeVars<-data.frame(NodeVars,stringsAsFactors = FALSE) colnames(NodeVars)<-"Vars" ImportantVars<-rbind(MSEVars,NodeVars) ImportantVars<-unique(ImportantVars) nrow(ImportantVars) ImportantVars<-as.character(ImportantVars) ImportantVars CarsVarsKept<-subset(mtcars,select=ImportantVars) Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected Any help on how to select these columns from the data frame would be most appreciated. --John J. Sparks, Ph.D. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.