You might look at the 'bag' function in the caret package. It will not do the subsampling of variables at each split but you can bag a tree and down-sample the data at each iteration. The help page has an examples bagging ctree (although you might want to play with the tree depth a little).
Max On Wed, Mar 19, 2014 at 3:32 PM, Maggie Makar <maggieyma...@gmail.com> wrote: > Hi all, > > I've been using the randomForest package and I'm trying to make the switch > over to party. My problem is that I have an extremely unbalanced outcome > (only 1% of the data has a positive outcome) which makes resampling methods > necessary. > > randomForest has a very useful argument that is sampsize which allows me to > use a balanced subsample to build each tree in my forest. lets say the > number of positive cases is 100, my forest would look something like this: > > rf<-randomForest(y~. ,data=train, ntree=800,replace=TRUE,sampsize = c(100, > 100)) > > so I use 100 cases and 100 controls to build each individual tree. Can I do > the same for cforests? I know I can always upsample but I'd rather not. > > I've tried playing around with the weights argument but I'm either not > getting it right or it's just the wrong thing to use. > > Any advice on how to adapt cforests to datasets with imbalanced outcomes is > greatly appreciated... > > > > Thanks! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.