Have you considered the R interface to a public domain version of Random
Forests?
http://cran.r-project.org/src/contrib/Descriptions/randomForest.html
You do not need to reduce the number of covariates.
It thrives on correlated covariates.
Regards,
- Kari Torkkola
William Shannon wrote:
Hi Peter,
I am unaware of SPINA and am downloading party now to look into that
software. I generally have used rpart (because Salford is so expensive)
but have never dealt with this many variables with rpart.
Do you have anyway to reduce the number of covariates before
partitioning? I would be concerned about the curse of dimensionality
with 900 variables and 1,000 data points. It would be very easy to find
excellent classifiers based on noise. Some suggest that a split data
set (train on one subset randomly selected from the 1,000 data points
and test on the remaining) overcomes this. However, if X by chance due
to the curse of dimensionality discriminates well than it will
discriminate well in both the training and test data sets.
Can you reduce the 900 covariates by PCA or perhaps use an upfront
stepwise linear discriminant analysis with a high P value threshold to
retain the covariate (say p = .2). We have a paper where we proposed
and tested a genetic algorithm to reduce the number of variables in
microarray data that I can send you in a couple of weeks when I get back
to St. Louis. It is being published in Sept. in the Interface Proceedings.
Good luck.
Bill Shannon
Washington Univ. School of Medicine, St. Louis
314-704-8725
*/Peter Flom <[EMAIL PROTECTED]>/* wrote:
I have been getting involved with classification trees, and have
some questions regarding software. My data consist of the following:
about 1,000 subjects - likely to increase but not dramatically
about 900 independent or predictor variables - all continuous, some
highly correlated, all standardized and approximately normally
distributed
outcome which can be dichotomous or categorical, with up to 10 or so
categories.
I have been using software from R - both Torsten Hothorn's party
package and Therneau and Atkinson's rpart - but these bog down when
the tree is not dichotomous
I have investigated Salford System's software, which is very
impressive, but expensive, and may be beyond our budget.
I've looked briefly at SPINA
I'd appreciate any advice or references to recent reviews.
Thanks
Peter L. Flom, PhD
Brainscope, Inc.
212 263 7863 (MTW)
212 845 4485 (Th)
917 488 7176 (F)
---------------------------------------------- CLASS-L list.
Instructions:
http://www.classification-society.org/csna/lists.html#class-l
---------------------------------------------- CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l
----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l