Have you considered the R interface to a public domain version of Random 
Forests?
http://cran.r-project.org/src/contrib/Descriptions/randomForest.html
You do not need to reduce the number of covariates.
It thrives on correlated covariates.

Regards,

- Kari Torkkola


William Shannon wrote:
Hi Peter,

I am unaware of SPINA and am downloading party now to look into that software. I generally have used rpart (because Salford is so expensive) but have never dealt with this many variables with rpart.

Do you have anyway to reduce the number of covariates before partitioning? I would be concerned about the curse of dimensionality with 900 variables and 1,000 data points. It would be very easy to find excellent classifiers based on noise. Some suggest that a split data set (train on one subset randomly selected from the 1,000 data points and test on the remaining) overcomes this. However, if X by chance due to the curse of dimensionality discriminates well than it will discriminate well in both the training and test data sets.

Can you reduce the 900 covariates by PCA or perhaps use an upfront stepwise linear discriminant analysis with a high P value threshold to retain the covariate (say p = .2). We have a paper where we proposed and tested a genetic algorithm to reduce the number of variables in microarray data that I can send you in a couple of weeks when I get back to St. Louis. It is being published in Sept. in the Interface Proceedings.

Good luck.
Bill Shannon
Washington Univ. School of Medicine, St. Louis
314-704-8725

*/Peter Flom <[EMAIL PROTECTED]>/* wrote:

    I have been getting involved with classification trees, and have
    some questions regarding software.  My data consist of the following:

    about 1,000 subjects - likely to increase but not dramatically

    about 900 independent or predictor variables - all continuous, some
    highly correlated, all standardized and approximately normally
    distributed

    outcome which can be dichotomous or categorical, with up to 10 or so
    categories.

    I have been using software from R - both Torsten Hothorn's party
    package and Therneau and Atkinson's rpart - but these bog down when
    the tree is not dichotomous

    I have investigated Salford System's software, which is very
    impressive, but expensive, and may be beyond our budget.

    I've looked briefly at SPINA


    I'd appreciate any advice or references to recent reviews.

    Thanks

    Peter L. Flom, PhD
    Brainscope, Inc.
    212 263 7863 (MTW)
    212 845 4485 (Th)
    917 488 7176 (F)


    ---------------------------------------------- CLASS-L list.
    Instructions:
    http://www.classification-society.org/csna/lists.html#class-l


---------------------------------------------- CLASS-L list. Instructions: http://www.classification-society.org/csna/lists.html#class-l

----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l

Reply via email to