Re: Tree software

Kari Torkkola Mon, 02 Jul 2007 11:15:12 -0700

Have you considered the R interface to a public domain version of Random 
Forests?
http://cran.r-project.org/src/contrib/Descriptions/randomForest.html
You do not need to reduce the number of covariates.
It thrives on correlated covariates.


Regards,

- Kari Torkkola


William Shannon wrote:

Hi Peter,
I am unaware of SPINA and am downloading party now to look into thatsoftware. I generally have used rpart (because Salford is so expensive)but have never dealt with this many variables with rpart.
Do you have anyway to reduce the number of covariates beforepartitioning? I would be concerned about the curse of dimensionalitywith 900 variables and 1,000 data points. It would be very easy to findexcellent classifiers based on noise. Some suggest that a split dataset (train on one subset randomly selected from the 1,000 data pointsand test on the remaining) overcomes this. However, if X by chance dueto the curse of dimensionality discriminates well than it willdiscriminate well in both the training and test data sets.
Can you reduce the 900 covariates by PCA or perhaps use an upfrontstepwise linear discriminant analysis with a high P value threshold toretain the covariate (say p = .2). We have a paper where we proposedand tested a genetic algorithm to reduce the number of variables inmicroarray data that I can send you in a couple of weeks when I get backto St. Louis. It is being published in Sept. in the Interface Proceedings.
Good luck.
Bill Shannon
Washington Univ. School of Medicine, St. Louis
314-704-8725

*/Peter Flom <[EMAIL PROTECTED]>/* wrote:

    I have been getting involved with classification trees, and have
    some questions regarding software.  My data consist of the following:

    about 1,000 subjects - likely to increase but not dramatically

    about 900 independent or predictor variables - all continuous, some
    highly correlated, all standardized and approximately normally
    distributed

    outcome which can be dichotomous or categorical, with up to 10 or so
    categories.

    I have been using software from R - both Torsten Hothorn's party
    package and Therneau and Atkinson's rpart - but these bog down when
    the tree is not dichotomous

    I have investigated Salford System's software, which is very
    impressive, but expensive, and may be beyond our budget.

    I've looked briefly at SPINA


    I'd appreciate any advice or references to recent reviews.

    Thanks

    Peter L. Flom, PhD
    Brainscope, Inc.
    212 263 7863 (MTW)
    212 845 4485 (Th)
    917 488 7176 (F)


    ---------------------------------------------- CLASS-L list.
    Instructions:
    http://www.classification-society.org/csna/lists.html#class-l
---------------------------------------------- CLASS-L list.Instructions: http://www.classification-society.org/csna/lists.html#class-l


----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l

Re: Tree software

Reply via email to