Hello

(note that this is the same Peter Flom at a different address with a new e-mail 
and a new job)

I have a data set with about 800 people and about 1000 variables.  The 
variables are all 'features' of EEG data that have been extracted by subject 
matter experts in neurology as being potentially useful. All variables have 
been standardized to mean 0, sd 1. There are many high correlations among them.

We are interested in many aspects of this data - one primary aim is to use the 
EEG data to better classify people who have neurological problems.  Two methods 
that seem particularly relevant to this list are clustering and decision trees. 
 I've done a bit of both, but always on data sets with FAR fewer variables 
(e.g. about 10 variables).  Especially with regard to clustering, I was 
thinking of doing a principal components analysis prior to the cluster analysis 
(perhaps with SAS PRINCOMP, FACTOR, or VARCLUS).  

With regard to trees, I've done some 'basic' analysis of other data sets using 
R's 'party' and 'rpart' packages.  With those data sets, however, the main goal 
was explanation, and so, I did not explore bagging and boosting and such.  Any 
pointers or introductions to that literature would be most welcome (preferably 
at a not TOO high mathematical level - I had some calculus many years ago, but 
am much more interested in applications than in 'theorem-proof' material).

I will be exploring this data set for quite some time, so am willing to invest 
some effort to learn best practices, and am also willing to try a variety of 
methods.

Finally, as to why I am looking at both trees and clusters - partly, we know 
the diagnosis of the people (hence trees are useful) but we also know that 
there are difficulties with the diagnoses, and that these difficulties may be 
amenable to exploration with sophisticated methods


Thanks in advance

Peter Flom
Brainscope, Inc.


----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l

Reply via email to