If you wouldn't mind sparing a few minutes, I'd like to ask for help with a statistics problem I've been working on.
The problem is similar to the following: 100 X variables, covering the gamut of 'youth' characteristics, collected mostly in elementary schools. These values include scores on various standardized tests; extra-curricular activities involvement; number of books read per year; various physical characteristics; estimates of neighborhood income levels; distances home residence is from school, etc., etc. Outliers have been trimmed. Some of the X variables appear to follow a normal distribution, while others are clearly bimodal. 6 or so Y variables, assessing measures of 'success' later in life, such as annual income, net worth, size of residence home, subjective measures of happiness with status in life, etc. Also included are some 'failure' measures, such as number of days spent in jail; number of divorces; and, days spent in hospitals. Some Y variables follow a normal distribution, while others are bimodal or considerably skewed. Total N is approximately 30,000. I've compiled all the X's and Y's into a single large worksheet of columns and rows. Additionally, I've constructed a second worksheet by converting all the continuous variables to categorical variables, which should allow for the use of statistical approaches suitable with categorical variables. A number of cells throughout the worksheet are blank where data were not available. My objective is to attempt to ascertain which ***severable-variable constellation(s)*** of the 100 predictor variables are statistically most important (in this data set) for predicting 'success,' as well as to avoid 'failure.' Although I definitely hope various 'constellations' emerge, if in the end no models turns out to have much of any predictive value, that's okay. Perhaps attempting to make such predictions is far more difficult than first might appear. Minitab is the statistics package I'm most accustomed to using (from my old college days, and since). However, I also am familiar with SAS, and other statistics programs. Please help me by suggesting which statistical approach(s) **you** would most likely use to squeeze the relavent relationships from this data set. I very much appreciate any help you can provide, and look forward to hearing back from you. Sincerely, Nicholas Kormanik [EMAIL PROTECTED] Salt Lake City, Utah . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
