If you wouldn't mind sparing a few minutes, I'd like to ask for help with a
statistics problem I've been working on.

The problem is similar to the following:

100 X variables, covering the gamut of 'youth' characteristics, collected
mostly in elementary schools.  These values include scores on various
standardized tests; extra-curricular activities involvement; number of books
read per year; various physical characteristics; estimates of neighborhood
income levels; distances home residence is from school, etc., etc.  Outliers
have been trimmed.  Some of the X variables appear to follow a normal
distribution, while others are clearly bimodal.

6 or so Y variables, assessing measures of 'success' later in life, such as
annual income, net worth, size of residence home, subjective measures of
happiness with status in life, etc.  Also included are some 'failure'
measures, such as number of days spent in jail; number of divorces; and,
days spent in hospitals.  Some Y variables follow a normal distribution,
while others are bimodal or considerably skewed.

Total N is approximately 30,000.

I've compiled all the X's and Y's into a single large worksheet of columns
and rows.  Additionally, I've constructed a second worksheet by converting
all the continuous variables to categorical variables, which should allow
for the use of statistical approaches suitable with categorical variables.

A number of cells throughout the worksheet are blank where data were not
available.

My objective is to attempt to ascertain which ***severable-variable
constellation(s)*** of the 100 predictor variables are statistically most
important (in this data set) for predicting 'success,' as well as to avoid
'failure.'

Although I definitely hope various 'constellations' emerge, if in the end no
models turns out to have much of any predictive value, that's okay.  Perhaps
attempting to make such predictions is far more difficult than first might
appear.

Minitab is the statistics package I'm most accustomed to using (from my old
college days, and since).  However, I also am familiar with SAS, and other
statistics programs.

Please help me by suggesting which statistical approach(s) **you** would
most likely use to squeeze the relavent relationships from this data set.

I very much appreciate any help you can provide, and look forward to hearing
back from you.

Sincerely,
Nicholas Kormanik
[EMAIL PROTECTED]

Salt Lake City, Utah





.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to