I have some data to analyze for my dissertation, but I'm not sure what
method to use to answer the question I am investigating. This may not be the
correct newsgroup for it, but there are rumors that some statisticians read
it.
First, the data set is from the Third International Mathematics and Science
Study, and consists of (among other things) a student's imputed mathematics
ability score, the region of the US where this student lives, as well as the
response to each question on the test. There were about 15 forms of the test
given, so each student didn't answer each question.
In other words, I have something like this:
MathScore Region Question 1 Question 2 ... Question n
100 NorthEast correct incorrect ... correct
130 Central incorrect incorrect ... correct
115 NorthEast incorrect correct ... incorrect
. . . . .
. . . . .
. . . . .
112 South correct incorrect ... correct
The dataset consists of about 6000 students, but there is a fair amount of
missing data.
Using a simple ANOVA, I have found that there is significant difference of
scores between regions of the country. What I'd like to do now is
investigate WHICH questions differentiate among regions. For example,
suppose question 1 really differentiates between the Northeast and the rest
of the country, i.e. people in the northeast tended to get the question
correct while the rest of the country didn't. This would perhaps tell me
that something in the curricula in those states is teaching this particular
subject more effectively than the other parts of the country.
There are 350+ questions, and, like I say, there are lots of missing values.
There is no single question that each student took.
So my question is, how do I determine which questions are good at this
discrimination? I've been trying to cast it as a regression problem, but I
don't think that's it. Is this a case for something like Fisher's
Discrimination analysis? I've only got passing familiarity with the
technique.
--
` ___ '
- (O o) -
----------------------ooO--(_)--Ooo-----------------------
_ __ __ _____
| | \/ | __ \ � Lee Creighton
| | \ / | |__) | SAS Statistical Instruments
_ | | |\/| | ___/
| |__| | | | | | [EMAIL PROTECTED]
\____/|_| |_|_| 5275R SAS Campus Drive
(919) 531-3755
Statistical Discovery Software
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================