Betty,

Dumping _any_ data simply because it "doesn't fit" is bogus.  Please don't.

when you examined those bothersome outliers, do you check that the data was
entered correctly?  is it possible that a student was temporarily blinded
during the school year, and thus could not do so well on the test the second
time?  (I once burned my writing hand badly, so could not write legibly for a
few weeks.  Does that count as a reason to dismiss my grades from the
analysis?)

If you have a sound reason, which has nothing to do with the analysis, for
dropping a set of data, then you may do so.  If your data does not fit your
notions of what it "should" do, you _must_ keep it in.  Think about the logic
of what you are trying to say with the data for a minute, and you will agree on
why.

Next, is it reasonable for some students to do less well on the second test
than the first?  Of course it is possible.  maybe the kid had a bad day the
second time, or was coming down with a bit of flu.  These would be reasons why
the "true" score was less the second time.

Also, the score on the test only reflects the capabilities of the student
(let's avoid the long tangent argument here, OK?); there is a "measurement
error" involved.  Who knows, maybe they cribbed answers from the proctor on the
first test!  The measurement error can be positive or negative.  If it happens
to be larger than the "true" amount of gain by a student, we could wind up with
a negative measured gain.  How big is the measurement error?  Good question.
One study estimated sigma about 4% on a standardized test, which means a
difference between two tests of 10% is reasonable.  In Maryland, they estimate
that about 10% of the students who took the myriad standardized tests should
rightly be on the other side of the critical line of one of the tests.  These
10% were no doubt concentrated among those who barely passed or barely failed.

Then there is the question of the tests.  Are they truly "identical"?  If they
are, would not we expect students to learn the answers, at least a little?  If
they are not truly identical, then how do we know that they measure "learning"
with equal accuracy?  I.e., have the tests been validated (if I use this word
correctly)?  Equal precision issues would come under the previous paragraph.
If one test 'hit' student minds differently than the other, but both hit
different student minds equally, then we would see a change in the average
'gain' score.  If one test 'hit' some student minds differently than the other
test, we would see an increase in the variance of the gain.  That would fall
under the previous paragraph, also.

My suspicion is that folks like Dennis could go on for a couple hours about
this set of tests, and how they could come up with whatever results you find.
Dennis no doubt has a lot more data and experience with the details than I.  I
ran across an aphorism (word?) yesterday that might apply here:

Do not put your faith in what statistics say until you have carefully
considered what they do not say.  - William W. Watt

Your tests could well measure what _average_ gain your students achieved.  With
enough data (numbers of kids) it could say what the average gain was with some
accuracy (small confidence interval for the estimate of the mean gain).  But I
question the precision of your measurement of the _individual_ gain, inasmuch
as the confidence interval for that value may be on the same order of magnitude
as the individual gain.

Cheers and best of luck on your analysis,

Jay

"Harris, Betty A" wrote:

> Hi all,
>
> While we're talking about outliers I had my first dose of looking at
> TerraNova data collected near the beginning of the school year and then
> again from the same kids near the end of the school year.   We received NCE
> scores back on:
> Reading Composite
> Reading Subtest
> Vocabulary Subtest
> Word Analysis Subtest
>
> I calculated gain scores (Spring - Fall) scores for each student.
> I was shocked to find that Overall, 23% of third graders (n=113) and 33% of
> third graders (n=123) had at least one gain score that was below zero.
>
> Some lost ground on all three subscales and the composite, however most
> students who lost ground between fall and spring testing--65% of second
> graders and 62.8% of third graders--only did so on one of the three subtest
> scores.  Only 5% of second graders and 11.6% of third graders lost ground on
> all three subtests.
>
> To deal with this issue, students with gain scores more than two standard
> deviations below the mean gain score were considered outliers and were
> removed from the analysis for the subtest with gain scores of more than 2
> standard deviations below the mean gain score for that subtest.
>
> Does that seem like a legitimate strategy to you?
>
> What about gain scores of more than 2 standard deviations about the mean?
>
> Out for now,
> Betty
>
> .
> .
> =================================================================
> Instructions for joining and leaving this list, remarks about the
> problem of INAPPROPRIATE MESSAGES, and archives are available at:
> .                 http://jse.stat.ncsu.edu/                    .
> =================================================================

--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
4444 North Green Bay Road
Racine, WI 53404-1216
USA

Ph: (262) 634-9100
FAX: (262) 681-1133
email: [EMAIL PROTECTED]
web: http://www.a2q.com

The A2Q Method (tm) -- What do you want to improve today?




.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to