Betty, Dumping _any_ data simply because it "doesn't fit" is bogus. Please don't.
when you examined those bothersome outliers, do you check that the data was entered correctly? is it possible that a student was temporarily blinded during the school year, and thus could not do so well on the test the second time? (I once burned my writing hand badly, so could not write legibly for a few weeks. Does that count as a reason to dismiss my grades from the analysis?) If you have a sound reason, which has nothing to do with the analysis, for dropping a set of data, then you may do so. If your data does not fit your notions of what it "should" do, you _must_ keep it in. Think about the logic of what you are trying to say with the data for a minute, and you will agree on why. Next, is it reasonable for some students to do less well on the second test than the first? Of course it is possible. maybe the kid had a bad day the second time, or was coming down with a bit of flu. These would be reasons why the "true" score was less the second time. Also, the score on the test only reflects the capabilities of the student (let's avoid the long tangent argument here, OK?); there is a "measurement error" involved. Who knows, maybe they cribbed answers from the proctor on the first test! The measurement error can be positive or negative. If it happens to be larger than the "true" amount of gain by a student, we could wind up with a negative measured gain. How big is the measurement error? Good question. One study estimated sigma about 4% on a standardized test, which means a difference between two tests of 10% is reasonable. In Maryland, they estimate that about 10% of the students who took the myriad standardized tests should rightly be on the other side of the critical line of one of the tests. These 10% were no doubt concentrated among those who barely passed or barely failed. Then there is the question of the tests. Are they truly "identical"? If they are, would not we expect students to learn the answers, at least a little? If they are not truly identical, then how do we know that they measure "learning" with equal accuracy? I.e., have the tests been validated (if I use this word correctly)? Equal precision issues would come under the previous paragraph. If one test 'hit' student minds differently than the other, but both hit different student minds equally, then we would see a change in the average 'gain' score. If one test 'hit' some student minds differently than the other test, we would see an increase in the variance of the gain. That would fall under the previous paragraph, also. My suspicion is that folks like Dennis could go on for a couple hours about this set of tests, and how they could come up with whatever results you find. Dennis no doubt has a lot more data and experience with the details than I. I ran across an aphorism (word?) yesterday that might apply here: Do not put your faith in what statistics say until you have carefully considered what they do not say. - William W. Watt Your tests could well measure what _average_ gain your students achieved. With enough data (numbers of kids) it could say what the average gain was with some accuracy (small confidence interval for the estimate of the mean gain). But I question the precision of your measurement of the _individual_ gain, inasmuch as the confidence interval for that value may be on the same order of magnitude as the individual gain. Cheers and best of luck on your analysis, Jay "Harris, Betty A" wrote: > Hi all, > > While we're talking about outliers I had my first dose of looking at > TerraNova data collected near the beginning of the school year and then > again from the same kids near the end of the school year. We received NCE > scores back on: > Reading Composite > Reading Subtest > Vocabulary Subtest > Word Analysis Subtest > > I calculated gain scores (Spring - Fall) scores for each student. > I was shocked to find that Overall, 23% of third graders (n=113) and 33% of > third graders (n=123) had at least one gain score that was below zero. > > Some lost ground on all three subscales and the composite, however most > students who lost ground between fall and spring testing--65% of second > graders and 62.8% of third graders--only did so on one of the three subtest > scores. Only 5% of second graders and 11.6% of third graders lost ground on > all three subtests. > > To deal with this issue, students with gain scores more than two standard > deviations below the mean gain score were considered outliers and were > removed from the analysis for the subtest with gain scores of more than 2 > standard deviations below the mean gain score for that subtest. > > Does that seem like a legitimate strategy to you? > > What about gain scores of more than 2 standard deviations about the mean? > > Out for now, > Betty > > . > . > ================================================================= > Instructions for joining and leaving this list, remarks about the > problem of INAPPROPRIATE MESSAGES, and archives are available at: > . http://jse.stat.ncsu.edu/ . > ================================================================= -- Jay Warner Principal Scientist Warner Consulting, Inc. 4444 North Green Bay Road Racine, WI 53404-1216 USA Ph: (262) 634-9100 FAX: (262) 681-1133 email: [EMAIL PROTECTED] web: http://www.a2q.com The A2Q Method (tm) -- What do you want to improve today? . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
