On Wed, 10 Jan 2001 21:32:43 GMT, Gene Gallagher
<[EMAIL PROTECTED]> wrote:
> The Massachusetts Dept. of Education committed what appears to be a
> howling statistical blunder yesterday. It would be funny if not for the
> millions of dollars, thousands of hours of work, and thousands of
> students' lives that could be affected.
>
< snip, much detail >
> I find this really disturbing. I am not a big fan of standardized
> testing, but if the state is going to spend millions of dollars
> implementing a state-wide testing program, then the evaluation process
> must be statistically valid. This evaluation plan, falling prey to the
> regression fallacy, could not have been reviewed by a competent
> statistician.
>
> I hate to be completely negative about this. I'm assuming that
> psychologists and others involved in repeated testing must have
> solutions to this test-retest problem.
The proper starting point for a comparison for a school
should be the estimate of the "true score" for the school:
the regressed-predicted value under circumstances of
no-change. "No-change" at the bottom would be satisfied
by becoming only a little better; no-change at the top
would be met by becoming only a little worse. If you are
dumping money into the whole system, then you might
hope to (expect to?) bias the changes into a positive direction.
I thought it was curious that the
"schools in the highest two categories were expected to increase
their average MCAS scores by 1 to 2 points, while schools in the
lowest two categories were expected to improve their scores by 4-7
points."
That sounds rational in form. It appears to me that their model
might have the correct form, but the numbers surprised them.
That is: It looks as if someone was taking into account regression of
a couple of points, then hoping for a gain of 4 or 5 points. That
(probably) under-estimated the regression-to-the-mean, and
over-estimated how much a school could achieve by freshly
swearing to good intentions.
What is needed -- in addition to self-serving excuses -- is an
external source of validation. And it should validate in cases that
are not predicted by regression to the mean.
--
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================