The Massachusetts Dept. of Education committed what appears to be a
howling statistical blunder yesterday.  It would be funny if not for the
millions of dollars, thousands of hours of work, and thousands of
students' lives that could be affected.

Massachusetts has implemented a state-wide mandatory student testing
program, called the MCAS.  Students in the 4th, 8th and 10th grades are
being tested and next year 12th grade students must pass the MCAS to

The effectiveness of school districts is being assessed using average
student MCAS scores.  Based on the 1998 MCAS scores, districts were
placed in one of 6 categories: very high, high, moderate, low, very low,
or critically low.  Schools were given improvement targets based on the
1998 scores, with schools in the highest two categories were expected to
increase their average MCAS scores by 1 to 2 points, while schools in
the lowest two categories were expected to improve their scores by 4-7
points (

Based on the average of 1999 and 2000 scores, each district was
evaluated yesterday on whether they had met their goals.  The report was
posted on the MA Dept. of education web site:

Those familiar with "regression to the mean" know what's coming next.
The poor schools, many in urban centers like Boston, met their
improvement "targets," while most of the state's top school districts
failed to meet their improvement targets.

The Boston Globe carried the report card and the response as a
front-page story today:

The Globe article describes how superintendents of high performing
school districts were outraged with their failing grades, while the
superintendent of the Boston school district was all too pleased with
the evaluation that many of his low-performing schools had improved:

[Brookline High School, for example, with 18 National Merit Scholarship
finalists and the highest SAT scores in  years, missed its test-score
target - a characterization  blasted by Brookline Schools Superintendent
James F. Walsh, who dismissed the report.

"This is not only not helpful, it's bizarre," Walsh said.  ''To call
Brookline, Newton, Medfield, Weston, Wayland, Wellesley as failing to
improve means so little, it's not helpful. It becomes absurd when you're
using this formula the way they're using it.''

Boston School Superintendent Thomas W. Payzant, whose district had 52 of
113 schools meet or exceed expectations, was more blunt: "For the
high-flying schools, I say they have a responsibility to not be smug
about the level they have reached and continue to aspire to do better."]

Freedman, Pisani & Purvis (1998, Statistics 3rd edition) describe the
fallacy involved:
"In virtually all test-retest situations, the bottom group on the first
test will on average show some improvement on the second test and the
top group will on average fall back.  This is the regression effect.
Thinking that the regression effect must be due to something important,
..., is the regression fallacy."

I find this really disturbing.  I am not a big fan of standardized
testing, but if the state is going to spend millions of dollars
implementing a state-wide testing program, then the evaluation process
must be statistically valid.  This evaluation plan, falling prey to the
regression fallacy, could not have been reviewed by a competent

I hate to be completely negative about this.  I'm assuming that
psychologists and others involved in repeated testing must have
solutions to this test-retest problem.

If I'm missing the boat on the this test-retest error, I'd also
appreciate others pointing it out.

Dr. Eugene D. Gallagher

