Re: Evaluation of skating

Rich Ulrich Fri, 22 Feb 2002 09:42:44 -0800

On 19 Feb 2002 15:14:01 -0800, [EMAIL PROTECTED] (Trevor Bond)
wrote:
[ snip, much ]
> affected who won the gold medal.  In fact, Looney (1994, p. 156) 
> concluded:
>   "all of the judges with an Eastern block or communistic background 
> not only ranked Baiul better than expected, but ranked Kerrigan 
> worse.  The same trend was seen for Western Block judges.  They 
> ranked Baiul worse and Kerrigan better than expected.  ... "


Finding a difference is one thing.  
Drawing invidious conclusions is a gratuitous step for
a statistician, isn't it?

Hypothesize.
Group A  holds a bunch of their own, inter-community skating
competitions.  So does group B.  This happens for many years.

I find it wholly reasonable -- if not expected -- that 
'community standards'  might exist with some 
divergence.  That's especially so when there were 
never any joint standards in the first place, and when
one country has the outstanding professional dance 
(ballet) of the world, which is accorded much local 
respect.  

>                                                       When the 
> median of the expected ranks is determined, Kerrigan would be 
> declared the winner.  Before the free skate began, all the judges 
> knew the rank order of the skaters from the technical program and the 
> importance of the free skate performance in determining the gold 
> medal winner.  This may be why some judging bias was more prevalent 
> in the free skate than in the technical program."

Or, it could be (as the name suggests) that  'free skate'  offers 
more individual choices, more choices that will please or offend
personal tastes.

>       Looney's investigation of the effect of judge's ratings on 
> the final placement of skaters objectively validates what a chorus of 
> disbelieving armchair judges had suspected.  The median rank system 

Hooey.  You can't 'objectively'  validate one set of value-judgments.
You can't show that one set of scores arises 'by merit' while another,
with exactly the same salient features, does not.

================
The NY Times published the rankings in the pairs free program,
the one that ended up with ratings of the French judge being dropped.
There were 9 judges, labeled by nationality, and 20 teams.  
I don't know how the teams were 'qualified'  to appear here:  
there were  3 each, from Canada, Russia, and China.  In some
sense, anyway, these are the best in the world.  I have 
reproduced the data, below.

What astounds me is the uniformity of the rankings.  The *worst*
Pearson correlation between two judges (also, Spearman, 
since the scores are ranks) is 0.973, between judges from 
Japan and Russia.  Correlations with the total were above 0.98.

The NY Times highlighted the 'discrepancies' between each
judge and the Final ranking.  Of those 180 rankings, there were
two that were off by 3 (Japan rating the U.S. #13  as 10, for 
instance), 5 that were off by 2, and only 58 others  off by 1.

The most consistent rankings were by the French judge
(the scores that were thrown out).

Anyway, one consequence of that 'reliability'  is that there is 
relatively great 'statistical power'  for looking at blocs of votes,
if such exist.  I know some other rankings have been less
consistent than this; I don't know how (a)typical this level
of agreement might be for this skating event, or others.

Personally, I now suspect that there is 'collusion'  to the 
extent that judges agree, before the skate-off, about who
will be competing for 1-3 (say), 4-7, ...,  16-20.  
That might be decided on gross technical competence
(again, not invidious).
Concerns of great or small errors, difficulty, originality:  
these play a role within these strata.  And, biases about
tastes in presentations.

*===== data: entered (for convenience) by judge.
*             set up for SPSS to read; transpose; list; correlate.

Title   Skating Pairs, rankings by judge.
data list list / rank1 to rank20 judge(20F3.0,1x,A8).
begin data
  1  2  3  4  6  5  7  9  8 10 12 11 13 14 15 16 18 17 19 20 Russia
  1  2  3  5  4  7  6  8  9 10 11 13 12 15 14 16 17 18 19 20 China
  2  1  3  5  4  7  6  8  9 12 10 13 11 14 15 16 17 18 19 20 U.S.
  1  2  3  4  5  6  7  9  8 10 11 12 13 14 15 16 17 18 19 20 France
  1  2  3  4  5  6  7  8 10  9 11 12 14 13 15 16 17 18 19 20 Poland
  2  1  3  7  4  5  6  8 10  9 11 12 13 14 15 16 18 17 19 20 Canada
  1  2  3  4  5  6  7  8  9 10 11 12 15 14 13 16 17 19 18 20 Ukraine
  2  1  3  5  4  6  7  8  9 11 10 12 13 14 15 16 18 17 19 20 Germany
  2  1  3  4  5  7  6  8  9 12 11 13 10 15 14 16 17 19 18 20 Japan
end data.
execute.
flip    newnames= judge.
formats russia to japan(F2.0).
list    all.
subtitle        'Spearman' is the Pearson corr.
compute ranked= $casenum.
nonpar corr     vars= russia to japan ranked /print=both.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Re: Evaluation of skating

Reply via email to