Re: Evaluation of skating

Trevor Bond Tue, 19 Feb 2002 14:29:45 -0800

Title: Re: Evaluation of skating

At 3:49 PM -0500 19/2/02, Dennis Roberts wrote:

One list I am on, we were having a discussion about how it would be
possible to make changes to the methods used in the judging of Olympic
Figure Skating, so as to make it less possible for collusion in the judging

to occur.

You might want to consider this from Chapter 10 (pp. 150-152) of Bond, T. G. & Fox, C.M. (2001) Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, N.J.: Erlbaum.

JUDGED SPORTING PERFORMANCES
Some Olympic games events provide the quintessential example of how we have come to routinely, and even passively, accept subjectivity in judgments of human performance. While, performance enhancing drugs aside, there is rarely any dispute about who wins gold in say the 1500m freestyle, the 100m track or the team bob-sled, a few of us can be drawn into the occasional argument about the medal winners in the platform diving, or on beam. Better still, let's take the winter Olympics women's figure skating as an example of a judged event. For the first skater about to go out on the ice, the announcer dramatically whispers something to the effect, "She must be upset because being the first skater on this program puts her at a disadvantage." Surely, we have all wondered why, in the attempts to at least create the appearance of objectivity in judging, we could openly admit and accept that the order in which one skates actually influences the judges' ratings! Even if you haven't noticed that particular example of lack of objectivity in the rating of performers, you would really be hard pressed not to admit what appears to be nationalistic or political alliance biases among the judges, where judges tend to favor skaters from their own countries (e.g. Eastern Block judges rate Western Block skaters less favorably and vice versa). In spite of these phenomena having been well documented in the literature (Bring & Carling, 1994; Campbell & Galbraith, 1996; Guttery & Sfridis, 1996; Seltzer & Glass, 1991; Whissel, Lyons, Wilkinson, & Whissell, 1993), the judgement by median rank approach has been maintained as the best method for minimizing this bias (Looney, 1997) because it is held to minimize the effect of extreme rankings from any one judge in determining any skater's final score.

The median rank approach has two problems, however (Looney, 1997). First, the judges are required to give different ratings to each skater, i.e., no two skaters may receive the same score from the same judge. This violates the principle of independence of irrelevant alternatives (Bassett & Persky, 1994; Bring & Carling, 1994), meaning that each skater, rather than being rated independently, is directly compared with others who skated before her. This can result in a situation where Skater A is placed in front of Skater B, but can then be placed behind Skater B once Skater C has performed (see Bring & Carling, 1994, for an example) (Looney, 1997). It is then clear why it is unfortunate to be the first skater - the judges tend to "reserve" their "better" scores in case they need them for a later performer! Secondly, the subjective meanings of the scores may differ from judge to judge, that is, "a 5.8 may represent the best skater for Judge A, but the third best skater for Judge B" (Looney, 1997, p. 145). This variation in meaning is what we refer to in Chapter 8 when discussing how some judges are routinely more severe or lenient than others - a judge effect that certainly cannot be corrected simply by calculating the median score.

In attempt to illustrate how one could create a set of objective, interval-level measures from such ordinal-level rankings, Looney (1997) ran a many-facets Rasch analysis for the scores from the figure skating event from the 1994 winter Olympics. Many will recall this controversial event in which Oksana Baiul won the gold medal over Nancy Kerrigan who won silver.

Looney obtained scores from the nine judges' ratings of 27 skaters on both components: 1) Technical Program (composed of required elements and presentation); and 2) Free Skate (composed of technical merit and artistic impression). Rasch analysis allowed her to calibrate these scores on an interval scale, showing not only the ability ordering of the skaters, but also the distance between each skater ability estimate. With many-facets Rasch analysis Looney was also able to estimate judge severity and component difficulty (the component elements nested within each of the two items) in the same measurement frame of reference.

Although in most of the examples throughout this book we placed more interest in the ordering and estimation of items, i.e., to examine how well our survey/examination was working, here the researcher was far more interested in estimations based on the ordering of the skaters and the severity of the judges. Of course, valid component ordering is a prerequisite to the interpretation the other facets, but the emphasis here is more on the placement of persons (given the pre-set required components and their rating scales) and the impact of the judges on those placements.

The Rasch estimates showed remarkably good fit to the model for all facets of the measurement problem: the four skating components, the judge ratings (with the exception of the judge from Great Britain), and skater ability (with the exception of Zemanova, the lowest ranked skater). Consequently, Looney would have been justified in feeling confident of her interpretation of the Rasch based placements. By estimating all of these facets in an objective frame of measurement, summing these judge ratings, and weighting each component its appropriate item weight, Looney found the top four skaters in order to be Kerrigan, Baiul, Bonaly, and Chen (Looney, 1994, p. 154) (The Olympic medals went to was Baiul (Ukraine), Kerrigan (USA), Chen (China), with Bonaly fourth).

Upon closer examination of the fit statistics for the judges, Looney discovered that judge idiosyncrasies did not affect the results of the Technical Program, but they did affect the results of the Free Skate. Since the Free Skate holds more weight in determining the final placement of skaters, these judge idiosyncrasies subsequently affected who won the gold medal. In fact, Looney (1994, p. 156) concluded:
"all of the judges with an Eastern block or communistic background not only ranked Baiul better than expected, but ranked Kerrigan worse. The same trend was seen for Western Block judges. They ranked Baiul worse and Kerrigan better than expected. When the median of the expected ranks is determined, Kerrigan would be declared the winner. Before the free skate began, all the judges knew the rank order of the skaters from the technical program and the importance of the free skate performance in determining the gold medal winner. This may be why some judging bias was more prevalent in the free skate than in the technical program."

Looney's investigation of the effect of judge's ratings on the final placement of skaters objectively validates what a chorus of disbelieving armchair judges had suspected. The median rank system cannot remove the effect of judge bias in close competitions because it focuses on between-judge agreement. The many-facets Rasch model, however, shifts that focus to within-judge consistency (Linacre, 1994, p. 142) so that individual judge effects, including bias can be detected and subsequently accounted for in the final placement decisions.

--

Assoc. Prof. Trevor G Bond

School of Education

James Cook University Q 4811

AUSTRALIA

http://www.soe.jcu.edu.au/staff/bond/

The Book: http://www.jcu.edu.au/~edtgb

IOMW: http://www.soe.jcu.edu.au/iomw/

Voice: (07) 47 814637

Fax: (07) 47 251690

Int'l: use (61 7)

Bomblets from NATO cluster bombs are

still killing people in Kosovo.

Re: Evaluation of skating

Reply via email to