Dennis, Not sure how this is/could be done in education, but in production, I'd say that there are sources of variation available - (presumed) ability of the student, which results in different performance on the test, and variation due to grader, for two. One could make Box plots for each sub-group (defined by grader). Now, if any subgroup is well away from the others, we can conclude that either (a) the grader was an 'outlier,' or (b) the subgroup was not similar to the others.
How far is 'well away'? more than 2 pooled stdev's could be a good indicator. A one way AoV test might do the trick, too. The chances of getting a subgroup 2 stdev's away from the total mean, by shear luck of the draw, is pretty darn small - about 5%, 1 in 20. Not to argue the ability of students to inadvertently sort themselves into weirdly deviate groups, but if I saw one subgroup 2 stdevs away from the overall mean, I would go exploring how the grader did their thing. Jay Dennis Roberts wrote: > At 08:57 PM 4/7/02 +0000, Tristan Miller wrote: > >Greetings. > > > >On Sun, 7 Apr 2002, Glen Barnett wrote: > > > Assuming you *can* take average student abilities across classes as equal > > > >Who said that we are sampling across classes? I was thinking of the case > >where the assignments from a single large class are randomly divided among > >several graders for marking, and one of the graders is an outlier. > > say ... you have (just as an example) 50 examinees ... each turning in an > assignment ... and, randomly assigning them to 5 graders ... 10 assignments > each ... right? > > how will you know for sure IF a grader is aberrant? ... an outlier? ... > surely, across the graders, there will be mean differences in their > gradings ... so, how much is now "defined" as too much? > > if we make some assumption that IF this person is aberrant ...that is a > random aberration ... then some linear adjustment might be called for or > justified but, if that is not the case ... some peculiar way in which this > grader rates things ... either very high or low ... if he/she does it in > some strange way DEPENDING on the specific content said by the examinees > ... then i don't see that such an across the board adjustment can be justified > > > > there are a variety of ways you might match mean and s.d., > > matching by mean and sd ... does not solve the potential problem that the > ORDERings of the examinees may be different FOR that set of examinee papers > COMPARED to how other graders might have rated these assignments ... > > >================================================================= > >Instructions for joining and leaving this list, remarks about the > >problem of INAPPROPRIATE MESSAGES, and archives are available at: > >. http://jse.stat.ncsu.edu/ . > >================================================================= > > . > . > ================================================================= > Instructions for joining and leaving this list, remarks about the > problem of INAPPROPRIATE MESSAGES, and archives are available at: > . http://jse.stat.ncsu.edu/ . > ================================================================= -- Jay Warner Principal Scientist Warner Consulting, Inc. 4444 North Green Bay Road Racine, WI 53404-1216 USA Ph: (262) 634-9100 FAX: (262) 681-1133 email: [EMAIL PROTECTED] web: http://www.a2q.com The A2Q Method (tm) -- What do you want to improve today? . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
