Larson addresses the idea of career or season-long datasets to compute z-scores, proving that a Wizard of Oz solution with reduced, not improved, validity: "Additionally, the whole premise of variance is that the observed is held constant and we are measuring and correcting variability among the observers. Rather than improving reliability by increasing the sample size, we decrease reliability by radically decreasing the comparability of the samples of debates seen by each of the observers."
Furthermore, it's not as if this idea was really publicly vetted, or perceived as imminent. I am as "in the loop" as just about anyone on tab procedures and did not know GSU and Southern Cal were really about to embark on a variance experiment and that the Wake 50 point scale would be getting in the way. There is no such thing as alternate actor fiat. Wake should act on the basis of what we perceive others are likely to do, not on the basis of what we wish they would do. Otherwise, I could just wave my magic Hoe fiat wand and argue the 50 point scale should be rejected because judges "should" just use the 20 point scale better. I can and did consider persuading them to do so, but have witnessed the inefficacy of that approach over the years. Adoption of a different scale is a persuasive technique in and of itself, which brings me to the final little point of this particular post . . . My description of how one might conceive the 50 point scale was meant to be the start of a discussion, not the end of it. For instance, Will said he would likely give "zero or next-to-zero 49's or 50's. More likely the former. It's fairly easy to 'imagine a better performance'." My word choice was, perhaps, unfortunate. To me, there are a decent, if small, number of "hard to imagine better" performances. But that's because I mean something more like "hard to imagine a real live mortal college student doing *better*" or "*unlikely* to be more than a handful of performances that good." But that's just my personal view that has a lot to do with my own opinion that "perfect" is a stupid category. For instance, I have seen some speeches (all in elims) that I would assign a 31 or 32 to on a 30 point scale because they really were that much better than the 29 point speeches. No speech is "perfect" but some might be enough better than the speeches that are a half point below the top of the scale as to warrant giving maximum points. Again, that's just me. I kind of proferred my grades analogy way of describing my idea of how to use the 50 point scale to my subjective/impressionistic list that I plunked at the bottom of that original post. I hope people offer amendments and other ways of interpreting the scale that make sense. I will then try to reformulate the "suggested use" guidelines to reflect what people think makes sense. I especially appreciate Will's engagement and emphasis on the idea that it is worse than unhelpful to just reject the attempt at a shared standard. I also appreciate people's nervousness with change and the overwhelmingly conservative nature of our community (culture is conservative, by definition). I would not "shake things up" just for the hell of it. Were the SQ not so broken, we would not just "experiment" on you. -- Ross K. Smith Director of Debate Wake Forest University 336-251-2076 (c) 336-758-5268 (o) http://groups.wfu.edu/debate/ http://www.DebateScoop.org _______________________________________________ CEDA-L mailing list [email protected] http://www.ndtceda.com/mailman/listinfo/ceda-l
