On 15 Oct 2003 10:36:01 -0700, [EMAIL PROTECTED] (Richard Hake) wrote: > In a recent post Dennis Roberts (2003a) wrote (slightly edited): [ snip, citations] > > I note that many education papers even those in physics-education > research (PER)!] continue to employ null-hypothesis testing with its > "p" values, while eschewing the more widely accepted "effect size" > (d),
"... more widely accepted" -- You wish. When the CI for the effect size *almost* includes zero, you don't have much to say about the effect size. (1) It is mostly determined by the adequacy of the design. (2) It is confused, at times, by the distinction between effect sizes that are 'within' and sizes that are 'between'. (3) The users are too shy about saying that, in fact the upper limit (or sometimes, the point estimate) is not a reasonable estimate. For instance, I have seen the CI for an Odds ratio that ran from 1.04 to 25; The strongest partisans of the Effect Size are willing to rely on it entirely, even when the CI *does* include zero. That position, I think, does not have much support. > and (would you believe?) even ignoring the half-century-old > "average normalized gain" <g> [Hovland et al. (1949), Gery (1972), > Hake (1998a,b; 2002a,b)]. > [ snip, various] > Regarding the half-century-old average normalized gain <g>, in Hake > (2003b) I wrote [see that article for the references, bracketed by > lines "HHHHHHH. . . ."): > > HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH > The normalized gain "g" for a treatment is defined [Hovland et al. > (1949), Gery (1972), Hake (1998a)] as g = Gain/[Gain (maximum > possible). Thus, e.g., if a class averaged 40% on the pretest, and > 60% on the posttest then the class-average normalized gain <g> = (60% > - 40%)/(100% - 40%) = 20%/60% = 0.33. Ever since the work of Hovland > et al. (1949) it's been know by pre/post cognoscente (up until about > 1998 probably less than 100 people worldwide) > that <g> IS A MUCH BETTER INDICATOR OF THE EXTENT TO WHICH A > TREATMENT IS EFFECTIVE THAN IS EITHER GAIN OR POSTTEST, for example, > if the treatment yields <g> > 0.3 for a mechanics course, then the > course could be considered as in the "interactive-engagement zone" > (Hake 1998a, Meltzer 2002b). > > Regrettably, the psychology/education/psychometric PEP community [see > e.g., Pelligrino et al. (2001); Shavelson & Towne (2001); Fox & > Hackermann (2002); Feuer et al. (2002)] remains largely oblivious of > PER and the normalized gain. Paraphrasing Lee Schulman, as quoted by [ snip, rest] Some of us consider that there are serious questions of scaling, and that the challenges are not met by our data. I tried your sort of scaling on my own, 20-odd years ago, on symptom data (Hamilton Rating Scale of depression, especially). I sort of figured, going into my exploration, that the data would be too noisy; so I was impressed by the fact that I could judge, eventually, that *points* were a more consistent criterion than "percent improvement." One patient might have twice the symptoms of another; but the rates of improvement were comparable in *points*. Are not areas where that will work better? - probably. Are there areas where it will not work as well? - I am sure. There's a huge number of problems where there is not an absolute maximum or minimum, or where the arbitrary rating scale that is being used does not extend to that limit. Another set of data that impressed me *negatively* about fractional scoring involved symptoms collected on the IMPS (inpatient, serious symptoms). In the midst of big variances which I had not yet explained, I could see some big 'fractional' differences, one group twice the other. That sort of difference does concern me. Then I checked the standardization of the test and discovered that the rating scale was 'bottoming out' -- there was almost no patient in the sample who scored in the range of pathology. *I* don't want the groups to test 'different' when the comparison comes down to two-patients-here versus one-patient-there. For justification of p-values and testing, I will again recommend Robert P. Abelson, "Statistics as principled argument." -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html "Taxes are the price we pay for civilization." . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
