On 21 Mar 2003 16:51:15 -0800, [EMAIL PROTECTED] (Dennis Roberts) wrote: > At 09:55 PM 3/21/03 +0000, Jerry Dallal wrote: > >dennis roberts wrote: > > > > > could someone give an example or two ... of how p values have really > > > advanced our knowledge and understanding of some particular phenomenon? > > > >Pick up any issue of JAMA or NEJM. > > no ... that is not sufficient ... just to look at a journal ... where p > values are used ... does not answer the question above ... that's circular > and just shows HOW they are used ... not what benefit is derived FROM there use > > i would like an example or two where ... one can make a cogent argument > that p ... in it's own right ... helps us understand the SIZE of an effect > ... the IMPORTANCE of an effect ... the PRACTICAL benefit of an effect > > maybe you could select one or two instances from an issue of the journal > ... and lay them out in a post?
Here's my selection of instances. Whenever I read of striking epidemiology results (usually, first in the newspaper), I *always* calculate the p-level for whatever the gain was. It *very* often is barely beyond 5%. Also, it is *very* often based on testing, say, 20 or 50 dietary items; or on some intervention with little experimental history. So I say to myself, "As far as being a notable 'finding' goes, this does not survive adjustment for multiple-testing. I will doubt it." Then, I am not at all surprised with the followup studies in five or ten years show that eggs are not so bad after all, real butter is not so bad, or Hormone Replacement Therapy is not necessarily so good. Serious students of the fields were not at all surprised, basically, for the same reasons that I'm not surprised. Quite often, as with the HRT, there is the additional problem of non-randomization. For that, I also consider the effect size, too, and say, "Can I imagine particular biases that could cause this result?" My baseline here is sometimes the 'healthy-worker effect' -- Compared with the non-working population, workers have age-adjusted SMR (Standardized Mortality Ratio) for heart diseases of about 0.60. That's mostly blamed on selection, since sick people quit work. Something that is highly "significant" can still be unbelievable, but I hardly waste any time before I disbelieve something that is 5% "significant" and probably biased. If you pay no attention to the p-value, you toss out what is arguably the single most useful screen we have. Of course, Herman gives the example where the purpose is different. We may want to explore potential new leads, instead of setting our standards to screen out trivial results. It seems to me that the "decision theorist" is doing a poor job of discriminating tasks here. I certainly never suggested that there is only one way to regard results -- I have periodically told posters that their new studies with small N and many variables seemed to set them up with "exploratory" paradigms, rather than "testable" hypotheses. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
