On 29 Jan 2001 10:29:36 -0800, [EMAIL PROTECTED] (dennis roberts) wrote:

> in an article ... that some might be able to access ...
> 
> http://bmj.com/cgi/content/full/322/7280/226
> 
> by
> 
> Jonathan A C Sterne, senior lecturer in medical statistics, George Davey 
> Smith, professor of clinical epidemiology.
> Department of Social Medicine, University of Bristol, Bristol BS8 2PR
> 
> one of the summary points made is the following:
> 
> "P values, or significance levels, measure the strength of the evidence 
> against the null hypothesis; the smaller the P value, the stronger the 
> evidence against the null hypothesis"
> 
> my main questions of this are:
> 
> 1. does the general statistical community accept this as being correct?
> 
 
I think that the problem is, in short, "N" --

For a given design/ study/  etc./  , there seems to be one "N" 
in use, so the evidence against the null is indexed by p, if we want
to.  Or there is effect size, but Effect is hard to balance against p
when there are varying Ns.

Rubin consistently recommends, "use decision theory."  

I base my "decision," sometimes, on the notion of 
"How much noise is in the process?"  -  
Two variables that correlate r= 0.2 are barely beating 
"response bias", if there is that sort of  yes/no tendency 
affecting those items.  So I am not impressed with r=0.2, 
even though the N might make it p=.001.

For much clinical work,  d= 0.5 SD is a frequent standard, 
needing N=64  to meet the 5% test. When just-barely
meeting that standard,  there is a 95% chance that 
the underlying mean is above 0.0, right?  - that is the
meaning of the test.

However, there is also a 88% chance (94%, one-tailed) 
that the underlying mean is above  0.10  SD.  And the 
"50% most likely" range, which some Bayesians have
liked to tout, has a cutoff that is even larger.

If I see a result on a scaled score that needs N=6400
to be "statistically significant at 5%,"  I am pretty sure
that the result is worthless.  Because it is (or should be)
accounted for FAR  too easily by artifacts of one sort or
another, even if I haven't figured out what those artifacts
might be.  Now, that is a SCALED score of the some
patient rating -- the sort that I have (well-founded) 
doubts about.  

Some other result does not use SD units, and may
imply standard Ns that are much larger.

For instance, I start out dubious about medical results that 
publish Odds Ratios of 1.2 or 1.5, even though the p-level might
be small because the N= 20,000.  I have in mind, especially,
those uncontrolled  "observational studies"  that are rife
with self-selection.  - I may end up believing the result,
but I do want to know that the role of artifact was ruled out.

Similar to before, it should be possible to show either a 
1-tailed CI  or a 50% range that lies *above*  the range of 
trivial and artifactual results.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to