- I have comments on the question of corrections for multiple
testing.  And I'm asking folks for feedback on Benjamini 
and Hochberg's FDR  as an alternative.


On 26 Nov 2003 10:01:12 -0800, [EMAIL PROTECTED] (Keaser,
Michael L) wrote:


>  
> I have two data sets: A and B
>  
> For data set A, I conduct 25 significant tests. Moreover I conduct 30 tests
> on data set B. I then create third data set, called C, by subtracting A from
> B. Thus:
>  
> Data set C = (Data Set B scores) - (Data Set A  raw scores). I then conduct
> another 15 tests on this data set C.  My question is, when doing the
> Bonferroni correction, would the significance level at p = 0.05 be:
>  
> Data set A 0.05/25 = 0.002
>  
> Data set B 0.05/30 = 0.0017
>  
> Data set C 0.05/15 = 0.003


Yes.  If that is how you want to divide them.
If your readership will let you.

>  
> On the other hand, since Data set C was derived from the subtraction of data
> set A from data set B, the number of tests for data set A would increase (25
> + 15 = 40), and the number of tests on data set B would increase (30 + 15 =
> 45). The total for data set C be (15 + 25(from A and B) + 5(from B) = 45).
> Thus, would the Bonferroni correction at significance level p = 0.05 be:
>  
> Data set A 0.05/40 = 0.00125
>  
> Data set B 0.05/45 = 0.0011
>  
> Data set C 0.05/45 = 0.0011
>  

Yes, that's another way to figure something, but I don't follow.

I see why the alternative of testing *everything*  would use 70.
If the whole set were planned from the start, and you did not 
have any hopes that were particularly high for any given test, 
or (even) any given set of tests...  then the total testing under
consideration would use  (25+15+30)  = 70  as the denominator.

Frankly, I figure that for my purposes, any Bonferroni
testing has to quit at about 15 tests;  our power does not
extend a lot beyond  0.05/15.   Astronomers, on the other 
hand, might divide by a huge number.  

One problem with asking for Bonferroni correction is that we
have to keep making really GROSS adjustments for the area
where the data arise.  P <  0.05  works for social science, with 
some amount of adjustment for Bonferroni, or using .01 or .001
for multiple tests.  

You need to look at what is used by the folks who publish 
in your area.  There is a lot to learn from publication:  Not only
can you read about the p-values and how they are used, 
you can read about the hypotheses, in order to reduce that
set (perhaps) from dozens to just two or three questions.

See what is published; then you look--  Do you have results
with a tiny nominal P (don't squeeze out all your *results*  if 
you think you have some)?   At the other end -- Do you have 
too many extra results?   It is not just that you  should not
bother mentioning that *all*  of everything was "significant" but
you should definitely be able to draw distinctions between 
*big*  differences and *small*  differences.   

I've had data where there were dozens of items with p-values
under .001, and so that means that the items with p-values
of  0.05  were (in comparison)  hardly worth mentioning -- 
NOT ALL EFFECTS  are EQUAL.


Back to the question of Corrections.
I mean, if you have ridiculously tiny p-values, you can mention
to your audience that they meet the stiffest test than anyone 
would consider applying to them.  On the other hand, if 
most of the tests are null, then you might remind yourself and
your audience that (for instance)  there were only two or
three questions that were major enough to justify the study
in the first place; and that those three tests firm the first tier
of testing, sharing the 0.05 alpha for the main study -- whereas
everything else was always considered to be exploratory.




- I lately discovered that I had overlooked a *different* basis 
of test correction which might have some merit.  That is, it might
be more easy to generalize across areas when you start with
the FDR  or  "False Discovery Rate" of Benjamini and 
Hochberg (1995, 2000).  This is *not*  the same as controlling
the alpha level;  this does provides a much laxer criterion than
is possible by using Bonferroni or its trivial tweaking; this is laxer
than using any of the post-hoc tests (SNK, etc)  from a generation
ago.  
- Right now, I am offering this limited endorsement of it,
because it *seems*  good from the talk that I heard, and the little
that I have read.  Google does not show me a whole lot of use, 
so far, so I am asking  for other experience and opinions - if
anyone can say anything. 

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
"Taxes are the price we pay for civilization." 
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to