- I have comments on the question of corrections for multiple testing. And I'm asking folks for feedback on Benjamini and Hochberg's FDR as an alternative.
On 26 Nov 2003 10:01:12 -0800, [EMAIL PROTECTED] (Keaser, Michael L) wrote: > > I have two data sets: A and B > > For data set A, I conduct 25 significant tests. Moreover I conduct 30 tests > on data set B. I then create third data set, called C, by subtracting A from > B. Thus: > > Data set C = (Data Set B scores) - (Data Set A raw scores). I then conduct > another 15 tests on this data set C. My question is, when doing the > Bonferroni correction, would the significance level at p = 0.05 be: > > Data set A 0.05/25 = 0.002 > > Data set B 0.05/30 = 0.0017 > > Data set C 0.05/15 = 0.003 Yes. If that is how you want to divide them. If your readership will let you. > > On the other hand, since Data set C was derived from the subtraction of data > set A from data set B, the number of tests for data set A would increase (25 > + 15 = 40), and the number of tests on data set B would increase (30 + 15 = > 45). The total for data set C be (15 + 25(from A and B) + 5(from B) = 45). > Thus, would the Bonferroni correction at significance level p = 0.05 be: > > Data set A 0.05/40 = 0.00125 > > Data set B 0.05/45 = 0.0011 > > Data set C 0.05/45 = 0.0011 > Yes, that's another way to figure something, but I don't follow. I see why the alternative of testing *everything* would use 70. If the whole set were planned from the start, and you did not have any hopes that were particularly high for any given test, or (even) any given set of tests... then the total testing under consideration would use (25+15+30) = 70 as the denominator. Frankly, I figure that for my purposes, any Bonferroni testing has to quit at about 15 tests; our power does not extend a lot beyond 0.05/15. Astronomers, on the other hand, might divide by a huge number. One problem with asking for Bonferroni correction is that we have to keep making really GROSS adjustments for the area where the data arise. P < 0.05 works for social science, with some amount of adjustment for Bonferroni, or using .01 or .001 for multiple tests. You need to look at what is used by the folks who publish in your area. There is a lot to learn from publication: Not only can you read about the p-values and how they are used, you can read about the hypotheses, in order to reduce that set (perhaps) from dozens to just two or three questions. See what is published; then you look-- Do you have results with a tiny nominal P (don't squeeze out all your *results* if you think you have some)? At the other end -- Do you have too many extra results? It is not just that you should not bother mentioning that *all* of everything was "significant" but you should definitely be able to draw distinctions between *big* differences and *small* differences. I've had data where there were dozens of items with p-values under .001, and so that means that the items with p-values of 0.05 were (in comparison) hardly worth mentioning -- NOT ALL EFFECTS are EQUAL. Back to the question of Corrections. I mean, if you have ridiculously tiny p-values, you can mention to your audience that they meet the stiffest test than anyone would consider applying to them. On the other hand, if most of the tests are null, then you might remind yourself and your audience that (for instance) there were only two or three questions that were major enough to justify the study in the first place; and that those three tests firm the first tier of testing, sharing the 0.05 alpha for the main study -- whereas everything else was always considered to be exploratory. - I lately discovered that I had overlooked a *different* basis of test correction which might have some merit. That is, it might be more easy to generalize across areas when you start with the FDR or "False Discovery Rate" of Benjamini and Hochberg (1995, 2000). This is *not* the same as controlling the alpha level; this does provides a much laxer criterion than is possible by using Bonferroni or its trivial tweaking; this is laxer than using any of the post-hoc tests (SNK, etc) from a generation ago. - Right now, I am offering this limited endorsement of it, because it *seems* good from the talk that I heard, and the little that I have read. Google does not show me a whole lot of use, so far, so I am asking for other experience and opinions - if anyone can say anything. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html "Taxes are the price we pay for civilization." . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
