William Chambers writes: > The people on this newslist have recently attempted to > disprove CR by violating assumptions, ignoring the probable > existence of confounding variables and by highly evasive > subsampling strategies. Some continue to work for the > truth, Gus and Steve among them. I applaud them both.
I'm not sure you should be applauding me. I am the one who tested CR on five data sets with "probable existence of confounding variables" and these data sets provide the strongest evidence to date against CR. This doesn't mean that I've given up, but we can't sugarcoat the bad news that these five data tell us. Blaming the failure of CR on confounding is an easy thing to do, but with that obligation comes the responsibility to try to identify the potential confounders. If you can't identify the confounder that caused problems with CR with this data set, you won't be able to eliminate the possibility of an equal but opposite confounder in a data set where CR performs well. That would leave you with the unhappy option of conceding that CR does poorly with data sets that have any potential for confounding. For one data set that I tested CR on, the unstated confounding variables not only masked the obvious causal direction between smoking and cancer, but totally reversed it to the point where CR provided conclusive evidence that cancer causes smoking. The link between smoking and cancer is very strong, with odds ratios on the order of 10 or 20 in many data sets. The sort of confounder that could cause problems with such a strong association would have to be made of kryptonite. So either there is a super duper confounder that half a century of research has failed to identify, or CR is extremely sensitive to weak confounding. It's a shame if CR turns out to be so sensitive to confounding, because traditional statistical methods can handle confounding very well. Paul Rosenbaum has a delightful book that lists a wide range of strategies for handling confounding. He also has a very good article in the American Statistician. Mitch Gail provides a wonderful overview of why we know that smoking causes cancer, in spite of the spurious claims about confounders that were raised in the 1950's and 1960's. And Ahluwalia et al discuss a very instructive case. A simple analysis showed a protective effect of Environmental Tobacco Smoke, a completely counter-intuitive finding. They demonstrate, however, that differences in maternal age can explain these unusual results. Chen et al is another instructive example, where a protective effect of maternal smoking against Down's syndrome is also explained by proper adjustments for maternal age. I can probably find a dozen more examples where traditional statistical methods have overcome problems with confounding. Observational Studies. Rosenbaum P (1995) New York: Springer-Verlag. Replicating Effects and Biases. Rosenbaum PR. The Amercian Statistician 2001:55(3);223-227. Statistics in Action. Gail MH. Journal of the American Statistical Association 1996:91(433);1-13. Exposure to environmental tobacco smoke and birth outcome: increased effects on pregnant women aged 30 years or older. Ahluwalia IB, Grummer-Strawn L and Scanlon KS. Am J Epidemiol 1997:146(1);42-7. Maternal smoking and Down syndrome: the confounding effect of maternal age. Chen CL, Gilbert TJ and Daling JR. Am J Epidemiol 1999:149(5);442-6. I would respectfully disagree with the words "probable existence of confounding variables" because a confounder that could cause a total reversal in the link between smoking and cancer is very IMPROBABLE. At least there aren't any confounders out there that are strong enough to cause traditional statistical methods to conclude that cancer causes smoking. The book isn't closed yet. We'll see how CR performs on additional data sets, especially data sets from non-medical areas. When I find some good data sets from these other areas, I will present the results here for all to see. But I think it is fair to say that CR performed very badly on the first five real data sets that I tried. Whether there are ANY real data sets out there where CR performs well remains an open question. And until we can find a large group of real data sets where CR performs well, I believe that simulations are a waste of time. Steve Simon, [EMAIL PROTECTED], Standard Disclaimer. The STATS web page has moved to http://www.childrens-mercy.org/stats. P.S. Dr. Chambers has also suggested that if we changed how we collect data, we might be able to better utilize CR to demonstrate causes. This is worth exploring, perhaps, but I would want to look at some designed experiments first. If CR performs poorly on a designed experiment with nice balanced data, that would be very bad news indeed. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
