Corresponding regression and confounders

Simon, Steve, PhD Mon, 30 Sep 2002 13:51:01 -0700

William Chambers writes:

> The people on this newslist have recently attempted to 
> disprove CR by violating assumptions, ignoring the probable 
> existence of confounding variables and by highly evasive 
> subsampling strategies. Some continue to work for the 
> truth, Gus and Steve among them. I applaud them both.


I'm not sure you should be applauding me. I am the one who tested CR on five
data sets with "probable existence of confounding variables" and these data
sets provide the strongest evidence to date against CR. This doesn't mean
that I've given up, but we can't sugarcoat the bad news that these five data
tell us.

Blaming the failure of CR on confounding is an easy thing to do, but with
that obligation comes the responsibility to try to identify the potential
confounders. If you can't identify the confounder that caused problems with
CR with this data set, you won't be able to eliminate the possibility of an
equal but opposite confounder in a data set where CR performs well. That
would leave you with the unhappy option of conceding that CR does poorly
with data sets that have any potential for confounding.

For one data set that I tested CR on, the unstated confounding variables not
only masked the obvious causal direction between smoking and cancer, but
totally reversed it to the point where CR provided conclusive evidence that
cancer causes smoking.

The link between smoking and cancer is very strong, with odds ratios on the
order of 10 or 20 in many data sets. The sort of confounder that could cause
problems with such a strong association would have to be made of kryptonite.
So either there is a super duper confounder that half a century of research
has failed to identify, or CR is extremely sensitive to weak confounding.
 
It's a shame if CR turns out to be so sensitive to confounding, because
traditional statistical methods can handle confounding very well. Paul
Rosenbaum has a delightful book that lists a wide range of strategies for
handling confounding. He also has a very good article in the American
Statistician. Mitch Gail provides a wonderful overview of why we know that
smoking causes cancer, in spite of the spurious claims about confounders
that were raised in the 1950's and 1960's.

And Ahluwalia et al discuss a very instructive case. A simple analysis
showed a protective effect of Environmental Tobacco Smoke, a completely
counter-intuitive finding. They demonstrate, however, that differences in
maternal age can explain these unusual results. Chen et al is another
instructive example, where a protective effect of maternal smoking against
Down's syndrome is also explained by proper adjustments for maternal age. I
can probably find a dozen more examples where traditional statistical
methods have overcome problems with confounding.

Observational Studies. Rosenbaum P (1995) New York: Springer-Verlag.

Replicating Effects and Biases. Rosenbaum PR. The Amercian Statistician
2001:55(3);223-227.

Statistics in Action. Gail MH. Journal of the American Statistical
Association 1996:91(433);1-13.

Exposure to environmental tobacco smoke and birth outcome: increased effects
on pregnant women aged 30 years or older. Ahluwalia IB, Grummer-Strawn L and
Scanlon KS. Am J Epidemiol 1997:146(1);42-7. 

Maternal smoking and Down syndrome: the confounding effect of maternal age.
Chen CL, Gilbert TJ and Daling JR. Am J Epidemiol 1999:149(5);442-6.

I would respectfully disagree with the words "probable existence of
confounding variables" because a confounder that could cause a total
reversal in the link between smoking and cancer is very IMPROBABLE. At least
there aren't any confounders out there that are strong enough to cause
traditional statistical methods to conclude that cancer causes smoking.

The book isn't closed yet. We'll see how CR performs on additional data
sets, especially data sets from non-medical areas. When I find some good
data sets from these other areas, I will present the results here for all to
see.

But I think it is fair to say that CR performed very badly on the first five
real data sets that I tried. Whether there are ANY real data sets out there
where CR performs well remains an open question. And until we can find a
large group of real data sets where CR performs well, I believe that
simulations are a waste of time.

Steve Simon, [EMAIL PROTECTED], Standard Disclaimer.
The STATS web page has moved to
http://www.childrens-mercy.org/stats.

P.S. Dr. Chambers has also suggested that if we changed how we collect data,
we might be able to better utilize CR to demonstrate causes. This is worth
exploring, perhaps, but I would want to look at some designed experiments
first. If CR performs poorly on a designed experiment with nice balanced
data, that would be very bad news indeed.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Corresponding regression and confounders

Reply via email to