Steve,

Actually what you have done is shown how little you understand mathematics
and how quickly you are willing to turn from an interesting issue to
sarcasm. You never bothered to read my publications and now you act as
though I am sticking qualifications on at the last minute.  This is
fraudlent rhetoric on your side.  I am suggesting that much of the data in
correlational research is junk. I am referencing the finest traditions in
statistics to make my case. You are saying this is impractical and using
logical inconsistencies to cast doubt on my logic. For example, the problem
with convenience samples is that they tend to be normally distributed.  The
convenience in itself is not an issue at all. The normal sampling of causes
is the issue.  You are mischaracterizing my arguments.  The issue of
confounding. All statistics have to face this problem. ALL.  Traditional
statistics tell us so little about the data that the absurdities of
confounding do are not clear to people like you.  The confounding is in the
data, not in CR. CR justs underscores the craziness of the data.


"Simon, Steve, PhD" <[EMAIL PROTECTED]> wrote in message
E7AC96207335D411B1E7009027FC284902A9B2D4@EXCHANGE2">news:E7AC96207335D411B1E7009027FC284902A9B2D4@EXCHANGE2...
> We are making quite a bit of progress. We are finding more and more
evidence
> that CR does not work with real world data sets. The following types of
> data, according to Dr. Chambers, are not good for CR.
>
> 1. Convenience samples. That pretty much eliminates the ability of CR to
> work in any study that requires informed consent, because restricting your
> sample to only those who give informed consent makes your sample a
> convenience sample.

This is a down right stupid comment.  You can get permissiom from people
with sufficient characteristics to fill out a factorial design.
Experimentalists do it all the time. I have done it. The reason you are not
doing it because you do not have enough subjects in your hospital to collect
factorial data.  You are trying to revolutionize medicine in a tea pot, the
cheap and easy way. If you want to be a real scientist, get the data you
need to make legitimate inferences. Your problem is that you are thinking
for profit and ego instead of trying to solve real problems. You have to
experience reality to master it. Grow up.


>It eliminates data sets where the study is restricted to
> a single hospital rather than a representative sample of hospitals. It
> eliminates data sets where we need financial inducements to get people to
> participate.

This is total nonsense. You can pay the people who participate in a
factorial design.


>
> 2. Data sets with confounders. That pretty much eliminates any
Epidemiology
> data set. And we'll never be able to use CR to understand the
environmental
> and hereditary causes of cancer, because cancer data has too many
> confounders.


All statistical studies have confounds. The challenge that real scientists
embrace is to do sufficient research to render improbable what ever
confounds are there. You found a strange pattern with bladder cancer and
rather than investigate confounds or type II errors, you blow yourself up
and get sarcastic, over generalize and act childish.


> 3. Data sets where the correlation is too high. This eliminates a lot of
the
> physical sciences. I know in Chemistry, that they get disappointed if the
> correlation in their calibration experiments is not at least 0.98.


If you had done your homework you would have know this high correlation
limitation a long time ago. It is extremely clear in my 1991 paper. Did you
read it?  I have also argued this point in this thread in  recent weeks.
Perfect correlations almost never happen in the behavioral sciences. We are
looking to those drug studies as experiments to take us a small step beyond
simulations. Find drug studies that use two way anovas without interactions.
We would not be using CR with drug studies ordinarily because we have
experiments.


>
> Now I wonder how you would design an experiment to keep the correlation
from
> being too high? I suppose you could deliberately be sloppy and hope that
> this introduces extra error into the process.

To lower the correlation between any particular cause and the effect, simply
cross two or more causes as in a factorial design.  They will share the
determination of the effect and the correlation will go down.


>
> In medical applications, we are totally without hope. Birth weight and
> gestational age are highly correlated, and we can do nothing to remove
this
> correlation. We can't command mothers to have 4000 gram/26 week babies or
> 500 gram/38 week babies. It just isn't going to happen.


Stop whining. Ccorrlations between .30 and .90 work.  Cross gestation age
and mothers weight.  Everything in life is not as simple as you want it to
be.


>
> Most statisticians are delighted to get a very high correlation. The
> strength of the correlation is one of the nine conditions that Hill set
out
> in 1965 to establish a cause and effect relationship.

I do not care the least what delights most statisticians since most that I
have met are pretty much close to stupid. Your language is pure sophistry. I
have made it very clear that perfect correlation is confounding. You hear
what you want to hear.  That is not  very mature.



>
> 4. You have to have enough data at the extremes. We might be able to fix
> this if we trim the data, but this has been shown to work only in
> simulations.

Yes, you must sample the causes so that the pattern is factorial. We are
interested in the combinations of the causes across their ranges. If the
data is not there, only a sophist would pretend that it is.




>
> And you have to be careful what you remove. If you remove the data by
> trimming the edges, that works, according to Dr. Chambers.

Ttry it yourself. I spend days coaching you on how to do this five minute
procedure. It works.


But if you remove
> the data by creating evenly spaced bins on a rectangular grid and then
> selecting the first observation to fall in each bin, then that makes CR
> worse, according to Dr. Chambers.


Steve, stop whining. Evenly spaced subsamples only replicate the
distribution you start with. Even some of your buddies on this newslist have
admitted that. I am not making this up. You are talking like a kid in high
school who gets annoid because algebra is difficult. Stop it and do not
waste people's time trying to look clever, only to back out when you get in
over your head.


>
> I have very little faith in the trimming approach. Selectively removing
data
> values based on their extremities is asking for trouble. It will create
all
> sorts of artefactual problems. And it will do nothing to fix all the other
> problems listed in this email.


What problems?  I have already indicated to you that it causes truncation
effects but these could be fixed with range restriction corrections. Tell
me, if you cut a line do the remaining parts cease being a line?  That is
all trimming does. It makes a clean cut across a line. You are calling forth
all sorts of superstitions saying the airplane can never fly. Stop arguing
by innuendo. Be explicit. Where did you get your degree?  You want to be a
big scientists, work for it.


>
> 5. If you have two possible causes, you need to sample so that all four
> corners of the square are filled. If you have three possible causes, you
> need to sample so that all eight corners of the cube are filled.


Yes Steven, science is hard stuff and if you want to know about complicated
things you have to do your homework and consider all the possibilities.
That's hard work, to much hard work for the sort of people who buy their
degrees these days.


>
> 6. You have to have a linear relationship. Transforming the data so that
the
> relationship becomes linear may or may not work.



Yes.  So what?  All correlations assume linearity.

>
> There are other restrictions that I believe should be added to the list.

What are they?


>
> 7. Heteroscedascity will almost certainly ruin CR.

Show us why.  Could it be because heteroscedascity is a product of sloppy
measures?  Yes.


>
> 8. CR does not work well with a discrete cause variable. It fails
miserably
> when the cause variable is binary. I suspect it will perform poorly when
the
> cause variable only has three or four levels.


Show us. It certainly fails when the effect is dichotomous but this because
dichotomous effects represent a degradation of information.

>
> 9. We can also rule out any data set where the effect is binary. We can
> never use CR to establish causes of mortality for example, because there
is
> no middle category for mortality.



Have I mentioned the minigroup method? I have described many times before.
Create sampling units composed of individuals. If you have 1000 cases,
create 100 units with 10 people each. Count the mortalities. Do CR on this
count.


>
> As additional real data sets fail, I'm sure we will see additional reasons
> added to this list. It may come down to this. CR only works for a data set
> that is carefully and meticulously designed from scratch to meet the
> extremely rigorous demands of the method. For simplicity, I'll call this a
> C-sample.


Such data would be worth gathering if it allowed us to infer causation from
correlation. Such research would be better than the thousands of junk
studies that now clutter the literature and waste research resources.
Research apparently usually gets done so people like you can say you have
published. Such incestuous behavior does not help the poor people in the
world who suffer while lazy scientists go on the vacation conferences and
act superior to the little people who work.


>
> To prove or disprove that CR works with a C-sample, we would have to
collect
> some data from scratch. That's too expensive and time consuming for me to
> do.

We get to the point. You have a  job where you are supposed to whip up great
results from trash. Your career hangs in the balance. You are too lazy to
collect data across years. You need something fast and you are not willing
to cooperate with others to get it.   Now imagine Darwin in HMS Beagle.
Imagine all those scientists who labored for years to solve important
problems. Those kind of people see you for what you are.


>But by showing sharply limiting the number of real world data sets that
> we can apply CR to, I will be performing some service.


You have not done anything but seek cheap glory. You are not trying to solve
the problem of causal inference, so that you can go out and help people. You
want to be the gunslinger who put crazy Bill Chambers in his place. The
limitations of CR were acknowledged long ago. You have not even gotten
caught up yet to 1991.  Try harder.

>
> And while I am still skeptical of simulations, Dr. Chamber's comments that
> CR works better for moderate rather than strong correlations is indeed
> supported by a simple simulation.


So I guess I did not lie about that then. Imagine what you would discover if
you checked out the other things I have said.


>
> Take the existing data set, and estimate the residual and predicted
values.
> Recombine the residual and the predicted value by reweighting the residual
> by a factor of 10, 30, or 100. We get a data set that is similar to the
> original data set, but with much more error in the data. It appears that a
> correlation of 0.22 works better with CR than a correlation around 0.56 or
> 0.07.


Lett's see, are you saying that creating y=x1+(3*x2) creates strange
results?



>
> > pred.resp <- predict(lm(resp~dose))
> > resid.resp <- resid(lm(resp~dose))
> > resp1 <- pred.resp+10*resid.resp
> > resp2 <- pred.resp+30*resid.resp
> > resp3 <- pred.resp+100*resid.resp
> > cor(dose,cbind(resp,resp1,resp2,resp3))
>      resp     resp1     resp2      resp3
>  0.989066 0.5570023 0.2181726 0.06691709
> >
> > corr.reg(dose,resp)
>
>       D = 0.16
>  rde(y) = 0.06
>  rde(x) = -0.09
>    cc.y = 0.78
>    cc.x = 0.52
>
> > corr.reg(dose,resp1)
>
>       D = -0.05
>  rde(y) = -0.14
>  rde(x) = -0.09
>    cc.y = 0
>    cc.x = -0.02
>
> > corr.reg(dose,resp2)
>
>       D = -0.23
>  rde(y) = -0.32
>  rde(x) = -0.09
>    cc.y = 0.14
>    cc.x = 0.25
>
> > corr.reg(dose,resp3)
>
>       D = -0.08
>  rde(y) = -0.17
>  rde(x) = -0.09
>    cc.y = 0.21
>    cc.x = 0.35
>
> So where do we stand? I've pretty much decided that CR is useless for the
> data sets that I encounter in my job. There may be some real data sets out
> there where CR works, but I'm starting to lose hope. Trimming is a waste
of
> time, in my opinion. If it works at all, it only overcomes one of the many
> limitations of CR. Trimming a data set won't remove any hidden
confounders,
> for example.


Steve, you are whining again. Trimming works. The only way to remove
confounded variables is to create better measures. That takes work, whether
you use CR or any other statistic.



>
> Testing CR on a C-sample might be worthwhile, but we'd have to think of an
> experiment we could design that wouldn't cost a lot of money or take a lot
> of time.


Did they teach you in science to only pursue cheap and easy challenges?

Bill


>
> Steve Simon, [EMAIL PROTECTED], Standard Disclaimer.
> The STATS web page has moved to
> http://www.childrens-mercy.org/stats
> .
> .
> =================================================================
> Instructions for joining and leaving this list, remarks about the
> problem of INAPPROPRIATE MESSAGES, and archives are available at:
> .                  http://jse.stat.ncsu.edu/                    .
> =================================================================



.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to