Steve, Actually what you have done is shown how little you understand mathematics and how quickly you are willing to turn from an interesting issue to sarcasm. You never bothered to read my publications and now you act as though I am sticking qualifications on at the last minute. This is fraudlent rhetoric on your side. I am suggesting that much of the data in correlational research is junk. I am referencing the finest traditions in statistics to make my case. You are saying this is impractical and using logical inconsistencies to cast doubt on my logic. For example, the problem with convenience samples is that they tend to be normally distributed. The convenience in itself is not an issue at all. The normal sampling of causes is the issue. You are mischaracterizing my arguments. The issue of confounding. All statistics have to face this problem. ALL. Traditional statistics tell us so little about the data that the absurdities of confounding do are not clear to people like you. The confounding is in the data, not in CR. CR justs underscores the craziness of the data.
"Simon, Steve, PhD" <[EMAIL PROTECTED]> wrote in message E7AC96207335D411B1E7009027FC284902A9B2D4@EXCHANGE2">news:E7AC96207335D411B1E7009027FC284902A9B2D4@EXCHANGE2... > We are making quite a bit of progress. We are finding more and more evidence > that CR does not work with real world data sets. The following types of > data, according to Dr. Chambers, are not good for CR. > > 1. Convenience samples. That pretty much eliminates the ability of CR to > work in any study that requires informed consent, because restricting your > sample to only those who give informed consent makes your sample a > convenience sample. This is a down right stupid comment. You can get permissiom from people with sufficient characteristics to fill out a factorial design. Experimentalists do it all the time. I have done it. The reason you are not doing it because you do not have enough subjects in your hospital to collect factorial data. You are trying to revolutionize medicine in a tea pot, the cheap and easy way. If you want to be a real scientist, get the data you need to make legitimate inferences. Your problem is that you are thinking for profit and ego instead of trying to solve real problems. You have to experience reality to master it. Grow up. >It eliminates data sets where the study is restricted to > a single hospital rather than a representative sample of hospitals. It > eliminates data sets where we need financial inducements to get people to > participate. This is total nonsense. You can pay the people who participate in a factorial design. > > 2. Data sets with confounders. That pretty much eliminates any Epidemiology > data set. And we'll never be able to use CR to understand the environmental > and hereditary causes of cancer, because cancer data has too many > confounders. All statistical studies have confounds. The challenge that real scientists embrace is to do sufficient research to render improbable what ever confounds are there. You found a strange pattern with bladder cancer and rather than investigate confounds or type II errors, you blow yourself up and get sarcastic, over generalize and act childish. > 3. Data sets where the correlation is too high. This eliminates a lot of the > physical sciences. I know in Chemistry, that they get disappointed if the > correlation in their calibration experiments is not at least 0.98. If you had done your homework you would have know this high correlation limitation a long time ago. It is extremely clear in my 1991 paper. Did you read it? I have also argued this point in this thread in recent weeks. Perfect correlations almost never happen in the behavioral sciences. We are looking to those drug studies as experiments to take us a small step beyond simulations. Find drug studies that use two way anovas without interactions. We would not be using CR with drug studies ordinarily because we have experiments. > > Now I wonder how you would design an experiment to keep the correlation from > being too high? I suppose you could deliberately be sloppy and hope that > this introduces extra error into the process. To lower the correlation between any particular cause and the effect, simply cross two or more causes as in a factorial design. They will share the determination of the effect and the correlation will go down. > > In medical applications, we are totally without hope. Birth weight and > gestational age are highly correlated, and we can do nothing to remove this > correlation. We can't command mothers to have 4000 gram/26 week babies or > 500 gram/38 week babies. It just isn't going to happen. Stop whining. Ccorrlations between .30 and .90 work. Cross gestation age and mothers weight. Everything in life is not as simple as you want it to be. > > Most statisticians are delighted to get a very high correlation. The > strength of the correlation is one of the nine conditions that Hill set out > in 1965 to establish a cause and effect relationship. I do not care the least what delights most statisticians since most that I have met are pretty much close to stupid. Your language is pure sophistry. I have made it very clear that perfect correlation is confounding. You hear what you want to hear. That is not very mature. > > 4. You have to have enough data at the extremes. We might be able to fix > this if we trim the data, but this has been shown to work only in > simulations. Yes, you must sample the causes so that the pattern is factorial. We are interested in the combinations of the causes across their ranges. If the data is not there, only a sophist would pretend that it is. > > And you have to be careful what you remove. If you remove the data by > trimming the edges, that works, according to Dr. Chambers. Ttry it yourself. I spend days coaching you on how to do this five minute procedure. It works. But if you remove > the data by creating evenly spaced bins on a rectangular grid and then > selecting the first observation to fall in each bin, then that makes CR > worse, according to Dr. Chambers. Steve, stop whining. Evenly spaced subsamples only replicate the distribution you start with. Even some of your buddies on this newslist have admitted that. I am not making this up. You are talking like a kid in high school who gets annoid because algebra is difficult. Stop it and do not waste people's time trying to look clever, only to back out when you get in over your head. > > I have very little faith in the trimming approach. Selectively removing data > values based on their extremities is asking for trouble. It will create all > sorts of artefactual problems. And it will do nothing to fix all the other > problems listed in this email. What problems? I have already indicated to you that it causes truncation effects but these could be fixed with range restriction corrections. Tell me, if you cut a line do the remaining parts cease being a line? That is all trimming does. It makes a clean cut across a line. You are calling forth all sorts of superstitions saying the airplane can never fly. Stop arguing by innuendo. Be explicit. Where did you get your degree? You want to be a big scientists, work for it. > > 5. If you have two possible causes, you need to sample so that all four > corners of the square are filled. If you have three possible causes, you > need to sample so that all eight corners of the cube are filled. Yes Steven, science is hard stuff and if you want to know about complicated things you have to do your homework and consider all the possibilities. That's hard work, to much hard work for the sort of people who buy their degrees these days. > > 6. You have to have a linear relationship. Transforming the data so that the > relationship becomes linear may or may not work. Yes. So what? All correlations assume linearity. > > There are other restrictions that I believe should be added to the list. What are they? > > 7. Heteroscedascity will almost certainly ruin CR. Show us why. Could it be because heteroscedascity is a product of sloppy measures? Yes. > > 8. CR does not work well with a discrete cause variable. It fails miserably > when the cause variable is binary. I suspect it will perform poorly when the > cause variable only has three or four levels. Show us. It certainly fails when the effect is dichotomous but this because dichotomous effects represent a degradation of information. > > 9. We can also rule out any data set where the effect is binary. We can > never use CR to establish causes of mortality for example, because there is > no middle category for mortality. Have I mentioned the minigroup method? I have described many times before. Create sampling units composed of individuals. If you have 1000 cases, create 100 units with 10 people each. Count the mortalities. Do CR on this count. > > As additional real data sets fail, I'm sure we will see additional reasons > added to this list. It may come down to this. CR only works for a data set > that is carefully and meticulously designed from scratch to meet the > extremely rigorous demands of the method. For simplicity, I'll call this a > C-sample. Such data would be worth gathering if it allowed us to infer causation from correlation. Such research would be better than the thousands of junk studies that now clutter the literature and waste research resources. Research apparently usually gets done so people like you can say you have published. Such incestuous behavior does not help the poor people in the world who suffer while lazy scientists go on the vacation conferences and act superior to the little people who work. > > To prove or disprove that CR works with a C-sample, we would have to collect > some data from scratch. That's too expensive and time consuming for me to > do. We get to the point. You have a job where you are supposed to whip up great results from trash. Your career hangs in the balance. You are too lazy to collect data across years. You need something fast and you are not willing to cooperate with others to get it. Now imagine Darwin in HMS Beagle. Imagine all those scientists who labored for years to solve important problems. Those kind of people see you for what you are. >But by showing sharply limiting the number of real world data sets that > we can apply CR to, I will be performing some service. You have not done anything but seek cheap glory. You are not trying to solve the problem of causal inference, so that you can go out and help people. You want to be the gunslinger who put crazy Bill Chambers in his place. The limitations of CR were acknowledged long ago. You have not even gotten caught up yet to 1991. Try harder. > > And while I am still skeptical of simulations, Dr. Chamber's comments that > CR works better for moderate rather than strong correlations is indeed > supported by a simple simulation. So I guess I did not lie about that then. Imagine what you would discover if you checked out the other things I have said. > > Take the existing data set, and estimate the residual and predicted values. > Recombine the residual and the predicted value by reweighting the residual > by a factor of 10, 30, or 100. We get a data set that is similar to the > original data set, but with much more error in the data. It appears that a > correlation of 0.22 works better with CR than a correlation around 0.56 or > 0.07. Lett's see, are you saying that creating y=x1+(3*x2) creates strange results? > > > pred.resp <- predict(lm(resp~dose)) > > resid.resp <- resid(lm(resp~dose)) > > resp1 <- pred.resp+10*resid.resp > > resp2 <- pred.resp+30*resid.resp > > resp3 <- pred.resp+100*resid.resp > > cor(dose,cbind(resp,resp1,resp2,resp3)) > resp resp1 resp2 resp3 > 0.989066 0.5570023 0.2181726 0.06691709 > > > > corr.reg(dose,resp) > > D = 0.16 > rde(y) = 0.06 > rde(x) = -0.09 > cc.y = 0.78 > cc.x = 0.52 > > > corr.reg(dose,resp1) > > D = -0.05 > rde(y) = -0.14 > rde(x) = -0.09 > cc.y = 0 > cc.x = -0.02 > > > corr.reg(dose,resp2) > > D = -0.23 > rde(y) = -0.32 > rde(x) = -0.09 > cc.y = 0.14 > cc.x = 0.25 > > > corr.reg(dose,resp3) > > D = -0.08 > rde(y) = -0.17 > rde(x) = -0.09 > cc.y = 0.21 > cc.x = 0.35 > > So where do we stand? I've pretty much decided that CR is useless for the > data sets that I encounter in my job. There may be some real data sets out > there where CR works, but I'm starting to lose hope. Trimming is a waste of > time, in my opinion. If it works at all, it only overcomes one of the many > limitations of CR. Trimming a data set won't remove any hidden confounders, > for example. Steve, you are whining again. Trimming works. The only way to remove confounded variables is to create better measures. That takes work, whether you use CR or any other statistic. > > Testing CR on a C-sample might be worthwhile, but we'd have to think of an > experiment we could design that wouldn't cost a lot of money or take a lot > of time. Did they teach you in science to only pursue cheap and easy challenges? Bill > > Steve Simon, [EMAIL PROTECTED], Standard Disclaimer. > The STATS web page has moved to > http://www.childrens-mercy.org/stats > . > . > ================================================================= > Instructions for joining and leaving this list, remarks about the > problem of INAPPROPRIATE MESSAGES, and archives are available at: > . http://jse.stat.ncsu.edu/ . > ================================================================= . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
