To understand correlations, dependence and causality better, I suggest going to the source -- "Causality" by Judea Pearl. Good book. He goes into necessary and sufficient conditions, etc.
P _____________________________________ Pradyumna Sribharga Upadrashta, PhD Student Scientific Computation, UofMN >-----Original Message----- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] On Behalf Of Jay Warner >Sent: Monday, September 29, 2003 5:15 PM >Cc: [EMAIL PROTECTED] >Subject: Re: [edstat] Unbaisedness and Variance - Regression > > >Now we're getting into it - seriously! > >Eric Bohlman wrote: > >> [EMAIL PROTECTED] (Jay Warner) wrote in news:[EMAIL PROTECTED]: >> >> > "correlation" says that when we observe a change in one >variable, we >> > see a consistent change in the other. If the correlation is >> > positive, then both go up together and down together. If the >> > correlation is negative, then as one goes up, the other goes down, >> > and vice versa. >> >> Not quite. Correlation specifically implies that the average of one >> variable is proportional to the value of the other. There are >> relationships that meet your definition but show little correlation. > >"Average"? > >Suppose I have a set of paired data, say 20 pairs, x(i) and >y(i). I plot them on a 2-D field, a standard scatter plot. > >If the 'cloud' of points forms a lens tipped upward toward the >right, I have a positive correlation, true? And if the cloud >is a lens tipped downward to the right, I have a negative >correlation. True? And if the cloud is a horizontal lens, or >a circular shape, the correlation will, no doubt, be near 0, true? > >(Sorry I can't draw the pictures here.) > >I see nothing in here that says the average of either set of >data. In fact, I believe 'correlation' doesn't care about the >average of either set. > >True? > >Nonetheless, for the case of a positive correlation, a higher >x(i) is associated with a higher y(i). > >where is the logic off here? > >> > >> > "dependence" indicates that variable B is controlled, or >is caused >> > by, variable A. the question of what is 'causality' takes up more >> > space than the internet has available. >> >> Specifically, dependence means that if you know the value of B, you >> can make a better guess at the value of A than if you didn't. > >This definition of 'dependence' is less stringent than what I >was thinking of as 'cause.' I stand corrected. Yes, if I go >to Oldenberg, in the years in question, and count the number >of storks, I can predict the number of people. True. If I >now go to Oldenberg in this year, and the relationships of the >variables are still valid, then I could still predict the >human population. > >But if I go around the city and shoot half of the storks >(which I assure you, I don't intend to do), the human >population of the city would not go down accordingly. > >In the case of storks and people, and other such 'dependence' >cases, I believe the reason is that a third (or more) >un-displayed variables in fact build the causal link between >the two observed variables. If the population of Oldenberg >was reduced by warfare in W.W.II, I'm sure the stork >population declined with it. When we exercise the un >displayed (often called 'hidden') variables, then we see the >effects that result form 'causes.' > >> > In Box, Hunter & Hunter is a plot of the human population of >> > Oldenberg, Germany, against the number of nesting storks for a >> > certain time period. It "proves" that storks bring human babies, >> > since more storks means more people. Does the human population >> > "depend" on the stork population? I don't think so. Is the human >> > population correlated with the stork population? Yup. >> >> But technically, that *is* a relationship of dependence. >Knowing the >> stork population helps you estimate the human population. >That in no >> way implies any sort of causal relationship; all it implies is that >> the joint distribution of human population and stork >population isn't >> the same for different marginal values of stork populations. >> Dependence != causality. Dependence is not an inherently asymmetric >> relationship. > >got it! this looks suspiciously like a statement using proper >terminology that I was attempting to get at above. > >So can we say that > >'correlation' indicates a mutual dependence between two >variables, and neither correlation nor dependence indicate >causal relationships. > >? > >In order to show a causal relationship, we would have to go to >a DoE, where we consciously set factor levels, and watch for >changes in response. OR, do something like Shanin's "Big Red >X" scene, where you make a change, see the problem go away, >then make the change back and see if it reappears. If it goes >away and come back 'under control,' then we say there is a >causal relationship, and we can fix the original problem. > >Make sense? > >Cheers, >Jay >-- >Jay Warner >Principal Scientist >Warner Consulting, Inc. >4444 North Green Bay Road >Racine, WI 53404-1216 >USA > >Ph: (262) 634-9100 >FAX: (262) 681-1133 >email: [EMAIL PROTECTED] >web: http://www.a2q.com > >The A2Q Method (tm) -- What do you want to improve today? > > > > >. >. ================================================================= >Instructions for joining and leaving this list, remarks about >the problem of INAPPROPRIATE MESSAGES, and archives are available at: >. http://jse.stat.ncsu.edu/ . >================================================================= > . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
