Re: Unbaisedness and Variance - Regression

Jay Warner Mon, 29 Sep 2003 19:03:11 -0700

Now we're getting into it - seriously!

Eric Bohlman wrote:


> [EMAIL PROTECTED] (Jay Warner) wrote in news:[EMAIL PROTECTED]:
>
> > "correlation" says that when we observe a change in one variable, we
> > see a consistent change in the other.  If the correlation is positive,
> > then both go up together and down together.  If the correlation is
> > negative, then as one goes up, the other goes down, and vice versa.
>
> Not quite.  Correlation specifically implies that the average of one
> variable is proportional to the value of the other.  There are
> relationships that meet your definition but show little correlation.

"Average"?

Suppose I have a set of paired data, say 20  pairs, x(i) and y(i).  I plot
them on a 2-D field, a standard scatter plot.

If the 'cloud' of points forms a lens tipped upward toward the right, I have a
positive correlation, true?
And if the cloud is a lens tipped downward to the right, I have a negative
correlation.  True?
And if the cloud is a horizontal lens, or a circular  shape, the correlation
will, no doubt, be near 0, true?

(Sorry I can't draw the pictures here.)

I see nothing in here that says the average of either set of data.  In fact, I
believe 'correlation' doesn't care about the average of either set.

True?

Nonetheless, for the case of a positive correlation, a higher x(i) is
associated with a higher y(i).

where is the logic off here?

> >
> > "dependence"  indicates that variable B is controlled, or is caused
> > by, variable A.  the question of what is 'causality' takes up more
> > space than the internet has available.
>
> Specifically, dependence means that if you know the value of B, you can
> make a better guess at the value of A than if you didn't.

This definition of 'dependence' is less stringent than what I was thinking of
as 'cause.'  I stand corrected.  Yes, if I go to Oldenberg, in the years in
question, and count the number of storks, I can predict the number of people.
True.  If I  now go to Oldenberg in this year, and the relationships of the
variables are still valid, then I could still predict the human population.

But if I go around the city and shoot half of the storks (which I assure you,
I don't intend to do), the human population of the city would not go down
accordingly.

In the case of storks and people, and other such 'dependence' cases, I believe
the reason is that a third (or more) un-displayed variables in fact build the
causal link between the two observed variables.  If the population of
Oldenberg was reduced by warfare in W.W.II, I'm sure the stork population
declined with it.  When we exercise the un displayed (often called 'hidden')
variables, then we see the effects that result form 'causes.'

> > In Box, Hunter & Hunter is a plot of the human population of
> > Oldenberg, Germany, against the number of nesting storks for a certain
> > time period.  It "proves" that storks bring human babies, since more
> > storks means more people. Does the human population "depend" on the
> > stork population?  I don't think so.  Is the human population
> > correlated with the stork population?  Yup.
>
> But technically, that *is* a relationship of dependence.  Knowing the stork
> population helps you estimate the human population.  That in no way implies
> any sort of causal relationship; all it implies is that the joint
> distribution of human population and stork population isn't the same for
> different marginal values of stork populations.  Dependence != causality.
> Dependence is not an inherently asymmetric relationship.

got it!  this looks suspiciously like a statement using proper terminology
that I was attempting to get at above.

So can we say that

'correlation' indicates a mutual dependence between two variables, and
neither correlation nor dependence indicate causal relationships.

?

In order to show a causal relationship, we would have to go to a DoE, where we
consciously set factor levels, and watch for changes in response.  OR, do
something like Shanin's "Big Red X" scene, where you make a change, see the
problem go away, then make the change back and see if it reappears.  If it
goes away and come back 'under control,' then we say there is a causal
relationship, and we can fix the original problem.

Make sense?

Cheers,
Jay
--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
4444 North Green Bay Road
Racine, WI 53404-1216
USA

Ph: (262) 634-9100
FAX: (262) 681-1133
email: [EMAIL PROTECTED]
web: http://www.a2q.com

The A2Q Method (tm) -- What do you want to improve today?




.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Unbaisedness and Variance - Regression

Reply via email to