Hi Don,
There are times when I realise the rust that has accumulated, and this is one
of them.
Changing the order of things a little, you (and D&S) are of course quite
correct that X variables are typically correlated, and that if they are not
the coefficients are the same as if a set of simple regressions are carried
out. Coincidentally, I was pointing this out to a class a couple of days ago
- but the class is 'not mathematically able', like most these days, so the
explanation was not of course at all technical. Rust......
With regard to correlation and collinearity - I have become used to
'explaining' collinearity to my classes in terms only of pairs of explanatory
variables, forgetting that the collinearity could involve a set of three or
more variables, and this 'pair-wise no collinearity' is, as I understand it,
equivalent to 'no linear correlation'. This suggests, incidentally, that 'not
collinear' is stronger than 'uncorrelated' (not *linearly* correlated) which
doesn't agree with your statement - is this so? It also suggests that
'collinearity' means more than just 'correlated'.
A useful way of picturing the situation is that each variable corresponds to
an axis, the angles between the axes determined by the correlation
coefficient. (I think, very uncertainly, that the correlation coefficient is
the cosine of the angle.) If variables are uncorrelated, the axes are
orthogonal; if they are perfectly correlated, the axes are identical. If
there is a linear combination between variables, the corresponding dimensions
collapse to a 'plane'. (This is all happening in k dimensions.) This
corresponds to the matrix X'X having rank less than k (for k variables) so
leads (as I understand it) to the collinearity problem.
In terms of the data, there is unlikely to be total collapse (just as a
sample correlation of exactly zero is highly unlikely) but you might get near
collapse. For only two variables highly correlated, the axes are nearly
indistinguishable; for three variables you will get a very low hill (this is
difficult to describe!). The problem then is to decide whether or not to
exclude variables - is the hill high enough to count as three variables, or
so low that one variabel should be excluded?
I think I stand by my original observation, that *in the data* there is
always some evidence of collinearity/correlation; if this evidence is strong
enough you have to reduce it by reselecting the variables.
In your third paragraph you seem to be identifying collinearity with
correlation - more precisely, that the problems with collinearity are those
of correlation - and to a large extent identifying 'the trouble' that I spoke
of.
Thanks for helping to chip off some of the rust. I know there is a lot
more.....
Regards,
Alan
"Donald F. Burrill" wrote:
> On Tue, 2 May 2000, Alan McLean wrote:
>
> > 'No collinearity' *means* the X variables are uncorrelated!
>
> This is not my understanding. "Uncorrelated" means that the correlation
> between two variables is zero, or that the intercorrelations among
> several variables are all zero. "Not collinear" means that there is not
> a linear dependency lurking among the variables (or some subset of them).
> "Uncorrelated" is a much stronger condition than "not collinear".
>
> > The basic OLS method assumes the variables are uncorrelated
> > (as you say).
>
> Not as presented in, e.g., Draper & Smith; who go to some trouble to
> show how one can produce from a set of correlated variables a set of
> orthogonal (= mutually uncorrelated) variables, and remark on the
> advantages that accrue if the X-matrix is orthogonal. But it is clear
> that they expect predictors to be correlated as a general rule.
>
> > In practice there is usually some correlation, but the estimates are
> > reasonably robust to this. If there is *substantial* collinearity you
> > are in trouble.
>
> If there is collinearity _at_all_ you are in trouble; further, if the
> correlations among some of the predictors are high enough (= close enough
> to unity), a computing system with finite precision may be unable to
> detect the difference between a set of variables that are technically not
> collinear but are highly correlated, and a set of variables that _are_
> collinear. (E.g., X and X^4 are not collinear; but if the range of X
> in the data is, say, 101 to 110, a plot of X^4 vs X will look very much
> like a straight line.) For this reason various safety features are
> usually built in to regression programs: variables whose tolerance value
> with respect to the other predictors is lower than a certain threshold
> (or whose variance inflation factor -- the reciprocal of tolerance -- is
> above a corresponding threshold) are usually excluded from an analysis;
> although it is often possible to override the system defaults if one
> thinks it necessary. The existence of such defaults is clear evidence
> that at least the persons responsible for system packages expected that
> variables would often have substantial intercorrelations.
>
> And if it were a requirement (= assumption) that predictors be
> uncorrelated, it would not be necessary to worry about inverting a pxp
> submatrix of predictors: the simple linear regression coefficient for
> predicting Y from X_j alone would be unaffected by the presence of other
> predictors in the model.
> -- Don.
> ------------------------------------------------------------------------
> Donald F. Burrill [EMAIL PROTECTED]
> 348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
> MSC #29, Plymouth, NH 03264 603-535-2597
> 184 Nashua Road, Bedford, NH 03110 603-471-7128
--
Alan McLean ([EMAIL PROTECTED])
Department of Econometrics and Business Statistics
Monash University, Caulfield Campus, Melbourne
Tel: +61 03 9903 2102 Fax: +61 03 9903 2007
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================