On 24 Feb 2000, Victor Aina wrote:
> I was wondering if anyone might have an opinion
> about the impact of subtracting a constant e.g.
> the mean of a variable from the regressor that
> happens to be collinear with another one.
Depends on the variables and their interrelationships. If (say) X1 is a
variable ranging between 50 and 90, and X2 = X1**2, X2 will appear to be
collinear with X1 because the values are so far out on the parabola that
the curve is virtually indistinguishable from a straight line. (With
only finite precision in the data, you can omit the "virtually" if the
range is sufficientlky far from zero: 550 to 590, or 7050 to 7090, or
... .) If you subtract the mean from X1, and then construct X2 as
(X1 - mean)**2, the variables no longer appear collinear; indeed, if X1
is symmetrical, the correlation is zero between X1 and this X2.
More generally, one can always derive from X2 the part of it that
is orthogonal to X1; and the part of X3 that is orthogonal to X1 and
X2; and so on. For these variables, the variance inflation factors are
all unity (or, equivalently, the tolerances (reciprocals of the variance
inflation factors) are all unity).
> A little algebra demonstrates that for least
> squares regression, only the intercept term
> changes when a constant is subtracted from a variable.
> The other slope coefficients remain unchanged.
The precision of the coefficients may be changed, however.
> It will be nice to know how prediction/forecasting
> is affected. Surely the condition index falls.
> Is collinearity masked in some way?
Depends on whether it's "real" or "spurious" collinearity. (The apparent
collinearity between X and X**2 when the range of X is far from zero is
what I call "spurious collinearity", arising not from the relationship
implied by squaring but from the restriction of range.)
> Are the coefficents more efficient (in terms of variance) after
> subtracting the mean?
Again, it depends on what else is going on. If you first constructed X2
as the square of an X1 suffering from the kind of restricted range
described above, and then subtracted its mean from X1 without modifying
X2 (or even if you then subtracted its mean from X2), the (X1,X2)
correlation will not be changed, the apparent collinearity will still be
present, the coefficients will still have the same precision (or
imprecision) as for the unmodified variables. Only subtracting the mean
from X1 (or performing somke other linear transformation on X1) before
constructing other predictors that are logically dependent on X1 will
have any useful effect.
> What happens if we have a nonlinear regression model
> such as logistic regression etc.
Well, in logistic regression (so far as I understand it -- I've not been
a practitioner of it) the nonlinearity lies in the dependent variable,
not in the predictors; so I should think the above comments, which have
only to do with (apparent) collinearity among the predictors, would be
applicable in this case as well.
-- DFB.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
MSC #29, Plymouth, NH 03264 603-535-2597
184 Nashua Road, Bedford, NH 03110 603-471-7128
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================