In article <[EMAIL PROTECTED]>,
EAKIN MARK E <[EMAIL PROTECTED]> wrote:
>I started a multiple regression using a dependent variable whose mean was
>zero and four independent variables. I created four more dependent variables
>by adding 10, 100, 1000, and 10000 to the first dependent. I expected the
>r-square of the no-intercept to always increase since the model is
>explaining why y differs from zero but after initially increasing, the
>r-square started to decrease again.
Consider a simplified model with only one predictor.
Let x be the (column) n-vector of scores on the predictor.
Wlog, let x'x = 1.
Let the dependent vector be y = z + c*u,
where c is an arbitrary scalar constant,
u is a vector whose elements are all sqrt(1/n),
and z is the vector of deviations from the mean on the "real" d.v.
Wlog, let z'z = 1.
Note that u'u = 1, z'u = 0, and the mean of y is c/sqrt(n).
The squared uncentered correlation of x with y is
(x'y)^2 (x'z + c*x'u)^2
r^2 = ---------- = ---------------.
(x'x)(y'y) 1 + c^2
For c near zero, r^2 approximates (x'z)^2,
the squared uncentered correlation of x with z.
As c gets very large, r^2 approaches (x'u)^2,
the squared uncentered correlation of x with u.
Similar relations hold when there are multiple predictors.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================