(Posted to Mark Eakin and to edstsat.)
Found this while cleaning out a too-long-neglected INBOX. Did you ever
get a useful response? The only response I've retained was the one by
Radford Neal you subsequently quoted in a follow-up posting, and it
looked as though Radford hadn't understood what you were about.
Possibly I don't either, but here's a sort of heuristic try:
Step 1. Draw pictures. The 5-dimensional space you describe is hard to
depict in e-mail, so let me simplify the problem to a single predictor,
whose values are all positive numbers, and a single response variable,
whose mean is zero. I assume that the correlation between the two is
positive. (Use a monospaced font like Courier to view these diagrams.)
Case I is the original data, which look sort of like this:
- x
- x
- x
0 +---------+---------+---------+x--------+---------+
0 1 2 x 3 4 5 predictor
- x
- x
where "x" describes the regression line through the data without forcing
the intercept. Now with the intercept forced through (0,0) the R-sq
won't be as large as for the non-forced analysis, because the forced
regression line is essentially horizontal.
Case II. Now you lift all those x's by 10 points, so the dependent mean
is 10 instead of zero. The forced line now has a positive slope, the
agreement with the non-forced line is closer, and R-sq has increased.
- x
- x
- x
10 + x
- x
- x
- x
-
0 +---------+---------+---------+---------+---------+
0 1 2 x 3 4 5 predictor
Cases III, IV, etc. This continues, as larger values are added to the
dependent variable, until the non-forced line can be projected through
the origin; at which point R-sq is at (or at least near) its maximum.
Beyond this point, as the "data" are raised ever higher, the slope of
the data becomes ever more anti-parallel to a line through the origin,
and R-sq decreases.
Were the predictor ALSO centered to start with, so that the unforced
line ran through the origin, the initial R-sq would have been the
maximum observed, and the subsequent modifications would have shown
smaller values of R-sq. Or so I think; but since all subsequent lines,
forced through the origin, would be vertical, either those smaller
values would all be the same, or they would soon approach an asymptotic
limiting value.
Were the initial correlation negative, I _think_ the initial R-sq would
have declined as the "data" were lifted, until the "natural" line of the
data was just about perpendicular to a line through the origin, after
which R-sq would increase, though maybe not by much.
OTOH, I may be wholly out to lunch on this...
On Fri, 22 Aug 2003, EAKIN MARK E wrote:
> I started a multiple regression using a dependent variable whose mean
> was zero and four independent variables. I created four more dependent
> variables by adding 10, 100, 1000, and 10000 to the first dependent. I
> expected the r-square of the no-intercept to always increase since the
> model is explaining why y differs from zero but after initially
> increasing, the r-square started to decrease again. I used both SAS
> and NCSS to double check my results. I must be forgetting something
> (notice how I avoided saying that I haven't the foggiest idea why).
> Any explanations?
-----------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================