|
----- Original Message ----- From: "Robert J. MacG. Dawson" <[EMAIL PROTECTED]>To: <[EMAIL PROTECTED]>Sent: Thursday, June 20, 2002 8:48 AMSubject: Re: equation for constrained linear
regression > Presumably if one were stranded
on a desert island with only Microsoft > Office, one could get the
through-the-origin regression line correctly > by putting (-x,-y)
into the data set for each (x,y). Some regression >
diagnostics would be all fouled up but it is my understanding that
you > don't get good regression diagnostics from Excel
anyway. > > However, I do wonder why this would be done; the
through-the-origin > constraint seems in many cases to imply that
data with near-0 x > coordinates ought to have not only small y
values but also small > variance. In most cases (not all) one would
probably do better to > log-transform and fit a
slope-constrained-to-1 OLS model; this is > equivalent to taking the
geometric mean of all the ratios (or, indeed, > the ratio of the
geometric means) > > -Robert Dawson > . >
. >
================================================================= >
Instructions for joining and leaving this list, remarks about the >
problem of INAPPROPRIATE MESSAGES, and archives are available at: >
.
http://jse.stat.ncsu.edu/
. >
================================================================= Hi, Robert -- As you appropriately
ask:"However, I do wonder why
this would be done" MY PRIMARY USE OF THE "THROUGH-THE-ORIGIN" (OR NO-INTERCEPT)
OPTION IS AS AN INSTRUCTIONAL
STRATEGY TO "UNDERSTANDING" REGRESSION /LINEAR
MODELS. It is easy to teach
students about "cell means" using the "no-int" options
(available in MOST REGRESSION COMPUTER
PROGRAMS), since the
"no-int" option produces "averages" from the least-squares solution
when you have a set of mutually exclusive groups coded as 'dummy'
variables. -- and averages are
the numerical values that students can understand for prediction in
situations that involve "mutually exclusive
categories". Then later, students can
learn about how to use the "default" when appropriate. But for
"understanding" the difference between the "default" and the "no-int"
option I feel that it is best to START WITH THE NO-INT OPTION FROM AN
INSTRUCTIONAL POINT-OF-VIEW. The lack of
understanding of the difference between the two situations probably is why the WRONG TOTAL
SUM OF SQUARES was used in the Excel "no-int"
situation.
Many comments on various lists show that many folks
don't understand the difference between the "default"
and "through-the-origin (no-int)" models.. If they had been first
taught about regression via the "no-int" model, they might better
understand what the "default" is doing. --
Joe .
|