Regression is the act of obtaining least squares (best fit) estimates for B
in a linear model, where y = XB + E, where y is a vector of observed
dependent values, X is a matrix of independent values, and E is a random
variable (usually normal with mean=0 & variance = sigma^2).
For simple linear regression X is comprised of a column of 1's associated
with the intercept and the typical x values.
For multiple regression we simply add more column of x values.
For k-way fixed, mixed, and random effects ANOVA's X is comprised of sets of
dummy variables (0's & 1's) that correspond to the respective groups or
factor levels.
ANOVA is a methodology for splitting up the variation associated with a
linear model - some limit it to fixed, mixed, and random effects models, but
the more classic regression models most certainly can utilize ANOVA's.
Another thing to consider is that the classic ANOVA models (fixed, mixed, &
random effects) are typically over-parameterized. That is, you have to play
around with the X-matrix to make sure it is orthogonal. Minitab and (I
think) SAS make the last coefficient equal to the negative sum of the rest
of the coefficients. You could also just make the last coefficient equal to
zero.
They both involve linear models.
"George W. Cobb" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
>
> I, too, think of ANOVA and regression as variations on a common
> theme. Here's an additional way in which they differ: For balanced
> ANOVA, the decomposition of the data into sums of squares and degrees of
> freedom is determined by a group of symmetries. For example, consider a
> one-way randomized complete block design with R rows as blocks and
> C columns as treatments. The analysis is invariant under all row
> permutations, and all column permutations, i.e., interchanging any two
> rows of the data, or any two columns of the data, won't change the
> analysis. If you now think of the data as a vector in RxC-dimensional
> space, the symmetries (row permutations, column permutations) determine
> invariant subspaces; these are precisely the subspaces you project the
> data vector onto to get the SSs and dfs. In regression, the subspaces
> you project onto are determined directly by a spanning set of carrier
> variables; in balanced ANOVA, the subspaces are uniquely determined by the
> symmetries, and the spanning sets are somewhat arbitrary. (I claim
> no credit for this lovely way of looking at things; I learned it from
> Peter Fortini and Persi Diaconis. It's written up in Fortini's
> dissertation from the 1970s, and Diaconis's IMS lecture notes on group
> theory and statistics.)
>
> Of course you only have such clean sets of symmetries for balanced
> designs, and the approach via symmetries doesn't address such things as
> the difference between fixed and random effects, which Bob
> Wheeler raises. Nevertheless, to the extent that I think of ANOVA
> as distinct from regression, I find the role of symmetries worth
> keeping in mind.
>
> George
>
> George W. Cobb
> Mount Holyoke College
> South Hadley, MA 01075
> 413-538-2401
>
>
>
>
> =================================================================
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
> http://jse.stat.ncsu.edu/
> =================================================================
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================