Hi

On 10 Nov 2003, Robert J. MacG. Dawson wrote:
> Robert Lundqvist wrote:
> > I found in one of the textbooks we use that calculating correlation
> > coefficients is not meaningful when you have categorical data. However,
> > using dummy variables should be possible, shouldn't it? Either when you
> > have one ordinary numerica variable and one dummy, or even when you have
> > two dummy variables. If not, could someone please put me in the right
> > direction so I can stop be so hesitating in class...Comments are welcome,
> > even if it turns out that I should have understood this.
> 
>       Oh, it's _possible_, all right. It's just not *meaningful*, because
> there are many ways to assign dummy variables to the levels of the
> categorical variable 
> and typically each will give a different result.  Is "banana" between
> "apple" and "orange" or not?
> 
>       There is a sort of exception when there are two levels, in which case
> all ways of labelling are equivalent up to linear transformation; but
> there are better ways to deal with this special case.

I have to disagree here.  Any ANOVA and contrasts can be analyzed
by regression/correlation methods.  So the regression analysis is
just as meaningful as any anova would be.  For a simple
illustration, consider a study involving 2 control groups and 2
treatment groups.  Three contrasts could be generated, ideally
based on a priori expectations (e.g., -1 -1 +1 +1, -1 +1 0 0, 0 0
-1 +1).  

To take the bananas, apples, and oranges as a very hypothetical
example, the researcher might examine the possibility that
participants acted differently to round than elongated fruits,
which leads to the contrasts -2 +1 +1, 0 -1 +1.

In fact many texts now teach that ANOVA is simply a special case
of the general linear model (i.e., regression and correlation).


In reply to this post, Herman Rubin offered the following:

From: Herman Rubin <[EMAIL PROTECTED]>
Correlations are rarely appropriate, but regressions are.
It means something that the effect of the dichotomous variable
is something; it does not mean anything that it has a correlation
of whatever with the variable being explained.

Not only is normality not the rule, but it is not at all
common.  Standardization and transformations complicate the
theory greatly.
-----------------------end Herman-------------------------

Again I would disagree.  r^2 is the proportion of variability (as
measured by SS) attributed to differences among the categories
(i.e., SStreatment/SStotal).  This is eta^2, and its root is
interpretable as a regular correlation coefficent.


Best wishes
Jim

============================================================================
James M. Clark                          (204) 786-9757
Department of Psychology                (204) 774-4134 Fax
University of Winnipeg                  4L05D
Winnipeg, Manitoba  R3B 2E9             [EMAIL PROTECTED]
CANADA                                  http://www.uwinnipeg.ca/~clark
============================================================================

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to