Hi On 10 Nov 2003, Robert J. MacG. Dawson wrote: > Robert Lundqvist wrote: > > I found in one of the textbooks we use that calculating correlation > > coefficients is not meaningful when you have categorical data. However, > > using dummy variables should be possible, shouldn't it? Either when you > > have one ordinary numerica variable and one dummy, or even when you have > > two dummy variables. If not, could someone please put me in the right > > direction so I can stop be so hesitating in class...Comments are welcome, > > even if it turns out that I should have understood this. > > Oh, it's _possible_, all right. It's just not *meaningful*, because > there are many ways to assign dummy variables to the levels of the > categorical variable > and typically each will give a different result. Is "banana" between > "apple" and "orange" or not? > > There is a sort of exception when there are two levels, in which case > all ways of labelling are equivalent up to linear transformation; but > there are better ways to deal with this special case.
I have to disagree here. Any ANOVA and contrasts can be analyzed by regression/correlation methods. So the regression analysis is just as meaningful as any anova would be. For a simple illustration, consider a study involving 2 control groups and 2 treatment groups. Three contrasts could be generated, ideally based on a priori expectations (e.g., -1 -1 +1 +1, -1 +1 0 0, 0 0 -1 +1). To take the bananas, apples, and oranges as a very hypothetical example, the researcher might examine the possibility that participants acted differently to round than elongated fruits, which leads to the contrasts -2 +1 +1, 0 -1 +1. In fact many texts now teach that ANOVA is simply a special case of the general linear model (i.e., regression and correlation). In reply to this post, Herman Rubin offered the following: From: Herman Rubin <[EMAIL PROTECTED]> Correlations are rarely appropriate, but regressions are. It means something that the effect of the dichotomous variable is something; it does not mean anything that it has a correlation of whatever with the variable being explained. Not only is normality not the rule, but it is not at all common. Standardization and transformations complicate the theory greatly. -----------------------end Herman------------------------- Again I would disagree. r^2 is the proportion of variability (as measured by SS) attributed to differences among the categories (i.e., SStreatment/SStotal). This is eta^2, and its root is interpretable as a regular correlation coefficent. Best wishes Jim ============================================================================ James M. Clark (204) 786-9757 Department of Psychology (204) 774-4134 Fax University of Winnipeg 4L05D Winnipeg, Manitoba R3B 2E9 [EMAIL PROTECTED] CANADA http://www.uwinnipeg.ca/~clark ============================================================================ . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
