(Reply to original posted and to the edstat list:) I assume that by "dummy variable" you mean a variable with two values, 0 and 1. (Although the actual coding doesn't matter, so long as there be only two values. I sometimes use 6 and 13 for a dummy variable encoding sex of respondent, so as to get "F" and "M" respectively as plotting symbols when using MINITAB's letter plot.)
The categorical (!) assertion that correlation coefficients are not meaningful for categorical data, currently found in a number of textbooks, has bothered me for some time, particularly when the author applies it (as is frequently the case) to the variable "sex". Earlier, textbooks (e.g. Glass & Stanley 1971) made it quite clear that correlations with dichotomies were not only meaningful (although one had to be careful about the *direction* of the variable, as affecting the sign of the coefficient, in interpreting results of a regression analysis, for example), but had once been assigned especial names: + point-biserial correlation coefficient for the correlation between a dichotomy and a quasi-continuous variable; + phi coefficient for the correlation between two dichotomies; + biserial correlation coefficient for the correlation between an artificial dichotomy (made by imposing a cut-point on a "continuous" variable) and a "continuous" variable; + tetrachoric correlation coefficient for the correlation between two such artificial dichotomies. The first two are simple consequences of applying the usual product-moment arithmetic to data when one or both variables are dichotomous; and as another respondent pointed out, their squares are perfectly legitimate representations of the proportion of variance in one variable "explained by" (or shared with) the other variable. The last two represent attempts to estimate, under assumptions that may or may not be reasonable in context, what the product-moment correlation would have been if one had had the original data (prior to imposing a cut-point on it) instead of the dichotomy. One may of course agree to the assertion without qualification, when the categorical variable in question involves more than two categories. As others have pointed out, a system of one variable with k categories may be converted to a system of (k-1) dichotomies; and it may then be reasonable to analyze them via a series of what in the ANOVA context would be called contrasts. The various correlation coefficients thus generated (phi and biserial, say) may be somewhat less easy to interpret than in the case of a single, rather obvious, dichotomy. But of course, just because something ain't easy is no reason to avoid trying it. -- DFB. On Mon, 10 Nov 2003, Robert Lundqvist wrote: > I found in one of the textbooks we use that calculating correlation > coefficients is not meaningful when you have categorical data. > However, using dummy variables should be possible, shouldn't it? > Either when you have one ordinary numerical variable and one dummy, or > even when you have two dummy variables. If not, could someone please > put me in the right direction so I can stop be so hesitating in class. > ... Comments are welcome, even if it turns out that I should have > understood this. ----------------------------------------------------------------------- Donald F. Burrill [EMAIL PROTECTED] 56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816 . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
