Robert Lundqvist <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]:
> I found in one of the textbooks we use that calculating correlation > coefficients is not meaningful when you have categorical data. However, > using dummy variables should be possible, shouldn't it? Either when you > have one ordinary numerica variable and one dummy, or even when you have > two dummy variables. If not, could someone please put me in the right > direction so I can stop be so hesitating in class...Comments are welcome, > even if it turns out that I should have understood this. It's certainly possible and the coefficients have meaning, but they can be hard to interpret. Some simple algebra shows that the correlation between a dichotomous variable X and a continuous variable Y works out to R=sqrt(p*q)*(M1-M0)/S where p is the proportion of X's that are 1, q is the proportion of X's that are zero, M1 is the mean of the Y's corresponding to X=1, M0 is the mean of the Y's corresponding to X=0, and S is the standard deviation of Y. So it's actually a scaled version of a commonly-used measure of mean difference (Cohen's D), with the scaling depending on the X margins. Thus talking about the correlation between an indicator and a continuous variable is really talking about the difference in mean of two groups. The scaling factor means that the correlation coefficient may not be able to reach +/-1 for some X margins, i.e. proportions of the indicator. *Testing* such a correlation is exactly equivalent to testing for a mean difference between two groups and will give the same results, but it would be rather strange to report the results as a test of correlation rather than an ordinary t-test for mean difference. And it would make more sense to report D as the effect-size measure rather than R, since its interpretation doesn't depend on the X marginals. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
