(Reply to original posted and to the edstat list:)

I assume that by "dummy variable" you mean a variable with two values, 0
and 1.  (Although the actual coding doesn't matter, so long as there be
only two values.  I sometimes use 6 and 13 for a dummy variable encoding
sex of respondent, so as to get "F" and "M" respectively as plotting
symbols when using MINITAB's letter plot.)

The categorical (!) assertion that correlation coefficients are not
meaningful for categorical data, currently found in a number of
textbooks, has bothered me for some time, particularly when the author
applies it (as is frequently the case) to the variable "sex".
 Earlier, textbooks (e.g. Glass & Stanley 1971) made it quite clear that
correlations with dichotomies were not only meaningful (although one had
to be careful about the *direction* of the variable, as affecting the
sign of the coefficient, in interpreting results of a regression
analysis, for example), but had once been assigned especial names:
 + point-biserial correlation coefficient for the correlation between a
dichotomy and a quasi-continuous variable;
 + phi coefficient for the correlation between two dichotomies;
 + biserial correlation coefficient for the correlation between an
artificial dichotomy (made by imposing a cut-point on a "continuous"
variable) and a "continuous" variable;
 + tetrachoric correlation coefficient for the correlation between two
such artificial dichotomies.

The first two are simple consequences of applying the usual
product-moment arithmetic to data when one or both variables are
dichotomous;  and as another respondent pointed out, their squares are
perfectly legitimate representations of the proportion of variance in
one variable "explained by" (or shared with) the other variable.
 The last two represent attempts to estimate, under assumptions that may
or may not be reasonable in context, what the product-moment correlation
would have been if one had had the original data (prior to imposing a
cut-point on it) instead of the dichotomy.

One may of course agree to the assertion without qualification, when the
categorical variable in question involves more than two categories.

As others have pointed out, a system of one variable with k categories
may be converted to a system of (k-1) dichotomies;  and it may then be
reasonable to analyze them via a series of what in the ANOVA context
would be called contrasts.  The various correlation coefficients thus
generated (phi and biserial, say) may be somewhat less easy to interpret
than in the case of a single, rather obvious, dichotomy.  But of course,
just because something ain't easy is no reason to avoid trying it.

  -- DFB.

On Mon, 10 Nov 2003, Robert Lundqvist wrote:

> I found in one of the textbooks we use that calculating correlation
> coefficients is not meaningful when you have categorical data.
> However, using dummy variables should be possible, shouldn't it?
> Either when you have one ordinary numerical variable and one dummy, or
> even when you have two dummy variables. If not, could someone please
> put me in the right direction so I can stop be so hesitating in class.
> ... Comments are welcome, even if it turns out that I should have
> understood this.

 -----------------------------------------------------------------------
 Donald F. Burrill                                         [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110                 (603) 626-0816
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to