On 16 Dec 2002 19:04:07 -0600, Brian Sandle
<[EMAIL PROTECTED]> wrote in part:
>
> > Perhaps you can point me to the derivation of how the long Pearson
> > recipe reduces to the much shorter Spearman when ranks are entered
> > rather than scores.

C'mon, Brian, the work isn't THAT hard.  You said that you're using
Bruning & Kintz, which is a cookbook par excellence.  (I'll sing its
praises elsewhere, if anyone is interested.)  While it will not contain
the derivation you seek, it does contain references.  Fairly complete
references, as I recall.  Consult them.  You will find the gory details
of what is outlined below.

Although others have tried to convey this to you, apparently without a
lot of success, I'll have a go at it also.

If the variable X has no tied values, the rank of X comprises the
natural numbers from 1 to N.  (Ditto, of course, for Y.)  You can
readily observe that the sum of the ranks (= 1 + 2 + 3 + ... + N) is a
function only of N;  and therefore so is the mean of the ranks.  Also,
the sum of squares of the ranks (= 1 + 4 + 9 + ... + N^2), and therefore
the sum of squared deviations of the ranks from their mean, is a
function only of N.  And the product of (the deviations of the ranks of
X from their mean) by (the deviations of the ranks of Y from their mean)
is a function only of N and of the correlation between X and Y.
 Applying the various simplifications thus available to the standard
formula for a product-moment correlation coefficient yields the usual
formula for the Spearman rank correlation.  As another respondent has
pointed out, this formula strictly applies only when there are no tied
ranks.  If there are tied values in X (or Y, or both), and the mean of
the tied ranks is assigned to the values in question, the sum of the
ranks remains unchanged, but the sum of squares differs.  Hence the
preference expressed by several respondents:  to produce the ranks of
the variables in question, and apply the standard ("Pearson") method of
calculation to those ranks to obtain the equivalent rank correlation.

If one suspects that X and Y are non-linearly related, and has some idea
of what kind of non-linearity is involved, it is usually preferable to
apply that specific non-linear transformation, instead of ranking the
values.  From the ranks one cannot recover the original values;  whereas
for more specific transformations there often is an inverse that will
recover them.  (Thus, if you substitute the reciprocal 1/X for X, you
can get X back again by taking the reciprocal of 1/X;  if you use the
logarithm of X, you can take the antilogarithm of (log X);  etc.)

Of course, the sensible thing to do is first to plot Y vs. X and see
whether the relationship is approximately linear (in which case there's
little point in bothering with non-linear approaches of ANY kind);  and
if it's not, whether it is at least monotonic (if it is not, there's
very little point in bothering with ranks, although some other kind of
non-linearity may be possible to model).

   --  DFB.
 -----------------------------------------------------------------------
 Donald F. Burrill                                            [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110                 (603) 626-0816
 [was:  184 Nashua Road, Bedford, NH 03110               (603) 471-7128]

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to