Gronkfeatures <[EMAIL PROTECTED]> wrote in message 
news:<[EMAIL PROTECTED]>...

> ... I have found a macro for use with 
> SAS that supposedly works out the kappa values that I'm after. It is called 
> the magree.sas. 

Thanks for pointing out magree.sas.  I wasn't aware of it before.

This macro appears to calculate Fleiss' kappa for multiple raters. 
Fleiss' kappa is slighly different from Cohen's kappa.  Fleiss'
formula is appropriate when you don't know the identity of each rater,
or where a different group of raters rate each subject.

In your case, I suspect you know the identify of each rater, and that
the same 3 raters rated all cases--so Fleiss' kappa would be somewhat
biased. The bias is explained in a 1980 Psychological Bulletin by A.
J. Conger.

Ideally what you would want is a multi-rater generalization of Cohen's
kappa.  Such generalizations exist, but not in SAS (or SPSS).  But
when all raters rate every case, this will not be much different than
calculating Cohen's kappa for each rater pair, and taking the average
kappa.  Conger's article suggests this approach is more accurate than
Fleiss' kappa for designs such as yours.

For ordinal measures, Kendall's W is not a bad idea.  I believe it is
basically an intraclass correlation, based on ranks.  Note you can
calculate Kendall's W without the magree.sas macro.  Basically just 
run the procs and datasteps in the macro code after:

/*********** Compute Kendall's Coefficient of Concordance, W
***********/

So in summary:

1.  For the ordered category ratings:  

    a.  while it is not ideal, weighted kappa isn't too much 
        different than an intraclass correlation.  So in the 
        interests of expedience, you could calculate weighted 
        kappa using proc freq for each pair of raters, 
        then take the average pairwise kappa.

    b.  Supplement the above with Kendall's W.

2.  For the non-ordered category ratings, calculate the average 
    pairwise UNweighted kappa.  Again, I suggest this only in the
    interest of expedience. Pay more attention to the 
    significance tests (p values) here than to the actual 
    magnitudes of kappa.  The idea is to get significant p values, 
    letting one reject the null hypothesis of rater independence.

As described on my webpage, one should not lose sight of the raw
levels of agreement (i.e., the proportion of times raters of the same
case make the same rating).

--------------------------------------------------------------------------
John Uebersax, PhD             (858) 597-5571 
La Jolla, California           (858) 625-0155 (fax)
email:  [EMAIL PROTECTED]

Statistics:  http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
Psychology:  http://members.aol.com/spiritualpsych
---------------------------------------------------------------------------
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to