Gronkfeatures <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>...
> ... I have found a macro for use with > SAS that supposedly works out the kappa values that I'm after. It is called > the magree.sas. Thanks for pointing out magree.sas. I wasn't aware of it before. This macro appears to calculate Fleiss' kappa for multiple raters. Fleiss' kappa is slighly different from Cohen's kappa. Fleiss' formula is appropriate when you don't know the identity of each rater, or where a different group of raters rate each subject. In your case, I suspect you know the identify of each rater, and that the same 3 raters rated all cases--so Fleiss' kappa would be somewhat biased. The bias is explained in a 1980 Psychological Bulletin by A. J. Conger. Ideally what you would want is a multi-rater generalization of Cohen's kappa. Such generalizations exist, but not in SAS (or SPSS). But when all raters rate every case, this will not be much different than calculating Cohen's kappa for each rater pair, and taking the average kappa. Conger's article suggests this approach is more accurate than Fleiss' kappa for designs such as yours. For ordinal measures, Kendall's W is not a bad idea. I believe it is basically an intraclass correlation, based on ranks. Note you can calculate Kendall's W without the magree.sas macro. Basically just run the procs and datasteps in the macro code after: /*********** Compute Kendall's Coefficient of Concordance, W ***********/ So in summary: 1. For the ordered category ratings: a. while it is not ideal, weighted kappa isn't too much different than an intraclass correlation. So in the interests of expedience, you could calculate weighted kappa using proc freq for each pair of raters, then take the average pairwise kappa. b. Supplement the above with Kendall's W. 2. For the non-ordered category ratings, calculate the average pairwise UNweighted kappa. Again, I suggest this only in the interest of expedience. Pay more attention to the significance tests (p values) here than to the actual magnitudes of kappa. The idea is to get significant p values, letting one reject the null hypothesis of rater independence. As described on my webpage, one should not lose sight of the raw levels of agreement (i.e., the proportion of times raters of the same case make the same rating). -------------------------------------------------------------------------- John Uebersax, PhD (858) 597-5571 La Jolla, California (858) 625-0155 (fax) email: [EMAIL PROTECTED] Statistics: http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm Psychology: http://members.aol.com/spiritualpsych --------------------------------------------------------------------------- . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================