[I'm top-posting a couple of comments, and deleting most of my own post that was cited.]
You seem to make one of my points -- that the popular ICCs will cover up mean differences, which might or might not be interesting. It may also cover up a single poorly correlated rater, among multiple raters. That can be good for planning for new raters, but it is not-so-good for training raters or for reporting results in full. On 13 May 2004 07:18:47 -0700, [EMAIL PROTECTED] (Paul R Swank) wrote: > First, let's consider the 2 observation case. I have 2 assessments of a > behavior rating taken 20 minutes apart; I wish to know how reliable the > assessments are. There are two potential sources of error, the relative > error over time, in which the order of scores for subject a and subject b on > the two assessments may be the same or different, and the absolute error in > which all subjects may be lower on the second assessment. If I do a Pearson > correlation between the two, I find a correlation of .78097 (n=313, p < > .0001). I do an analysis of variance with repeated measures on time (the > equivalent of the paired t-test, and find a significant difference between > the means (time 1, mean = 3.377, sd=1.10; mean 2 = 3.291, sd=1.16; F(1, 312) > = 4.16; p = .0422). Now, I do a generalizability analysis. I find the > following variance components: > > Subjects .99269 > Time .00300 > Subjects by Time .27842 > > The generalizability coefficient (or ICC) considering only the relative > error (interaction) is > > .99269 / (.99269 + .003) = .99269/1.27111 = .78096 which is the Pearson - oops! for that first denominator - > Correlation within rounding. I then figure the coefficient taking into > account the mean difference as well. > > .99269 / (.99269 + .003 + .27842) = .99269 / 1.27411 = .779. > > I have had a minimal effect on the reliability as should be obvious by the > variance component for time, which is very small relative to the other > variance components. > > Thus, even though the difference between time 1 and 2 is significant (due in > part to the large sample and the strong correlation between two observations > taken 20 minutes apart), the effect on the reliability is small. Of course, > I could observe that in the means as well, since they re very close, but of > course, when you see two means, many people want to know if they are > statistically different. > > Add to this result, the fact that, because in reality I have 5 assessments > of the observed variable over an hour's time, the generalizability result is > much easier to deal with than is 10 unique Pearson correlations and an ANOVA > (hopefully not 10 paired t-tests), and it becomes clear that the > generalizability analysis is cleaner than breaking the analysis into two > parts. > [snip sig.] > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Richard Ulrich > Sent: Wednesday, May 12, 2004 2:52 PM [snip, his and mine] RU > > Yes, it is the overall impact, and that can be useful for the > *final* statement, especially when a very precise statement of > overall impact is warranted -- because, for instance, power analyses are > being based on the exact value of the exact form of ICC that is needed: Same > versus different raters; single versus > multiple scorers. > > And I think it is an over-generalization to prefer an ICC when the issue is > the cruder one of apparent adequacy. The ICC is less informative (about > means) and less transparent (multiple versions available to select, all of > them burying the means). > > [snip, rest] -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================