On Tue, 7 May 2002 11:08:26 +0200, "Johannes Hartig" <[EMAIL PROTECTED]> wrote:
> Hi all, > I have the following problem which I don't find an actually > satisfying statistical method for: > The item discrimination index for questionnaire items (i.e. > the corrected item-total-correlation) is supposed to increase > with the number of similar items answered (when controlling > the item content). In several papers these "item context effects" > are tested by correlating (fisher-z-transformed) item discrimination > with item position in the quesionnaire (e.g. Knowles, 1988*). > What leaves me somehow unsatisfied about this technique is that > the "sample size" used for this analysis equals the number of > questionnaire items, not the number of respondents. - The subject of the analysis is a certain set of items. If you test each item in only one position, then you have each item just once. That is an awfully weak design, which I would be tempted to call an AWFUL design. If the item-positioins were not assigned randomly, *then* I would give in to temptation: Awful. > What I'm looking for is a method to test the _trend_ in > correlations of a series of variables x(1), x(2), ... x(k) with > another variable y. I know how to test the null hypothesis that > all k correlations are equal, but this is not exactly the question > I'm trying to answer. Is there any way to test the trend in series > of correlations based on the raw data, i.e. that uses the power > of the original sample size? Or is the correlation between k and > r(x(k),y) an adequate procedure, even if this means a "sample" > size of e.g. 36 items that were originally answered by 1200 > respondents? Well, as I mentioned, the correlation between r and k is not adequate, either, if the positions were not assigned by chance. Yes, you can test the correlations for homogeneity, but that is odd test, in a way -- for many purposes, including what you are tackling today, the reasonable assumption is that those correlations *are* different, with no argument needed. But I am a little bit ambivalent. I don't think I have explained - to myself - what makes one test based on N=1200 "okay," and another one, "not-okay". There is something here that is like using the 'pooled-within' variance, across all the polynomial effects, to analyse the linear trend in a repeated measures design: I think that it is mostly wrong, but not 100% of the time. I know it is better to have - a larger N, and - a larger difference in correlations, and - a large number of correlations. Here is a counter-example: Here is one reason that you do need to look at the individual correlations, which deserve to have a large enough N to have reliability -- If you had just 5 correlations, grading merely from .51 to .55, with N=25, I would believe *every* outcome -- any comparison based on those r's -- was chance. However, if this precise example had such a good correlation with k that the p-level was .0001 or smaller, I would believe that someone was lying about the design, or somehow cheating, or hitting the RARE chance: There was not enough 'information' with N=25 and r's from .51 to .55 to demonstrate any effect, except by accident. So: You need a large enough reliability in the measures to justify the accuracy claimed - or assumed - by the test that is being performed. When results come out *too good*, you do report them; but it is appropriate to warn that the effect was, somehow, in your opinion, outside of chance [I know that this reminds me of 'over-interpreting' in some other contexts, and I would be pleased if someone would cite parallels.] -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
