Steve) Thanks very much for your response.
One might inquire, if one were pursuing this matter in a little more
depth, why one would not prefer a continuous approximating distribution
(e.g., normal, if that be appropriate, as is often the case), on the
basis either that the empirical CFs at hand represent an instance drawn
from such an idealized population, or that the continous function is an
adequate approximation to the true population distribution; since the
purpose you describe clearly is to apply the CF information to some
(hypothetical?) set of students whose scores are not in fact
represented in the data in hand.
Steve) Yeah, a normal approximation might be a good idea. Our data are
typically close to being normally distributed, though since Rasch
measurement is used no assumptions of normality are made (and given
this, I don�t know how the suggestion would go down, but still�.). Is
it about (hypothetical) students not represented in the data? Well,
yes and no. Not literally, but in essence, yes � we want to make an
interpolation so as to more closely approximate how many
students 'might have actually' scored below a relatively more precise
score point than our test provides for. For example, we may have 210
students with an ability (logit) of �1.32, 330 students at �0.81, and
wish to closer approximate how many students would hypothetically score
below a value of, say, �1.01. See, the percentages are reported
publicly from year to year, and large fluctuations may cause a stir!
We're in essence trying to anticipate what may happen with a different
test but the same 'cut score' on the same scale, in future years
(assuming the ability distribution stays fairly stable). Of course,
the only proper way to do this is to measure more accurately (not an
option), but obviously I�m after the best way to approximate given what
we have.
(Of course, the problem you describe below still arises, in terms of
how one converts from the discrete empirical CF function to the
(idealized?) continuous function; this is much less a problem if the
continuous function is obtained from information other than the CFs
themselves -- e.g., an approximating normal distribution would be
derived from the empirical mean and standard deviation, not from the
empirical CFs.)
Steve) I can only see us doing this if the normal (or other
distribution) is a close approximation at all, or at least most, points
along the scale. But thanks, this is well worth exploring.
If by "cumulative frequency" ("CF" above) you mean "observed frequency
of responses less than or equal to this score value", and especially if
these CFs have been cumulated over a grouped empirical frequency
distribution, your logic is impeccable. If you've been cumulating at
the level of individual score values, there may be room for SOME
quibbling.
Steve) No, I mean < , though I don�t see that it makes a great deal of
difference for interpolation given that we may be talking about any
point up to a couple of decimal places on a scale of range about �5 to
+5.
First, make sure you're all on the same wavelength. You clearly are
thinking in terms of "<=" CFs; plotting at the lower limit would be
appropriate for "strictly <" CFs (or equivalently ">=" CFs). Plotting
at the midpoint would be reasonable if one took for one's CF the
midpoint between a "strictly <" CF and a "<=" CF. If upon examination
it turns out that your colleagues (?) really think they're dealing
with "<<=" CFs:
Steve) I�m not sure I explained in sufficient detail. We want to make
interpolations, potentially at any point on the continuum (to a couple
of decimals). Nonetheless, this is something that needs to be
explicitly clarified, you�re right. It hasn�t been to date, so far as
I�m aware (I�ve assumed everyone means �percentage below the score�).
You might ask them how they view the two intervals at the extreme ends
of the CFs. In terms of relative cumulative percents (C%s), what scores
then apply to the upper and lower limits of (1) the lowest non-empty
score interval; (2) the highest score interval? And in particular, what
C% applies to the upper limit of the highest interval? Either of the
two alternatives you report implies a C% > 100% here, which ought to be
absurd enough for anyone with a decent grasp of reality.
Steve) That�s it! Perfect way to make the point, I think. I did think
of that some time ago, but I must admit it has slipped my mind since.
A reductio ad absurdum should hit the spot! Thanks again.
Another approach is to inquire how one would arrange a CF
downward -- i.e., where the C%s range from 0 at the maximum value to
100% at the minimum, and the CFs represent the frequency of responses
greater than or equal to this score value.
Steve) Yes, I�ve raised this.
As for references, well the logic is all that concerns me, I can assure
you. However, that done, anything else to make the case would be
good. I've consulted with a couple of texts already, and they
recommend plotting at the 'exact upper limit' as I'd expect. Problem
is, someone (quite highly regarded) has suggested that different texts
recommend different methods (either plotting CFs at upper limit or mid-
point). This seems OK for a basic visual representation, but not for
making interpolations.
If possible, would you mind reconsidering in light of the fact that we
are making interpolations as described above. I�d be interested to
hear your thoughts.
Thanks again!
Steve.
>
Sent via Deja.com http://www.deja.com/
Before you buy.
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================