I'm still trying to perfect my answer, so I will take another
shot here. I don't know whether J Williams saw what I posted before;
but I am happy that DMR is calling it a bad test item.
On Thu, 01 Feb 2001 14:22:21 GMT, [EMAIL PROTECTED] (J. Williams)
wrote:
> On 30 Jan 2001 20:04:52 -0800, [EMAIL PROTECTED] (dennis roberts) wrote:
[ ... snip, much ]
JW >
> C is indeed the best choice. It is the ONLY correct answer. What is
> so awful about the correct choice? I don't get it!
DMR >
> >but as it stands ... it surely is a poor item that fails to keep straight
> >... appropriateness of the item GIVEN some objective
JW >
> The question yields a subtle view of a theoretical confidence
> interval. Maybe, I'm missing something salient here, but I think it
> is a fair question. Of course, I was not an English major either :-)
I think there are three different approaches that can be delineated to
saying what is "a good item."
(1) There is (something like) "Is the right answer given by someone
with a good IQ?" I think that we are all agreed that (C) should meet
that requirement. Further, I imagine that the item was validated
*statistically* by this standard -- marking "C" goes along with
higher scores on other test items.
(2) There is a narrower approach -- which, indeed, was the question
specified when this item was posted. "Does the item show whether the
student understands rounding?" Will it be answered correctly by
everyone who does, or could naive respondents be led astray?
Since a "broad jump measured accurately to the nearest foot"
is not something that anyone in the Western world has ever heard
of, is it really fair to ask an 8th grader to interpret what it might
mean? (I assume, the 8th grader is suppose to translate this,
immediately, into "This is a ROUNDING problem," and the rest
of us statisticians know what the item's answer is, because we
have overlearned exactly that same response.)
You demonstrate possible difficulties, perhaps, by debriefing
students who missed the item; or by comparing to other, related items;
or by noting that there are unexpected item-loadings in a large scale
factor analysis. But you usually will discover them by careful
face-inspection, which is what I provided (I hope) in earlier posts.
"If you can imagine a way that someone would misread the item,
then someone will." This is a mild version of Murphy's law. It is
practically a truism when you are designing items or forms -- the hard
part of your judgement is, figuring how much "problem" is too-much
problem. In the recent Florida election, we learned that "punched
cards" have an inherent error rate of over 1%. And a "butterfly
ballot" has a rate over 5%. How much does it matter that most of
these errors should befall that 15% of the voters in Florida who were
voting for the first time? - well, it means that our subjective
account should not assume that every voter is cool and experienced.
"Professionally speaking," the butterfly punch-ballot has to be
regarded as awful, no matter how much Jay Leno, etc., make fun
of the Florida voters instead.
Similarly for the test-item. If you are making assumptions about the
pupil's experience, vocabulary, acculturation, IQ, and attitude, then
you may forget to rate the item by how it measures "rounding."
Here is a minor question or observation. In the real world, does
anyone ever perform rounding, and blandly expect for it to be
recognized as such? Or don't we *explicitly* state that "this is
rounding and not truncation or estimation."
(3) The third approach is, "Is the answer technically correct?"
So far, it remains embarrassing and something-to-be-corrected,
when the keyed answer violates physics, or careful logic. Or if,
on close inspection, the question does not make good sense.
This is more important than slightly misleading some students.
Bad logic likely will be reflected in errors of the previous type,
but errors of (#3) need to be corrected, where (#2) do not.
It is harder to show test-makers that *they* are "wrong."
I have not had many people agree with me that, instead of being
purely logical, this item relies on well-understood jargon or idiom.
I 'm trying one more time.
It says, "measured accurately to the nearest foot." People keep
claiming that "accurate" must mean "it's 100% accurate" - so that
this conflation of accuracy (of measurement) and precision (of
reporting) is entirely expected and natural.
What if another item said that a blimp at 1000 feet saw the two jumps,
and "estimated each at 21 feet, accurate only to the nearest foot."
What does this imply about the maximum difference between the two
jumps?
What if it said, "estimated each at 21 feet and 6 inches, measuring
accurately only to the nearest foot"?
- actually, the occasional use of half-units (like 6 inches) is
probably a give-away that someone thinks that their *accuracy* is
about one-unit; they are promising not to err by more than 1/2, so
they refuse to round off, between .40 and .60, say.
- I think that I have just presented, in those last two things,
comments that are much more "real-life" than the statement in the
original test item. And "accurately" is ambiguous to the 14-year-old.
Finally, we round *numbers* if we don't want to fret about
measurement error. And we keep that language clear.
- this is still not perfect, but I hope I am improving it.
--
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================