So far, so good. We agree that if we are just compared facts of each case,
then there is no generalization, no inference and no statistical tests. We
agree that one can hypothesize anything, presume a null hypothesis, do a
re-randomization and see if what actually happened is statistically
significant -- without generalizing from a sample of subjects to a larger
population of subjects.
The real question, as you pointed out, is RELEVANCE. How is the result of a
statistical test involving a hypothetical contra-factual situation relevant
to these very limited factual cases?
I agree that statistical testing need not be relevant to every claim.
Claiming that the MIT female professors had less citations than their male
counterparts -- in the years studied -- is factual -- not inferential.
Claiming that the black basketball players had more points per person than
their white counterparts -- in the games studied -- is factual -- not
inferential. In your case of committee assignments, claiming that the women
had poorer assignments than the men -- in the time period studied -- is
factual -- not inferential.
But in most of your examples MORE is being claimed. In most cases, the
claim includes an inference. Once the claim involves an inference, then a
statistical test may be relevant.
In one case, the claim was discrimination (causal explanation of observed
differences); in another the claim was greater scoring ability (causal
explanation of observed differences). Sometimes it is difficult to tell
from the words whether one is talking about the outcome (greater scoring) or
the causal process (greater scoring ability). The outcomes are observable
and factual; the causes are typically non-observable and inferential.
Neither discrimination (motives of others) nor ability (internal
potentiality) are directly observable. Both must are "internal" and must be
inferred from their effects. Presumably given an infinite amount of time in
a static situation, with more and more controls, one could be contextually
certain one way or the other. But in the short run, without such controls,
uncertainty abounds.
In both cases, the inference involves generalizing from a small "sample" of
time to a larger "population" of time. Thus, the strength of the argument
is influenced by the time-span of the data. In the case of MIT, had the
data been based on only one month, the case would be much weaker in support
of discrimination than if we had data for 12 years. In the case of
basketball scoring of white and black players, the case would be much weaker
if we included only one quarter of one game than if we had included many
games.
How can we measure the influence of the time span involved in the data?
Here is where IMHO one can make a case for statistical tests being RELEVANT.
PS. Just because MIT can "attribute" an outcome (difference in pay/status)
to a particular cause (discrimination) does not mean their argument is
strong. A claim involving the existence/influence of an unobservable
(discrimination) requires evidence. In this case, I think statistical
inference may provide some of that evidence.
If we thought about these matters in the context of process control
(where we sample in time) then the importance of time might be more obvious.
---------------------------------------------
"Irving Scheffe" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> Milo:
>
> Sure, although I don't see how that is relevant
> to the MIT situation, which attributed
> the current status of women there to
> discrimination, based on an undisclosed
> methodology.
>
> More generally, one CAN indeed do randomization tests
> on similar data even though there is no inference toward
> a larger population, and no intention of inferring
> later performance, and no random sampling from a
> larger population.
>
> For example, suppose a department head insists that committee
> assignments are made by completely random selection, but
> the 4 women discover that they have (by some objective criterion)
> the 4 worst assignments, while the 4 men faculty have the 4 best
> assignments. One can test the hypothesis that the assignment
> is random with respect to quality, and reject it at commonly used
> significance levels, by asking, "What is the probability
> of observing an imbalance this large if these 8 assignments
> had been randomly assigned to these 8 people"?
>
> Quality of Assignment (10 point scale)
>
> Males Females
> 6 1
> 7 3
> 8 2
> 9 4
>
>
> Since there are 70 possible assignments,
> and this one achieves the absolute worst
> imbalance, things look bad for the
> department head. The probability is
> 1/70 of obtaining an imbalance greater
> than or equal to the one observed, given
> random sampling.
>
> Note, we need not assume that the men
> and women have been sampled randomly
> from a larger population to perform
> this combinatorial calculation.
>
> On the other hand, one would not need a significance test to
> evaluate the statement that "The women, at this time, have much
> worse committee assignments than the men," so long as
> it can be assumed that the measurement scale is reasonable.
>
> --Jim
>
> --------------------
> James H. Steiger, Professor
> Department of Psychology
> University of British Columbia
> Vancouver, B.C., Canada V6T 1Z4
> ----------------------
>
> Note: I urge all members of this list to read
> the following and inform themselves carefully
> of the truth about the MIT Report on the Status
> of Women Faculty.
>
> Patricia Hausman and James Steiger Article,
> "Confession Without Guilt?" :
> http://www.iwf.org/news/mitfinal.pdf
>
> Judith Kleinfeld's Article Critiquing the MIT Report:
> http://www.uaf.edu/northern/mitstudy/#note9back
>
> Original MIT Report on the Status of Women Faculty:
> http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/
>
>
>
>
>
>
>
> On Sun, 18 Feb 2001 23:06:44 GMT, "Milo Schield" <[EMAIL PROTECTED]>
> wrote:
>
> >Jim has consistently maintained three claims/arguments:
> >1. we are not trying to generalize from a small group of people to a
larger
> >population.
> >2. No inference is involved (if we are not generalizing)
> >3. Using statsitical tests is meaningless (since no inference is
involved).
> >
> >I agree with his 1st and 3rd points -- but not his 2nd.
> >Within the second, I disagree with the truth of his premise.
> >
> >The generaliation mentioned in #1 is NOT the only possible
generalization.
> >Another generalization may be involved: that involving time. We sample
> >things (basketball goals) for two groups of players in a few games and
then
> >want to make an inference about whether these particular scores were
> >unlikely given that THESE particular players involved had the same
average
> >scores in the long run -- for all games.
> >
> >Given a time-based generalization, we now have an inference. Given this
> >inference, the applicability of statistical tests seems quite relevant.
> >Milo
> >PS. This may be a quasi-Bayesian reinterpretation of these problems --
but
> >if it fits.....
> >-------------------------------------------------
> >"Irving Scheffe" <[EMAIL PROTECTED]> wrote in message
> >[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> >> There are a wide variety of probabilities that may be calculated
> >> in this situation, depending on the assumptions you want to
> >> make, and precisely what you mean by "this result." However, if you
> >> ask, "How likely is it that the for side won", the answer is that the
> >> for side won. If you ask, "How likely is it that the percentage of
> >> for votes is higher for women than for men in this sample," the answer
> >> is that it is perfectly likely, because it happened.
> >>
> >> In an example perhaps more relevant to previous examples here, suppose
> >> this was an actual departmental vote, and the result was 5-3. The
> >> motion passed.
> >>
> >> If one of the men was a statistician wanting to overthrow
> >> the result and said, "wait a minute,
> >> let's perform a statistical test and see the probability of
> >> obtaining this gender split, given that the 5 yes votes were actually
> >> randomly assigned with respect to gender," I suspect people
> >> would think him odd. He can compute this probability, but
> >> it is irrelevant.
> >>
> >>
> >>
> >>
> >> On 15 Feb 2001 13:49:51 -0800, [EMAIL PROTECTED] (Paul R
> >> Swank) wrote:
> >>
> >> >I remember a question from some stat book about a situation where
there
> >were 8 members of a group, three men and five women (or the reverse, I
can't
> >remember
> >> >
> >> >which) and on some issue the vote was five to three with all five
women
> >voting for. The question was "How likely was this event to occur by
chance"?
> >Can we not ask that question?
> >> >
> >> >
> >> >At 05:34 PM 2/15/01 GMT, you wrote:
> >> >
> >> >>Rich:
> >> >
> >> >>
> >> >
> >> >>To be blunt, although
> >> >
> >> >>your comments in this forum are often
> >> >
> >> >>valuable, you fell far short of two
> >> >
> >> >>cents worth this time.
> >> >
> >> >>
> >> >
> >> >>This is not a popularity contest, it is a statistical
> >> >
> >> >>argument. You offered an unsupported
> >> >
> >> >>opinion with only one content-related
> >> >
> >> >>comment. Let's cut to the chase.
> >> >
> >> >>
> >> >
> >> >>Please define precisely what you meant in
> >> >
> >> >>the phrase
> >> >
> >> >>
> >> >
> >> >>> - and if you want to know something about how unlikely it was to
> >> >
> >> >>>get means that extreme, you can randomize. Do the test.
> >> >
> >> >>
> >> >
> >> >>a. You do *have* means "that extreme."
> >> >
> >> >>
> >> >
> >> >>b. There is no "likelihood" to be considered, because
> >> >
> >> >>the entire population is available. We were assessing the
> >> >
> >> >>original MIT conjecture that to imply there were important
> >> >
> >> >>performance differences between male and female biologists
> >> >
> >> >>AT MIT would be "the last refuge of the bigot."
> >> >
> >> >>
> >> >
> >> >>So, my countercomments to you are:
> >> >
> >> >>
> >> >
> >> >>1. Rather than snipping the Gork example, deal with it. Explain,
> >> >
> >> >>in detail, why the Gork women shouldn't be paid more than the men.
> >> >
> >> >>My prediction: you can't, and you won't.
> >> >
> >> >>
> >> >
> >> >>2. You talk about "how unlikely it was." Unlikely when?
> >> >
> >> >>Unlikely under what conditions?
> >> >
> >> >>
> >> >
> >> >>3. (if you choose to answer question 2) Why would the Gork society be
> >> >
> >> >>interested in assessing any such likelihood, if they have a
> >> >
> >> >>meritocracy, and their only interest lies in assessing whether male
> >> >
> >> >>and female Gorks have shown productivity differences?
> >> >
> >> >>
> >> >
> >> >>If you can actually answer such questions, rather than rendering
> >> >
> >> >>an unsupported opinion, you might have two cents worth to add.
> >> >
> >> >>
> >> >
> >> >>All the best,
> >> >
> >> >>
> >> >
> >> >>Jim
> >> >
> >> >>
> >> >
> >> >>---------
> >> >
> >> >>James H. Steiger, Professor
> >> >
> >> >>Dept. of Psychology
> >> >
> >> >>University of British Columbia
> >> >
> >> >>Vancouver, B.C., Canada V6T 1Z4
> >> >
> >> >>
> >> >
> >> >>Comments reflect my opinion only,
> >> >
> >> >>-------------
> >> >
> >> >>
> >> >
> >> >>On Thu, 15 Feb 2001 10:39:45 -0500, Rich Ulrich <<[EMAIL PROTECTED]>
> >> >
> >> >>wrote:
> >> >
> >> >>
> >> >
> >> >>>I am just tossing in my two cents worth ...
> >> >
> >> >>>
> >> >
> >> >>>On Thu, 15 Feb 2001 07:53:13 GMT, Jim Steiger, posting as
> >> >
> >> >>>[EMAIL PROTECTED] (Irving Scheffe) wrote:
> >> >
> >> >>>
> >> >
> >> >>><< snip, name comment >
> >> >
> >> >>>
> >> >
> >> >>>> 2. I tried to make the Detroit Pistons example as obvious as I
could.
> >> >
> >> >>>> The point is, if you want to know whether one population performed
> >> >
> >> >>>> better than another, and you have the performance information,
[under
> >> >
> >> >>>> the simplying assumption, stated in the example and obviously not
> >> >
> >> >>>> literally true in basketball, that you have an acceptable
> >> >
> >> >>>> unidimensional index of performance], you don't do a statistical
> >test,
> >> >
> >> >>>> you simply compare the groups.
> >> >
> >> >>>
> >> >
> >> >>>
> >> >
> >> >>>>
> >> >
> >> >>>> Your question about the randomization test seems
> >> >
> >> >>>> to reflect a rather common confusion, probably
> >> >
> >> >>>> deriving from some overly enthusiastic comments
> >> >
> >> >>>> about randomization tests in some
> >> >
> >> >>>> elementary book.
> >> >
> >> >>>
> >> >
> >> >>> - If you are willing, perhaps we could discuss the textbook
> >> >
> >> >>>examples. I don't remember seeing what I would call
> >> >
> >> >>>"overly enthusiastic comments about randomization."
> >> >
> >> >>>When I looked a few years ago, I did see one book with an
> >> >
> >> >>>opposite fault, exemplified in a problem about planets.
> >> >
> >> >>>I thought the authors' were pedantic or silly, when they refused
> >> >
> >> >>>to admit randomization as a first step of assessing whether there
> >> >
> >> >>>*might* be something interesting going on.
> >> >
> >> >>>
> >> >
> >> >>>> Some people seem to
> >> >
> >> >>>> emerge with vague notions that two-sample randomization tests make
> >> >
> >> >>>> statistical testing appropriate in any situation in which you have
> >> >
> >> >>>> two stacks of numbers. That obviously isn't true.
> >> >
> >> >>>> Your final question asks if "statistical tests" be appropriate
> >> >
> >> >>>> even when not sampling from a population. In some sense, sure. But
> >not
> >> >
> >> >>>> in this case.
> >> >
> >> >>>
> >> >
> >> >>>I can't say that I have absorbed everything that has been argued.
> >> >
> >> >>>But as of now, I think Gene has the better of it. To me, it is not
> >> >
> >> >>>very appropriate to be highly impressed at the mean-differences,
> >> >
> >> >>>when TESTS that are attempted can't show anything. The samples
> >> >
> >> >>>are small-ish, but the means must be wrecked a bit by outliers.
> >> >
> >> >>>
> >> >
> >> >>>>
> >> >
> >> >>>> Maybe the following example will help make
> >> >
> >> >>>> it clearer:
> >> >
> >> >>> << snip rest, including example that brings in "power" but not
> >> >
> >> >>>convincingly. >
> >> >
> >> >>
> >> >
> >> >>
> >> >
> >> >>
> >> >
> >> >>=================================================================
> >> >
> >> >>Instructions for joining and leaving this list and remarks about
> >> >
> >> >>the problem of INAPPROPRIATE MESSAGES are available at
> >> >
> >> >> http://jse.stat.ncsu.edu/
> >> >
> >> >>=================================================================
> >> >
> >> >>
> >> >
> >> >------------------------------------
> >> >
> >> >Paul R. Swank, PhD.
> >> >
> >> ><smaller>Professor & Advanced Quantitative Methodologist
> >> >
> >> ></smaller>UT-Houston School of Nursing
> >> >
> >> >Center for Nursing Research
> >> >
> >> >Phone (713)500-2031
> >> >
> >> >Fax (713) 500-2033
> >> >
> >> >
> >> >=================================================================
> >> >Instructions for joining and leaving this list and remarks about
> >> >the problem of INAPPROPRIATE MESSAGES are available at
> >> > http://jse.stat.ncsu.edu/
> >> >=================================================================
> >>
> >
>
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================