On Fri, 09 Mar 2001 15:53:12 +0000, Thom Baguley
<[EMAIL PROTECTED]> wrote:

>Irving Scheffe wrote:
>> Imagine it is 1961. Our question is, which outfield has better
>> home run hitters, the Yankees or Detroit? Here are the numbers
>> for the Yankee and Tiger starting Outfields.
>> 
>>         Yanks   Tigers
>>         -----   ------
>>          61       45
>>          54       19
>>          22       17
>>         --------------
>> 
>> Now, the t-test isn't significant, nor is the permutation test.
>> But is either relevant to the question? If you have a reasonable
>> understanding of the notion of "home run," the answer is no.
>> 
><snip>
>> It was, by definition, the population of interest, so it appears that
>> you are flat wrong. The question we were asking was, "if we take the
>> large identifiable cluster of senior MIT women who graduated between
>> 1970 and 1976, and compare them with their natural cohort, the men who
>> graduated in the same time frame, do we see performance differences?"
>> 
>> The answer is, as shown by the data above: yes. We see huge
>> performance differences. Just like with the Yankees and Tigers in
>> 1961.
>
>It seems to me that you are unncessarily restricting the questions than can be
>asked by others. 

I was presenting a counterexample to an erroneous assertion by
Mr. Ulrich. This in no way is "restricting" the discussion at all.
Indeed, if you read my preceding posts carefully enough, 
you'll find an explicit disclaimer to the contrary. I recognize
that the "utility function" relating citations and publications
to quality is complex, and that there are questions of
natural variability to be addressed. 

>You are not even restricting them to the interesting
>questions. 

Again, please do not engage in straw man mischaracterization.
I'm not "restricting" anybody to anything. Indeed, it is the rigid
and improper insistence on a useless significance test
that is "restrictive," misleading, and lacking a rationale. 

I've simply presented an example of how a t-test not only fails to
add useful information, but provides a misleading conclusion.
If you think otherwise, please provide an example, with a rationale.
But please read on, because I think I'm going to help answer your
questions for you.

>For example, asking who scored more in 1961 - is different to which
>players were better. 

I cannot imagine anyone, least of all myself, disagreeing. Why you
think it is relevant to my critique of a randomization test is
a mystery. As someone with a lifelong fascination with baseball
statistics, I'd freely admit that virtually any measure of anything in
baseball is impure. 

The key structural point in my argument is this. If you accept
the assumption that the players performance in the previous
season is the thing being evaluated, reference to what might
have happened under some fictitious random sampling process
is irrelevant.

The Yanks outhomered the heck out of the Tigers in 1961. Whether
this indicates they are "better hitters," "more Christian," 
"superior human beings," or even "better home run hitters
in the long run" etc. is, of course, another matter,
and possibly very interesting. But you're not going to address
any of those issues with a t-test or randomization test. If you think
you can, please present a rationale.

Imagine the Tigers approached the
media in late 1961 and said, "Actually, Dr. Randomo isn't
sure that Maris, Mantle and Berra outhomered us in any 
meaningful sense, because, if you think about it, this
difference might be produced by 6 players of equal ability
influenced by a large number of random factors." 

If they were ordinary sportswriters, they'd simply 
say "are you nuts?"

But, if they were statisticians, they'd say (a) you 
are asking the wrong question, and (b) you have the 
wrong model. The question is not whether Mantle, Maris,
and Berra are better collective home run hitters over some
hypothetical long run than Kaline, Colavito, and Bruton. [Actually,
virtually anyone familiar with baseball would agree that they were, as
a group, better players, but that is another matter. All 6 were
outstanding players.]

In a similar vein, the question in the MIT case was not
whether the MIT male senior biologists are better people
than their female counterparts. It is simply, how true is the
implied assertion in the MIT report that there were no
performance differences that might account for [undocumented]
differences in salary and performance between senior
men and women. MIT stated that to assert that differences in resource
allocation might be due to performance differences is "the last refuge
of the bigot." Hausman and I were documenting major performance
differences.


>Why not think of it in terms of "Could this difference be
>produced by 6 players of equal ability influenced by a large number of random
>factors". In that case a significance test might have some value in evaluating
>the hypothesis that one group was better.

Again, you're slipping in an alternative question to
the one that was asked. 

First, you're addressing the wrong question. 
We are not interested, in the example, in the "ability" of the
players. We are interested in whether, over the course of the
preceding 162 games, the Yanks outhomered the Tigers by a substantial
amount. They did. [This is not to say that "ability" isn't an
interesting question. But your proposed randomization test doesn't
address that issue well at all.]

Regarding MIT, a better question might
be, "How often can you go into a senior department, take two
groups of 5 or 6 people closely equal in seniority, have Group A
produce twice as many publications, roughly 4.5 times as many
citations, and substantially more grant money than the other group,
and NOT find that Group A is paid more than Group B?"  Of course,
thanks to reverse discrimination, you'd better stick to faculty of one
sex. 

Feel free to analyze this question and report back your findings to
sci.stat.edu.  [MIT, of course, refuses to divulge any salary data.
But, for example, the University of Virginia does.]

Second of all, you have chosen a suboptimal unit of analysis, if 
you are really interested in assessing "ability."

You'll have to pay closer attention to the Yankees-Tigers example,

At the end of that example, I pointed out that the t-test would be
identical if the totals were multiplied by 10 and given to represent a
10 year difference. Yet any intelligent statistician (or baseball fan)
would agree that the latter data set is vastly more informative
than the data based on one year.

So, if you were to try to analyze this question in the style you
prefer, you would have to play very, very close attention to an
implicit fallacy in Gallagher's argument, which is this:

1. We chose a "unit of analysis" that made our presentation as
simple as possible, i.e., academic performance data collapsed over a
12 year span by simply totalling. We did this because we recognized
that the general public would not be that interested in examining
time-series data on publications and citations for these scientists,
and we thought (correctly) that most people would realize that
the performance differences are huge.

2. Ulrich and Gallagher were, therefore, not only asking the
wrong question, but using the wrong unit of analysis.

Let me be more specific:

Suppose the data for the two groups of baseball players is
extended to the following. 


         A1  A2  A3      B1  B2  B3
-----------------------------------
 1961    61  54  22      45  19  17  
         62  55  23      46  20  18
         63  56  24      47  21  19
         64  57  25      48  22  20
         65  58  26      49  23  21
         61  54  22      45  19  17
         60  53  21      44  18  16
         59  52  20      43  17  15
         58  51  19      42  16  14
         57  50  18      41  15  13
------------------------------------
        610 540 220     450 190 170
------------------------------------

Suppose a statistician were purportedly trying to analyze which group
of players, A or B, is a better "long run" collective group of home
run hitters, on the basis of these data. Do you think it would be wise
to simply perform a randomization test on the numbers 610, 540, etc.
Or should the statistician, in some way, take into account the fact
that these data are based on a finer unit of analysis, garnered over
10 seasons? 

To illustrate the weakness of your proposed randomization test,
suppose we compare two alternative, albeit questionable
[because of assumption violations] parametric analyses. The first
analysis would be a t-test, done on the 5 year totals. [This is
similar to what Gallagher and Ulrich propose.] The second is
a Groups by Trials repeated measures design. The t-test is
non-significant, t=1.24.  The F test is astronomical.

But notice, neither test is answering the question, "were the
group A players more productive than the group B players over
the last 10 years." They are answering a rather different
question.

The simple fact is, Group A was a better group of home run hitters.

>
>The second case is even stronger. Take any two groups any you'll almost
>certainly find a difference on most measures (citation count, salary, hat size
>or whatever).

Yes, you will. If you think that has any bearing on the meaning of
a 4.5-fold difference in citation rates for two groups of
people in the same department over a 12 year span, please elucidate.

>
>Finally, what allows you to infer that any difference you observe it "huge".
>This is a relative judgement. In statistics we typically reference it to some
>indication of (population) variability. In real world contexts we often use
>other benchmarks.

Indeed, and if you can elucidate how performing a randomization test
on nonrandomly sampled groups provides any information on that
question [this was Dr. Gallagher's assertion] please do.

>
>For example, think about runs scored in the first innings of a test match by
>three top order batsmen from two cricket teams
>
>         England   Sri Lanka
>        -----     ------
>          61       45
>          54       19
>          22       17
>         --------------
>
>Is this a huge difference? I think not. Does it provide strong evidence that
>the England top order batsmen are better than the Sri Lankans? No. What allows
>you to infer a huge difference in the baseball case is your knowledge of
>baseball (frequency of runs and so on). So at best, I think it is a misleading 
>example.

I'm sorry but I think you mistake [or perhaps overlook] the notion of
"counterexample." Both Dr. Gallagher and Mr. Ulrich asserted
that a randomization test provided useful information. I asserted it
did not necessarily provide such information, and presented a simple
counterexample. [Several, actually.] It is obviously [if you
understand the notion of a counterexample] not necessary for all sets
of numbers in all circumstances to support the notion of a major
practical difference for my counterexample to be valid.

The key reason for using home runs was to use a commonly understood
measure, and focus on the fallacy in Mr. Ulrich's reasoning. 

The fallacy is that senior full professors are, in general, paid
according to the performance characteristics over their careers, which
are within a few years of actually being over! By ignoring
the appropriate "unit of analysis" for your randomization test,
you are trivializing the key point. The numbers in the Hausman-Steiger
report represent the results over the last half-career of those
scientists. Most undergraduates, I think, could recognize the
fallacy of analyzing a sum of publications or citations, taken over
half a career, as though it were runs scored in the first inning of a
test match.

If you are stating, in a roundabout way, that it would be helpful,
in the context of evaluating the performance of the MIT biologists,
to know more about the citation counts of biologists in general,
and their relationship to salary across a broad range of departments,
I agree. However, I submit that a randomization test on the
12 year totals within the MIT Biology department tells you nothing
about that. If you are disagreeing, please elucidate.

Finally, a plea for objectivity: Please impose the same standards on
the MIT administration that you are implicitly applying here. Take the
time and trouble to read the three documents cited at the end of this
post. I suspect that, doing so with an open mind, you will arrive at
the conclusion that the MIT report was junk science -- a political
manifesto masquerading as a scientific report.

Mr. Ulrich now seems to think that the MIT administration thought
the data "irrelevant" in their report. He's wrong. Robert Birgeneau,
Dean of Science at MIT at the time, praised the report as being
"very data-driven," and declared that is an "MIT thing." And numerous
feminist commentators have praised the report as finally providing
concrete evidence of discrimination. The Report contains no
evidence of discrimination at all. Read it.




=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to