On Fri, 02 Mar 2001 16:28:53 -0500, Rich Ulrich <[EMAIL PROTECTED]>
wrote:
>On Tue, 27 Feb 2001 07:49:23 GMT, [EMAIL PROTECTED] (Irving
>Scheffe) wrote:
>
>My comments are written as responses to the technical
>comments to Jim Steiger's last post. This is shorter than his post,
>since I omit redundancy and mostly ignore his 'venting.'
>I think I offer a little different perspective on my previous posts.
>
>[ snip, intro. ]
Mr. Ulrich's latest post is a thinly veiled ad hominem, and
I'd urge him to rethink this strategy, as it does not
present him in a favorable light.
Any objective reader would notice how the post
is riddled with emotional attributions and loaded
language like
"venting"
"exquisite sensitivity" (a claim attributed to me that I never made)
"hammering your own gavel"
"ferocity"
"angry"
"shouted down"
"blundering around"
"browbeat them"
"crude"
At the same time that Mr. Ulrich makes these disparaging
but completely inaccurate attributions, he characterizes
the posts of another discussant as "polite." Considering
that this "polite" poster (Gene Gallagher) used terms like "Rush
Limbaugh dittohead," it is clear that Mr. Ulrich's perceptions and
attributions are badly biased.
While he invests an extraordinary amount of effort in
such irrelvant ad hominems, Mr. Ulrich seems unable
to answer the simplest statistical questions regarding
his point of view. And, in his latest post, he reveals
in more detail how he insists on remaining as uninformed
as possible while rendering such judgments.
Most disturbingly, he contradicts himself
and mischaracterizes previous discussions.
For example,
>
>JS>
>> You are the one who examined nonrandom data, representing citation
>> counts over a 12 year period for senior male and female MIT biologists
>> matched for year of Ph.D. You look at these data, which
>> show a HUGE difference in performance between the men and women,
>> and declare that a significance test is necessary. But you
>> cannot provide any mathematical justification for the test.
>
>> I gave several examples to try to jar you into realizing that
>> a statistical test on the data cannot answer the question you
>> want answered.
>
>To start with, I never examined any *data*. I kept away from
>the papers because I knew so little about the data and it looked
>so messy; I made some comments about how difficult it could be.
Yet he made what appeared to be comments about data. For example:
<quote from earlier Ulrich post>
>I can't say that I have absorbed everything that has been argued.
>But as of now, I think Gene has the better of it. To me, it is not
>very appropriate to be highly impressed at the mean-differences,
>when TESTS that are attempted can't show anything. The samples
>are small-ish, but the means must be wrecked a bit by outliers.
This raises the question: If he never examined the data, how could he
make a statement about "outliers" in the data?
>
>I tossed in a couple of comments to encourage Gene G., who made
>some good sense, as did Dennis.
They made no sense, let alone "good sense." I gave numerous
examples demonstrating this. Mr. Ulrich professes that he doesn't
see the point of them.
>As I read it, you proceeded to
>browbeat them, while failing to respond to their substance.
Not true. First of all, there was vitually no substance in
their arguments. Dr. Gallagher wants to do a randomization test
because he is concerned [I'm interpreting and paraphrasing a bit]
about the scale and variability questions that
naturally surround citation data. These concerns are worthwhile,
but he failed (ever) to explain how a randomization test or a t-test
could answer such questions.
Indeed, later, he presented a "logarithmic transform" of the
citation data which made the differences look less severe,
but never provided any rationale for that, either. [See my
Yankees-Tigers example later.]
>I have tried to make sense of that early part of *your* argument,
>where you want to leap over their critiques.
Actually, they offered no critique. Dr. Gallagher offered mainly
name-calling and ad hominem in his early posts, using terms
like "Rush Limbaugh dittohead." Seems like Mr. Ulrich's criticism
is misplaced.
>
>You claim a HUGE difference. You say you assert this because of
>exquisite sensitivity to numbers. Dennis challenged this on the
>basis of "lousy standards" -- either by their metric or content --
>and Gene challenged this as misleading, because it was "not
>(nominally) significant."
To declare that something is "not significant" requires a rationale.
>They disagreed with you on the inference
>that you drew from two means.
>
>I agree that a huge difference may be useful. I agree that t-tests
>don't offer any final resolution. (As I posted before,) with
>nonrandom data, we have to argue contingencies, explore options,
>and make what inferences that we can. You seem to cut that short,
>chop! -- pronouncing your own verdict as final -- but I don't see how
>hammering your own gavel can convince people who have the
>choice of looking elsewhere.
"hammering your own gavel"--> more ad hominem
Hausman and I presented the data which Mr. Ulrich
(oddly, I must say) refused to look at. (Imagine, a "biostatistician"
who refuses to look at the data that are the object of discussion.]
Our assertion was *not* that these data *necessarily* prove that
resource allocations were strictly merit based.
In fact, we explicitly disclaimed that interpretation, a fact
which Mr. Ulrich would not know, having refused to read the
relevant document. But we asserted these data certainly offer an
alternative explanation for charges of "discrimination" in the MIT
biology department. Here they are again:
12 year citation counts:
Males Females
----------------------
12830 2719
11313 1690
10628 1301
4396 1051
2133 935
893
-----------------------
>
>You may think that you speaking from unimpeachable epiphany;
>to the rest of us, it looks like you are jumping to a conclusion.
Straw man. I never stated that I was speaking from "unimpeachable
epiphany."
Do "biostatisticians" claim psychic powers? Mr. Ulrich makes
pronouncements about "outliers" without ever examining the
data, now he makes pronouncements about "the rest of us"
on the basis of, apparently, nothing at all. "The
rest of us" apparently includes Mr. Ulrich and Dr. Gallagher,
and perhaps Dr. Roberts. The fact that two or three
individuals feel compelled to crank out a statistical
test proves nothing.
>
>You offer your *inference* that a huge citation difference explains
>the outcome.
No. I offered the *fact* that there is a huge citation difference,
and that it *could* well explain the outcome. These data were
presented to counter the claim in the original MIT report that
there were no differences between female complainants and their
counterparts that might explain differences in resource
allocation.
Of course, if Mr. Ulrich refuses to read the material under
discussion, these important distinctions are bound to elude him.
>
>If those measurements are on a reasonably useful metric, then a
>t-test should show it. It is my own experience, and part of my
>own learned, "exquisite" sensitivity to numbers, that
> (1) a mean difference as large as you illustrated should result
>in a t-test that is significant, unless there is something screwy
>with the numbers.
I can think of reasons why a "mean difference"
might not result in a "t-test that is significant,"
other than the fact that there is "something
screwy with the numbers." But the t-test is irrelevant
in this situation.
> (2) And if there is something so screwy with the numbers, then
>it is usually misleading and wrong to present the MEANS as if
>their contrast was meaningful ("huge").
Actually, the t-test *is* significant! By whatever logic
Dr. Gallagher professes to be endorsing, it should be powerful
evidence of a difference.
But, I submit, it is irrelevant. And Mr. Ulrich is simply unable
to delineate (a) what rationale he is
using to conduct this test, (b) what hypothesis he is testing,
and (c) what sampling distribution is relevant.
This is not surprising, but I repeat the question I've
now asked numerous times. What is the rationale?
>
>Now, there is not a "mathematical necessity" for a test statistic.
>It is a request that you respect the conventions of statisticians,
>even when we ask for a test on non-random data, for what we might
>learn from it.
This is circular logic.
a. I've asked (*repeatedly*) what rationale Mr. Ulrich has for doing
the test.
b. He states that his rationale is that I "respect
the conventions of statisticians." What conventions? Stated where?
[Throughout history, there have been numerous inane "conventions"
in various areas of statistical practice. ]
c. He state that "we might
learn" something from such a test as a premise, i.e., that
something might be learned. But "what might be learned," "how," and
"why" are implied by the original question!
So, at the risk of Mr. Ulrich firing more ad hominems my way
[I assure everyone I'm not venting] I repeat the question:
Please tell us precisely what you think we are going to learn, and
why.
In order to help Mr. Ulrich focus on the issues, suppose we examine
a similar question regarding human performance in another
area, baseball.
Imagine it is 1961. Our question is, which outfield has better
home run hitters, the Yankees or Detroit? Here are the numbers
for the Yankee and Tiger starting Outfields.
Yanks Tigers
----- ------
61 45
54 19
22 17
--------------
Now, the t-test isn't significant, nor is the permutation test.
But is either relevant to the question? If you have a reasonable
understanding of the notion of "home run," the answer is no.
> Non-significant tests, which I had thought the
>data were producing, really undermine your adjective "huge".
Most people would say the Yankees had a "huge advantage" over
the Tigers in outfield home run production. Yet Mr. Ulrich
would apparently say otherwise. The season is over. And we
are asking only about what has already happened. And it is
clear that the 1961 Yankee outfield out-homered the heck out of the
1961 Tiger outfield. But somehow, Mr. Ulrich isn't convinced. Why?
Because a significance test that he has no rationale for
didn't come out significant.
>A "significant" test, which you now report, lends some credibility.
>Gene's permutation test says that those sets are not disjoint,
>however, so there is some basis for direct comparison.
What, precisely, does Mr. Ulrich mean by saying "Gene's permutation
test says that these sets are not disjoint"? One does not
need any significance test to determine whether two sets
are disjoint. Does Mr. Ulrich know what it means to say that
"two sets are disjoint"? It means they have no
elements in common.
>The
>most extreme permutation would have undercut *one* form of
>comparison, and the most obvious part of one argument about
>discrimination (though, I expect, not everything). It *looks*
>like you didn't want to c///ider a comparison because you
>figured you could win the argument by repetition and ferocity.
Again, simple English would be nice. "Ferocity" is, once again,
recourse to ad hominem. One wonders how, precisely, Mr. Ulrich
would characterize Dr. Gallagher's use of the
term "Rush Limbaugh dittohead"? Would he call it "super-ferocious"?
>
>Your first excuse for not computing a t was that this was a
>"population" but that was flat wrong.
It was, by definition, the population of interest, so it appears that
you are flat wrong. The question we were asking was, "if we take the
large identifiable cluster of senior MIT women who graduated between
1970 and 1976, and compare them with their natural cohort, the men who
graduated in the same time frame, do we see performance differences?"
The answer is, as shown by the data above: yes. We see huge
performance differences. Just like with the Yankees and Tigers in
1961.
>I asked for your textbook
>references, and finally offered my own, in order to figure out
>your context.
Mr. Ulrich offered two quotes, each expressing an opinion. Only one
advised computing a significance test on nonrandom data, but
gave no trace of a rationale. He provided no reason why
the opinion of either cited person should be considered
any more valuable than mine.
>I offered "nonrandom", which you used, above.
>However, the old arguments about not computing "test statistics"
>on nonrandom samples have hardly any force these days -- I
>offer epidemiology as the pervasive (and persuasive) influence.
>
>Epidemiologists need to be reminded about the limits to their
>inference, - they tend to forget it entirely - but I think you
>are standing alone if you refuse to compute, claiming that old
>principle. I don't know if your role is such that someone will
>*have* to answer you, or if you are fated to wind up
>ignored (as non-responsive and therefore irrelevant);
>and angry.
More thinly disguised ad hominem.
>
>[snip, my comment to which JS wrote:]
>> Not so. If you were following the logic of the many examples I've
>> presented, you could see that you can construct a reductio ad
>> absurdem for any of the types of significance tests you are
>> proposing. If I believed strictly in hypothesis testing
>
>Ah, yes, the examples. I'm sorry, but these still seem apart
>from *my* points. These seemed designed to avoid discussion
>of the meaning of the numbers. If there is "significance," or
>not, the burden is shifted to one argument or the other. The
>first example featured 3+3 non-overlapping numbers near 90,
>as "productivity" of some sort. I said, those seemed like
>screwy numbers, justifying a closer look at the numbers.
>Another had two disjoint sets of numbers, from which one might
>conclude, if anything, "These are disjoint sets, whatever
>they mean.... It probably justifies a closer look at the
>numbers." You, on the other hand, want to pronounce a
>difference as "huge" and shut down any discussion, contrary
>to what I conclude from the examples.
Here, Mr. Ulrich seems determined to ignore the lesson of the
examples, and restate a completely incorrect insistence on a
significance test as though it were a premise. Sorry, it is not a
premise. It is a position he has that many, if not most statisticians
would disagree with. He is unable to provide a rationale for it.
The lesson of the "Gork" example I presented in an earlier
post was simple. The rules of the imaginary "Gork" society
are clear -- blurk production determines worth. Under the supposition
that more productive people should be paid more, it is clear that
the female Gorks should be paid more. They have been more productive.
We don't need a significance test to tell us that.
We would need much more information, of course, to evaluate
"HOW MUCH MORE?" the female Gorks should be paid. This raises
important questions about metric and utility. But there is no
way that the "significance" test can necessarily tell us
anything about either metric or utility. If Mr. Ulrich thinks
it can, he should present a rationale. But he won't, because
he cannot.
[Sociological thought experiment for all readers
who have ventured this far: Imagine the genders were reversed.
Imagine this huge performance difference favored the
MIT women biologists. Do you think we would be having this argument?
My suspicion is that we would not. In fact, we would
have seen these data blared in every available forum,
with nary a peep from the sci.stat.edu regulars. But
that is just a conjecture, based on the fact that
feminists have used far weaker citation data to
make far stronger arguments.]
>
>I have seen the MIT numbers, now, in your post. The top three
>men are an impressive set. I have also read the pages at the URLs
>posted by Dennis, and I have grasped that 10,000 cites are
>*conceivably* the result of a single, joint authorship of some
>basic methodology that everyone must cite. Furthermore, I wonder
>if HERE, that might be the case. Those top three numbers leave
>a big gap to the rest. Like the gap in the Example, the irregularity
>causes me to wonder what at the numbers -- curiosity you seem to
>discourage. I'll just add: If "number of citations" *always*
>rules with promotions committees/ deans, then those folks would be
>incompetent asses.
Straw man argument. [Again, it would have helped had Mr. Ulrich
read the Hausman-Steiger paper prior to jumping in.] And, while
Mr. Ulrich notes that the "top 3 numbers" on the male side
leave a "big gap" to the rest, he fails to note [perhaps
indicating a bias?] that, if you move the 4th ranking male
to the female side, and simply concentrate on the resulting
numbers, you see the following:
M 4396
F 2719
F 1690
F 1301
F 1051
F 935
Now, imagine that the genders of the above individuals were reversed,
and the data were
F 4396
M 2719
M 1690
M 1301
M 1051
M 935
What do you see? You see a lone female, with a "big gap to the rest."
If "she" isn't making substantially more than the average male, I'd
say there is a real possibility of gender discrimination, wouldn't
you? Notice how, because of the intense societal conditioning
from feminist propaganda, we tend to view the data rather differently
when the genders are reversed.
Mr. Ulrich's selectivity in analyzing the data, and his failure
to notice this alternative "big gap" suggests either a lack of
care, or a severe bias.
Nobody, least of all I, asserted that "number of citations should
always rule." There are many other considerations. However, these
data (along with the presence of Nobel Prize winners who just missed
being in our male group) suggest that the assertion in the MIT
report that the female complainants were not outperformed by their
male counterparts is incorrect. Read the original MIT report,
and then analyze their methodology. It is fertile ground for
criticism.
At least Mr. Ulrich is starting to ask intelligent questions. Yes,
of course there are many more questions one could ask about the
data. However, all the above mitigating hunches Mr. Ulrich proposes
are wrong. And, moreover, close by our chosen "populations," there are
other men with 10,000+ citations who are Nobel Prize winners. We did
not include them in our comparison group because we preferred to
remain with our predetermined strategy.
>
>
>[ snip, about CI; redundancy. venting ]
>
More cheap ad hominem trying to disguise what is now
becoming painfully obvious -- a lack of statistical
rationale.
>JS>
>> What "outlier" are you referring to? What statistical rule did you
>> use to determine the "outlier"?
>[ snip; MIT paper had data; t-test *is* signficant at .05 ]
>
> - Sorry, my mistake. As I said several times before, the
>"huge" nominal difference in means combined with a failed test
>confirmed a bad distribution. I expected an outlier or two.
>Now, you give me three - that's a different oddity, and calls
>for a different sort of explanation.
Again, Mr. Ulrich evades the obvious question. How, pray tell, did
he determine that the 3 men at the top are "outliers" in
this population of scientists? They constitute half
the population! So I'll ask again. The question is simple
enough. I'll break it down.
In what (objective) sense are you able
to determine they are "outliers"?
By what rule?
>
>You go on to say, you did not state you were doing inference
>on means. I say, No, you just went ahead and did it, claiming NOT.
>And then you shouted down the demurrals. If you could point to a
>rule that fixed "conditions" to "citations"-according-to-that
>measurement, then you'd start with an administrative problem.
>But that would be converted to "inference" as soon as someone
>demanded justifications.
"shouted down" --> more emotional attribution
>
>
>I include the data, just so it will be reproduced here.
>
>> Here are the raw data for the citation counts for the 5 senior
>> MIT female biologists and 6 males who graduated from 1970-76.
>>
>> Males Females
>> ----------------------
>> 12830 2719
>> 11313 1690
>> 10628 1301
>> 4396 1051
>> 2133 935
>> 893
>> -----------------------
>>
>> These data are based on 12 years worth of records, from 1989-2000.
>> The above could be broken down in numerous other ways. For example, we
>> could produce citation counts per year, try to perform some kind of
>> correction for the highly specific areas the individuals publish in,
>> etc. Time series could be examined.
>
>Ahem. These alternatives seem stunningly dull, compared to
>what you should have heard about if you were paying attention.
>First, "First authorship" vs "Other."
Actually, we looked at that, but you were too busy writing ad hominems
to read the actual paper we are discussing.
>Factor out: contribution of a
>single paper. Time series may be useful, if you mean, data for
>each published piece.
>
>> However, these data are anything but a random sample. MIT is one of
>> the most selective universities in the world in terms of whom it
>> hires.
>
> - another possible bias: What was written before hire by MIT?
Again, Mr. Ulrich needs to examine the data. Notice that the
data gathering period was the last 12 years. They graduated
long before that, and whether they were or were not at MIT
would have little bearing on anything.
> - another consideration: These all look pretty good, so,
>What is "sufficient"? What level of treatment is *owed*, at a
>minimum to faculty who are the most select in the world?
>How much does it change if they are wooed by other universities,
>in the way that Nobel winners are, or that women-scientists
>have been lately?
These are important considerations, and MIT chose to obscure them
in its "Report." There are 2 Nobel prize winners in the MIT
Biology department who got their degrees within 3 years of Nancy
Hopkins, the chief complainant. A third man is on the Nobel short
list. This man had about 10 times the grant money and 10 times
the citations that Hopkins had. Whether that should be "sufficient"
to get him a slightly larger lab is an interesting question.
[Actually, although Dr. Hopkins emphasizes her impromptu lab
measurements in her personal accounts of this case, the MIT report
does not present any data on lab measurements, and actual participants
in the deliberations commented, off the record, that such measurements
proved inconclusive.] Perhaps Hopkins would argue that her market
value as a female scientist should overwhelm the huge performance
difference. But that would seemingly counter any claim that women are
victims of discrimination! If women with half the publications and
one fourth the citations are worth as much as men, simply because of
their gender, doesn't that amount to some serious anti-male
discrimination?
MIT deliberately tried to obscure the huge performance differences
between the senior male and female biologists there by disclaiming the
possibility that there might be such a difference. Camille Paglia, by
the way, spotted this right away. And she's not even a consulting
biostatistician!
>
>Can a school like MIT keep some of its faculty on board at
>low wages, etc., because of prestige? (probably).
>Do they? (I have no idea). Should they? (not often).
No evidence was ever presented in the MIT report to suggest
that these women had low wages. Reading the report would be
a worthwhile use of 20 minutes of Mr. Ulrich's time.
>
>
>[ snip, details about superior records. ]
>> Remembering that these data were gathered over a 12 year span,
>> and that they are designed ONLY to answer the question, "Is
>> it possible that the higher departmental perks and status
>> of these males is due to performance," I think your commentary
>> was ludicrous, Rich.
>
>Remembering that I never saw these data before and that I was
>pointedly picking the frayed holes of your *argument*, that
>judgment is impertinent. If these data were carefully designed,
>I should expect more qualification and justification to them;
>aren't they a crude number? - Perhaps I miss something by not
>reading the papers, but, if so, you should have pointed Gene and
>Dennis politely to the details, instead of blundering around and
>making it appear that "this one is huge" is your whole basis.
>My commentary is devoted to your presentation, here.
Here, Mr. Ulrich characterizes me as "crude" "blundering" with
"frayed holes"
He characterizes Gene Gallagher, who used the term "Rush Limbaugh
dittohead" as "polite" .
I think that reveals much about his objectivity. [That, and his
"outlier analysis" capabilities.]
He also evinces a very fundamental (and quite scary) confusion between
"simple" and "crude."
>
>[ snip, "importance of issue" and more redundancy.]
>
>Hope that helps.
Only a little. Very little.
Now, returning to the Yankees and Tigers.
Yanks Tigers
----- ------
61 45
54 19
22 17
--------------
a. Did the Yankee starting outfield outhomer the Tigers in 1961?
(The tests, both t and permutation, aren't significant.)
I would say that they outhomered them by a huge margin.
b. [Thought experiment:] Suppose I had similar data accumulated over
10 years, and these trends held up. i.e., after 10 years, you
saw data like these.
Yanks Tigers
----- ------
610 450
540 190
220 170
--------------
Does your confidence in the relative homer capability of
the Yanks and Tigers change (from where it was in
1961)? [Most people, I believe, would
suggest that it should.] Note that the significance test result
doesn't change at all. Why not? Do you see anything paradoxical
in this? [The MIT data were gathered over 12 years.]
c. [Thought experiment:] A member of the Tigers, Al Kaline, declares
that
"The home run differences are overrated. I log tranformed the 10 year
data and got:
Yanks Tigers
----------------
2.78 2.65
2.73 2.28
2.34 2.23
------------------
As you can see, there really isn't much difference between the
Yanks and the Tigers. I find these data much easier to look
at than the untransformed data."
What do you think about Kaline's logic? Did he actually
have any logic? Did his analysis actually reveal anything
new about these data, or did it simply evidence his
powerful motivation to reduce a huge difference?
Best regards,
Jim Steiger
--------------
James H. Steiger, Professor
Dept. of Psychology
University of British Columbia
Vancouver, B.C., V6T 1Z4
-------------
Note: I urge all members of this list to read
the following and inform themselves carefully
of the truth about the MIT Report on the Status
of Women Faculty.
Patricia Hausman and James Steiger Article,
"Confession Without Guilt?" :
http://www.iwf.org/news/mitfinal.pdf
Judith Kleinfeld's Article Critiquing the MIT Report:
http://www.uaf.edu/northern/mitstudy/#note9back
Original MIT Report on the Status of Women Faculty:
http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================