Neal,
I did intend to respond to this post -- you seem serious about this,
more so than "Irving."
On 13 Mar 2001 22:36:03 GMT, [EMAIL PROTECTED] (Radford Neal)
wrote:
[ snip, .... previous posts on what might be tested ]
>
> None of you said it explicitly, because none of you made any coherent
> exposition of what should be done. I had to infer a procedure which
> would make sense of the argument that a significance test should have
> been done.
>
> NOW, however, you proceed to explicitly say exactly what you claim not
> to be saying:
>
RU> >
> >I know that I was explicit in saying otherwise. I said something
> >like, If your data aren't good enough so you can quantify this mean
> >difference with a t-test, you probably should not be offering means as
> >evidence.
- This is a point that you are still missing.
I am considering the data... then rejecting the *data* as lousy.
I'm *not* drawing substantial conclusions (about the original
hypotheses) from the computed t, or going ahead with further tests.
NR>
> In other words, if you can't reject the null hypothesis that the
> performance of male and female faculty does not differ in some
> population from which the actual faculty were supposedly drawn, then
> you should ignore the difference in performance seen with the actual
> faculty, even though this difference would - by standard statistical
> methodology explained in any elementary statistics book - result in a
> higher standard error for the estimate of the gender effect, possibly
> undermining the claim of discrimination.
- Hey, I'm willing to use the honest standard error. When I have
decent numbers to compare. But when the numbers are not *worthy*
of computing a mean, then I resist comparing means.
RU> >
> > And, Many of us statisticians find tests to be useful,
> >even when they are not wholly valid.
>
NR>
> It is NOT standard statistical methodology to test the significance of
> correlations between predictors in a regression setting, and to then
> pretend that these correlations are zero if you can't reject the null.
- again, I don't know where you get this.
Besides, on these data, "We reject the null..." once JS finally did
a t-test. But it was barely 5%.
And now I complain that there is a huge gap. It is hard to pretend
that these numbers were generated as small, independent effects that
are added up to give a distribution that is approximately normal.
[ snip, some ]
RN>
> So the bigger the performance differences, the less attention should
> be paid to them? Strange...
>
Yep, strange but true.
They would be more convincing if the gap were not there.
The t-tests (Students/ Satterthwaite) give p-values of .044 and .048
for the comparison of raw average values, 7032 versus 1529.
If we subtract off 5000 from each of the 3 large counts (over 10,000),
the t-tests have p-values of .037 and .036, comparing 4532 versus
1529.
Subtract 7000 for the three, p-values are hardly different, at .043,
.040; comparing counts of 3532 versus 1539.
In my opinion, this final difference rates (perhaps) higher on the
scale of "huge differences" than the first one: the t-tests are
about equal, but the actual numbers (in the second set) don't confirm
any suspicions about a bad distribution. The first set is bad enough
that "averages" are not very meaningful.
http://www.pitt.edu/~wpilib/index.html
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================