Neal, 
I did intend to respond to this post -- you seem  serious about this, 
more so than "Irving."

On 13 Mar 2001 22:36:03 GMT, [EMAIL PROTECTED] (Radford Neal)
wrote:

[ snip, .... previous posts on what might be tested ]
> 
> None of you said it explicitly, because none of you made any coherent
> exposition of what should be done.  I had to infer a procedure which
> would make sense of the argument that a significance test should have
> been done.
> 
> NOW, however, you proceed to explicitly say exactly what you claim not
> to be saying:
> 
RU> >
> >I know that I was explicit in saying otherwise.  I said something
> >like,  If your data aren't good enough so you can quantify this mean
> >difference with a t-test, you probably should not be offering means as
> >evidence. 

 - This is a point that you are still missing.  
I am considering the data...  then rejecting the *data*  as lousy.

I'm *not*  drawing substantial conclusions (about the original
hypotheses) from the computed t, or going ahead with further tests.
 
NR>
> In other words, if you can't reject the null hypothesis that the
> performance of male and female faculty does not differ in some
> population from which the actual faculty were supposedly drawn, then
> you should ignore the difference in performance seen with the actual
> faculty, even though this difference would - by standard statistical
> methodology explained in any elementary statistics book - result in a
> higher standard error for the estimate of the gender effect, possibly
> undermining the claim of discrimination.

- Hey, I'm willing to use the honest standard error.  When I have
decent numbers to compare.  But when the numbers are not *worthy*  
of computing a mean, then I resist comparing means.
 
RU> >
> > And,  Many of us statisticians find tests to be useful,
> >even when they are not wholly valid.  
>
NR> 
> It is NOT standard statistical methodology to test the significance of
> correlations between predictors in a regression setting, and to then
> pretend that these correlations are zero if you can't reject the null.

 - again, I don't know where you get this.  

Besides, on these data, "We reject the null..."  once JS finally did 
a t-test.  But it was barely 5%.  

And now I complain that there is a huge gap.  It is hard to pretend
that these numbers were generated as small, independent effects that
are added up to give a distribution that is approximately normal.  

[ snip, some ] 
RN>
> So the bigger the performance differences, the less attention should
> be paid to them?  Strange...
> 
Yep, strange but true.

They would be more convincing if the gap were not there.

The t-tests (Students/ Satterthwaite) give p-values of .044 and .048
for the comparison of raw average values, 7032 versus 1529.
If we subtract off 5000 from each of the 3 large counts (over 10,000),
the t-tests have p-values of .037 and .036,  comparing 4532 versus
1529.  

Subtract 7000 for the three, p-values are hardly different, at .043,
.040; comparing counts of 3532 versus 1539.  

In my opinion, this final difference rates (perhaps) higher on the
scale of "huge differences"  than the first one:  the t-tests are
about equal, but the actual numbers (in the second set) don't confirm 
any suspicions about a bad distribution.   The first set is bad enough
that "averages"  are not very meaningful.



http://www.pitt.edu/~wpilib/index.html


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to