Paige,

The person also sent me the message and gave me permission to send it as
follows:


"The table below summarizes a
simple example I prepared for one of my classes. Assume two normally
distributed distributions each with a standard deviation of 10 and a mean
difference of 10. With a sample size of 10 from each distribution, you have
56.2 per cent power to reject the null hypothesis of equality of the means
using an alpha level of .05 and 29.4 per cent power using an alpha level of
.01. If, instead, you use a sample size 11 from the first distribution and
a sample size of 9 from the second distribution, the power values drop only
slightly to 55.8 and 29.0. Keeping the total sample size constant at 20,
notice how the increasing inequality of the sample sizes impacts the power.
In the worst case shown, sample sizes of 19 and 1, power drops to 15.2 and
4.6. The same phenomenon occurs in more complicated designs with more
groups.

       N1  N2    POWER05    POWER01

       10   10      56.2               29.4
       11    9       55.8               29.0
       13    7       52.3               26.2
       15    5       45.0               20.8
       16    4       39.5               17.2
       17    3        32.7              13.1
       18    2        24.6                8.8
       19    1        15.2                4.6"


Bill here:

The data above present one half of a roughly bell shaped frequency
distribution. It is abundantly clear that the reduction of cell sizes
reduces the power of the statistics.  This fact is also supported by those
graphs from regression analysis that show the standard error increases as
the values of the predictor are more extreme. ) (

All of this suggests to me that when ever there is a serious desire to infer
causation from correlational data, it is reasonable to seek out uniformly
sampled putative causes.  The problem with using corresponding regressions
with normally distributed causes is that there is not enough information in
the extremes to reveal the polarization effect. We see that data degradation
also occurs in the simplest ANOVA designs when the factors are sampled
normally. This confirms the unity of the general linear model.

I understand your point that the normality assumption applies to the
dependent variable, at least when F or t are being calculated. But if y
values in the extremes of x, have a wider dispersion and hence greater error
when the cell sizes are normally distributed, it would seem that uniformity
in the x factor would be the ideal. When we calculate the difference between
y means, across the levels of x, if the underlying variances are not
identical, then different standard errors should be assumed per mean. This
complicates the ANOVA design and the pooling of error variances. Think about
unequal variances in the t-test.

It may be true that the linear slope calculated on y from x is legitimately
extrapolated across the ranges of y.  But the pattern of deviations about
that slope is not uniform and thus the inferences of the points along y are
not based on uniform parameters. I believe this is a well established fact.
Statistics that require more than theoretically extrapolated slopes, are
thus compromised by unequal cell sizes.

My conclusion from all of this is that where SEM users have hypotheses, they
would best spend the extra time and money uniformly sampling their putative
causes, so to better represent the causal model empirically.

Do you agree?

Bill

> .
>



.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to