Re: Normally Distributed ANOVA FACTORS?

bijag Wed, 25 Sep 2002 07:38:31 -0700

Hi Paige,

Comments below:


"> > The data above present one half of a roughly bell shaped frequency
> > distribution. It is abundantly clear that the reduction of cell sizes
> > reduces the power of the statistics.  This fact is also supported by
those
> > graphs from regression analysis that show the standard error increases
as
> > the values of the predictor are more extreme.
>
> I didn't follow this last sentence. What graphs? What standard error?

This is pretty standard stuff. For example, on page57 of Kleinbaum and
Kuppers book Applied Regressions Analysis...,  confidence bands are
graphically displayed for a regression model. The bands get wider towards
the ends of the regression slope, thus illustrating wider variation in the
extremes. The width of the confidence band is a function of the standard
error of the estimated y value at each level of the predictor variable. I
remember as a student asking Jamie Algina why this occured but did not get
an answer, and have not heard one since. Perhaps the band gets wider when
the predictor (x) is normally distributed but not when x is uniformly
distributed?

>
> > All of this suggests to me that when ever there is a serious desire to
infer
> > causation from correlational data, it is reasonable to seek out
uniformly
> > sampled putative causes.
>
> This would be ideal, and can be done in designed studies, however many
> studies are not really "designed", the data is collected and you have to
> live with whatever sample sizes occur.

But should you get paid to infer causation from samples that are not
sufficient to warrant that inference. Again we come back to practicality
versus integrity.


>
> There was a recent discussion in one of these stat newsgroups about
> inferring causation from correlation data. I note that people fall on
> both sides of the argument, however, my position is that without subject
> matter knowledge, you cannot get to causation, you only have correlation.


You are speaking as an authority figures expressing an opinion. I have
working and publishing stuff on inference of causation from correlations
since 1985 and I am afraid mere opinion does not get very far. Why do you
think it is impossible, other than that you have been told by your teachers
(who apparently also stress practicality) that is impossible?

>
>   The problem with using corresponding regressions
> > with normally distributed causes is that there is not enough information
in
> > the extremes to reveal the polarization effect. We see that data
degradation
> > also occurs in the simplest ANOVA designs when the factors are sampled
> > normally. This confirms the unity of the general linear model.
>
> I have no idea what polarization means, nor do I understand the term
> "factors are sampled normally". I do not understand "unity of the
> general linear model".

Forgive me, I thought perhaps you had been following the arguments on
corresponding regressions. The general linear model is a model in statistics
that integrates both the correlational and ANOVA traditions into a unified
set of calculations. I mention it because if we subscribe to the general
linear model, then the assumptions we hold for ANOVA should apply for
correlation as well.  By factors are sampled normally, I mean some idiot
goes out and purposefully collects smaller numbers of observations towards
the ends of an anova factor and many towards the middle ranges of the
factor. Thus, the cell sizes of the factor will be approximately normally
distributed. We would ordinarily frown upon someone doing such a thing in a
designed experiment but think nothing of the same sort of sampling occuring
in correlational studies.


>
> > I understand your point that the normality assumption applies to the
> > dependent variable, at least when F or t are being calculated.
>
> The normality assumption applies to the errors in the dependent
> variable, not the dependent variable itself.

Interesting.  So why do so many people prefer normally distributed
variables?

>
> > But if y
> > values in the extremes of x, have a wider dispersion and hence greater
error
> > when the cell sizes are normally distributed, it would seem that
uniformity
> > in the x factor would be the ideal. When we calculate the difference
between
> > y means, across the levels of x, if the underlying variances are not
> > identical, then different standard errors should be assumed per mean.
This
> > complicates the ANOVA design and the pooling of error variances. Think
about
> > unequal variances in the t-test.
> >
> > It may be true that the linear slope calculated on y from x is
legitimately
> > extrapolated across the ranges of y.  But the pattern of deviations
about
> > that slope is not uniform and thus the inferences of the points along y
are
> > not based on uniform parameters. I believe this is a well established
fact.
> > Statistics that require more than theoretically extrapolated slopes, are
> > thus compromised by unequal cell sizes.
>
> Your argument seems to rely on assumptions you make that are not
> universally true. "The pattern of deviations about that slope is not
> uniform ..." I have many industrial examples where the pattern of
> deviations is uniform, regardless of the value of X.

So you do know what I mean above when I talk about standard errors etc!  Ok,
look at the data you mention. Then look at the data in Kleinbaum and Kupper.
Are the uniform confidence bands you see in your data derived from designs
in which the factors/predictors are sampled uniformly across the levels?


>
> > My conclusion from all of this is that where SEM users have hypotheses,
they
> > would best spend the extra time and money uniformly sampling their
putative
> > causes, so to better represent the causal model empirically.
>
> Well, now you drag in SEM ... you are really stretching to make a point,
> aren't you? SEM is often done on data that is collected based upon
> historical studies, where uniform sampling simply isn't possible. What
> is your point?

You seem to believe you have a superior intellect and trainning.  What has
really happened is that I am using your insatiable need to show off your
knowledge of statistics, to illustrate a point. That point is, that unequal
cell sizes create differences in the precision of estimates. Using unequal
cell sizes, as is the habit of correlational and SEM people of practical
inclincation, builds into the statistics serious problems. Because of near
mystical attachments of people to normal distributions, not only have many
statistical studies been compromised, but future developments in causal
inference are being obstructed. So my wonderful colleague, I will drag the
truth into any conversation I enter, without hesitation or apology.
Dragging is what one must do when dealing with reticent "professionals" who
collude in incompetence and ignorance.

>
> > Do you agree?
>
> I don't agree, I don't disagree, to put it simply, I don't follow what
> your argument.


Would you create an anova design with cell sizes that are normally
distributed across the levels of your factors? If so, why?

Best,

Bill


>
> --
> Paige Miller
> [EMAIL PROTECTED]
> http://www.kodak.com
>
> "It's nothing until I call it!" -- Bill Klem, NL Umpire
> "When you get the choice to sit it out or dance, I hope you dance" --
> Lee Ann Womack
>



.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Normally Distributed ANOVA FACTORS?

Reply via email to