On Sat, 21 Dec 2002, Ronny Richardson wrote in part:

> First, in experimenting around with a 3x3 ANOVA without replication,
> I filled the cells with =RAND() to generate random numbers and I
> happened to get the following:
>
>                    SS    df                MS              F
> Rows     -2.105562167     2      -1.052781083    -1.336268078
> Columns  -0.226230909     2      -0.113115455    -0.143574551
> Error     3.151406819     4       0.787851705
> Total     0.819613743     8

Your intuition is correct:  negative results for SS are not possible.
I have heard (some folks have documentation, I gather) that EXCEL's
statistical algorithms sometimes leave something to be desired;  perhpas
this is an instance.  OTOH, it is imaginably a rounding problem, in
subtracting two numbers whose difference is near zero, so that one is
losing the most significant digits in each value in the subtraction, and
is left with an unreliable difference.  Hard to tell without the actual
numerical data being used, as you point out below.

> Unfortunately, I did not convert the random number functions to
> numbers so the specific data I used to generate this table was lost
> when the worksheet recalculated.
>
> I've repeated this experiment several times and sometimes the sums
> of squares are positive and sometimes one or more of them are
> negative. Is this a bug in Excel? I don't see how a sum of *square*
> can be negative even though I am using random numbers so all of the
> SS should be in error.

Using random values as you are, without building in systematic
differences attributable to the two factors, you rather expect the SS to
be zero.  If the SS were calculated by subtracting means from
"observed" values, squaring these, and adding them up, you'd invariably
have positive (or at least non-negative) SS.  But EXCEL was programed
blindly from a stats text, using "computational" formulae that entail
squaring the raw data, adding them up, and subtracting a value computed
as the squared sum of the raw values, divided by the number of cases.
Any computer does arithmetic in finite precision, often amounting to
only six hexadecimal digits in single precision.  If it takes more
precision than that to distinguish between the two values being
subtracted, the difference may have such poor precision that you can't
even tell its sign.  I suspect that's what's happening here.

> Second, Excel offers a two-factor ANOVA with and without
> replication. None of the examples of two-way ANOVA in any of the
> textbooks that I own shows a two-way ANOVA without replication. It
> seems counter intuitive to me to perform ANOVA with a sample size of
> one in each cell. Am I missing something here?

VERY seldom done with so few as 2 or 3 levels of the factors.  But might
be done for, say, a 4x10 design, where the 10 are randomly chosen levels
of a nuisance factor, one is really interested in whether there are
subtle differences among the 4, and one really believes there is no
interaction between the factors:  this case is often called a
"randomized blocks" (RB) design.  I used to assign such an example to
students in intermediate statistics.  The design is particularly useful
if you really believe the "no interaction" part and observations are
expensive (or otherwise difficult) to obtain.

The same logical design is often treated in textbooks for statistics in
psychology as a "repeated measures" design;  frankly, I prefer to
reserve that usage for a whole class of designs, not just a one-way
design with repeated measures on the cases observed.

> The output from with and without replication is significantly
> different as well.

That was a pun, I take it.  ("significantly")

> Using made up data, without replication, I get the following:
>
>                   SS     df      MS     F     P-value       F crit
> Rows           56.25      1   56.25    25      0.1257     161.4462
> Columns       110.25      1  110.25    49      0.0903     161.4462
> Error           2.25      1    2.25
> Total         168.75      3
>
> Using made up data, with replication, I get the following
>
>                   SS     df      MS         F    P-value     F crit
> Sample        234.38      1  234.38  130.8140     0.0000     4.3513
> Columns       693.38      1  693.38  387.0000     0.0000     4.3513
> Interaction     0.38      1    0.38    0.2093     0.6522     4.3513
> Within         35.83     20    1.79
> Total         963.96     23
>
> It makes no sense to me to call them "rows and columns" without
> replication and "sample and columns" with replication.

Yes.  I wondered about that "Sample" label when I first saw it in your
earlier post.  Evidently EXCEL does not permit one to label the sources
sensibly (that is, according to what the data actually refer to).

> It sort of makes sense not to get interaction without replication. I
> am assuming that with a sample size of one in each cell that there
> is not enough information to compute interactions.

Nope.  With n=1 per cell, what you cannot estimate is the pooled
within-cell variance (the deviation of each observation from the cell
mean is zero, since each observation IS the cell mean).  What you get is
the interaction SS, labelled "Error", because while you believe there's
no interaction, it follows (details in a standard text) that the
expected value of the interaction MS is the random error variance
(usually called sigma-squared) which you cannot estimate directly from a
pooled within-cell variance (as explained just above).

HTH.     -- DFB.
 -----------------------------------------------------------------------
 Donald F. Burrill                                            [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110                 (603) 626-0816
 [was:  184 Nashua Road, Bedford, NH 03110               (603) 471-7128]

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to