Sorry for the Stat 101 lecture, but it seemed apropos... -- Don.

On Fri, 27 Sep 2002, seferiad wrote:

> I'm not sure how to do a T test for binomial distributions.

Short (and not so short) answers are embedded in the original post.

> Let's say I pull 2 sets of samples (100 each). I want to compare to
> see if they came from the same parent distribution. I take 100 and do
> some process to them, I take the other 100 and do something else to
> those.  Some of these parts in both sets fail.  So the mean
> probability of failure is u1 (for set 1) and u2 (for set 2).  u1 = n x
> p1 , u2 = n x p2, where n= 100. The variances are automatically
> different, since variance = n x p x q (and p1 and p2 are different).

Only the two sample variances are different.  The null hypothesis being
that P1 = P2 (for the population probabilities P1 and P2), there is only
one variance _under_H0_.

> Here is where I get confused. If I do the conventional t-test, what do
> I assume for the standard deviation?  To do the T-test, I can use the
> stdev from Set #1 or Set #2, or I can pool the stdev.

Pool the variances:  the pooled proportion p. = (u1 + u2)/(n1 + n2),
its variance = p. x (1-p.)/(n1 + n2).

> Since I don't know which is the corect stdev to use, does it make
> sense to do Ttest for all 3 stdev's, and conclude the following:
>
> Pooled stdev will give the most likely probability of being correct,
> but that using stdev1 and stdev2 (in the denomiator) will effectively
> give 2 confidence intervals that "bound" the correct answer?

No.  Only the pooled value is consistent with the hypothesis being tested.

> What is the accepted approach.

Read on...

> Also, when we do a T-test for a normal distribution, the stdev is
> divided by the square root of n (which makes sense to me).  But for the
> binomial population, the stdev we calculate is sqrt (n x p x q), which
> is already associated to a stdev taken n-samples at a time.

This std. dev. is of the number of successes (or failures, as the case may
be) observed, not of the proportion of successes (or failures).  It is
equivalent to the standard deviation of the observed variable, when the
variable of interest is a measured variable instead of the count of
successes (or failures).

> As such, it seems to me that when doing a T-test for binominal
> distribution, we shouldn't divide the stdev by square root of n, since
> it has already been included.  Otherwise, we would be double counting.

Non sequitur.  You might equally well claim that for a measured variable
one ought not to divide by n, because the number of cases has already
contributed to the computation of mean and variance.

One way of looking at it is this:  a value of t is, like a value of z,
just a standard score:  the deviation of an observed value from its mean,
divided by the standard deviation of the sampling distribution of the
observed value:

               z (or t) = (X - mu)/sigma.

 The mean in question is the (hypothesized) population mean for the
(sampling) distribution of X, and sigma is the population standard
deviation for the same distribution.  The only difference between z and t,
in this context, is whether one has prior knowledge of the population
variance, or must of necessity estimate that from the data.

 When X is the mean of n observations of a variable, mu is the same as for
the variable itself, but the variance of X is (1/n) times the variance for
a single observation:  hence its standard error (== the standard deviation
of its sampling distribution) is sigma/(square root of n).

 In your case, you want to test an hypothesis about the difference between
P1 and P2, which are the (population) means of u1 and u2 (think of them as
the expected number of successes (or failures) per observed case).  The
null hypothesis is  P1 = P2,  equivalent to  P1 - P2 = 0.  The population
mean of this difference (under H0) is zero;  the (population) variance of
the difference is

       P. x Q.(1/n1 + 1/n2)

 where P. is the true proportion (equal for both groups, under H0),
estimated by the pooled proportion p. defined above, and Q. = 1-P.

Summarizing, the test statistic (with (n1 + n2 - 2) degrees of freedom)
is
    t = ((p1 - p2) - 0) / SQRT( P. x Q. x (1/n1 + 1/n2).

[(p1 - p2) is your estimate of (P1 - P2);
 0 is the population value of (P1 - P2) when (as under H0) P1 = P2;
 the denominator is the standard error of (p1 - p2).]

> Just wanted to check, since I'm confused about this.  Is my
> interpretation correct?

Not altogether, as you will have gathered by now.  But your questions are
sensible ones, and entirely appropriate for the state of uncertainty you
describe.  (After all, from one point of view uncertainty is what this
enterprise is all about...)

 -----------------------------------------------------------------------
 Donald F. Burrill                                            [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110                 (603) 626-0816
 [Old address:  184 Nashua Road, Bedford, NH 03110       (603) 471-7128]

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to