On Fri, 28 Sep 2001, John Jackson wrote in part:
> My formula is a rearrangement of the confidence interval formula shown
> below for ascertaining the maximum error.
E = Z(a/2) x SD/SQRT N
> The issue is you want to solve for N, but you have no standard
> deviation value.
Oh, but you do. In the problem you formulated, unless I
misunderstood egregiously, you are seeking to estimate the proportion of
defective (or pirated, or whatever) CDs in a universe of 10,000 CDs.
There is then a maximum value for the SD of a proportion:
SD = SQRT[p(1-p)/n]
where p is the proportion in question, n is the sample size.
This value is maximized for p = 0.5 (and it doesn't change much
between p = 0.3 and p = 0.7 ). If you have a guess as to the value
of p, you can get a smaller value of SD, but using p = 0.5 will
give you a conservative estimate.
You then have to figure out what that "5% error" means: it might
mean "+/- 0.05 on the estimated proportion p" (but this is probably not a
useful error bound if, say, p = 0.03), or it might mean "5% of the
estimated proportion" (which would mean +/- 0.0015 if p = 0.03).
(In the latter case, E is a function of p, so the formula for n
can be solved without using a guesstimated value for p until the last
step.)
Notice that throughout this analysis, you're using the normal
distribution as an approximation to the binomial b(n,p;k) distribution
that presumably "really" applies. That's probably reasonable; but the
approximation may be quite lousy if p is very close to 0 (or 1).
Thbe thing is, of course, that if there is NO pirating of the CDs, p=0,
and this is a desirable state of affairs from your clients' perspective.
So you might want to be in the business of expressing the minimum p
that you could expect to detect with, say, 80% probability, using the
sample size eventually chosen: that is, to report a power analysis.
> The formula then translates into n = (Z(a/2)*SD)/E)^2
> Note: ^2 stands for squared.
>
> You have only the confidence interval, let's say 95% and E of 1%.
> Let's say that you want to find out how many people in the US have
> fake driver's licenses using these numbers. How large (N) must your
> sample be?
Again, you're essentially trying to estimate a proportion. (If it is
the number of instances that is of interest, the distribution is still
inherently binomial, but instead of p you're estimating np, with
SD = SQRT[np(1-p)]
and you still have to decide whether that 1% means "+/- 0.01 on the
proportion p" or "1% of the value of np".
-- DFB.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
184 Nashua Road, Bedford, NH 03110 603-471-7128
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================