Here is my solution using figures which are self-explanatory:
Sample Size Determination
pi = 50% central area 0.99
confid level= 99% 2 tail area 0.5
sampling error 2% 1 tail area 0.025
z =2.58
n1 4,146.82 Excel function for determining central interval
NORMSINV($B$10+(1-$B$10)/2)
n 4,147
The algebraic formula for n was: n = ?(1-?)*(z/e)2
If you can't read the above:
n = pi(1-pi)*(z/e)^2
Let me know if this makes sense.
It is simply amazing to me that you can do a random sample of 4,147 people
out of 50 million and get a valid answer. What is the reason for taking
mulitple samples of the same n - to achieve more accuracy? Is there a rule
of thumb on how many repetitions of the same sample you would take?
"John Jackson" <[EMAIL PROTECTED]> wrote in message
s1ot7.61225$[EMAIL PROTECTED]">news:s1ot7.61225$[EMAIL PROTECTED]...
> Donald - Thank you for your cogent explanation of a concept that is a bit
> hard to grasp.
> After researching it more, I determined that there is a gaping hole in my
> knowldege relating to the area of inferences on a population proportion so
I
> am somethat admittedly in the dark and have to study up a bit.
>
> Having said that, here are some answers to ?s you posed and some
additional
> comments.
>
> Instead of a warehouse full of CDs, lets work w/a much larger population.
>
> Revised fact pattern:
>
> Suppose you want to estimate the % of voters who acutally voted in the
2000
> U.S. Presidential election who failed to make a choice for any candidate
> (blank ballot). Assume (forgetting about politics) that this was simply a
> matter of inadvertance, error on the part of the voter, that all voting
> machines worked properly, and that the problem manifested itself the same
> way all over the country. You want to estimate how many ballots were blank
> and be 98% confident that the error of estimate is 2% or less. So you have
a
> universe of 50m voters or however many went to the polls. Assume you don't
> really know if its is 50m or 75m or 100m. You just know its in the tens of
> millions.
>
> So you want to estimate the proportion of blank ballots, knowing that a
huge
> number of people went to the polls. You mention and I see it stated in
some
> books that when you don't know the SD and don't know the exact population
> size, other than that is in the millions, the safest choice is p = .5 -
that
> apparently is a sort of worse case scenario it seems......... I have to
> reread my material and also revisit the binomial distribution area which I
> have studied extensively. However that knowledge has been pushed out of
the
> way by this complex area of sampling.
>
> Anyway, if you have some further thoughts given my clarification, I would
> welcome your insights.
>
>
> "Donald Burrill" <[EMAIL PROTECTED]> wrote in message
> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > On Fri, 28 Sep 2001, John Jackson wrote in part:
> >
> > > My formula is a rearrangement of the confidence interval formula shown
> > > below for ascertaining the maximum error.
> > E = Z(a/2) x SD/SQRT N
> > > The issue is you want to solve for N, but you have no standard
> > > deviation value.
> > Oh, but you do. In the problem you formulated, unless I
> > misunderstood egregiously, you are seeking to estimate the proportion of
> > defective (or pirated, or whatever) CDs in a universe of 10,000 CDs.
> > There is then a maximum value for the SD of a proportion:
> > SD = SQRT[p(1-p)/n]
> > where p is the proportion in question, n is the sample size.
> > This value is maximized for p = 0.5 (and it doesn't change much
> > between p = 0.3 and p = 0.7 ). If you have a guess as to the value
> > of p, you can get a smaller value of SD, but using p = 0.5 will
> > give you a conservative estimate.
> > You then have to figure out what that "5% error" means: it might
> > mean "+/- 0.05 on the estimated proportion p" (but this is probably not
a
> > useful error bound if, say, p = 0.03), or it might mean "5% of the
> > estimated proportion" (which would mean +/- 0.0015 if p = 0.03).
> > (In the latter case, E is a function of p, so the formula for n
> > can be solved without using a guesstimated value for p until the last
> > step.)
> > Notice that throughout this analysis, you're using the normal
> > distribution as an approximation to the binomial b(n,p;k) distribution
> > that presumably "really" applies. That's probably reasonable; but the
> > approximation may be quite lousy if p is very close to 0 (or 1).
> > Thbe thing is, of course, that if there is NO pirating of the CDs, p=0,
> > and this is a desirable state of affairs from your clients' perspective.
> > So you might want to be in the business of expressing the minimum p
> > that you could expect to detect with, say, 80% probability, using the
> > sample size eventually chosen: that is, to report a power analysis.
> >
> > > The formula then translates into n = (Z(a/2)*SD)/E)^2
> > > Note: ^2 stands for squared.
> > >
> > > You have only the confidence interval, let's say 95% and E of 1%.
> > > Let's say that you want to find out how many people in the US have
> > > fake driver's licenses using these numbers. How large (N) must your
> > > sample be?
> >
> > Again, you're essentially trying to estimate a proportion. (If it is
> > the number of instances that is of interest, the distribution is still
> > inherently binomial, but instead of p you're estimating np, with
> > SD = SQRT[np(1-p)]
> > and you still have to decide whether that 1% means "+/- 0.01 on the
> > proportion p" or "1% of the value of np".
> > -- DFB.
>
------------------------------------------------------------------------
> > Donald F. Burrill
[EMAIL PROTECTED]
> > 184 Nashua Road, Bedford, NH 03110
603-471-7128
> >
> >
> >
> > =================================================================
> > Instructions for joining and leaving this list and remarks about
> > the problem of INAPPROPRIATE MESSAGES are available at
> > http://jse.stat.ncsu.edu/
> > =================================================================
>
>
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================