one tech issue, one thinking issue, I believe.
1) Tech: if np _and_ n(1-p) are > 5, the distribution of binomial
observations is considered 'close enough' to Normal. So 'large n' is
OK, but fails when p, the p(event), gets very small.
Most examples you see in the books use p = .1 or .25 or so. Modern
industrial situations usually have p(flaw) around 0.01 and less. Good
production will run under 0.001. To reach the 'Normal approximation'
level with p = 0.001, you have to have n = 5000. Not particularly
reasonable, in most cases.
If you generate the distribution for the situation with np = 5 and n =
20 or more, you will see that it is still rather 'pushed' (tech term) up
against the left side - your eye will balk at calling it normal. But
that's the 'rule of thumb.' I have worked with cases, pushing it down
to np = 4, and even 3. However, I wouldn't want to put 3 decimal
precision on the calculations at that point.
My personal suggestion is that if you believe you have a binomial
distribution, and you need the confidence intervals or other
applications of the distribution, then why not simply compute them out
with the binary equations. Unless n is quite large, you will have to
adjust the limits to suit the potential observations, anyway. For
example, if n = 10, there is no sense in computing a 3 sigma limit of np
= 3.678 - you will never measure more precisely than 3, and then 4. But
that's the application level speaking here.
2) I think your books are saying that, when n is very large (or I
would say, when np>5), the binomial measurement will fit a Normal dist.
It will be discrete, of course, so it will look like a histogram not a
continuous density curve. But you knew that. I think your book is
calling the binomial rv a single measurement, and it is the collection
of repeated measurements that forms the distribution, no? I explain a
binomial measurement as, n pieces touched/inspected, x contain the
'flaw' in question, so p = x/n. p is now a single measurement in
subsequent calculations. to get a distribution of 100 proportion
values, I would have to 'touch' 100*n. I guess that's OK, if you are
paying the inspector.
Clearly, one of the draw backs of a dichotomous measurement (either OK
or not-OK) is that we have to measure a heck of a lot of them to start
getting decent results. the better the product (fewer flaws) the worse
it gets. See the situation for p = 0.001 above. Eventually we don't
bother inspecting, or automate and do 100% inspection. So the next
paragraph better explain about the improved information with a
continuous measure...
Sorry, I got up on my soap box by mistake.
Is this enough explanation?
Jay
James Ankeny wrote:
> Hello,
> I have a question regarding the so-called normal approx. to the binomial
> distribution. According to most textbooks I have looked at (these are
> undergraduate stats books), there is some talk of how a binomial random
> variable is approximately normal for large n, and may be approximated by the
> normal distribution. My question is, are they saying that the sampling
> distribution of a binomial rv is approximately normal for large n?
> Typically, a binomial rv is not thought of as a statistic, at least in these
> books, but this is the only way that the approximation makes sense to me.
> Perhaps, the sampling distribution of a binomial rv may be normal, kind of
> like the sampling distribution of x-bar may be normal? This way, one could
> calculate a statistic from a sample, like the number of successes, and form
> a confidence interval. Please tell me if this is way off, but when they say
> that a binomial rv may be normal for large n, it seems like this would only
> be true if they were talking about a sampling distribution where repeated
> samples are selected and the number of successes calculated.
>
>
>
>
>
>
> _______________________________________________________
> Send a cool gift with your E-Card
> http://www.bluemountain.com/giftcenter/
>
>
>
>
> =================================================================
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
> http://jse.stat.ncsu.edu/
> =================================================================
>
>
>
--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
4444 North Green Bay Road
Racine, WI 53404-1216
USA
Ph: (262) 634-9100
FAX: (262) 681-1133
email: [EMAIL PROTECTED]
web: http://www.a2q.com
The A2Q Method (tm) -- What do you want to improve today?
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================