one tech issue, one thinking issue, I believe.

1)   Tech:   if np _and_  n(1-p) are > 5, the distribution of binomial 
observations is considered 'close enough' to Normal.  So 'large n' is 
OK, but fails when p, the p(event), gets very small.

Most examples you see in the books use p = .1 or .25 or so.  Modern 
industrial situations usually have p(flaw) around 0.01 and less.  Good 
production will  run under 0.001.  To reach the 'Normal approximation' 
level with p = 0.001, you have to have n = 5000.  Not particularly 
reasonable, in most cases.

If you generate the distribution for the situation with np = 5 and n = 
20 or more, you will see that it is still rather 'pushed' (tech term) up 
against the left side - your eye will balk at calling it normal.  But 
that's the 'rule of thumb.'  I have worked with cases, pushing it down 
to np = 4, and even 3.  However, I wouldn't want to put 3 decimal 
precision on the calculations at that point.

My personal suggestion is that if you believe you have a binomial 
distribution, and you need the confidence intervals or other 
applications of the distribution, then why not simply compute them out 
with the binary equations.  Unless n is quite large, you will have to 
adjust the limits to suit the potential observations, anyway.  For 
example, if n = 10, there is no sense in computing a 3 sigma limit of np 
= 3.678 - you will never measure more precisely than 3, and then 4.  But 
that's the application level speaking here.

2)    I think your books are saying that, when n is very large (or I 
would say, when np>5), the binomial measurement will fit a Normal dist.  
It will be discrete, of course, so it will look like a histogram not a 
continuous density curve.  But you knew that.  I think your book is 
calling the binomial rv a single measurement, and it is the collection 
of repeated measurements that forms the distribution, no?  I explain a 
binomial measurement as,  n pieces touched/inspected, x contain the 
'flaw' in question, so p = x/n.  p is now a single measurement in 
subsequent calculations.  to get a distribution of 100 proportion 
values, I would have to 'touch' 100*n.  I guess that's OK, if you are 
paying the inspector.

Clearly, one of the draw backs of a dichotomous measurement (either OK 
or not-OK) is that we have to measure a heck of a lot of them to start 
getting decent results.  the better the product (fewer flaws) the worse 
it gets.  See the situation for p = 0.001 above.  Eventually we don't 
bother inspecting, or automate and do 100% inspection.  So the next 
paragraph better explain about the improved information with a 
continuous measure...

Sorry, I got up on my soap box by mistake.

Is this enough explanation?

Jay

James Ankeny wrote:

>   Hello,
>     I have a question regarding the so-called normal approx. to the binomial
> distribution. According to most textbooks I have looked at (these are
> undergraduate stats books), there is some talk of how a binomial random
> variable is approximately normal for large n, and may be approximated by the
> normal distribution. My question is, are they saying that the sampling
> distribution of a binomial rv is approximately normal for large n?
> Typically, a binomial rv is not thought of as a statistic, at least in these
> books, but this is the only way that the approximation makes sense to me.
> Perhaps, the sampling distribution of a binomial rv may be normal, kind of
> like the sampling distribution of x-bar may be normal? This way, one could
> calculate a statistic from a sample, like the number of successes, and form
> a confidence interval. Please tell me if this is way off, but when they say
> that a binomial rv may be normal for large n, it seems like this would only
> be true if they were talking about a sampling distribution where repeated
> samples are selected and the number of successes calculated.
> 
> 
> 
> 
> 
> 
> _______________________________________________________
> Send a cool gift with your E-Card
> http://www.bluemountain.com/giftcenter/
> 
> 
> 
> 
> =================================================================
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>                   http://jse.stat.ncsu.edu/
> =================================================================
> 
> 
> 

-- 
Jay Warner
Principal Scientist
Warner Consulting, Inc.
4444 North Green Bay Road
Racine, WI 53404-1216
USA

Ph:     (262) 634-9100
FAX:    (262) 681-1133
email:  [EMAIL PROTECTED]
web:    http://www.a2q.com

The A2Q Method (tm) -- What do you want to improve today?




=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to