Re: stats question: binomial CI, finite population

Michael Dietze Mon, 17 Dec 2007 11:05:53 -0800

My 2 cents,

The data being modeled here is neither normally distributed, nor
binomially distributed, but rather is hypergeometric.  If you use normal
or binomial CI then for some specified level of confidence you will
always be able to generate non-sensible CI's (e.g. the probability of
observing < 20 cases in the full population of 100 is > 0 despite the
fact that you have already observed 20 cases in your sample of 50).
You can instead work with the hypergeometric directly to calculate the
full posterior density of the number of cases.  In this case it gave me
a 95% CI that the full population has 31-50 cases.  It is reassuring
that the FPC gives a very similar answer (30-50%).


Example R code:

### hypergeometric posterior
N <- 100  ## total population size
k <- 50   ## finite population size
x <- 20   ## number of observed cases
m <- seq(0,100)  ## possible number of cases
n <- N-m         
p <- dhyper(x,m,n,k)*1/N  ## likelihood * flat prior
p <- p/sum(p)             ## normalize posterior
plot(m,p)                 ## plot posterior distribution
findInterval(c(0.025,0.5,0.975),cumsum(p))  ## calculate median & 95% CI

  -- Mike



On Mon, 2007-12-17 at 11:29 -0500, Dave Hewitt wrote:
> Sue, Phil, et al.,
> 
> The essential part of Sue's concern is that, given that she observed 50% of 
> the total population, the typical standard error for the estimated 
> proportion is too large. If you observe that much of the population, you 
> should be more "confident" in your estimate of the proportion. Indeed, 
> you're half way to not needing an estimate at all. The continuity 
> correction that Phil mentioned is relevant to confidence intervals for any 
> estimated proportion, but does not address Sue's primary concern.
> 
> The standard error (SE) for the proportion is -- 
> sqrt((prop*(1-prop))/ssize) -- where ssize is sample size (here, 50) and 
> prop is the estimated proportion (here, 0.4). SE = 0.069
> 
> The correction to SE that Sue is looking for is accomplished by multiplying 
> the standard error by the finite population correction (FPC), which reduces 
> the standard error:
> 
> FPC is -- sqrt((popsize-ssize)/(popsize-1)) -- where popsize is the 
> population size (here, 100). FPC = 0.71
> 
> The corrected standard error is SE*FPC = 0.049.
> 
> You can then get an approximate confidence interval for whatever 
> "confidence" you want by multiplying the corrected standard error by the 
> associated Z value. For a typical 95% interval, Z=1.96 and the interval is 
> (0.30, 0.50).
> 
> Using a simple continuity correction (adding -- 0.5/ssize -- to the upper 
> limit and substracting -- 0.5/ssize -- from the lower limit) widens it a 
> touch: (0.29, 0.51). [This isn't exact, but the continuity correction does 
> little here.]
> 
> Using prop.test(20, 50, conf.level=0.95) from R in its "raw" form, which 
> includes the continuity correction but is uncorrected for the FPC, gives 
> (0.27, 0.55). I didn't notice a quick way to adjust for the FPC within 
> prop.test(), but I suspect someone has done it.
> 
> Bottom line: you gain a bit with the FPC, as you'd expect.
> 
> At 08:55 AM 12/17/2007 -0500, Phil Novack-Gottshall wrote:
> >Dear Suzanne and interested others,
> >
> >The latest best practice I've run across is the
> >Wilson method using Yates' continuity
> >correction.  It is formally described and advocated in the following 
> >articles:
> >
> >Newcombe R.G. (1998) Two-Sided Confidence
> >Intervals for the Single Proportion: Comparison
> >of Seven Methods. Statistics in Medicine 17, 857Â872.
> >
> >Newcombe R.G. (1998) Interval Estimation for the
> >Difference Between Independent Proportions:
> >Comparison of Eleven Methods. Statistics in Medicine 17, 873Â890.
> >
> >If you use the stats language R, it's implemented
> >using prop.test (which also allows two-sample
> >testing of equal proportions).  There's also a
> >web interface at: http://faculty.vassar.edu/lowry/prop1.html
> >
> >Good luck,
> >Phil
> >
> >At 10:37 PM 12/16/2007, Suzanne Griffin wrote:
> > >Can anyone tell me how to compute CI's for a
> > >proportion when the sample is fron from a finite
> > >population? For example,?the population size is
> > >100, I sample 50 individuals, and the event of
> > >interest occurs in 20 cases. I want to put
> > >confidence intervals around that 0.40.
> > >
> > >I would appreciate any guidance.
> > >
> > >Sue
> > >
> > >Suzanne Griffin
> > >Wildlife Biology Program
> > >College of Forestry and Conservation
> > >University of Montana
> > >Missoula, MT 59812
> > >
> >
> >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >    Phil
> >Novack-Gottshall
> >[EMAIL PROTECTED]
> >
> >    Assistant Professor
> >    Department of Geosciences
> >    University of West Georgia
> >    Carrollton, GA 30118-3100
> >    Phone: 678-839-4061
> >    Fax: 678-839-4071
> >    http://www.westga.edu/~pnovackg
> >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Dave Hewitt
> VIMS, Gloucester Point, VA
-- 
____________________________________________________________
[EMAIL PROTECTED]              22 Divinity Ave
http://www.esm.harvard.edu           Cambridge, MA 02138

Re: stats question: binomial CI, finite population

Reply via email to