On 28 Oct 2003 17:55:10 -0800, [EMAIL PROTECTED] (Scott
Edwards  Tallahassee,FL) wrote:

> Does anyone know of a sampling formula to use for a proportion when
> you have 'clusters' of dependent cases in the population?
> I have to calculate a proportion of certain criteria that are met upon
> inspection of records in an agency.  The problem is that a single
> employee may complete 10-20 records, therefore the assumption of
> independence of cases is blown.  Therefore, I can't use the old
> tried-and-true sampling formula:
> 
>           2 
>          t PQ                   n=sample size
>          ----                   t=t-value for desired confidence level
>            2                      (e.g. 1.96 for 95%)
>           d                     P=proportion with trait in population
> n=  --------------------         (if unknown, 50% most conservative)
>                 2               Q=1-P
>         1      t PQ             d=desired precision
>     1 + -- * ( -----  -  1 )     (confidence int = proportion +-d)
>         N        2              N=population size
>                 d
> 
> which is used for, like, political polls where every case is an
> independent person.
> What do I do?

You are asking for n, for the planning of a survey among N, 
and your formula is using Finite population correction.
You can check with groups.google and see how often 
I have told  people that FPC  is *usually*  a bad idea.   

However, from what you say, I can see that you *might*  
have an application that calls for it.  On the other hand,
if you are trying to meet the requirement of state or
federal regulations, the procedures are probably 
spelled out in detail.  If you are trying to create methods
for a regulatory system, then you need more consultation
than you can get  for free by e-mail.


The first thing I would do is check whether there 
actually is *dependence*.  
You can't assume independence,
but you may be able to demonstrate it.

If the people's responses are not independent, that 
changes the *sort*  of statement that you should make/
 - Is there a correlation of 'p'  with N?  
 - Is this <something> bad, or neutral -- that is, do you *have*  to
place a limit on it, or should you seek a neutral statement 
of what exists, which gives the best feel for the distribution?
(I am asserting that the mean and CI  is apt to be a thin 
statement, if you are looking for understanding.)

Hope this helps.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
"Taxes are the price we pay for civilization." 
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to