Rich,
Thanks for your response.  See comments/questions interspersed below.
Scott


Rich Ulrich <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>...
> On 28 Oct 2003 17:55:10 -0800, [EMAIL PROTECTED] (Scott
> Edwards  Tallahassee,FL) wrote:
> 
> > Does anyone know of a sampling formula to use for a proportion when
> > you have 'clusters' of dependent cases in the population?
> > I have to calculate a proportion of certain criteria that are met upon
> > inspection of records in an agency.  The problem is that a single
> > employee may complete 10-20 records, therefore the assumption of
> > independence of cases is blown.  Therefore, I can't use the old
> > tried-and-true sampling formula:
> > 
> >           2 
> >          t PQ                   n=sample size
> >          ----                   t=t-value for desired confidence level
> >            2                      (e.g. 1.96 for 95%)
> >           d                     P=proportion with trait in population
> > n=  --------------------         (if unknown, 50% most conservative)
> >                 2               Q=1-P
> >         1      t PQ             d=desired precision
> >     1 + -- * ( -----  -  1 )     (confidence int = proportion +-d)
> >         N        2              N=population size
> >                 d
> > 
> > which is used for, like, political polls where every case is an
> > independent person.
> > What do I do?
> 
> You are asking for n, for the planning of a survey among N, 
> and your formula is using Finite population correction.
> You can check with groups.google and see how often 
> I have told  people that FPC  is *usually*  a bad idea.   

  I was unaware of this. I will check your former messages.
This appeared to be the 'standard' formula for sample size calculation
when you are interested in a proportion of items that pass/fail, *and
you have independence* (e.g. political polls), so I'm afraid that many
of us are making this error. However, I am definitely not tied to this
formula and am just looking for a method to get the job done as
accurately as possible.
  
> However, from what you say, I can see that you *might*  
> have an application that calls for it.  On the other hand,
> if you are trying to meet the requirement of state or
> federal regulations, the procedures are probably 
> spelled out in detail.  If you are trying to create methods
> for a regulatory system, then you need more consultation
> than you can get  for free by e-mail.

  Actually, I was attempting to state why I *couldn't* use this
formula, since it assumes independence, which I must be very suspect
of.
Regarding your comment on regulation, this is simply a data analysis
problem - I'm not clear why the issue of regulation is relevant.  If
the methodology had been laid out in a regulation then I definitely
wouldn't be wasting your guys time asking help in formulating one. 
The problem from a research design/analysis standpoint doesn't strike
me as *that* unusual.  I've read many times of how analyses must be
adjusted due to 'clusters' of data points that are not independent
(eg. effect of temperature on performance of athletes measured
multiple time), I just haven't seen how to approach it from a sample
size determination perspective.  Actually, it occurs to me that
perhaps I should be conceptualizing the design as one of
repeated-measures (each employee being a subject, with each record
being evaluated being a point of measurement) - would that perhaps
clarify how to determine the sample size?  The problem of course is
that I must end up with a point estimate and confidence interval for
the entire _agency_, rather than looking at the difference between
subjects, as in a typical repeated measures experiment.  Still seems
like it could be done.

  WIth regard to free vs. paid help, I certainly have to objection to
people being paid for their time, and would have no objection to
pursue that path if I had sufficient resources to pay someone
$100/hour.  However, I posted it here for two additional important
reasons.

    1.  I wouldn't be sure who to approach to pay - I've tried all my
stat friends and colleagues to no avail - and by posting it here I
thought that I would approach the largest possible audience.

    2.  I was under the impression that this group was for the purpose
of discussing interesting statistical issues/problems that had
applicablility beyond the specific problem.  Perhaps I'm missing
something, but I'm unable to see your perspective that this problem
would only come up in the context of 'regulation' - to the contrary,
it seems to me it would come up in many instances of evaluating
organizations as a whole, with many individuals, performing multiple
tasks (e.g a factory, with many employees, making many widgets each
and you wanted to estimate *factory-wide* the proportion of defects in
the widgets that was occurring - this is the *exact* same problem that
I have)

 
> The first thing I would do is check whether there 
> actually is *dependence*.  
> You can't assume independence,
> but you may be able to demonstrate it.

Unfortunately, I don't have a sample of the data to estimate the
degree of dependence.  Plus, if I wanted to go out and get such a
sample, I'd have to bug you guys for help on how to determine *that*
sample size.  :)

> If the people's responses are not independent, that 
> changes the *sort*  of statement that you should make/
>  - Is there a correlation of 'p'  with N?  

Do you mean lower case 'p' (i.e. p-value) or the P in the formula
above?

>  - Is this <something> bad, or neutral -- that is, do you *have*  to
> place a limit on it, or should you seek a neutral statement 
> of what exists, which gives the best feel for the distribution?
> (I am asserting that the mean and CI  is apt to be a thin 
> statement, if you are looking for understanding.)

Not sure if I understand this part completely, but I definitely am
looking for a neutral, unbiased statement of my best guess of what
*exists* (with a confidence level and interval of course)

> Hope this helps.

Thanks for your time,

Scott Edwards
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to