Thanks Kjetil,
I hadn't thought of using simulation. I've done a bit of discrete
event simulation for operations research problems, but this looks
quite a bit different. To be honest, I haven't used R before, and I
simply can't follow your code. What assumptions does a simulation
like this make? (similar to the ones statistical models have like
independence, normal curve, homoscedasticity, etc)
Scott
[EMAIL PROTECTED] wrote in message news:<[EMAIL PROTECTED]>...
> On 29 Oct 2003 at 19:49, Scott Edwards Tallahassee,FL wrote:
>
> I would attck this kind of problem with simulation, much faster than
> trying to get at an formula. Your problem could be frased as a
> multilevel problem, so you could find more help at the multilevel
> list:
> [EMAIL PROTECTED]
>
> I would approch simulation using a binomial mixec model, with groups
> the individuals, using R:
>
> library(nlme)
> library(MASS)
> > m <- 20 # records per person
> > n <- 100 # number of persons
> > a <- log(0.1/0.9) # logit model for p=0.1
> > sdepsilon <- 0.2 # standard deviation on logit scale
> > Y <- numeric(n*m) # binary observation vector
> > id <- factor(rep(1:n, rep(m,n)) ) # group identification
> > for (i in 1:n) { # simulation
> + epsilon <- rnorm(1, sd=sdepsilon)
> + p <- exp( a + epsilon)/(1+exp(a+epsilon))
> + for (j in 1:m) {
> + Y[(i-1)*m + j] <- rbinom(1,1,p)
> + }}
> mod1 <- glmmPQL(fixed= Y ~ 1, random = ~ 1 | id, family=binomial)
> > intervals(mod1)
> Approximate 95% confidence intervals
>
> Fixed effects:
> lower est. upper
> (Intercept) -2.326024 -2.180668 -2.035312 # this is the
> # interval for "a "
> # whjich you are
> #interested in
> attr(,"label")
> [1] "Fixed effects:"
>
> Random Effects:
> Level: id
> lower est. upper
> sd((Intercept)) 1.030459e-05 0.03553273 122.5254
>
> Within-group standard error:
> lower est. upper
> 0.9693968 0.9998783 1.0313182
>
>
> Note that the interval is on logit scale, you must tarnsform to get
> it on probability scale.
>
> Hope this helps,
>
> Kjetil Halvorsen
>
>
> > Rich,
> > Thanks for your response. See comments/questions interspersed below.
> > Scott
> >
> >
> > Rich Ulrich <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>...
> > > On 28 Oct 2003 17:55:10 -0800, [EMAIL PROTECTED] (Scott
> > > Edwards Tallahassee,FL) wrote:
> > >
> > > > Does anyone know of a sampling formula to use for a proportion when
> > > > you have 'clusters' of dependent cases in the population?
> > > > I have to calculate a proportion of certain criteria that are met upon
> > > > inspection of records in an agency. The problem is that a single
> > > > employee may complete 10-20 records, therefore the assumption of
> > > > independence of cases is blown. Therefore, I can't use the old
> > > > tried-and-true sampling formula:
> > > >
> > > > 2
> > > > t PQ n=sample size
> > > > ---- t=t-value for desired confidence level
> > > > 2 (e.g. 1.96 for 95%)
> > > > d P=proportion with trait in population
> > > > n= -------------------- (if unknown, 50% most conservative)
> > > > 2 Q=1-P
> > > > 1 t PQ d=desired precision
> > > > 1 + -- * ( ----- - 1 ) (confidence int = proportion +-d)
> > > > N 2 N=population size
> > > > d
> > > >
> > > > which is used for, like, political polls where every case is an
> > > > independent person.
> > > > What do I do?
> > >
> > > You are asking for n, for the planning of a survey among N,
> > > and your formula is using Finite population correction.
> > > You can check with groups.google and see how often
> > > I have told people that FPC is *usually* a bad idea.
> >
> > I was unaware of this. I will check your former messages.
> > This appeared to be the 'standard' formula for sample size calculation
> > when you are interested in a proportion of items that pass/fail, *and
> > you have independence* (e.g. political polls), so I'm afraid that many
> > of us are making this error. However, I am definitely not tied to this
> > formula and am just looking for a method to get the job done as
> > accurately as possible.
> >
> > > However, from what you say, I can see that you *might*
> > > have an application that calls for it. On the other hand,
> > > if you are trying to meet the requirement of state or
> > > federal regulations, the procedures are probably
> > > spelled out in detail. If you are trying to create methods
> > > for a regulatory system, then you need more consultation
> > > than you can get for free by e-mail.
> >
> > Actually, I was attempting to state why I *couldn't* use this
> > formula, since it assumes independence, which I must be very suspect
> > of.
> > Regarding your comment on regulation, this is simply a data analysis
> > problem - I'm not clear why the issue of regulation is relevant. If
> > the methodology had been laid out in a regulation then I definitely
> > wouldn't be wasting your guys time asking help in formulating one.
> > The problem from a research design/analysis standpoint doesn't strike
> > me as *that* unusual. I've read many times of how analyses must be
> > adjusted due to 'clusters' of data points that are not independent
> > (eg. effect of temperature on performance of athletes measured
> > multiple time), I just haven't seen how to approach it from a sample
> > size determination perspective. Actually, it occurs to me that
> > perhaps I should be conceptualizing the design as one of
> > repeated-measures (each employee being a subject, with each record
> > being evaluated being a point of measurement) - would that perhaps
> > clarify how to determine the sample size? The problem of course is
> > that I must end up with a point estimate and confidence interval for
> > the entire _agency_, rather than looking at the difference between
> > subjects, as in a typical repeated measures experiment. Still seems
> > like it could be done.
> >
> > WIth regard to free vs. paid help, I certainly have to objection to
> > people being paid for their time, and would have no objection to
> > pursue that path if I had sufficient resources to pay someone
> > $100/hour. However, I posted it here for two additional important
> > reasons.
> >
> > 1. I wouldn't be sure who to approach to pay - I've tried all my
> > stat friends and colleagues to no avail - and by posting it here I
> > thought that I would approach the largest possible audience.
> >
> > 2. I was under the impression that this group was for the purpose
> > of discussing interesting statistical issues/problems that had
> > applicablility beyond the specific problem. Perhaps I'm missing
> > something, but I'm unable to see your perspective that this problem
> > would only come up in the context of 'regulation' - to the contrary,
> > it seems to me it would come up in many instances of evaluating
> > organizations as a whole, with many individuals, performing multiple
> > tasks (e.g a factory, with many employees, making many widgets each
> > and you wanted to estimate *factory-wide* the proportion of defects in
> > the widgets that was occurring - this is the *exact* same problem that
> > I have)
> >
> >
> > > The first thing I would do is check whether there
> > > actually is *dependence*.
> > > You can't assume independence,
> > > but you may be able to demonstrate it.
> >
> > Unfortunately, I don't have a sample of the data to estimate the
> > degree of dependence. Plus, if I wanted to go out and get such a
> > sample, I'd have to bug you guys for help on how to determine *that*
> > sample size. :)
> >
> > > If the people's responses are not independent, that
> > > changes the *sort* of statement that you should make/
> > > - Is there a correlation of 'p' with N?
> >
> > Do you mean lower case 'p' (i.e. p-value) or the P in the formula
> > above?
> >
> > > - Is this <something> bad, or neutral -- that is, do you *have* to
> > > place a limit on it, or should you seek a neutral statement
> > > of what exists, which gives the best feel for the distribution?
> > > (I am asserting that the mean and CI is apt to be a thin
> > > statement, if you are looking for understanding.)
> >
> > Not sure if I understand this part completely, but I definitely am
> > looking for a neutral, unbiased statement of my best guess of what
> > *exists* (with a confidence level and interval of course)
> >
> > > Hope this helps.
> >
> > Thanks for your time,
> >
> > Scott Edwards
> > .
> > .
> > =================================================================
> > Instructions for joining and leaving this list, remarks about the
> > problem of INAPPROPRIATE MESSAGES, and archives are available at:
> > . http://jse.stat.ncsu.edu/ .
> > =================================================================
>
>
> .
> .
> =================================================================
> Instructions for joining and leaving this list, remarks about the
> problem of INAPPROPRIATE MESSAGES, and archives are available at:
> . http://jse.stat.ncsu.edu/ .
> =================================================================
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================