Jeremy Bauer wrote:
>
> Good Day,
>
> Out of a population of 220 children, I need to randomly select 10 for use in
> a pilot experiement. The selection of 10 subjects is based on financial
> reasons, not on any power calculations. My question is, how do I make sure
> that the subsample of 10 subjects represent my population of 220?
>
> The question seems basic at face value, but I'm just not comfortable with
> any solutions. It is important that the subsample have similar age, height
> & weight. Do I just run one t-test for each of the 3 dependent variables?
> While the sample sizes are unequal, am I correct in saying that unequal
> sample sizes are "ok" as long as the variances are equal?
(1) The t test in question is a one-sample test. The other set is your
population, its mean is known.
(2) However, the logic of using it as you are doing is wrong. Firstly,
your null hypothesis is that the group is a random samples from the
population; if you know that it is, there is no point testing this.
You hope, I think, to use a t test to test whether you have, by
mischance, drawn an "unlucky" sample. The trouble here is that the
criterion for a poor fit is based not on how poor the fit is, but on the
probability of getting such a poor fit by chance. As your sample size
increases, your chance of detecting a certain difference increases. Any
resemblance to a "sensible" cutoff point is a pure coincidence.
Finally, testing at p=.05, you will only reject the worst 5% of samples
in each category.
(3) The usual solution here would just be to go ahead, draw your
sample, and assume that randomization will work its usual magic. Better
than 86% of the time (that's .95 cubed) that sample would pass your
three tests anyway. Better, because the three criteria are correlated,
so a sample with representative weight will probably have representative
height and age.
There *is* the possibility of using a non-random "representative
sample" or "stratified sample". This is usually a dangerous and
difficult procedure. Here, however, with a well-defined set of
properties to emulate in the sample, and a well-defined sampling frame,
something might be done. One possibility that springs to mind would be
drawing (say) 10 or 100 simple random samples, then choosing the one
which best fits the population. However, standard inference techniques
might well be invalidated by such a nonrandom (and nonstandard) sampling
technique and you would need expert advice throughout.
(4) A better solution might be to analyze whatever it is (_why_ does
everybody who posts to EDSTAT-L asking for advice have to be so
secretive about their goals? Is this like going to the doctor with a
story about "a friend who thinks he might have VD"?) in a way that takes
account of dependence on height, age, and weight. (But see below)
(5) Finally, you'd better do those power calculations... 10 is a
dangerously low sample size. (You're concerned that a sample of size 10
will have unrepresentative heights, ages, and weights - why should you
trust it to be any more representative for whatever you're studying?)
-Robert Dawson
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================