On Thu, 16 Jan 2003 20:35:02 GMT, Jerry Dallal
<[EMAIL PROTECTED]> wrote:
> Dennis Roberts wrote:
>
> > Comments appreciated.
>
> I think that what you and Rich are struggling with is that there is
> a difference between an expected length and a given probability of
> not exceeding some length. If it's not, it's still an issue.
After trying to figure what to post next, I have finally
concluded that nobody else knows how to figure that, either.
(a) I don't have a textbook reference on how to figure
the CI of a standard deviation;
(b) I don't remember where and when I learned it; and
(c) most of you folks don't know what I was talking about,
because you have never walked through the basic problem.
Here is an elementary lesson.
==STATEMENTS CONCERNING A POPULATION VARIANCE.
Given: Population assumed to be (more or less) 'normal',
with sample variance V estimated with n=16, 15 d.f.
Someone is interested in computing the related
quantities in a future sample: V;
S= standard deviation; SE= standard error.
>From the tabled distribution for chisquared, 15 d.f.
tail% X^2
0.975 6.27
0.50 14.34
0.20 19.31
0.10 22.31
0.05 25.00
0.025 27.49
Recall: The variance is distributed proportionate to
the chisquared/d.f. Therefore, one can estimate a
95% confidence interval for the population variance,
by preserving the respective chisquared relationships
(as proportions).
LowerLimit < Expected value < Upper Limit
6.27/15 < 15.0/15 < 27.49/15
or 0.42 < 1 < 1.83
When we set the observed variance to the Expected value,
we observe that the sample of n=16 yields a fairly
wide relative range for the population estimate of
the variance, (0.42, 1.83)- times- the observed.
If you want the one-tailed CI like I do, you can use
the one-tailed limit, which is a relative 25/15
or 1.67 (still 95%).
Note that if we started with a known (or assumed)
population variance, that same variability would
also belong to any new, small sample of 16.
That is, the chisquared could give us a probability
statement to extrapolate from the known to a sample.
==STATEMENT ABOUT ANOTHER SAMPLE.
We can use a familiar method in order to extrapolate
a confidence limit on the variance, from one sample
to another sample. Namely, we consider the test that
would show them to be different (which is what we do
with the t-test and the CI for the mean), and then
'invert' the test -- describe it in terms of its limits.
The familiar F-test is the ratio of (X^2/df)/ (X^2/df).
So the table of F can be used to see what variance
would be sufficiently different to be "unlikely" at
the 5% level, or whatever.
For a 5% test, one-tailed, with 15 d.f. for the
smaller variance; for numerators 15, 75 and infinite.
d.f. cutoff-ratio (F)
15 2.41
75 2.15
inf. 2.07
Either 75 or 'infinite' degrees of freedom allow the
to-be-observed variance to be about 2.1 times the
15-df variance that is in hand.
I wish to spell out the consequences more fully.
This computation has made no assumptions about any
underlying differences or 'effects'. (That is one
thing that a 'power' calculation usually does; so this
is not a 'power' statement in that usual fashion.)
It is saying that if one wants 95% confidence about
a subsequent variance -- in the manner than one places
a confidence limit -- then one has will conclude that
the variance might be at least 2.1 times what was
observed for that 15 d.f. sample.
==WHEN DOES EXTRAPOLATION BECOME POWER?
Consider: a power statement is not unrelated to the
ideas of CIs. I'm not sure, but I think that Cohen's models
use the CI around the hypothesized, making use of the
'alternate' distribution, but that they are all CIs. Well,
if you don't want to call the following a 'power' exercise,
I don't really mind, but it seems to me that the relationship
is more a help than a harm.
If we extend the algebra in the example above so that
it refers to a SE, which depends on a new N, I think
it is effectively a power statement -- in some fashion.
The midpoint in the problem that Dennis describes is
the computation the SE. He gets fancier yet by
requiring use of the t-limits, with whatever-d.f.,
to set up the 'measurement error' derived from that SE.
==ACCEPTABLE ADVICE TO CLIENTS AND TO STUDENTS.
I think that it should be clear, once you see alternatives,
that a statistician should not tell his client, "If you
don't know a population standard deviation, collect any
small sample that you can and use THAT pilot value as your
best estimate" ( - a paraphrase of Final Comment #2).
A piloted value gives you a point-estimate. As I
mentioned before, that point-estimate of the future SD
has a power of about 50%. This stuff has not been
taught well, or even very often, so I won't be surprised
by seeing someone else take the point estimate as a
population estimate; they don't know what else to do.
If you want to end up making a statement, "You would
like to have 95% confidence that your SE is 0.022 gallons",
you have to make the statement about the replicating the
variance, and then shrinking the SE by increasing the N.
(So: that is *not* what you want to ask under-graduate
psychology students, yet.)
Dennis has pointed at a textbook, which I mention here:
The closest example that I find in Moore and McCabe is
on page 517, and even that one fails to discuss the difficulties
that those authors are dodging, too: The example has a small-
sample variance measured at 3; then there is something
like, "The experimenters have several years of data
available, by which they know that the population
variance is also about 3, so they can use 3.0 in their
power analysis." That was sufficient, for that example.
Evasion: What if you don't know the variance?
What happens in practice, with variances unknown? Well,
we try to convert a question into one that we can answer.
That means that we describe our effects as 'd' as
Cohen showed us -- a standardized effect -- rather than
as the more meaningful, natural unit.
We can use the pilot study to enhance our prior intuition
or knowledge, to get a maximum variance ("We used this
value, which we assume to be a safe maximum"), to plug into
a projection for CI or power analysis.
Still, I think more people would be willing to use the real
values but I now suspect that they have never heard of
this-all.
--
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================