James Ankeny wrote:
>
> Hello,
> I am currently taking a first course in statistics, and I was hoping that
> perhaps someone might be kind enough to answer a question for me. I
> understand that, while a quantitative variable may not be normally
> distributed, we may calculate the mean of the sample, and use facts about
> the Central Limit Theorem, to form a 95% confidence interval for the
> population mean. As far as I know, this means that in 95/100 samples, the
> interval will contain the true population mean. This seems very useful at
> first, but then something begins to confuse me. Yes, we have an interval
> that may contain the true population mean, but ... if the distribution is
> heavily skewed to the right, say like income, why do we want an interval for
> the population mean, when we are taught that the median is a better measure
> of central tendency for skewed distributions?
Excellent question. The answer is, we often don't; and we can instead
transform (say by logs) and get an interval for a transformed mean (here
the geometric mean.) Or we can use a sign interval (nonparametric) and
get an interval estimate for the median.
. There is just one more
> thing I would like to get off my chest. My textbook talks about simple
> random sampling, where you can specify the probability of a sample being
> selected from the population. Yet, there are examples in the book which deal
> with conceptual populations, such as the set of all cars of a particular
> model which may be manufactured in the future. Suppose you have a sample of
> several of these autos, and you want to find a 95% confidence interval for
> mean miles/gallon. How is this an SRS when you can't specify the probability
> of a sample being selected, because the population is conceptual?
It isn't one, but it acts in much the same way.
In practice, the SRS model and real samples have practically no
intersection. One or more of of the following almost always happens:
*Your population is theoretical (eg, testing to see whether whatever
difference may exist between the lifetimes of American Presidents and
British monarchs are statistically significant, using *all* the ages as
a pseudosample)
*Your population is somewhat vaguely defined
*Your population is not all available for listing or samping (a
"random" sample of wild squirrels in Maine)
*You want to make inferences about all human beings based on
the Psych 1000 subject pool, or any other "convenience" group.
Fortunately, you *may* be able to draw valid conclusions even in these
circumstances if you use common sense, though the statistics will be
dubious.
-Robert Dawson
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================