On Sun, 9 Dec 2001, Ronny Richardson wrote in part:
> Bluman has a figure (2, page 333) that is supposed to show the student
> "When to Use the z or t Distribution." I have seen a similar figure in
> several different textbooks.
So have I, sometimes as a diagram or flow chart, sometimes in paragraph
or outline form.
> The figure is a logic diagram and the first question is "Is sigma
> known?" If the answer is yes, the diagram says to use z. I do not
> question this; however, I doubt that sigma is ever known in a business
> situation and I only have experience with business statistics books.
Depends partly on what parameter one is addressing (either as a
hypothesis test or as a confidence interval). For the mean of an unknown
empirical distribution, I expect you're right. But for the proportion of
persons in a population who would want to purchase (for a currently
topical example) a Segway, the population variance is a known function of
the proportion (the underlying distribution being, presumably, binomial),
and for this case the t distribution is simply inappropriate, and one
ought to use either the proper binomial distribution function, or else
the normal approximation to the binomial (perhaps after satisfying
oneself that N is sufficiently large for the approximation to be credible
with the hypothesized (or observed) value of the proportion; various
textbook authors offer assorted recipes for this purpose).
{ Snip, discourse on N >= 30, although I'd
think it were rather on df >= 30. }
> However, other authors go well beyond 30. Aczel (3, inside cover) has
> values for 29, 30, 40, 60, and 120, in addition to infinity. Levine
> (4, pages E7-E8) has values for 29-100 and then 110 and 112, along with
> infinity. I could go on, but you get the point. If you always switch
> to z at 30, then why have t tables that go above 28? Again, the
> infinity entry I understand, just not the others.
{ Snip, assorted quotes ... }
> So, Berenson seems to me to be saying that you always use t when you
> must estimate sigma using s. Levine (4, page 424) says roughly the
> same thing, ...
> So, I conclude {slightly edited -- DB}
> 1) we use z when we know the sigma and either the data are normally
> distributed or the sample size is greater than 30 so we can use the
> central limit theorem.
I would amend this to "the sample size is large enough that we can..."
Whether 30 is in fact large enough or not depends rather heavily on what
the true shape of the parent population actually is. (If it's roughly
symmetrical and bell-shaped, 30 may be O.K.)
> 2) When n<30 and the data are normally distributed, we use t.
> 3) When n is greater than 30 and we do not know sigma, we must estimate
> sigma using s so we really should be using t rather than z.
> Now, every single business statistics book I have examined, including
> the four referenced below, use z values when performing hypothesis
> testing or computing confidence intervals when n>30.
> Are they
> 1. Wrong
> 2. Just oversimplifying it without telling the reader
> or am I overlooking something?
I vote for both 1. and 2., since 2. is in my view a subset of 1, although
others may not share this opinion. I would add
3. Outdated.
on the grounds that when sigma is unknown, the proper distribution is t
(unless N is small and the parent population is screwy) regardless how
large the sample size may be. The main (if not the only) reason for the
apparent logical bifurcation at N = 30 or thereabouts was that, when
one's only sources of information about critical values were printed
tables, 30 lines was about what fit on one page (plus maybe a few extra
lines for 40, 60, 120 d.f.) and one could not (or at any rate did not)
expect one's business students to have convenient access to more
extensive tables of the t distribution. And, one suspects latterly,
authors were skeptical that students would pay attention to (or perhaps
be able to master?) the technique of interpolating by reciprocals between
30 df and larger numbers of df (particularly including infinity).
But currently, _I_ would not expect business students to carry out the
calculations for hypothesis tests, or confidence intervals, by hand,
except maybe half a dozen times in class for the good of their souls:
I'd expect them to learn to invoke a statistical package, or else
something like Excel that pretends to supply adequate statistical
routines. And for all the packages I know of, there is a built-in
function for calculating, or approximating, the cumulative distribution
of t for ANY number of df. The advice in any _current_ business-
statistics text ought to be, therefore, to use t _whenever_ sigma is not
known. And if the textbook isn't up to that standard, the instructor
jolly well should be.
{ Snip, references. See the original post for more details. }
-- DFB.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
184 Nashua Road, Bedford, NH 03110 603-471-7128
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================