On Sun, 9 Dec 2001, Ronny Richardson wrote in part:

> Bluman has a figure (2, page 333) that is supposed to show the student
> "When to Use the z or t Distribution."  I have seen a similar figure in
> several different textbooks. 

So have I, sometimes as a diagram or flow chart, sometimes in paragraph 
or outline form.

> The figure is a logic diagram and the first question is "Is sigma
> known?" If the answer is yes, the diagram says to use z. I do not 
> question this;  however, I doubt that sigma is ever known in a business 
> situation and I only have experience with business statistics books. 

Depends partly on what parameter one is addressing (either as a 
hypothesis test or as a confidence interval).  For the mean of an unknown 
empirical distribution, I expect you're right.  But for the proportion of 
persons in a population who would want to purchase (for a currently 
topical example) a Segway, the population variance is a known function of 
the proportion (the underlying distribution being, presumably, binomial), 
and for this case the t distribution is simply inappropriate, and one 
ought to use either the proper binomial distribution function, or else 
the normal approximation to the binomial (perhaps after satisfying 
oneself that N is sufficiently large for the approximation to be credible 
with the hypothesized (or observed) value of the proportion;  various 
textbook authors offer assorted recipes for this purpose).

        {  Snip, discourse on N >= 30, although I'd 
           think it were rather on  df >= 30.  }

> However, other authors go well beyond 30.  Aczel (3, inside cover) has
> values for 29, 30, 40, 60, and 120, in addition to infinity.  Levine 
> (4, pages E7-E8) has values for 29-100 and then 110 and 112, along with 
> infinity.  I could go on, but you get the point.  If you always switch 
> to z at 30, then why have t tables that go above 28?  Again, the 
> infinity entry I understand, just not the others. 

        {  Snip, assorted quotes ...  }

> So, Berenson seems to me to be saying that you always use t when you
> must estimate sigma using s.  Levine (4, page 424) says roughly the 
> same thing, ...

> So, I conclude  {slightly edited -- DB}

> 1) we use z when we know the sigma and either the data are normally
> distributed or the sample size is greater than 30 so we can use the
> central limit theorem. 

I would amend this to "the sample size is large enough that we can..." 
Whether 30 is in fact large enough or not depends rather heavily on what 
the true shape of the parent population actually is.  (If it's roughly 
symmetrical and bell-shaped, 30 may be O.K.)

> 2) When n<30 and the data are normally distributed, we use t. 

> 3) When n is greater than 30 and we do not know sigma, we must estimate 
> sigma using s so we really should be using t rather than z. 

> Now, every single business statistics book I have examined, including 
> the four referenced below, use z values when performing hypothesis 
> testing or computing confidence intervals when n>30. 

> Are they 

> 1. Wrong 
> 2. Just oversimplifying it without telling the reader 

> or am I overlooking something? 

I vote for both 1. and 2., since 2. is in my view a subset of 1, although 
others may not share this opinion.  I would add 

  3.  Outdated.

on the grounds that when sigma is unknown, the proper distribution is t 
(unless N is small and the parent population is screwy) regardless how 
large the sample size may be.  The main (if not the only) reason for the 
apparent logical bifurcation at N = 30 or thereabouts was that, when 
one's only sources of information about critical values were printed 
tables, 30 lines was about what fit on one page (plus maybe a few extra 
lines for 40, 60, 120 d.f.) and one could not (or at any rate did not) 
expect one's business students to have convenient access to more 
extensive tables of the t distribution.  And, one suspects latterly, 
authors were skeptical that students would pay attention to (or perhaps 
be able to master?) the technique of interpolating by reciprocals between 
30 df and larger numbers of df (particularly including infinity). 

But currently, _I_ would not expect business students to carry out the 
calculations for hypothesis tests, or confidence intervals, by hand, 
except maybe half a dozen times in class for the good of their souls:  
I'd expect them to learn to invoke a statistical package, or else 
something like Excel that pretends to supply adequate statistical 
routines.  And for all the packages I know of, there is a built-in 
function for calculating, or approximating, the cumulative distribution 
of t for ANY number of df.  The advice in any _current_ business-
statistics text ought to be, therefore, to use t _whenever_ sigma is not 
known.  And if the textbook isn't up to that standard, the instructor 
jolly well should be.

        {  Snip, references.  See the original post for more details.  }

                        -- DFB.
 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110                          603-471-7128



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to