"Ronny Richardson" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...

> A few weeks ago, I posted a message about when to use t and when to use z.

I did not see the earlier postings, so forgive me if I repeat advice already
given.:-)

    1. The consequences of using the t distribution instead of the normal
distribution for sample sizes greater than 30 are of no importance in
practice. The difference in the numbers given as confidence limits are so
small that no sensible person would change their course of action based on
that miniscule variation. In the case of a significance test a result just
over or just under, say, the 5% level should always be examined in the
knowledge that the 5% is an arbitrary level and that a level of 4.9%  or
5.1%  could equally well have been chosen.

    2. There is no good reason for statistical tables for use in practical
analysis of data to give figures for t on numbers of degrees of freedom over
30 except that it makes it simple to routinely use one set of tables when
the variance is estimated from the sample.
Another reason that books of tables do not include t values for degrees of
freedom between 30,60,sometimes 120 and infinity is that there is no
need,even for the extreme tails of the distribution and when ,for whatever
reason, high accuracy is required, because the intermediate values can be
obtained by harmonic interpolation. That is, the tail entries in the
distribution can be  obtained by linear interpolation on 1/n.

    3. There are situations where the error variance is known. They
generally arise when the errors in the data arise from the use of a
measuring instrument with known accuracy or when the figures available are
known to be truncated to a certain number of decimal places. For example:
    Several drivers use cars in a car pool. The distance tavelled on each
trip by a driver is recorded, based on the odometer reading. Each
observation has an error which is uniformly distributed in (0,0.2). The
variance of this error is (0.2)^2)/12  = .003333  and standard deviation
0.0578  . To calculate confidence limits for the average distance travelled
by each driver, the z statistic should be used.

    A similar situation could arise in dealing with data in which the error
arises from the rounding of all numbers to the nearest thousand.

       This is an uncommon situation in a business context, but it arises
quite often in scientific work where the inherent accuracy of a measuring
instrument may be known from long experience and need not be estimated from
the small sample currently being examined.

    4. You seem to think the Central Limit Theorem is behind the validity of
t vs z tables. This is not so. The CLT only bears on the Normal shape and
the relation of the variance of an average or sum to the population
variance.

        Commenting specifically on points in your posting:

"Ronny Richardson" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...

> A few weeks ago, I posted a message about when to use t and when to use z.
        (snip)
> So, I conclude 1) we use z when we know the sigma and either the data is
> normally distributed or the sample size is greater than 30

   Yes, but the difference if you use t is tiny and of no importance.

>so we can use the central limit theorem.

        No. The CLT is not the reason. The CLT ensures that the average and
sum are Normally distributed for large enough n. Unless the data is very
skewed or bimodal, n=5 is usually large enough in practice. This is a
separate issue to the choice of Normal or t distribution for inference.
>
> 2) When n<30 and the data is normally distributed, we use t.
>
> 3) When n is greater than 30 and we do not know sigma, we must estimate
> sigma using s so we really should be using t rather than z.

        but the difference in the resulting numbers is miniscule and of no
importance.
>
> Now, every single business statistics book I have examined, including the
> four referenced below, use z values when performing hypothesis testing or
> computing confidence intervals when n>30.
>
> Are they
>
> 1. Wrong
> 2. Just oversimplifying it without telling the reader
>
> or am I overlooking something?
>
> Ronny Richardson
>
        I hope that helps
            Jim Snow




=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to