"Dr. S. Shapiro" wrote:
> 
> Dear Colleagues,
> 
>      I am seeking a not-too-technical article written for non-statisticians

        OK, here goes.

(A)
        The arithmetic mean is the sum of all the data divided by N; or, for a
continuous distribution, the integral of (the density x the value)
divided by the integral of the density.
        It has various circumstances under which theory supports its use. Two
main ones are:

        (a)  If the sum of the data is of primary interest, regardless of
distribution. For instance, if you use the arithmetic mean of the
weights of 100 potatoes to estimate the total weight of a shipment of
1000 potatoes, you will have an unbiased estimate.  Similarly, the
arithmetic mean of a sample of residential assessments gives an unbiased
estimate of the tax base.

        (b) If a "least squares" error penalty is appropriate - that is, the
"badness of being wrong" is, for your purposes, proportional to the sums
of the squares of the errors. [This in turn can be justified by the idea
of "maximum likelihood" if the errors are assumed to have a normal
distribution, all with the same variance - and this is a realistic
approximation in many cases, though of course never *exactly* true.] 
It can be shown by elementary calculus that the arithmetic mean of the
data is the value A which minimizes

        (A-x1)^2 + (A-x2)^2 + ... + (A-xN)^2 

that is, the sum-of-squared-differences is less for the arithmetic mean
than for any other value. [The median performs the same function if your
error penalty is the sum of absolute values of errors; and the mode
arises if it's the number of not-exactly-right values.]

        (c) If the data appear to be from a symmetrically-distributed
population, the mean coincides [approximately for the data, exactyl for
the distribution] with the median, trimmed mean, and (if unique) the
mode, so you cover a lot of options by using it. [If the distribution is
heavy-tailed, or the data possibly contaminated with outliers, the medan
or trimmed mean may nonetheless be a better estimator, but it's the
*same* parameter, the "middle" of a symmetric distribution..


(B)

        The geometric mean is computed by taking the log of all the values,
computing the arithmetic mean, and then exponentiating. (For
mathematicians, it's the conjugate of the arithmetic mean under the log
transform;  note that the exponential transformation "undoes" the log
transform.)

        It can also be written as 

         N________________
        \/x1 x2 x3  ... xN

        but this hides its real identity as a transformation of the arithmetic
mean.

        
        Theory suggests its use when your real interest is in the product of
the data - for instance, estimating the effect of compounding
investments.

        It is also a good choice when a log transform symmetrizes the data.

        BTW: The median is not affected by transformations, so if the median is
an appropriate measure to use don't worry about transforming.
The median of the log(x_i) is the log of median (x_i). 
        If you *can* transform [that is, if your data are really numbers that
you can do arithmetic on sensibly, rather than shoe sizes or something -
the mode is probably useless.



(C)     The harmonic mean is computed by taking the reciprocal of all
the values, computing the arithmetic mean, and then taking the
reciprocal
again.  (For mathematicians, it's the conjugate of the arithmetic mean
under the reciprocal transformation.) It's probably good to keep in mind
that the final reciprocal is not there as a _repetition_ of the first
ones, but rather as the _undoing_ of them.

        It can also be written as

        N/(1/a1 + 1/a2 + 1/a3 + ... + 1/aN)

        but this hides its real identity as a transformation of the arithmetic
mean.

        Its main use is in geometry, not statistics, but it may be useful in
the uncommon event of a reciprocal transform symmetrizing the data.

(D) The root-mean-square is another specialized measure of location,
again conjugated fromthe arithmetic mean. It is widely used in physics
and engineering in statistical or semistatistical calculations, when the
measurement of real interest, the energy or power, is the square of the
easily-measurable observable, which may be voltage, current, electrical
field, etc.   This is the same philosophy as in the other cases.

        -Robert Dawson
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to