Re: Desperately looking for statistics handbook.

Donald Burrill Thu, 26 Feb 2004 22:15:11 -0800

Two replies to Robert's question:
 (1)  There was a thread on edstat a year or two ago that included
several versions of an updating algorithm that was rather elegant.
That was before my then ISP went bankrupt and all my files (including
all my e-mail files) vanished into cyberspace, so I can't look them up
so conveniently.  But it might be worth browsing the edstat archive.
 (2)  You may care to consult my dissertation:  "Computer-generated
errors in statistical analysis" (Cornell, 1969), the core of which was
published in the first issue of J. Statistical Computing and Simulation.
Thumbnail result:  number of digits of precision lost in using the usual
"computing formula" = log(k^2) approximately, where k = ratio of mean to
standard deviation in the data.  An implication of that result is that
if 0 lies in the range of the data, loss in precision will not
ordinarily exceed one decimal digit (k = 3 is fairly extreme, even when
0 is at one end of the set of data values, log(9) < 1 in base 10;  and I
conjectured that k could not in any case exceed 5 (log(25) < 1.5, base
10) -- I think there's a theorem there, but never developed it).
 Further implication:  if one stores the first (non-missing) value of
each variable in a vector, subtracts that vector from each subsequent
observation, and uses those differences as one's data (so that zero is a
fortiori within the range of values), the loss in precision due to
rounding error in the computational formula is at most one decimal
digit.
  An advantage of this scheme is that the data supplied to the algorithm
are always *without rounding error* (unless the original data required
more precision than that supplied by the floating-point hardware), which
is not always true of data from an updating-algorithm approach (whose
precision depends on the precision of the current temporary mean).
Which also means, in practice, that the data thus supplied seldom
require more than three decimal digits to express, and if stored in
ASCII form (at, in those days, six characters per 36-bit word or eight
characters per 48-bit word in CDC machines) took up notably less storage
space than if stored in floating-point form, one value per word.  (In
the late 1960s, memory space was a rather more important consideration
than it now is.)  But I always considered the more important implication
to be the absence of *any* rounding error in the data supplied to the
algorithm.


On Thu, 26 Feb 2004, Robert J. MacG. Dawson wrote:

>       The thing usually presented as a "computational formula" for
> variance does NOT avoid rounding error, but on the contrary is known
> to be a far worse offender in this regard than the conceptually simple
> debiased-mean-square-residual formula.  In particular, in the
> nightmare scenario, it can yield a negative value for variance,
> causing the program to crash when standard deviation is computed.
>
>       It is used (when it is) because it reduces (slightly) the number
> of operations and (significantly) the space complexity and number of
> memory calls.
>
>       I seem to recall from my long-ago exposure to numerical analysis
> that there are formulae that work significantly better than the
> "standard" ones in terms of rounding error, but I've never seen them
> in a stats text. Anbody know a good reference?
>
>       -Robert Dawson

 ------------------------------------------------------------
 Donald F. Burrill                              [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110      (603) 626-0816
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Desperately looking for statistics handbook.

Reply via email to