After going home last night, I realized there is one flaw in my
analysis of yesterday: The convolution-size-based portion of the
roundoff error, i.e. the leftmost term in

>       2*log_2(radix) + C*[log_2 N + log_2 log_2 N] < it (20.6.E)

must also have a constant multiplying it, i.e. the equation should be

        A*log_2(radix) + C*[log_2 N + log_2 log_2 N] < it. (20.6.E')

The two constants A and C should be deducible from actual behavior
of the algorithm, by doing a nonlinear least-squares fit of (20.6.E')
to the empirical (radix vs. N) datasets, assuming that (max. radix) for
any N satisfies (20.6.E') with equality replacing the inequality:

        A*log_2(max. radix) + C*[log_2 N + log_2 log_2 N] = it. (20.6.E'')

Also note that A and C will depend on the digit representation used
(balanced-digit form vs. nonnegative-digit), the floating-point accuracy
of the hardware (including the rounding mode used), and the details of
the algorithmic implementation. To ease the algebra, let's simplify
the notation: let y = log_2(max. radix), z = log_2 N, so we have the
weakly nonlinear relation

        A*y = it - C*[z + log_2 z]. (20.6.E''')

Further letting x = [z + log_2 z] yields a linear relation:

        y = a0 + a1*x, where a0 = it/A, a1 = - C/A.

I have decently accurate (based on empirical testing) values of (max. radix)
vs. N for my Mlucas code (rightmost column based on analysis below):

z = log_2 N     x = [z + log_2 z]       y = log_2(radix)        y from LS fit
10              13.322                  21.97                   22.00
11              14.459                  21.73                   21.71
12              15.585                  21.48                   21.43
13              16.700                  21.15                   21.14
14              17.807                  20.90                   20.86
15              18.907                  20.45                   20.58
16              20.000                  20.29                   20.30
17              21.087                  19.99                   20.02
18              22.170                  19.84                   19.74
19              23.248                  19.45                   19.47

Plotting these yields something very linear-plus-scatter looking, so
our relation (20.6.E'') looks like it indeed fits the observed data well.
For the least-squares fit, we need the following arithmetic averages,
where <f> := sum_{i=1...n}(f_i)/n :

<x> = 18.3285, <y> = 20.725, <x*y> = 377.30, <x^2> = 345.95, <y^2> = 430.18.

The slope (a1) and y-intercept (a0) of our best-fit line are given by

     <x*y> - <x>*<y>                <y>*<x^2> - <x>*<x*y>
a1 = --------------- = -.2554, a0 = --------------------- = <y>-a1*<x> = 25.406.
      <x^2> - <x>^2                     <x^2> - <x>^2

These yield constants A = 53/a0 = 2.086, C = -a1*A = 0.5328 in (20.6.E'),
so both assumed constants are positive and order of unity, as expected.

We also need the following quantities (I forget what their names are):

S_r = n*[(<y^2> - <y>^2) -a1*(<x*y> - <x>*<y>)] = .01006585

S_t = n*(<y^2>-<y>^2) = 6.54375

The standard error (sigma) and correlation coefficient (r) are

sigma = sqrt(S_r/(n-2)) = .0354716,  r = sqrt((S_t-S_r)/S_t) = .9992306,

and the fact that r is very close to unity indicates a very good correlation.

For George's breakpoint data at http://www.mersenne.org/status.htm,
using that log_2(radix) = (max. p)/N:

N       z=log_2N x = [z+log_2 z]   pmax     y=log_2(radix)      y (LS)  pmax(LS)
98304   16.585  20.637          2000000         20.345          20.377  2003128
114688  16.807  20.878          2330000         20.316          20.321  2330605
131072  17.000  21.087          2655000         20.256          20.273  2657228
163840  17.322  21.436          3310000         20.203          20.193  3308342
196608  17.585  21.721          3960000         20.142          20.127  3957082
229376  17.807  21.962          4605000         20.076          20.071  4603841
262144  18.000  22.170          5260000         20.065*         20.023  5248952
327680  18.322  22.517          6540000         19.958          19.943  6534954
393216  18.585  22.801          7820000         19.887          19.878  7816179
458752  18.807  23.041          9090000         19.815          19.822  9093472
524288  19.000  23.248         10380000         19.798*         19.774  10367499
655360  19.322  23.594         12895000         19.676          19.695  12907054
786432  19.585  23.877         15400000         19.582          19.629  15437114
917504  19.807  24.115         17945000         19.558          19.574  17959583
1048576 20.000  24.322         20500000         19.550*         19.527  20475156

These give a best-fit line slope and intercept of

a1 = -.23073, a0 = 25.13845,  (corresponding to A = , C = )

where the smaller (in magnitude) slope reflects the slightly greater numerical
precision Prime95 enjoys over its non-x86 counterparts. The standard error and
correlation coefficient are

sigma = .0244426,  r = .9964081.

Thus, the Prime95 data are slightly less well-correlated (when fitted to the
assumed function) than those for Mlucas. This may reflect a few slightly
outlying points, indicated by asterisks to the right of their y-value in
the above table. Note that all the outliers correspond to power-of-two FFT
lengths. It may be that Prime95 enjoys slightly better accuracy when the
FFT length is a power of two, although it's not clear to me why this should
be the case (I don't see this behavior with my code, but it's not been used
as much, either). Perhaps careful examination of the radix-3, 5 and 7 front
end routines (George's terminology) might turn up some sequences of floating-
point operations that tend to suffer larger roundoff errors than those found in
power-of-2 routines (where the symmetry of the operations makes instruction
scheduling somewhat easier).

FURTHER REFINEMENTS: Richard Crandall writes, regarding my convolution
error estimate:

>For X steps of an error-accumulation walk, the estimate
>is yet worse for the *maximum* deviation.  I assume
>the maximum is in fact the entity of interest, because
>of the nature of the convolution problem.  The known statistic
>is that for X steps the maxmimum random walk deviation is
>
>     Sqrt[X] Log X
>
>with probability one, in the sense of the Fisher theorem.
>Because of the Log X factor, the need for "a few" multiples
>of Sqrt[number of operations] has the "few" growing
>out of control.
>
>I believe a practical factor is say
>
>   (1 + B Log[x]) Sqrt[x]
>
>for some empirical constant B.  I have not tested this,
>but believe it so.
>
>There might actually be yet another factor of Log Log X,
>I would have to check.  At any rate, the maximum
>convolution errors should grow faster than just the Sqrt[.].

This would imply that my A*log_2(radix) term in (20.6.E') should
in fact include a log_2 N (and perhaps also a log_2 log_2 N) term.
I leave it as an exercise to the reader as to whether this leads to
a better correlation with the observed data.

-Ernst
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

Reply via email to