After going home last night, I realized there is one flaw in my
analysis of yesterday: The convolution-size-based portion of the
roundoff error, i.e. the leftmost term in
> 2*log_2(radix) + C*[log_2 N + log_2 log_2 N] < it (20.6.E)
must also have a constant multiplying it, i.e. the equation should be
A*log_2(radix) + C*[log_2 N + log_2 log_2 N] < it. (20.6.E')
The two constants A and C should be deducible from actual behavior
of the algorithm, by doing a nonlinear least-squares fit of (20.6.E')
to the empirical (radix vs. N) datasets, assuming that (max. radix) for
any N satisfies (20.6.E') with equality replacing the inequality:
A*log_2(max. radix) + C*[log_2 N + log_2 log_2 N] = it. (20.6.E'')
Also note that A and C will depend on the digit representation used
(balanced-digit form vs. nonnegative-digit), the floating-point accuracy
of the hardware (including the rounding mode used), and the details of
the algorithmic implementation. To ease the algebra, let's simplify
the notation: let y = log_2(max. radix), z = log_2 N, so we have the
weakly nonlinear relation
A*y = it - C*[z + log_2 z]. (20.6.E''')
Further letting x = [z + log_2 z] yields a linear relation:
y = a0 + a1*x, where a0 = it/A, a1 = - C/A.
I have decently accurate (based on empirical testing) values of (max. radix)
vs. N for my Mlucas code (rightmost column based on analysis below):
z = log_2 N x = [z + log_2 z] y = log_2(radix) y from LS fit
10 13.322 21.97 22.00
11 14.459 21.73 21.71
12 15.585 21.48 21.43
13 16.700 21.15 21.14
14 17.807 20.90 20.86
15 18.907 20.45 20.58
16 20.000 20.29 20.30
17 21.087 19.99 20.02
18 22.170 19.84 19.74
19 23.248 19.45 19.47
Plotting these yields something very linear-plus-scatter looking, so
our relation (20.6.E'') looks like it indeed fits the observed data well.
For the least-squares fit, we need the following arithmetic averages,
where <f> := sum_{i=1...n}(f_i)/n :
<x> = 18.3285, <y> = 20.725, <x*y> = 377.30, <x^2> = 345.95, <y^2> = 430.18.
The slope (a1) and y-intercept (a0) of our best-fit line are given by
<x*y> - <x>*<y> <y>*<x^2> - <x>*<x*y>
a1 = --------------- = -.2554, a0 = --------------------- = <y>-a1*<x> = 25.406.
<x^2> - <x>^2 <x^2> - <x>^2
These yield constants A = 53/a0 = 2.086, C = -a1*A = 0.5328 in (20.6.E'),
so both assumed constants are positive and order of unity, as expected.
We also need the following quantities (I forget what their names are):
S_r = n*[(<y^2> - <y>^2) -a1*(<x*y> - <x>*<y>)] = .01006585
S_t = n*(<y^2>-<y>^2) = 6.54375
The standard error (sigma) and correlation coefficient (r) are
sigma = sqrt(S_r/(n-2)) = .0354716, r = sqrt((S_t-S_r)/S_t) = .9992306,
and the fact that r is very close to unity indicates a very good correlation.
For George's breakpoint data at http://www.mersenne.org/status.htm,
using that log_2(radix) = (max. p)/N:
N z=log_2N x = [z+log_2 z] pmax y=log_2(radix) y (LS) pmax(LS)
98304 16.585 20.637 2000000 20.345 20.377 2003128
114688 16.807 20.878 2330000 20.316 20.321 2330605
131072 17.000 21.087 2655000 20.256 20.273 2657228
163840 17.322 21.436 3310000 20.203 20.193 3308342
196608 17.585 21.721 3960000 20.142 20.127 3957082
229376 17.807 21.962 4605000 20.076 20.071 4603841
262144 18.000 22.170 5260000 20.065* 20.023 5248952
327680 18.322 22.517 6540000 19.958 19.943 6534954
393216 18.585 22.801 7820000 19.887 19.878 7816179
458752 18.807 23.041 9090000 19.815 19.822 9093472
524288 19.000 23.248 10380000 19.798* 19.774 10367499
655360 19.322 23.594 12895000 19.676 19.695 12907054
786432 19.585 23.877 15400000 19.582 19.629 15437114
917504 19.807 24.115 17945000 19.558 19.574 17959583
1048576 20.000 24.322 20500000 19.550* 19.527 20475156
These give a best-fit line slope and intercept of
a1 = -.23073, a0 = 25.13845, (corresponding to A = , C = )
where the smaller (in magnitude) slope reflects the slightly greater numerical
precision Prime95 enjoys over its non-x86 counterparts. The standard error and
correlation coefficient are
sigma = .0244426, r = .9964081.
Thus, the Prime95 data are slightly less well-correlated (when fitted to the
assumed function) than those for Mlucas. This may reflect a few slightly
outlying points, indicated by asterisks to the right of their y-value in
the above table. Note that all the outliers correspond to power-of-two FFT
lengths. It may be that Prime95 enjoys slightly better accuracy when the
FFT length is a power of two, although it's not clear to me why this should
be the case (I don't see this behavior with my code, but it's not been used
as much, either). Perhaps careful examination of the radix-3, 5 and 7 front
end routines (George's terminology) might turn up some sequences of floating-
point operations that tend to suffer larger roundoff errors than those found in
power-of-2 routines (where the symmetry of the operations makes instruction
scheduling somewhat easier).
FURTHER REFINEMENTS: Richard Crandall writes, regarding my convolution
error estimate:
>For X steps of an error-accumulation walk, the estimate
>is yet worse for the *maximum* deviation. I assume
>the maximum is in fact the entity of interest, because
>of the nature of the convolution problem. The known statistic
>is that for X steps the maxmimum random walk deviation is
>
> Sqrt[X] Log X
>
>with probability one, in the sense of the Fisher theorem.
>Because of the Log X factor, the need for "a few" multiples
>of Sqrt[number of operations] has the "few" growing
>out of control.
>
>I believe a practical factor is say
>
> (1 + B Log[x]) Sqrt[x]
>
>for some empirical constant B. I have not tested this,
>but believe it so.
>
>There might actually be yet another factor of Log Log X,
>I would have to check. At any rate, the maximum
>convolution errors should grow faster than just the Sqrt[.].
This would imply that my A*log_2(radix) term in (20.6.E') should
in fact include a log_2 N (and perhaps also a log_2 log_2 N) term.
I leave it as an exercise to the reader as to whether this leads to
a better correlation with the observed data.
-Ernst
________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm