Bojan Antonovic writes:
> > (Somebody else wrote)
> >
> > Bah! 128 bit floats are quite useful. Think about percision.
> > There are segments within the 64 bit range where you get terrible decmil
> > precision. (such as with any large number) This becomes a problem with
> > scaling large numbers to small, working math, and then rescaling them to
> > large numbers. I have seen this occur with many programs. 128 bit
> > floats would be very nice for large numbers with decmils.
>
> True and not true. To avoid to have very high precision there are reules how to
> compute results of functions. For example if you want to add a serie of numers
> first begin with the smallest and then raise up to the highest.
Suppose we regard the original inputs to a computation
as being exact, and a long (floating point) computation is
run with d significant bits in each mantissa, with no exponent
underflow or overflow.
As d varies, the number of significant output bits is
typically d - c, where the constant c depends upon the depth of
the computation (and how carefully the code avoids round-off errors).
For example, a computation x^16 = (((x^2)^2)^2)^2 has four squarings.
The first squaring loses about 0.5 bit of significance.
The second squaring doubles the old error (to 1.0 bit)
and adds some round-off error of its own,
for a net error between 0.5 and 1.5 bits (average 1 bit).
Two more squarings raise the average error to 4 bits.
How does this relate to Mersenne?
Let's assume that the FFT computation loses 12 bits
of significance for FFT lengths in the range of interest.
Now consider an LL test on Mp where p ~= 5 million:
Mantissa Output Radix FFT
bits bits length
53 41 2^11 524288 (Alpha)
64 52 2^16 327680 (Pentium)
110 98 2^40 131072 (Proposed)
The last two columns five a possible FFT radix and length.
The radix is 2^r where
2r + log_2(FFT length) <= (output bits)
r * (FFT length) >= p ~= 5 million
Going from 53 mantissa bits (in a 64-bit word)
to 110 mantissa bits (in a 128-bit word), we have reduced
the required FFT length by a factor of 4.
That means approximately 25% as floating point operations
are needed for the FFT. If our hardware can do 128-bit
floating point operations in twice the time needed for similar 64-bit
operations, then our overall time has improved by a factor of 2.