RE: Mersenne: Multiple residues - enhancing double-checking

Ken Kriesel Thu, 5 Aug 1999 00:29:00 -0700
At 03:01 PM 1999/08/04 -0600, "Aaron Blosser" <[EMAIL PROTECTED]> wrote:
>I think we could keep it simple by just saving every x% iteration's residue
>(in truncated form).  Using a WAG of saving it every 10% along the way,
>you'd only have 9 partial and 1 full residue when all is said and done.
>
>So for an exponent like 8027219, you'd save the partial residue at the 10%,
>or 802722th iteration (rounding up or down as normal).  Of course, the
>number of iterations varies just slightly from the exponent, but
>whatever...you get the idea.
>
>These partial residues could be sent (for the first LL test) during the
>check-ins, or saved up and all sent at once when the test is done.

The 2^n least significant bits of the residue make a good check value, 
since once the modulo operation begins to have an effect, the lsb's 
quickly change.

It is essential to have agreement among different programs, on exactly 
which iterations' residue LSB's are to be saved.  This encompasses both
the save interval in iterations, and agreeing on whether the starting
value of 4 is iteration 0, 1, 2, etc., as well as agreeing on the starting
value, which can be other values besides 4 according to the math.  
It is more efficient but not necessary to have agreement on how many bits 
to save.  Even n=4, 2 bytes, gives a fairly good error detection rate, but
n=6, 8 bytes is much better!

I suggest that the number of iterations between residue saves be made
a function of the FFT length, since those scale with the runtime for
a given cpu.  As we move to exponents that require months even on fast
machines, we will see an increase in the rate of erroneous results per
exponent on the same hardware and same small error rate per execution hour.
We will eventually reach exponents with runtimes long enough that the
system hardware reliability drops significantly before an exponent
completes, due to hardware aging!

My feeling is that a P-90-month is about the right interval at which
to kick out intermediate residues of 64-bits or so.

In a case where two runs are taken to completion, for a large exponent,
such that dozens of interim residues are recorded for the first LLtest
and for the second, the third tie-breaker need only be run to the point
where the first two diverge and the third agrees with one but not the
other.  On average this will save about half the third-test, which now
we need about 1% of the time, but in a few years we may need much more 
often.  By shortening the third-test duration before we can make a comparison,
it halves the probability of the third-test being in error and so lessens
the amount of time spent on fourth-tests.  This saving will also become
more significant at higher exponents.

Total runtimes are going up drastically.  When I joined GIMPS in mid 1996,
runtimes per exponent were 12 hours (572k on a P-120).  Soon we will have 
the capability to run exponents requiring more than 12 months on PIII-500's.

>> Might the v.17 problem have been trapped with something like
>> this?  I do not
>> recall enough of the discussion to know and the ensuing belly-aching
>> overshadowed the real content of finding/fixing/reworking.  (I know I am
>> never going to rise high on the list, so I do not worry a whole lot about
>> how much my report shows.)

There is another trick for dealing with the v17 shift bug, which I've 
suggested to George Woltman.  Before the modulo kicks in, the rightmost 
2n+2 bits should follow a set pattern that is calculatable for n< log(2)p
or so.  Math wizards, please be kind re errors in what follows.

In hexadecimal, the first several iterations for any value P of current
interest follows a predictable pattern.  The pattern persists until
the result of an iteration grows larger than Mp, and so is independent
of P for a time.
L(n+1)=Ln^2-2; (using the convention here that we iterate from 2 to n)
n      hex Ln       binary Ln
2          4
3          E                                          
4         C2                                      1100 0010
5       9302                            1001 0011 0000 0010
6  5468 4C02        0101 0100 0110 1000 0100 1100 0000 0010
7  1BD6 96D9 F03D 3002         ... 1101 0011 0000 0000 0010

least significant 64-bits only below:
8  8cc8 8407 a9f4 c002
9  5559 9f9d 37d3 0002
10 f460 d65d df4c 0002
11 e180 d807 7d30 0002
12 9bdb 491d f4c0 0002
13 4ceb b477 d300 0002

etc for all n I found to check, where Mp> Ln, & Ln<=2^(2^(n-1)); 
Mp>2^(2^(n-1))
2^p>2^(2^(n-1))+1
p>2^(n-1)
log(2)p+1>n
 log(2)33,219,281=24.985>n
 log(2)5,000,000=22.25>n
(In practice I've found the limits at low n occur at slightly lower p
than expected.)

That is, the rightmost 2(n+1) bits begin with binary 0011, then in the middle
there are 2(n-2) zero bits, then a 1 and another 0.

For exponents 16,777,215<p~33,000,000, the max n for this to hold is 
only around 24, so it is only useful as a check of the first iterations.  
But that may be sufficient to catch programming errors like occurred in 
V17's shift count bug.  It seems to me that this check would be extremely 
fast, since the bit lengths are short and given by formula.
The test only need be done once per LLtest, before the iteration whose
input squared would exceed Mp.  The number of bits that should match
is respectable for exponents of current interest; over 40 bits.
If the shifting of these check values to the shift count of the LLtest 
was done by a separate algorithm, those bits should still match.
It presents an opportunity for defensive programming.

It does not help with detecting mildly inadequate FFT lengths, however,
since according to George that requires more randomized bit patterns.
It's possible if the check pattern is determined with entirely integer
instructions, and compared to the result obtained using the FFT in the FPU,
that it might smoke out a pattern-sensitive floating multiply or other error
in the underlying hardware, akin to the early-Pentium FDIV bug.


A part of the QA effort is to run limited numbers of iterations on
many exponents, on more than one chip architecture, OS, and program,
and compare the 64-bit residues.  For example, run the first 400 iterations
of an exponent in the millions, on prime95 v19 and something like 
maclucasunix, and compare the results for a match, for many different 
exponents.  Since the convolution error is larger for the larger exponents
in an FFT runlength, these limit exponents will figure prominently.

A select set of full LLtests of prime95 v19 (prerelease version) against 
already-double-checked results will also be made before release; these
will be limited by George's tentative release date to smaller exponents
than the primenet server is currently issuing for first LLtest.


Ken

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
RE: Mersenne: Multiple residues - enhancing double-checking

Reply via email to