On Saturday 25 May 2002 22:19, you wrote:

> I noticed that v22.2 and v22.3 automatically do roundoff checking every
> iteration for any exponent close enough to the FFT limit.  Is there any
> reason to be concerned about the possibility of roundoff error for CPUs
> that aren't P4s?  

I don't think so. We are looking at the x87 (non-SSE2) code and may make some 
minor adjustments to the FFT run length crossover points, but there is a lot 
of "experimental evidence" relating to non-SSE2 code; the adjustments are 
probably as likely to be up as down.

Please remember that the crossover points are a compromise between wasting 
time by using an excessive FFT run length and wasting time due to runs 
failing (or needing extra checking) due to using a FFT run length which is 
really too short. There is no completely safe figure.

> What about if the non-P4s are only doing double checks?

This doesn't really matter. Double checks are independent of the first test. 
Don't assume that the first test was correct... if you make that assumption, 
what's the point in running a double-check at all?

> Since numbers of double checking size have been checked by non-P4s for
> years without any problems that I've heard about.

The point is, if you do get an excess roundoff error that makes the run go 
bad, the double-check (when it is eventually done) will fail, and the 
exponent will have to be tested again. There is essentially no possibility of 
the project missing a prime as a consequence of this. However, if you can 
detect the likelihood of there being excess roundoff errors at the time 
they're occurring, you can save time which would be wasted if you continue a 
run which has already gone wrong. This also virtually eliminates the 
possibility of you, personally, missing a prime due to a crossover being too 
aggressive and therefore falling victim to an undetected excess roundoff 
error.

We simply don't know if there are extra problems occurring very close to the 
existing non-SSE2 crossover points as any "genuine" errors caused by the 
crossover points being too aggressive are overwhelmed by errors caused by 
"random" hardware/software glitches. However it has become apparent that the 
SSE2 crossover points were initially set too aggressively. We do have one 
documented instance of where a roundoff error of 0.59375 occurred (aliased to 
0.40625, therefore causing a run to go bad) without there being any other 
instances of roundoff errors between 0.40625 & 0.5. This is probably a very, 
very rare event, but the fact that it has happened at all has made us more 
wary.

v22.3 has a new error checking method which will _correct_ any run which is 
going wrong by running the iteration where the excess roundoff error occurs 
in a slow but safe mode. This of course depends on the excess roundoff error 
being detected. If you have roundoff error checking disabled then you miss 
the chance 127 times out of 128.

The roundoff error rises very rapidly with the exponent size - somewhere 
round about the 25th power. This is why it's only worthwhile having roundoff 
error checking every iteration in the top 0.5% or so of the exponent range 
for any particular run length - that 0.5% makes a lot more than 10% 
difference to the expected maximum roundoff error.

Why not just set the crossovers lower? Well, this would work, but running 
with roundoff checking enabled is faster than running with the next bigger 
FFT run length but without roundoff checking.

Another consequence of having roundoff error checking enabled is that random 
hardware glitches (or software glitches due to misbehaviour by device drivers 
etc. unrelated to Prime95) will be detected much more consistently.

> Very specifically, I'm
> wondering if I should be ok if I use the "undocumented" setting in
> prime.ini to turn off roundoff checking every iteration for when my Pentium
> 200 MHz double checks 6502049 ( the next FFT size is at 6520000 ).  Thanks.

Up to you. My feeling is that the new default behaviour is right. However 
per-iteration roundoff checking probably causes more of a performance hit on 
Pentium architecture than on PPro or P4 due to the relative shortage of 
registers.

Another point here, if people using v22.3+ leave the default behaviour, we 
will get a lot better evidence as to the actual behaviour in the critical 
region just below the run length crossovers; we will be able to feed this 
back in the form of revised crossovers and/or auto roundoff error check range 
limit. 

QA work should prevent gross errors, but the amount of data which QA 
volunteers can process is small compared to the total throughput of the 
project. We should have avoided the problems with the aggressive SSE2 
crossovers, but QA volunteers didn't have P4 systems at the time the code was 
introduced.

Regards
Brian Beesley

_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to