Re: Mersenne: Overclocking - bad for project?

Brian J. Beesley Sat, 23 Dec 2000 23:37:20 -0800
On 23 Dec 00, at 10:31, Gareth Randall wrote:

> > My opinion is that it's better to have fewer correct results than to
> > have
> the central database poisoned by loads of "don't think it's prime, but the
> user was overclocking" results, which of course cannot be distinguished
> from perfect answers. I'd trade two unreliable answers for one honest
> result. (What ends up happening is even worse. Mismatching checksums mean
> that the tests must be repeated until a consensus is reached.)
> 

This is basically true. However, most systems are conservatively 
engineered; if care is taken (especially with regard to cooling)  
overclocking need not neccessarily result in unreliable systems.

Systems which are not overclocked can still be unreliable for a 
number of reasons.

IMHO undercooling (perhaps because a fan has stopped running, 
possibly without the knowledge of the user) represents at least as 
serious a problem as overclocking.

Damage by static discharge to components (especially memory, and 
usually due to mishandling during assembly) is another possible cause 
of systems running less than perfectly reliably.

On 23 Dec 00, at 10:15, John R Pierce wrote:

> at one time I had a number of those ILLEGAL SUMOUT errors, it turned out
> to be caused by an errant internet multimedia plugin (Crescendo MIDI)
> which was somehow interfering with the pentium-II's FPU.  This problem was
> specific to Windows95 too, I think, and went away with a later release of
> the kernel (win98 or 98SE fixed it, I think... it definately is not a
> problem in NT or Win2000).  I think we all decided it was related to this
> plugin doing MMX processing at a interrupt basis without properly
> notifying the kernel or something similar to this.

Yes, software problems are a real possibility, especially in Win 9x, 
because the memory model used does not protect process memory 
properly.
> 
> Anyways, I suspect the probability of a hardware error causing erroneous
> results without triggering MASSIVE numbers of check errors is
> slim-to-none.

Unfortunately lack of errors in normal operation of Prime95 is not a 
good indicator of a really reliable system. Because the error check 
is run every iteration (instead of once every 128 iterations) and 
because the result is compared with a known value, the 16-hour self 
test, or the torture test, is better as a hardware reliability tool 
than running LL tests in "production" mode.

A couple of years ago I had an instance of a P100 system with a blown 
CPU fan. It seems to have run for months with no detected errors. 
Eventually it did throw a couple of wobblies, but meanwhile it had 
submitted a couple of results which later turned out to be bad (mixed 
in with a pile of others which were OK).

My only other known error was during a QA test involving a run on a 
very large exponent. It turned out that the system had glitched, 
probably only once - rerunning in segments revealed a discrepancy 
between iterations 8.3 million & 8.4 million, but the other 16 of 17 
million iteration segments were clean. This was on a well-cooled 
Athlon 650 system running Win 2K Professional, using ECC RAM.
I don't know the cause. Possibly you can just get a hardware glitch 
something of the order of once a year. The point is that there was 
nothing in the log to show that the result might be suspect.

Also bear in mind that there is no certainty that there is no 
undetected bug in either the software or the CPU. There seems little 
point in excluding results from systems which may possibly be less 
than perfectly reliable so long as other sources of error may exist.

The double-checking mechanism _does_ work!
> 
> How many mismatched checksums does primenet have to reconcile on a ongoing
> basis?
> 
George would need to answer this one, but the incidence of "bad" 
results submitted is something of the order of 1%. Maybe a bit 
higher, and maybe tending to rise with increasing exponent size (or 
increasing run times?) I do have a feeling that current systems are 
less conservatively engineered than they used to be years ago - the 
market is more competitive, and there is more consumer pressure for 
ever higher "performance numbers" than there once was.

Seasonal felicitations
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
Re: Mersenne: Overclocking - bad for project?

Reply via email to