On 12 Sep 2001, at 15:44, David Jaggard wrote:

> I have an overclocked AMD Duron machine using windows 98SE running at
> just under 1GHz and have been running prime95 v20 successfully for
> many months. I have occasionally had hardware errors when I have tried
> to overclock further but at the current speed prime95 v20 is happy.
> 
> Yesterday (11-09-2001) I upgraded prime95 from v20 to v21. My tests
> ressumed only to quickly fail with a SUMOUT/possible hardware failure.
> I made some small changes to speed and core voltage but could not
> correct the errors. I decided to run the torture test and I left it
> running all night. This morning I viewed results.txt and all tests had
> passed yet the normal operation of the program still fails. 

SUMOUT is often (not invariably, but often) caused by dodgy 
software. Sound card drivers for Windows 9x/ME are the most 
likely culprits. If this is the cause then the Good News is that your 
results are most likely OK. Note that a sound card driver problem 
_could_ be sensitive to Prime95 version as the driver could be 
bombing memory contents which are effectively unused in v20 but 
contain active data or code in v21. This might also explain why the 
torture test fails to reveal the problem - obviously there are _some_ 
minor differences between the code executed during torture testing 
and that used for production LL/DC testing.

> Surely if
> my system was unstable the torture test should fail! Is it possible
> that in the new release the normal primality tests are more stressful
> than the torture test? Or is there a bug?

Umm. Not sure about that. The torture test does not (cannot!) test 
every combination of operand values which can occur in "real" data.

Anyway if it's a driver problem then maybe it runs OK overnight 
because the device with the duff driver isn't being used.

If the sumouts occur at the same iteration number then you 
_could_ have found a bug caused by either a program error or a 
design flaw in the processor which occurs with very specific data. If 
they occur at random then it sounds much more like interference 
from a bad device driver or some other hardware problem - maybe 
overheating.

Did you try re-rerunning the specific selftest for the FFT runlength 
used by the exponent you're having a problem with? (The FFT 
runlength is the nearest "round" number to the size of the 
Pnnnnnnn savefile divided by 4096. To force a re-run, stop Prime95, 
edit local.ini removing the appropriate SelfTestNNNN=1 line & 
restart Prime95.)

If that works but the actual assignment keeps failing, it _might_ be 
a problem caused by specific data values. One way to _prove_ this 
would be to take the savefile to a different system & run it on that - 
if it's a data value problem that will fail too. Next try on a system 
with an Intel processor - if that fails it's a program problem, 
otherwise the CPU is doing something strange. I have a variety of 
reasonably reliable systems with different CPU types available if 
that's of any help.

If you suspect that it might be a driver problem, try upgrading the 
sound card driver (it's usually the sound card, or integrated 
Winmodem if you have one of those), next unloading the sound 
card driver, next physically removing the sound card. If it's a 
Winmodem problem then the system should be reliable so long as 
the modem lead is unplugged. In the event you find a driver problem 
is to blame, your options are (a) put up with it, (b) replace the 
driver, (c) replace the hardware for a unit with a "known safe" driver 
or (d) change your OS to Win NT/2000/XP, or linux, which will not 
suffer this sort of driver problem since they use properly partitioned 
memory.

To rule out overheating on your own system, I suggest you try:

(a) making sure that the cooling fan(s) in the case and/or power 
supply are working - on AMD Athlon/Duron systems you don't 
actually need to check the processor fan; the processor will die of 
extreme overheating in seconds if that fails! 

If your mainboard has a chipset cooling fan, check that is running, 
too.

(b) blowing any accumulated crud (compacted dust) out of the 
processor heatsink and away from the large chipsets on the 
mainboard and the memory;

(c) reducing the processor to its rated speed. If this clears the 
problem then I'm afraid you're maybe going to have to live with it.

BTW processors are designed for 4 to 7 years reliable operation at 
their rated speed. They will age more quickly when overclocked, 
mostly because of the higher temperatures generated by faster 
switching, and especially if the core voltage is increased. As the 
processor ages it will become increasingly less able to run reliably 
at clock speeds in excess of its rating.

Critical components on the mainboard, in particular the voltage 
regulator, age too. Some mainboards are known to have problems 
with capacitors used by the CPU voltage reg which are located 
close to the CPU socket; consequently they tend to run quite 
warm due to being in the exhaust air flow from the CPU cooler. If 
those caps start to go bad then the first thing that will happen is 
that the CPU core voltage will become unstable, and so will the 
CPU - irrespective of the CPU clock speed, though higher clock 
speeds tend to rely on closer control of the core voltage, as well as 
making the CPU voltage reg's job harder by drawing more current.
> 
> On a seperate matter why is there no processor catagory for AMD Duron
> chips and yet there is a catagory for Intel Celerons? I'm proud of the
> fact that my budget setup is faster than some ready-made Athlon 1GHz
> systems!

Um. There is no technical reason for any difference in coding 
between Duron & Thunderbird. Although it was never implemented, 
and would be a waste of effort now, the early Celeron 266 and 300 
CPUs have _no_ L2 cache and could perhaps have benefited from 
slightly revised code. But I guess the real reason is that no-one 
told George how to tell the difference between Duron, Thunderbird 
and the original Athlon with 512K L2 cache running "slowly" from 
the processor ID values returned.


Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to