On 12 Sep 2001, at 15:44, David Jaggard wrote:
> I have an overclocked AMD Duron machine using windows 98SE running at
> just under 1GHz and have been running prime95 v20 successfully for
> many months. I have occasionally had hardware errors when I have tried
> to overclock further but at the current speed prime95 v20 is happy.
>
> Yesterday (11-09-2001) I upgraded prime95 from v20 to v21. My tests
> ressumed only to quickly fail with a SUMOUT/possible hardware failure.
> I made some small changes to speed and core voltage but could not
> correct the errors. I decided to run the torture test and I left it
> running all night. This morning I viewed results.txt and all tests had
> passed yet the normal operation of the program still fails.
SUMOUT is often (not invariably, but often) caused by dodgy
software. Sound card drivers for Windows 9x/ME are the most
likely culprits. If this is the cause then the Good News is that your
results are most likely OK. Note that a sound card driver problem
_could_ be sensitive to Prime95 version as the driver could be
bombing memory contents which are effectively unused in v20 but
contain active data or code in v21. This might also explain why the
torture test fails to reveal the problem - obviously there are _some_
minor differences between the code executed during torture testing
and that used for production LL/DC testing.
> Surely if
> my system was unstable the torture test should fail! Is it possible
> that in the new release the normal primality tests are more stressful
> than the torture test? Or is there a bug?
Umm. Not sure about that. The torture test does not (cannot!) test
every combination of operand values which can occur in "real" data.
Anyway if it's a driver problem then maybe it runs OK overnight
because the device with the duff driver isn't being used.
If the sumouts occur at the same iteration number then you
_could_ have found a bug caused by either a program error or a
design flaw in the processor which occurs with very specific data. If
they occur at random then it sounds much more like interference
from a bad device driver or some other hardware problem - maybe
overheating.
Did you try re-rerunning the specific selftest for the FFT runlength
used by the exponent you're having a problem with? (The FFT
runlength is the nearest "round" number to the size of the
Pnnnnnnn savefile divided by 4096. To force a re-run, stop Prime95,
edit local.ini removing the appropriate SelfTestNNNN=1 line &
restart Prime95.)
If that works but the actual assignment keeps failing, it _might_ be
a problem caused by specific data values. One way to _prove_ this
would be to take the savefile to a different system & run it on that -
if it's a data value problem that will fail too. Next try on a system
with an Intel processor - if that fails it's a program problem,
otherwise the CPU is doing something strange. I have a variety of
reasonably reliable systems with different CPU types available if
that's of any help.
If you suspect that it might be a driver problem, try upgrading the
sound card driver (it's usually the sound card, or integrated
Winmodem if you have one of those), next unloading the sound
card driver, next physically removing the sound card. If it's a
Winmodem problem then the system should be reliable so long as
the modem lead is unplugged. In the event you find a driver problem
is to blame, your options are (a) put up with it, (b) replace the
driver, (c) replace the hardware for a unit with a "known safe" driver
or (d) change your OS to Win NT/2000/XP, or linux, which will not
suffer this sort of driver problem since they use properly partitioned
memory.
To rule out overheating on your own system, I suggest you try:
(a) making sure that the cooling fan(s) in the case and/or power
supply are working - on AMD Athlon/Duron systems you don't
actually need to check the processor fan; the processor will die of
extreme overheating in seconds if that fails!
If your mainboard has a chipset cooling fan, check that is running,
too.
(b) blowing any accumulated crud (compacted dust) out of the
processor heatsink and away from the large chipsets on the
mainboard and the memory;
(c) reducing the processor to its rated speed. If this clears the
problem then I'm afraid you're maybe going to have to live with it.
BTW processors are designed for 4 to 7 years reliable operation at
their rated speed. They will age more quickly when overclocked,
mostly because of the higher temperatures generated by faster
switching, and especially if the core voltage is increased. As the
processor ages it will become increasingly less able to run reliably
at clock speeds in excess of its rating.
Critical components on the mainboard, in particular the voltage
regulator, age too. Some mainboards are known to have problems
with capacitors used by the CPU voltage reg which are located
close to the CPU socket; consequently they tend to run quite
warm due to being in the exhaust air flow from the CPU cooler. If
those caps start to go bad then the first thing that will happen is
that the CPU core voltage will become unstable, and so will the
CPU - irrespective of the CPU clock speed, though higher clock
speeds tend to rely on closer control of the core voltage, as well as
making the CPU voltage reg's job harder by drawing more current.
>
> On a seperate matter why is there no processor catagory for AMD Duron
> chips and yet there is a catagory for Intel Celerons? I'm proud of the
> fact that my budget setup is faster than some ready-made Athlon 1GHz
> systems!
Um. There is no technical reason for any difference in coding
between Duron & Thunderbird. Although it was never implemented,
and would be a waste of effort now, the early Celeron 266 and 300
CPUs have _no_ L2 cache and could perhaps have benefited from
slightly revised code. But I guess the real reason is that no-one
told George how to tell the difference between Duron, Thunderbird
and the original Athlon with 512K L2 cache running "slowly" from
the processor ID values returned.
Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers