Brian,

Thankyou for the help. However I wrote the email to the mailing list at work
and what I reported to be a SUMOUT was actually a SUMINP != SUMOUT error. I
realize these are two different types of errors and that the actual error I
see is not going to be caused by driver failure. If I have time tonight I
will try to run the torture test at the same FFT as the LL test.

I'm not going to complain if I have to reduce the processor speed a bit
because I've obviously gone a bit too far for my particular chip. A chip is
only as fast as its reliability afterall!

My only concern is that the torture test should be a more accurate measure
of hardware stability than the LL test and my intial findings suggest that
perhaps it isn't. With the addition of the option at startup to only run
stress tests (added specifically for overclockers I believe) the torture
test needs to be the more accurate test as many uses may never run the LL
tests.

On the Athlon/Duron ID front, if its not possible to detect the chip could
we not have an option to set it manually? I'd just like things to look right
in the stats.

Thanks again
Davy

----- Original Message -----
From: <[EMAIL PROTECTED]>
To: David Jaggard <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, September 12, 2001 9:00 PM
Subject: Re: Mersenne: Torture test passes but normal use fails!


> On 12 Sep 2001, at 15:44, David Jaggard wrote:
>
> > I have an overclocked AMD Duron machine using windows 98SE running at
> > just under 1GHz and have been running prime95 v20 successfully for
> > many months. I have occasionally had hardware errors when I have tried
> > to overclock further but at the current speed prime95 v20 is happy.
> >
> > Yesterday (11-09-2001) I upgraded prime95 from v20 to v21. My tests
> > ressumed only to quickly fail with a SUMOUT/possible hardware failure.
> > I made some small changes to speed and core voltage but could not
> > correct the errors. I decided to run the torture test and I left it
> > running all night. This morning I viewed results.txt and all tests had
> > passed yet the normal operation of the program still fails.
>
> SUMOUT is often (not invariably, but often) caused by dodgy
> software. Sound card drivers for Windows 9x/ME are the most
> likely culprits. If this is the cause then the Good News is that your
> results are most likely OK. Note that a sound card driver problem
> _could_ be sensitive to Prime95 version as the driver could be
> bombing memory contents which are effectively unused in v20 but
> contain active data or code in v21. This might also explain why the
> torture test fails to reveal the problem - obviously there are _some_
> minor differences between the code executed during torture testing
> and that used for production LL/DC testing.
>
> > Surely if
> > my system was unstable the torture test should fail! Is it possible
> > that in the new release the normal primality tests are more stressful
> > than the torture test? Or is there a bug?
>
> Umm. Not sure about that. The torture test does not (cannot!) test
> every combination of operand values which can occur in "real" data.
>
> Anyway if it's a driver problem then maybe it runs OK overnight
> because the device with the duff driver isn't being used.
>
> If the sumouts occur at the same iteration number then you
> _could_ have found a bug caused by either a program error or a
> design flaw in the processor which occurs with very specific data. If
> they occur at random then it sounds much more like interference
> from a bad device driver or some other hardware problem - maybe
> overheating.
>
> Did you try re-rerunning the specific selftest for the FFT runlength
> used by the exponent you're having a problem with? (The FFT
> runlength is the nearest "round" number to the size of the
> Pnnnnnnn savefile divided by 4096. To force a re-run, stop Prime95,
> edit local.ini removing the appropriate SelfTestNNNN=1 line &
> restart Prime95.)
>
> If that works but the actual assignment keeps failing, it _might_ be
> a problem caused by specific data values. One way to _prove_ this
> would be to take the savefile to a different system & run it on that -
> if it's a data value problem that will fail too. Next try on a system
> with an Intel processor - if that fails it's a program problem,
> otherwise the CPU is doing something strange. I have a variety of
> reasonably reliable systems with different CPU types available if
> that's of any help.
>
> If you suspect that it might be a driver problem, try upgrading the
> sound card driver (it's usually the sound card, or integrated
> Winmodem if you have one of those), next unloading the sound
> card driver, next physically removing the sound card. If it's a
> Winmodem problem then the system should be reliable so long as
> the modem lead is unplugged. In the event you find a driver problem
> is to blame, your options are (a) put up with it, (b) replace the
> driver, (c) replace the hardware for a unit with a "known safe" driver
> or (d) change your OS to Win NT/2000/XP, or linux, which will not
> suffer this sort of driver problem since they use properly partitioned
> memory.
>
> To rule out overheating on your own system, I suggest you try:
>
> (a) making sure that the cooling fan(s) in the case and/or power
> supply are working - on AMD Athlon/Duron systems you don't
> actually need to check the processor fan; the processor will die of
> extreme overheating in seconds if that fails!
>
> If your mainboard has a chipset cooling fan, check that is running,
> too.
>
> (b) blowing any accumulated crud (compacted dust) out of the
> processor heatsink and away from the large chipsets on the
> mainboard and the memory;
>
> (c) reducing the processor to its rated speed. If this clears the
> problem then I'm afraid you're maybe going to have to live with it.
>
> BTW processors are designed for 4 to 7 years reliable operation at
> their rated speed. They will age more quickly when overclocked,
> mostly because of the higher temperatures generated by faster
> switching, and especially if the core voltage is increased. As the
> processor ages it will become increasingly less able to run reliably
> at clock speeds in excess of its rating.
>
> Critical components on the mainboard, in particular the voltage
> regulator, age too. Some mainboards are known to have problems
> with capacitors used by the CPU voltage reg which are located
> close to the CPU socket; consequently they tend to run quite
> warm due to being in the exhaust air flow from the CPU cooler. If
> those caps start to go bad then the first thing that will happen is
> that the CPU core voltage will become unstable, and so will the
> CPU - irrespective of the CPU clock speed, though higher clock
> speeds tend to rely on closer control of the core voltage, as well as
> making the CPU voltage reg's job harder by drawing more current.
> >
> > On a seperate matter why is there no processor catagory for AMD Duron
> > chips and yet there is a catagory for Intel Celerons? I'm proud of the
> > fact that my budget setup is faster than some ready-made Athlon 1GHz
> > systems!
>
> Um. There is no technical reason for any difference in coding
> between Duron & Thunderbird. Although it was never implemented,
> and would be a waste of effort now, the early Celeron 266 and 300
> CPUs have _no_ L2 cache and could perhaps have benefited from
> slightly revised code. But I guess the real reason is that no-one
> told George how to tell the difference between Duron, Thunderbird
> and the original Athlon with 512K L2 cache running "slowly" from
> the processor ID values returned.
>
>
> Regards
> Brian Beesley
> _________________________________________________________________________
> Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
> Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers
>

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to