At 09:23 AM 2/3/2001 -0000, "Brian J. Beesley" <[EMAIL PROTECTED]> wrote:
>George has announced the development of new FFT code optimised for 
>Pentium 4. The FFT code is the true heart of the program: it's really 
>hard for me to put into words just how much we all owe to George for 
>his unstinting efforts to make the program as efficient as it is. 
>Suffice it to say that, without George's input, we would probably be 
>two or three Mersenne primes short of where we actually are.

Possibly 4 primes.

I've been lobbying a bit for a dual-processor optimized version for Intel.  
I have little technical basis on which to judge the potential gains, but 
speculate that memory bus contention and caching efficiency would 
be improved if both processors were working on the same large exponent.

Well before learning of GIMPS and George's program in 1996, I had
coded a program to do limited trial division followed by a Lucas-Lehmer
test.  Having done that, and then seeing the efficiency of George's
program, gave me a better appreciation for how much work and skill
went into prime95.  It's highly optimized, using the counters built into
the cpus for the purpose, and using virtually everything known about
properties of potential factors and the best algorithms to speed things up
both in the factoring attempts and the Lucas Lehmer test.

Our best hopes for future speed improvements are
1) the steady march to faster computers in greater numbers (& more 
multi-cpu systems running multiple instances of the program 1 per cpu)
2) possible future discoveries of better algorithms by mathematicians.
Last I heard, there was still some space between the upper and lower 
bounds to the limit on number of operations required to perform a
long multiplication or squaring.
3) modest increments due to optimizations to specific architectures, 
additional factoring methods, implementation of additional FFT runlengths etc.
The easy large gains were implemented long ago, and medium difficulty
& moderate gains also, leaving diminishing returns requiring significant
effort.

>Nothing built by human hands is perfect, so, sure, the program could 
>be improved! Personally I'd like to see an optimization for Athlon; 
>at the expense of having to load different versions for different 
>processor types, I'd like to see seperate "streamlined" versions of 
>the code optimized for different processor types rather than one 
>monolithic program with everything embedded in it; and I'd like to 
>see the client/server security code somehow "opened", to facilitate 
>the integration of non-Intel clients into PrimeNet, but without 
>sacrificing the trust we have that results have really been computed.

Let's keep it simple; disk space is cheap (including space for paging
out unused code).
I think the current situation where the program detects cpu model
and reacts accordingly is a good one; you can still take control
by editing the ini file if need be.  It keeps it simple for both the novice
end user wanting to download a program and get going, and for the
program developers.  If cpu-specific programs were made
instead, the number of distinct programs gets large because the
combinations of cpu type and OS is large.
Prime95 supports or supported something like 11 cpu type codes.
Regarding OS, there are at least 6 types (currently maintained at V20):
Win95/NT interactive
NT service
linux
statically linked linux
freebsd
statically linked bsd

Setting CPU type in prime95 V20.4.1 and then examining local.ini,
CPUType=3 is a Cyrix 6x86
CPUType=4 is a 486
CPUType=5 is a Pentium
CPUType=6 is a Pentium Pro
CPUType=7 is an AMD K6
CPUType=8 is a Celeron
CPUType=9 is a PentiumII
CPUType=10 is a PentiumIII
CPUType=11 is an AMD Athlon
P-4 code adds another type.  (Is there another AMD type?)
Presumably cputype 386 and below have been retired 
(yes 386's were supported, in v13.2 or so)

Perhaps some GIMPS participants could offer George & Scott 
nonprivileged account access on some other architectures, so they 
could do the required development.

>With increasing exponent size (and therefore run time), I'd like to 
>see PrimeNet evolve to track intermediate residues & also to be able 
>to coordinate parallel LL testing & double-checking, so that runs 
>which are going wrong can be stopped for investigation without having 
>to be run through to the end.

In the QA effort, we've seen a few instances already of errors caught
midway by doing a manual/email version of this.  Brian Beesley had an error
detected this way in his run of a double-check of a 10-megadigit exponent.
This exponent takes a PII-400 428 days (yes 14 months) to complete,
so detecting the one error and restarting early saves about 10.5 PII-400
months.
(Thanks to Rick Pali for providing interim residues to make this savings
possible.)
Another exponent, 20295631 showed similar results; both Paul Victor Novarese's
run and mine produced errors while Brian Beesley's run matched Gordon
Spence's.

I assume that Brian means sending intermediate 64-bit residues to Primenet
for comparison.  (The intermediate save files are too big to send with any
frequency and would require a lot of storage.)

To automate checking via interim residues would require significant longterm 
storage at primenet, of quadruples containing exponent, iteration, 64-bit
residue,
and the source of the information (person or machine ID).  When two with
matching exponent and iteration but different source were available a
comparison
would be made; if a discrepancy was found, both runs should be halted while
a tiebreaker run was made via a different source, to avoid wasting cpu time
of one or both original sources.  Since the most likely cause of a discrepancy
is error in one run not both, a resume capability as well as a discard
capability
would be needed.  I feel exponents halted for a tiebreaker run should not be 
expired.


Ken

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to