Ken Kriesel wrote:
> At 09:23 AM 2/3/2001 -0000, "Brian J. Beesley" <[EMAIL PROTECTED]> wrote:
>
>> George has announced the development of new FFT code optimised for
>> Pentium 4. The FFT code is the true heart of the program: it's really
>> hard for me to put into words just how much we all owe to George for
>> his unstinting efforts to make the program as efficient as it is.
>> Suffice it to say that, without George's input, we would probably be
>> two or three Mersenne primes short of where we actually are.
>
>
> Possibly 4 primes.
That would obviously depend on how fast David Slowinski was progressing
at the time. Remember that much of George's contribution was organizing
the project itself, though he certainly has put a huge amount of effort
into developing the x86 software.
>
> I've been lobbying a bit for a dual-processor optimized version for Intel.
> I have little technical basis on which to judge the potential gains, but
> speculate that memory bus contention and caching efficiency would
> be improved if both processors were working on the same large exponent.
IIRC, Slowinski usually ran one exponent on each processor, except when
verifying a prime.
>
> Well before learning of GIMPS and George's program in 1996, I had
> coded a program to do limited trial division followed by a Lucas-Lehmer
> test. Having done that, and then seeing the efficiency of George's
> program, gave me a better appreciation for how much work and skill
> went into prime95. It's highly optimized, using the counters built into
> the cpus for the purpose, and using virtually everything known about
> properties of potential factors and the best algorithms to speed things up
> both in the factoring attempts and the Lucas Lehmer test.
>
> Our best hopes for future speed improvements are
> 1) the steady march to faster computers in greater numbers (& more
> multi-cpu systems running multiple instances of the program 1 per cpu)
> 2) possible future discoveries of better algorithms by mathematicians.
> Last I heard, there was still some space between the upper and lower
> bounds to the limit on number of operations required to perform a
> long multiplication or squaring.
> 3) modest increments due to optimizations to specific architectures,
> additional factoring methods, implementation of additional FFT runlengths etc.
> The easy large gains were implemented long ago, and medium difficulty
> & moderate gains also, leaving diminishing returns requiring significant
> effort.
Obviously, something else we can all do is encourage our friends to run
the program. I am a member of a website known as everything2, and have
written articles for that site regarding GIMPS. Feedback is welcome. I
am aware that my article is now extremely brief, but it's hard to see
what more can be said in a non-technical way.
http://www.everything2.com/index.pl?node_id=877902
I have also mentioned GIMPS on my homepage,
http://www.acsu.buffalo.edu/~nrussell/
>
>> Nothing built by human hands is perfect, so, sure, the program could
>> be improved! Personally I'd like to see an optimization for Athlon;
>> at the expense of having to load different versions for different
>> processor types, I'd like to see seperate "streamlined" versions of
>> the code optimized for different processor types rather than one
>> monolithic program with everything embedded in it; and I'd like to
>> see the client/server security code somehow "opened", to facilitate
>> the integration of non-Intel clients into PrimeNet, but without
>> sacrificing the trust we have that results have really been computed.
>
>
> Let's keep it simple; disk space is cheap (including space for paging
> out unused code).
> I think the current situation where the program detects cpu model
> and reacts accordingly is a good one; you can still take control
> by editing the ini file if need be. It keeps it simple for both the novice
> end user wanting to download a program and get going, and for the
> program developers. If cpu-specific programs were made
> instead, the number of distinct programs gets large because the
> combinations of cpu type and OS is large.
Agreed there! There is nothing wrong with monolithic programs.
Compared with, eg, Netscape or even Winamp, Prime95 requires a very
small amount of resources.
(snip)
>
> Perhaps some GIMPS participants could offer George & Scott
> nonprivileged account access on some other architectures, so they
> could do the required development.
Certainly not a bad idea. I think an Amiga client in particular would
attract a fair amount of interest, as would a version specially designed
for NetBSD or OpenBSD
>
>> With increasing exponent size (and therefore run time), I'd like to
>> see PrimeNet evolve to track intermediate residues & also to be able
>> to coordinate parallel LL testing & double-checking, so that runs
>> which are going wrong can be stopped for investigation without having
>> to be run through to the end.
>
>
> In the QA effort, we've seen a few instances already of errors caught
> midway by doing a manual/email version of this. Brian Beesley had an error
> detected this way in his run of a double-check of a 10-megadigit exponent.
> This exponent takes a PII-400 428 days (yes 14 months) to complete,
> so detecting the one error and restarting early saves about 10.5 PII-400
> months.
> (Thanks to Rick Pali for providing interim residues to make this savings
> possible.)
> Another exponent, 20295631 showed similar results; both Paul Victor Novarese's
> run and mine produced errors while Brian Beesley's run matched Gordon
> Spence's.
>
> I assume that Brian means sending intermediate 64-bit residues to Primenet
> for comparison. (The intermediate save files are too big to send with any
> frequency and would require a lot of storage.)
Agreed there. On the other hand, on a fast machine, the SETI project
requires receiving the better part of a megabyte every day, and they
have had no problem finding new recruits.
>
> To automate checking via interim residues would require significant longterm
> storage at primenet, of quadruples containing exponent, iteration, 64-bit
> residue,
> and the source of the information (person or machine ID). When two with
> matching exponent and iteration but different source were available a
> comparison
> would be made; if a discrepancy was found, both runs should be halted while
> a tiebreaker run was made via a different source, to avoid wasting cpu time
> of one or both original sources.
The difficulty here is explaining the entire process to new users. I
know that when i was new I had enough trouble deciding on issues like
how much memory to make available, even without worrying about comparing
residues at that time.
> Since the most likely cause of a discrepancy
> is error in one run not both, a resume capability as well as a discard
> capability
> would be needed. I feel exponents halted for a tiebreaker run should not be
> expired.
Note that all of this will require making changes in the server, and
there has historically been a fair backlog of such changes.
>
>
> Ken
Regards,
Nathan
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers