Ken Kriesel wrote:

> At 09:23 AM 2/3/2001 -0000, "Brian J. Beesley" <[EMAIL PROTECTED]> wrote:
> 
>> George has announced the development of new FFT code optimised for 
>> Pentium 4. The FFT code is the true heart of the program: it's really 
>> hard for me to put into words just how much we all owe to George for 
>> his unstinting efforts to make the program as efficient as it is. 
>> Suffice it to say that, without George's input, we would probably be 
>> two or three Mersenne primes short of where we actually are.
> 
> 
> Possibly 4 primes.

That would obviously depend on how fast David Slowinski was progressing 
at the time.  Remember that much of George's contribution was organizing 
the project itself, though he certainly has put a huge amount of effort 
into developing the x86 software. 

> 
> I've been lobbying a bit for a dual-processor optimized version for Intel.  
> I have little technical basis on which to judge the potential gains, but 
> speculate that memory bus contention and caching efficiency would 
> be improved if both processors were working on the same large exponent.

IIRC, Slowinski usually ran one exponent on each processor, except when 
verifying a prime. 

> 
> Well before learning of GIMPS and George's program in 1996, I had
> coded a program to do limited trial division followed by a Lucas-Lehmer
> test.  Having done that, and then seeing the efficiency of George's
> program, gave me a better appreciation for how much work and skill
> went into prime95.  It's highly optimized, using the counters built into
> the cpus for the purpose, and using virtually everything known about
> properties of potential factors and the best algorithms to speed things up
> both in the factoring attempts and the Lucas Lehmer test.
> 
> Our best hopes for future speed improvements are
> 1) the steady march to faster computers in greater numbers (& more 
> multi-cpu systems running multiple instances of the program 1 per cpu)
> 2) possible future discoveries of better algorithms by mathematicians.
> Last I heard, there was still some space between the upper and lower 
> bounds to the limit on number of operations required to perform a
> long multiplication or squaring.
> 3) modest increments due to optimizations to specific architectures, 
> additional factoring methods, implementation of additional FFT runlengths etc.
> The easy large gains were implemented long ago, and medium difficulty
> & moderate gains also, leaving diminishing returns requiring significant
> effort.

Obviously, something else we can all do is encourage our friends to run 
the program.  I am a member of a website known as everything2, and have 
written articles for that site regarding GIMPS.  Feedback is welcome.  I 
am aware that my article is now extremely brief, but it's hard to see 
what more can be said in a non-technical way. 

http://www.everything2.com/index.pl?node_id=877902
I have also mentioned GIMPS on my homepage,
http://www.acsu.buffalo.edu/~nrussell/

> 
>> Nothing built by human hands is perfect, so, sure, the program could 
>> be improved! Personally I'd like to see an optimization for Athlon; 
>> at the expense of having to load different versions for different 
>> processor types, I'd like to see seperate "streamlined" versions of 
>> the code optimized for different processor types rather than one 
>> monolithic program with everything embedded in it; and I'd like to 
>> see the client/server security code somehow "opened", to facilitate 
>> the integration of non-Intel clients into PrimeNet, but without 
>> sacrificing the trust we have that results have really been computed.
> 
> 
> Let's keep it simple; disk space is cheap (including space for paging
> out unused code).
> I think the current situation where the program detects cpu model
> and reacts accordingly is a good one; you can still take control
> by editing the ini file if need be.  It keeps it simple for both the novice
> end user wanting to download a program and get going, and for the
> program developers.  If cpu-specific programs were made
> instead, the number of distinct programs gets large because the
> combinations of cpu type and OS is large.

Agreed there!  There is nothing wrong with monolithic programs.  
Compared with, eg, Netscape or even Winamp, Prime95 requires a very 
small amount of resources. 
(snip)

> 
> Perhaps some GIMPS participants could offer George & Scott 
> nonprivileged account access on some other architectures, so they 
> could do the required development.

Certainly not a bad idea.  I think an Amiga client in particular would 
attract a fair amount of interest, as would a version specially designed 
for NetBSD or OpenBSD

> 
>> With increasing exponent size (and therefore run time), I'd like to 
>> see PrimeNet evolve to track intermediate residues & also to be able 
>> to coordinate parallel LL testing & double-checking, so that runs 
>> which are going wrong can be stopped for investigation without having 
>> to be run through to the end.
> 
> 
> In the QA effort, we've seen a few instances already of errors caught
> midway by doing a manual/email version of this.  Brian Beesley had an error
> detected this way in his run of a double-check of a 10-megadigit exponent.
> This exponent takes a PII-400 428 days (yes 14 months) to complete,
> so detecting the one error and restarting early saves about 10.5 PII-400
> months.
> (Thanks to Rick Pali for providing interim residues to make this savings
> possible.)
> Another exponent, 20295631 showed similar results; both Paul Victor Novarese's
> run and mine produced errors while Brian Beesley's run matched Gordon
> Spence's.
> 
> I assume that Brian means sending intermediate 64-bit residues to Primenet
> for comparison.  (The intermediate save files are too big to send with any
> frequency and would require a lot of storage.)

Agreed there.  On the other hand, on a fast machine, the SETI project 
requires receiving the better part of a megabyte every day, and they 
have had no problem finding new recruits. 

> 
> To automate checking via interim residues would require significant longterm 
> storage at primenet, of quadruples containing exponent, iteration, 64-bit
> residue,
> and the source of the information (person or machine ID).  When two with
> matching exponent and iteration but different source were available a
> comparison
> would be made; if a discrepancy was found, both runs should be halted while
> a tiebreaker run was made via a different source, to avoid wasting cpu time
> of one or both original sources.  

The difficulty here is explaining the entire process to new users.  I 
know that when i was new I had enough trouble deciding on issues like 
how much memory to make available, even without worrying about comparing 
residues at that time. 

> Since the most likely cause of a discrepancy
> is error in one run not both, a resume capability as well as a discard
> capability
> would be needed.  I feel exponents halted for a tiebreaker run should not be 
> expired.

Note that all of this will require making changes in the server, and 
there has historically been a fair backlog of such changes. 

> 
> 
> Ken

Regards,
Nathan

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to