Mersenne Digest Wednesday, January 3 2001 Volume 01 : Number 806 ---------------------------------------------------------------------- Date: Sun, 31 Dec 2000 18:18:37 -0600 From: Nathan Russell <[EMAIL PROTECTED]> Subject: Mersenne: Shutdown script for mprime Hi, I was wondering if anyone knows of a script to shut down mprime and make sure the save files are finished saving before SIGKILL is sent in the shutdown process. In terms of starting it up, I now have a line in my inittab file to start it whenever the system enters runlevel 5, and run it with runtime options to use the Prime95 directory of the Windows drive as its working directory. It is set up to run setuid a user, and I've checked and it is indeed running as that user. Thanks! Nathan _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Sun, 31 Dec 2000 18:20:42 -0500 From: [EMAIL PROTECTED] Subject: Mersenne: Automatic reply I have moved. My new address is: [EMAIL PROTECTED] Please use this address in the future. Thanks _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Mon, 1 Jan 2001 00:46:48 +0100 (MET) From: [EMAIL PROTECTED] Subject: Re: Mersenne: 2000 - first calender year without a Mersenne Nathan Russell observes: > Well, unless there is an announcement within the next few hours, 2000 > will be the first calender year in the history of GIMPS without a > Mersenne prime. > > Is the number of searchers, and the power of their hardware, increasing > enough to keep up with the increased runtimes and (slightly) decreased > chance of a given exponent being prime? Or can we expect only one prime > every several years, as was the case before the start of the GIMPS > effort? The chance that a given Mp is prime is O(1/log(Mp)) = O(1/p). The LL test for Mp needs O(p) iterations. The time per iteration (using an FFT) is about O(p*log(p)). So the estimated time per LL test is O(p^2 * log(p)) whereas the chance of success on one LL test is O(1/p). We need a few adjustments to the formula since (for example) the time to trial divide by primes < N is approximately O(log(p) /p) for fixed N. [We test O(N/p) divisors below N, with a cost of O(log(p)) per divisor.] This trial division time decreases with p, whereas LL time increases, which affects our strategy. Nonetheless, to test all O(2^b / b) primes with b bits, we estimate O(2^b / b) * O(2^(2b) * b) = O(8^b) operations with O(1) expected successes. Doing all 24-bit p's takes about eight times as long as doing all 23-bit p's. Even with more contributors and faster hardware, we soon get diminishing returns. Peter Montgomery _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Sun, 31 Dec 2000 23:10:35 -0500 From: "Frank_A_L_I_N_Y" <[EMAIL PROTECTED]> Subject: Re: Mersenne: 2000 - first calender year without a Mersenne - ----- Original Message ----- From: "Frank_A_L_I_N_Y" <[EMAIL PROTECTED]> To: "Nathan Russell" <[EMAIL PROTECTED]> Sent: Sunday, December 31, 2000 7:53 PM Subject: Re: Mersenne: 2000 - first calender year without a Mersenne > Even with hardware twice as fast, any current two year exponent would still > take a year at that time. > I don't think the rate of double the speed every two years can hold out for > ever. > Even if the next hardware can get through these exponents faster, the longer > exponents would come down the pipe sooner, requiring the next hardware jump > that much sooner. > > Regards Frank > ----- Original Message ----- > From: "Nathan Russell" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Sunday, December 31, 2000 7:01 PM > Subject: Mersenne: 2000 - first calender year without a Mersenne > > > > Well, unless there is an announcement within the next few hours, 2000 > > will be the first calender year in the history of GIMPS without a > > Mersenne prime. > > > > Is the number of searchers, and the power of their hardware, increasing > > enough to keep up with the increased runtimes and (slightly) decreased > > chance of a given exponent being prime? Or can we expect only one prime > > every several years, as was the case before the start of the GIMPS > > effort? > > > > This might be a good discussion, since the list is fairly quiet now. > > > > Nathan > > _________________________________________________________________________ > > Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm > > Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers > _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Sun, 31 Dec 2000 23:10:23 -0500 From: "Frank_A_L_I_N_Y" <[EMAIL PROTECTED]> Subject: Re: Mersenne: 2000 - first calender year without a Mersenne - ----- Original Message ----- From: "Frank_A_L_I_N_Y" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, December 31, 2000 7:59 PM Subject: Re: Mersenne: 2000 - first calender year without a Mersenne > Sorry, I replied to Mr Russell before reading Mr Montgomery's reply. > Mr M said it much better than I. > > Reagards Frank > ----- Original Message ----- > From: <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Sunday, December 31, 2000 6:46 PM > Subject: Re: Mersenne: 2000 - first calender year without a Mersenne > > > > > > Nathan Russell observes: > > > Well, unless there is an announcement within the next few hours, 2000 > > > will be the first calender year in the history of GIMPS without a > > > Mersenne prime. > > > > > > Is the number of searchers, and the power of their hardware, increasing > > > enough to keep up with the increased runtimes and (slightly) decreased > > > chance of a given exponent being prime? Or can we expect only one prime > > > every several years, as was the case before the start of the GIMPS > > > effort? > > > > The chance that a given Mp is prime is O(1/log(Mp)) = O(1/p). > > The LL test for Mp needs O(p) iterations. > > The time per iteration (using an FFT) is about O(p*log(p)). > > So the estimated time per LL test is O(p^2 * log(p)) > > whereas the chance of success on one LL test is O(1/p). > > > > We need a few adjustments to the formula since (for example) > > the time to trial divide by primes < N is approximately > > O(log(p) /p) for fixed N. [We test O(N/p) divisors below N, > > with a cost of O(log(p)) per divisor.] This trial division > > time decreases with p, whereas LL time increases, which > > affects our strategy. > > > > Nonetheless, to test all O(2^b / b) primes with b bits, > > we estimate O(2^b / b) * O(2^(2b) * b) = O(8^b) operations > > with O(1) expected successes. Doing all 24-bit p's > > takes about eight times as long as doing all 23-bit p's. > > Even with more contributors and faster hardware, > > we soon get diminishing returns. > > > > Peter Montgomery > > > > > > _________________________________________________________________________ > > Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm > > Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers > _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Mon, 01 Jan 2001 12:14:14 +0100 From: Martijn <[EMAIL PROTECTED]> Subject: Re: Mersenne: Shutdown script for mprime Nathan Russell wrote: > Hi, > > I was wondering if anyone knows of a script to shut down mprime and make > sure the save files are finished saving before SIGKILL is sent in the > shutdown process. > Hi I use an sysv type init and start and stop the mersenne from the rc.d directory. The script is as follows: (/etc/rc.d/init.d/mersenne, added to the respective rc.n directories with chkconfig) #!/bin/sh # # mersenne # # chkconfig: 2345 10 50 # description: mersenne # # Source function library. . /etc/init.d/functions # See how we were called. case "$1" in start) action "Starting mersenne: " su - -c /home/mersenne/startmprime mersenne touch /var/lock/subsys/mersenne ;; stop) action "Stopping Mersenne" killall mprime rm -f /var/lock/subsys/mersenne ;; restart|reload) $0 stop; $0 start; ;; *) # do not advertise unreasonable commands that there is no reason # to use with this device echo "Usage: mersenne {start|stop|restart|reload}" exit 1 esac exit 0 the /home/mersenne/startmprime script is as follows: #!/bin/bash cd /home/mersenne ./mprime -B -A0 ./mprime -B -A1 By killing it relatively early in the shutdown sequence I know for sure that mprime has enough time to shut down before the generic kills are sent. (kill and killall sends SIGTERM by default) Kind Regards, Martijn > > In terms of starting it up, I now have a line in my inittab file to > start it whenever the system enters runlevel 5, and run it with runtime > options to use the Prime95 directory of the Windows drive as its working > directory. It is set up to run setuid a user, and I've checked and it > is indeed running as that user. > > Thanks! > > Nathan > _________________________________________________________________________ > Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm > Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Mon, 1 Jan 2001 16:52:34 -0500 (EST) From: Jason Stratos Papadopoulos <[EMAIL PROTECTED]> Subject: Mersenne: Integer Convolution Library v1.1 released Hello again. Version 1.1 of ICL is a service release; there are minor speedups, a single major bug fix for Mersenne code and improved test programs. The build process is now much more generic so that you don't need the absolute latest GNU tools to compile it. You will still need an Alpha, however; preferably an ev6. The code now includes a hacked x86 assembler, whose preprocessor is used to build the library. Thanks are due to Simon Tatham for NASM, even though I've cut it to pieces for use here. www.glue.umd.edu/~jasonp/icl11.tar.gz Let me know what you think, jasonp _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Mon, 01 Jan 2001 21:01:28 -0500 From: George Woltman <[EMAIL PROTECTED]> Subject: Mersenne: What I've learned about the P4 thusfar (long) Happy new year to all, Current timings on a 512K FFT (exponents up to 10.32 million): 1.4GHz P4: 0.126 sec 1.1GHz T-bird: 0.103 sec 1.0GHz P3: 0.145 sec As pointed out in an earlier email, using the new SSE2 instructions and coding for the changed cache line size should allow much better timings on the P4. I coded the most common FFT building block, four_complex_fft, using the SSE2 instructions. This macro takes 8 floating point values (4 complex numbers) and does 2 "levels" of the FFT. The SSE2 implementation will operate on two different sets of 8 floating point values and also do 2 levels of the FFT (i.e. it does exactly twice as much work). Next I timed this macro using consecutive memory locations. The P3 version of this macro takes 52 clocks. The P4 new implementation takes 48 clocks! More than twice the throughput. Now the bad news, when I use the prime95 memory layout where the 8 input values come from 8 different cache lines and the modified values are output to the same cache lines (an in-place FFT), the P4 code now takes 112 clocks. The cause is the new 64-byte cache line. The L1 cache is write-back. Any changes to the L1 cache are written back to the L2 cache. On the P3 the L1 and L2 cache line size is 32 bytes. A write to the L2 cache takes 7 clocks (I think). Thus, the P3 macro will run in 8*7=56 clocks. The P4 has a 64 byte L1 cache. The update of the L2 cache is done with two writes of 32 bytes each taking 7 clocks. Thus, the P4 macro takes 8*14=112 clocks. After much thought (coming up with a scheme that does not cause thrashing of the 64 TLBs is not easy), I think I've worked out a new memory layout for prime95. This is a mostly-in-place FFT where the outputs of the macros are written to consecutive memory locations. I've now begun the process of recoding all of the prime95 building blocks, then I'll code up the new memory layout for the P4. Obviously this will take a good deal of time. I thought you might be interested in a rough estimate of how fast a 512K FFT might run in the finished version. The same macro above when operating out of the L2 cache is a little slower - call it 56 clocks. The macro processes 16 doubles, or 56/16=3.5 clocks/double. The 512K FFT has 19 forward FFT levels and 19 inverse FFT levels. The first 3 levels are special and are as expensive as 2 levels later on. Remember, the macro does 2 levels meaning we will run it 9 times in the forward FFT and 9 times in the inverse FFT. Total cost is 512K doubles * 3.5 clocks/double * 18 / 1.4G clocks/sec. This is .024 sec. I've estimated the carry propagation step at .004 sec. The FFT will also access main memory 3 times, much of this cost can be hidden with judicious use of the PREFETCH instruction. This estimate also does not include the cost of accessing the sin/cos data. All in all, I think a time of 0.040 seconds should be achievable. Having fun, George _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Mon, 1 Jan 2001 22:02:42 -0500 (EST) From: Jason Stratos Papadopoulos <[EMAIL PROTECTED]> Subject: Re: Mersenne: What I've learned about the P4 thusfar (long) On Mon, 1 Jan 2001, George Woltman wrote: > Now the bad news, when I use the prime95 memory layout where the 8 input > values come from 8 different cache lines and the modified values are > output to the same cache lines (an in-place FFT), the P4 code now takes 112 > clocks. > > The cause is the new 64-byte cache line. The L1 cache is write-back. > Any changes to the L1 cache are written back to the L2 cache. On the P3 Two things here; first, doesn't the P4 have 128 byte cache lines? Also, do you mean write *back* or write *through*? Write back to computer architects usually means cache lines don't automatically update the cache level below them. > P4 has a 64 byte L1 cache. The update of the L2 cache is done with two > writes of 32 bytes each taking 7 clocks. Thus, the P4 macro takes 8*14=112 > clocks. Are you sure of this timing? The memory bus to L2 is supposed to be 256 bits wide, and is supposed to deliver data once per clock (or maybe every other clock, the Coppermine P3 does that). This would mean a cache line write back would take at least four processor clocks. Or is that super-speed delivery only supposed to be for reads, with writes being buffered? > After much thought (coming up with a scheme that does not cause thrashing > of the 64 TLBs is not easy), I think I've worked out a new memory layout > for prime95. This is a mostly-in-place FFT where the outputs of the macros > are written to consecutive memory locations. I've now begun the process > of recoding all of the prime95 building blocks, then I'll code up the > new memory layout for the P4. Obviously this will take a good deal of time. Since the new L1 cache is tiny and the L2 is very large and almost as fast, would your life become any easier if you coded for L2 cache latency instead of L1? Most of the work gets done in a single pass through main memory, then the "rows" of the transform get processed in a second pass that can barely tolerate L2 latency. Or is the latency of SSE2 instructions so high that you run out of registers and rely on renaming already? jasonp _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Mon, 01 Jan 2001 22:44:22 -0500 From: George Woltman <[EMAIL PROTECTED]> Subject: Re: Mersenne: What I've learned about the P4 thusfar (long) Hi Jason, At 10:02 PM 1/1/2001 -0500, Jason Stratos Papadopoulos wrote: > > The cause is the new 64-byte cache line. The L1 cache is write-back. > > Any changes to the L1 cache are written back to the L2 cache. On the P3 > >Two things here; first, doesn't the P4 have 128 byte cache lines? Also, >do you mean write *back* or write *through*? The L2 cache line is 128 bytes, but the L1 cache line is 64 bytes. I get back/through confused all the time. When you write to the L1 cache it is also written to the L2 cache, but not main memory (until the line is thrown out of the L2 cache). > > P4 has a 64 byte L1 cache. The update of the L2 cache is done with two > > writes of 32 bytes each taking 7 clocks. Thus, the P4 macro takes 8*14=112 > > clocks. > >Are you sure of this timing? Yes. > The memory bus to L2 is supposed to be 256 >bits wide, 256 bits is 32 bytes which is why it takes 2 writes to copy an L1 cache line to the L2 cache. > and is supposed to deliver data once per clock (or maybe every >other clock, the Coppermine P3 does that). This would mean a cache line >write back would take at least four processor clocks. The 7 clock figure came from Tom's Hardware guide which I'm sure is in the Intel manual somewhere. It so beautifully described the 112 clock result I was seeing that I've not looked for the Intel reference. > Or is that >super-speed delivery only supposed to be for reads, with writes being >buffered? I'm not sure what the latency is for a read from the L2 cache. Your theory could be correct. I'm sure that Intel engineers want to optimize the read path more than the write path. >Since the new L1 cache is tiny and the L2 is very large and almost as >fast, would your life become any easier if you coded for L2 cache latency >instead of L1? Most of the work gets done in a single pass through main >memory, then the "rows" of the transform get processed in a second pass >that can barely tolerate L2 latency. My plan is to work through main memory in 128KB chunks. Within these 128KB chunks I'll make several passes working on 2KB sub-chunks to maximize L1 cache hits. > Or is the latency of SSE2 >instructions so high that you run out of registers and rely on renaming >already? The latency is high, but so far I've not had too much trouble with register pressure. The four_complex_unfft macro used 3 temporary memory locations due to register pressure. Fortunately, store-forwarding seems to be keeping the cost at a minimum. Regards, George _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Tue, 2 Jan 2001 10:44:20 -0000 From: "Brindle, Gordon" <[EMAIL PROTECTED]> Subject: Mersenne: Understanding the Propagate Carry Step in the LL Test Hi Everyone and a Happy New Year to you all, I wondered if anyone can help me understand (perhaps by way of an example) how the Propagate Carry step works in the Crandall/Fagin LL test. I have browsed the archives of this list but can't find anything that fully answers my question. I would like to implement a LL test in Mathematica along the lines of the partial example given by Crandall in Chapter 3 of "Topics in Advanced Scientific Computation" p94 but I am having difficulty understanding this step (I've read the Crandall/Fagin 1994 paper, too). I believe this step is what Crandall refers to as "add-with-carry, immediately after the weighted convolution" on p96 of his book. My interest is in trying to gain a mathematical understanding of all the steps for a practical implementation of the LL test using DWT (so that I can extend Crandall's example into a fully-fledged LL test, albeit to work with small exponents). If you can help me to understand this step I should be most grateful, Regards, Gordon. Gordon Brindle E [EMAIL PROTECTED] _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Wed, 03 Jan 2001 17:58:14 EST From: [EMAIL PROTECTED] Subject: Mersenne: Microsoft patents zeroes and ones Check out http://www.theonion.com/onion3311/microsoftpatents.html At $0.10 per binary digit, the royalty fee GIMPS would owe Microsoft for the last Mersenne prime is nearly $700000. Should we divide this equally amongst GIMPS participants, or bill people proportionally to their CPU-year contributions? I hope the fellow whose run actually discovered the prime hasn't spent all his EFF prize money yet... On the plus side, such royalty fees are legitimately tax deductible as business expenses. As usual, we can count on Bill Gates to look out for the little guy. Thanks, Bill! - -Ernst _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ End of Mersenne Digest V1 #806 ******************************
