Mersenne Digest Monday, December 13 1999 Volume 01 : Number 670 ---------------------------------------------------------------------- Date: Thu, 9 Dec 1999 22:40:32 -0000 From: "McMac" <[EMAIL PROTECTED]> Subject: Mersenne: Re: Mersenne Digest V1 #669 > How quickly can M(110503) be tested now-a-days on the fastest > machines? > > - --Luke Erm, not sure about fastest but my PII-450 took under four minutes to do it (wasn't watching it, so I missed it ending). McMac "There must have been a door there in the wall, when I came in" - -The Wall, Pink Floyd. _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Thu, 9 Dec 1999 17:40:11 EST From: [EMAIL PROTECTED] Subject: Mersenne: Re: Testing M(110503) Luke Welsh wrote: >How quickly can M(110503) be tested now-a-days on the fastest machines? Paul Novarese replied: >All times below are with Mlucas2.7z > >Processor Speed OS Time(sec) > >EV6 500 DU4.0f 128 >EV6 466 RH6.0 130 >EV5 300 DU4.0e 247 >EV4 166 DU5.0 1078 > >How long did it take you back in 1988? Luke replied: >Ten, (or was it eleven?), minutes. It's silly, since the architectures and algorithms are quite different, but I remember how pleased I felt when I started timing out the pre- release v2.7 Mlucas code and finally broke 15 minutes (which is what I thought I recalled Luke saying he needed) to test M110503 on my now-rather-humble 200 MHz ev4. Luke (now feeling the powahh of the dahhk side :) again: >I don't recall counting instructions and computing the MFLOP >rate. Instead, I replied upon the compiler and profiler >which said I was running very near the theoretical peak >of 2.4 GFLOP. I just told people 2.0 GFLOP. > >So now it takes only 2 minutes on a 500 MHz EV6? Is that a >1 GFLOP machine? In theory, the ev6 can complete one floating add (FADD) and one FMUL per cycle (as can all generations of the Alpha), so yes, a 500MHz ev6 is potentially a 1 GFLOP machine. However, at long FFT lengths memory traffic and cache misses impose a penalty even on a beast like the ev6, and since (at least the Mlucas) FFT is dominated by floating add/subtracts (on average two of these for every FMUL) the best one sees in practice at large lengths is around 1 FLOP per cycle on average. For M110503, on the other hand, one needs only a 6K FFT, which needs less than 64KB for double-precisdion operands using an in-place FFT, i.e. the entire dataset can fit into the ev6's 64KB L1 cache. Thus one is probably getting something approaching the theoretical maximum (based on the clock rate and the code's mix of instructions) of 750 MFLOP. >The DWT is twice as fast as my poor old >FFT, and mixed-radix yields more improvements, but I don't >see how the EV6 (1 GFLOP?) Lucas Lehmer is 5 times faster >than the NEC SX/2 (2 GFLOP) was. Actually, a factor-of-five speedup at similar FLOP rates is not as unreasonable as it may appear at first sight. Indeed, the DWT cuts the FFT length by a factor of two, but this generally speeds things up by more than a factor of 2, especially when the dataset sizes are close to the cache size(s). As you say, being able to FFT length 6K rather than the 8K a power-of-2 FFT would need gives a further speedup, so now you're probably 3-4x faster than a 16K non-DWT-based FFT code would need. Throw in the additional speedups in a code like Mlucas (e.g. the combined FFT-pass/pointwise-square/ IFFT-pass and IFFT-pass/carry-propagation/FFT-pass routines) and you can expect an additional 20-40% speedup. There's your factor of 5. >Maybe the NEC profiler lied to me? Specs-manship? I always >trusted it because we were much faster than Slowinski and >a lot of prospective clients were testing the machine (oil >companies). Indeed, 2-2.4 GFLOP does sound unreasonable, since that would imply Mlucas is 10-12x faster than your code was. One other possibility (besides the NEC profiler being wrong) comes to mind - your code (like my pre-v2.7 ones) used an out-of-place FFT strategy, in which the data are bounced between two arrays on each pass. In my old code I combined that with a higher-radix FFT, i.e. one would read a block of data from one array, do a bunch of operations (perhaps as large as a radix-16 DFT) on them, and then write the results to a second array, which makes for very simple data access patterns which lend themselves to vector architectures like the NEC. Doing lots of stuff between the array loads and stores limits the penalty imposed by the extra memory traffic of the two-array sceme to perhaps 20-30%. On the other hand, if one uses simple radix-2 FFT passes (as I believe your code did), the loads and stores will dominate the opcount, and the out-place scheme will have roughly twice as many of these as an in-place FFT. So perhaps that accounts for the additional factor of two that is apparently needed to reconcile the timings of the two codes. So I guess the oil companies were probably looking for pipelined architectures? (Sorry Luke, I couldn't resist.) - -Ernst _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Sat, 11 Dec 1999 16:15:27 -0500 From: "Frank_A_L_I_N_Y" <[EMAIL PROTECTED]> Subject: Mersenne: P III took a 10 count I Just fried my 550 p III. My exponent was due dec 26. Sending this e-mail via my 166mhz P I. P III will be fixed by 12/15/1999. Will this change my exponent run in any way. My news reader does not have a spell checker. I will have to use the spell checker that doubles as a fly swatter Why is the word dictionary in the dictionary? If you need the spelling, it's on the cover. If you need the definition, you wouldn't know were to go to get the word defined. eye epologize N ad Vance fore N E spelin Miss steaks. <grin> (8>)> _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Mon, 13 Dec 1999 21:19:40 -0500 From: Pierre Abbat <[EMAIL PROTECTED]> Subject: Mersenne: mprime 19.1 glibc I'd like to upgrade mprime, and I see it's linked with glibc 2.1. I have glibc 2.0.7. Will it work? I'm currently running 18.1. phma _________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ End of Mersenne Digest V1 #670 ******************************
