Folks,

Compaq have a DS20 Alpha with 2 500MHz 21264 CPUs on the internet for
people to try out.  I've modified the mers package to that it can print
out iteration timing (patches coming soon Will!).  The iteration times
are fairly constant across 10 samples, so I've only listed one per
program/exponent.  The -C means don't dump a checkpoint at any time, and
-S N means print iteration times every N iterations.  The programs were
compiled with the DEC C compiler using "cc -fast -arch host -O4".  The
exponents were choosen just to demonstrate different FFT lengths.

Before anyone says anything, I know that the "iters/sec" should be
"secs/iter" :-)

% ./fftlucas -C -S 10 900001
speed: 10 iters in  1.362 seconds, 0.136 iters/sec (fft len   64k)
% ./fftlucas -C -S 10 1400001
speed: 10 iters in  3.168 seconds, 0.317 iters/sec (fft len  128k)  
% ./fftlucas -C -S 10 2900001
speed: 10 iters in  7.124 seconds, 0.712 iters/sec (fft len  256k) 
% ./fftlucas -C -S 10 5800001
speed: 10 iters in 16.410 seconds, 1.641 iters/sec (fft len  512k)

% ./mersenne1 -C -S 10 900001
speed: 10 iters in  0.832 seconds, 0.083 iters/sec (fft len   64k)
% ./mersenne1 -C -S 10 1400001
speed: 10 iters in  1.797 seconds, 0.180 iters/sec (fft len  128k)
% ./mersenne1 -C -S 10 2900001
speed: 10 iters in  4.257 seconds, 0.426 iters/sec (fft len  256k)
% ./mersenne1 -C -S 10 5800001
speed: 10 iters in  9.055 seconds, 0.905 iters/sec (fft len  512k)

% ./MacLucasUNIX -C -S 10 1400001
speed: 10 iters in  0.152 seconds, 0.015 iters/sec (fft len   64k)
% ./MacLucasUNIX -C -S 10 2900001
speed: 10 iters in  0.362 seconds, 0.036 iters/sec (fft len  128k)
% ./MacLucasUNIX -C -S 10 5800001
speed: 10 iters in  0.782 seconds, 0.078 iters/sec (fft len  256k)
% ./MacLucasUNIX -C -S 10 11600001
speed: 10 iters in  1.777 seconds, 0.178 iters/sec (fft len  512k)
% ./MacLucasUNIX -C -S 10 23200001
speed: 10 iters in  4.606 seconds, 0.461 iters/sec (fft len 1024k)
% ./MacLucasUNIX -C -S 10 46400001
speed: 10 iters in 13.601 seconds, 1.360 iters/sec (fft len 2048k)

and for the 10^n digit fans:

% ./MacLucasUNIX -C -S 10 33219281
speed: 10 iters in  4.634 seconds, 0.463 iters/sec (fft len  1024k)
% ./MacLucasUNIX -C -S 3 332192831
speed: 3 iters in 65.950 seconds, 21.983 iters/sec (fft len 16384k)

The machine "only" had 1GB of RAM, and the 16M FFT took up about 675MB
of RAM.  I couldn't test any larger numbers :-)


For comparison, this is MacLucasUNIX on a 500MHz AlphaPC164 (21164 CPU)
compiled with "gcc -mcpu=21164a -Wa,-m21164a -O6":

speed: 10 iters in  0.331 seconds, 0.033 iters/sec (fft len   64k)
speed: 10 iters in  0.842 seconds, 0.084 iters/sec (fft len  128k)
speed: 10 iters in  1.918 seconds, 0.192 iters/sec (fft len  256k)
speed: 10 iters in  4.219 seconds, 0.422 iters/sec (fft len  512k)
speed: 10 iters in 10.531 seconds, 1.053 iters/sec (fft len 1024k)

GCC isn't the best compiler around with floating point, so these figures
might not be the best comparison between the 21164 and 21264.

Also for comparison, here's some figures for MacLucasUNIX on a 200MHz
UltraSparc with different FFT lengths:

speed: 10 iters in  0.530 seconds, 0.053 iters/sec (fft len   64k)
speed: 10 iters in  1.739 seconds, 0.174 iters/sec (fft len  128k)
speed: 10 iters in  3.737 seconds, 0.374 iters/sec (fft len  256k)
speed: 10 iters in  7.459 seconds, 0.746 iters/sec (fft len  512k)
speed: 10 iters in 16.261 seconds, 1.626 iters/sec (fft len 1024k)

Even dividing the iteration times by 2.5 (which assumes that memory
bandwidth scales equally well with the UltraSparcs), the Alpha 21264
comes up favorably.


Ernst - since nigel is no more, where can I get the latest f90 code?
I've got 2.5b, and it's giving me some errors:

    no restart file found...looking for range file...
    no range file found...switching to interactive mode.
   Enter p,n (set n=0 for default FFT length) >5100071,262144
   Enter 'y' to run a self-test, <return> for a full LL test >y
    p is prime...proceeding with Lucas-Lehmer test...
    using an FFT length of      262144
    this gives an average    19.4552268981934      bits per digit
   M 5100071 Roundoff warning on iteration      10 maxerr =  0.499997074861
    FATAL ERROR...Halting execution.

Testing 3100079 (at about 11 bits per digit) gives the same errors, and
this happens with or without optimisation turned on.


Well, that's my benchmarking done for the day...

Simon.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to