[EMAIL PROTECTED] wrote:
> Dear All: I'm catching up on lots of postings, so forgive me if this
> is long-winded.
I'll give you a long-winded reply then :-)
> Simon Burge writes:
>
> >Ernst - since nigel is no more, where can I get the latest f90
> >code? I've got 2.5b, and it's giving me some errors:
>
> I'm not sure why it's error-exiting, but since you compiled locally
> I suspect overly aggressive compile options. I n particular if you
> use -fast, you must also use -assume accuracy_sensitive, to keep
> the compiler from eliminating the (x+rnd)-rnd operations used to
> effect a fast NINT in the carry phase. Also, you might try both
> -O4 and -O5: the latter sometimes gives slower executables, in a
> platform-dependent way, and should be used with caution.
It looks like the DEC C compiler has the same problem - adding "-assume
accuracy_sensitive" to the compiler command line fixed the problem where
MacLucasUnix thought it could do 33219281 with a 1M FFT - it does detect
an error and bumps up to a 2M FFT pretty quickly (within the first ten
iterations).
> In any event, you can get the new improved version 2.6b via
>
> ftp://209.133.33.182/pub/mayer/Mlucas_2.6b.f90.gz
>
> (David Willmore is the only person who got the short-lived v2.6a;
> David, 2.6b is about 10% faster, so you may want to grab it now.)
Have you got 2.6x and 2.6a mixed up? From what I understand, David used
2.6x for the double check, and 2.6a is currently on your ftp site.
> Brian Beesley writes (about MacLucasUnix):
>
> >I find, running MLU on a Alpha 21164-533, 128K FFT works up to about
> >exponent 2.35 million, & pro rata. MLU on a Sparc seems to be able to
> >run a bit higher, somewhere around 2.45 million seems to be OK for a
> >128K FFT.
>
> Hmm, those upper limits seem a bit low. Does MacLucasUnix tell you when
> the exponent is too large? Some related postings in the last digest
> seem to show people using exponents much too large for a given FFT size
> but getting no error messages - that would be bad.
It does, when (as I mentioned above) compiled with the right options...
> For comparison,
> Mlucas, on machines that support real*16 sincos inits (Alpha and SGI),
> can go up to the following p's (I omit 160, 192 and 224K for brevity):
>
> size: 128K 256K 320K 384K 448K 512K 640K 768K 896K 1024K
> pmax: 2.62M 5.20M 6.46M 7.71M 8.96M 10.2M 12.6M 15.1M 17.5M 20M
Here's some _very_ rough figures for MacLucasUnix on the DS20:
size: 128K 256K 320K 384K 448K 512K 640K 768K 896K 1024K
pmax: 2.38M 4.98M 9.3M 18.8M
> David Willmore's timings of the beta of Mlucas 2.6 indicate that the
> code runs about 3 times faster on a 500MHz 21264 than on a 400MHz 21164,
> so dividing the numbers in the 21164 column by 3 should yield a decent
> estimate of 21264 timings until I can get some actual timing data.
Slightly more that three times - here's your table with a column for the
21264 added:
Platform/per-iteration time (sec)
200Mhz 21064 400MHz 21164 195MHz R10000 250MHz R10000 500MHz 21264
cache sizes 8kB D-cache 32kB D-cache 32kB D-cache 64kB I-cache
unknown 96kB mixed I/D 64kB D-cache
512kB L2 4MB L2 1MB L2 4MB L2
FFT length: ------------ ------------ ------------- ------------- -------------
128K 0.32 0.12 0.096 0.095 0.043
160K 0.37 0.17 0.14 0.14 0.051
192K 0.48 0.22 0.17 0.17 0.062
224K 0.58 0.26 0.21 0.20 0.081
256K 0.63 0.29 0.25 0.23 0.10
320K 0.87 0.39 0.33 0.29 0.12
384K 1.06 0.49 0.40 0.35 0.16
448K 1.29 0.58 0.49 0.42 0.19
512K 1.39 0.65 0.56 0.47 0.21
640K 1.88 0.84 0.70 0.60 0.28
768K 2.35 1.15 0.96 0.80 0.38
896K 2.73 1.22 1.04 0.86 0.40
1024K 2.96 1.36 1.17 0.96 0.46
These results are for a copy of Mlucas_2.6a.f90 I compiled with:
f90 -o lm -tune ev6 -O5 lucas_mayer_V2.5b.f90
The Mlucas_2.6a.exe from your FTP site gives the same results but is
slightly slower (around 5%) on the DS20:
% cat foo
15000017,0
y
750
% time ./Mlucas_2.6a.exe < foo
no restart file found...looking for range file...
no range file found...switching to interactive mode.
Enter p,n (set n=0 for default FFT length) >
Enter 'y' to run a self-test, <return> for a full LL test >
Enter number of iterations for timing test>
p is prime...proceeding with Lucas-Lehmer test...
M( 15000017 ): using an FFT length of 786432
this gives an average 19.0735079447428 bits per digit
750 iterations of M15000017 with FFT length 786432
Res64: 545ACAF7C5DB12F5. Program: E2.6a
Clocks = 00:04:59.369
298.72u 0.09s 4:59 99% 0+368k 0+7io 0pf+0w
% time ./lm26a < foo
no restart file found...looking for range file...
no range file found...switching to interactive mode.
Enter p,n (set n=0 for default FFT length) >
Enter 'y' to run a self-test, <return> for a full LL test >
Enter number of iterations for timing test>
p is prime...proceeding with Lucas-Lehmer test...
M( 15000017 ): using an FFT length of 786432
this gives an average 19.0735079447428 bits per digit
750 iterations of M15000017 with FFT length 786432
Res64: 545ACAF7C5DB12F5. Program: E2.6a
Clocks = 00:04:42.171
281.59u 0.11s 4:42 99% 0+368k 0+6io 0pf+0w
I can give you the raw data for my additions to the table (including
exponents, iterations and residues) if you want.
Simon.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers