[EMAIL PROTECTED] wrote:

> Dear All: I'm catching up on lots of postings, so forgive me if this
> is long-winded.

I'll give you a long-winded reply then :-)

> Simon Burge writes:
> 
> >Ernst - since nigel is no more, where can I get the latest f90
> >code? I've got 2.5b, and it's giving me some errors:
> 
> I'm not sure why it's error-exiting, but since you compiled locally
> I suspect overly aggressive compile options. I n particular if you
> use -fast, you must also use -assume accuracy_sensitive, to keep
> the compiler from eliminating the (x+rnd)-rnd operations used to
> effect a fast NINT in the carry phase. Also, you might try both
> -O4 and -O5: the latter sometimes gives slower executables, in a
> platform-dependent way, and should be used with caution.

It looks like the DEC C compiler has the same problem - adding "-assume
accuracy_sensitive" to the compiler command line fixed the problem where
MacLucasUnix thought it could do 33219281 with a 1M FFT - it does detect
an error and bumps up to a 2M FFT pretty quickly (within the first ten
iterations).

> In any event, you can get the new improved version 2.6b via
> 
> ftp://209.133.33.182/pub/mayer/Mlucas_2.6b.f90.gz
> 
> (David Willmore is the only person who got the short-lived v2.6a;
> David, 2.6b is about 10% faster, so you may want to grab it now.)

Have you got 2.6x and 2.6a mixed up?  From what I understand, David used
2.6x for the double check, and 2.6a is currently on your ftp site.

> Brian Beesley writes (about MacLucasUnix):
> 
> >I find, running MLU on a Alpha 21164-533, 128K FFT works up to about 
> >exponent 2.35 million, & pro rata. MLU on a Sparc seems to be able to 
> >run a bit higher, somewhere around 2.45 million seems to be OK for a 
> >128K FFT.
> 
> Hmm, those upper limits seem a bit low. Does MacLucasUnix tell you when
> the exponent is too large? Some related postings in the last digest
> seem to show people using exponents much too large for a given FFT size
> but getting no error messages - that would be bad.

It does, when (as I mentioned above) compiled with the right options...

> For comparison,
> Mlucas, on machines that support real*16 sincos inits (Alpha and SGI),
> can go up to the following p's (I omit 160, 192 and 224K for brevity):
> 
> size: 128K   256K   320K   384K   448K   512K   640K   768K   896K  1024K
> pmax: 2.62M  5.20M  6.46M  7.71M  8.96M  10.2M  12.6M  15.1M  17.5M  20M

Here's some _very_ rough figures for MacLucasUnix on the DS20:

size: 128K   256K   320K   384K   448K   512K   640K   768K   896K  1024K
pmax: 2.38M  4.98M                       9.3M                       18.8M 

> David Willmore's timings of the beta of Mlucas 2.6 indicate that the
> code runs about 3 times faster on a 500MHz 21264 than on a 400MHz 21164,
> so dividing the numbers in the 21164 column by 3 should yield a decent
> estimate of 21264 timings until I can get some actual timing data.

Slightly more that three times - here's your table with a column for the
21264 added:

                                Platform/per-iteration time (sec)
            200Mhz 21064   400MHz 21164   195MHz R10000   250MHz R10000   500MHz 21264
            cache sizes    8kB D-cache    32kB D-cache    32kB D-cache    64kB I-cache
            unknown        96kB mixed I/D                                 64kB D-cache
                           512kB L2       4MB L2          1MB L2          4MB L2
FFT length: ------------   ------------   -------------   -------------   -------------
 128K       0.32           0.12           0.096           0.095           0.043
 160K       0.37           0.17           0.14            0.14            0.051
 192K       0.48           0.22           0.17            0.17            0.062
 224K       0.58           0.26           0.21            0.20            0.081
 256K       0.63           0.29           0.25            0.23            0.10
 320K       0.87           0.39           0.33            0.29            0.12
 384K       1.06           0.49           0.40            0.35            0.16
 448K       1.29           0.58           0.49            0.42            0.19
 512K       1.39           0.65           0.56            0.47            0.21
 640K       1.88           0.84           0.70            0.60            0.28
 768K       2.35           1.15           0.96            0.80            0.38
 896K       2.73           1.22           1.04            0.86            0.40
1024K       2.96           1.36           1.17            0.96            0.46

These results are for a copy of Mlucas_2.6a.f90 I compiled with:

        f90 -o lm -tune ev6 -O5 lucas_mayer_V2.5b.f90

The Mlucas_2.6a.exe from your FTP site gives the same results but is
slightly slower (around 5%) on the DS20:

        % cat foo
        15000017,0
        y
        750

        % time ./Mlucas_2.6a.exe < foo
          no restart file found...looking for range file...
          no range file found...switching to interactive mode.
         Enter p,n (set n=0 for default FFT length) >
         Enter 'y' to run a self-test, <return> for a full LL test >
          Enter number of iterations for timing test>
          p is prime...proceeding with Lucas-Lehmer test...
         M( 15000017 ): using an FFT length of  786432
          this gives an average    19.0735079447428      bits per digit
             750 iterations of M15000017 with FFT length  786432
         Res64: 545ACAF7C5DB12F5. Program: E2.6a
         Clocks = 00:04:59.369
        298.72u 0.09s 4:59 99% 0+368k 0+7io 0pf+0w

        % time ./lm26a < foo
          no restart file found...looking for range file...
          no range file found...switching to interactive mode.
         Enter p,n (set n=0 for default FFT length) >
         Enter 'y' to run a self-test, <return> for a full LL test >
          Enter number of iterations for timing test>
          p is prime...proceeding with Lucas-Lehmer test...
         M( 15000017 ): using an FFT length of  786432
          this gives an average    19.0735079447428      bits per digit
             750 iterations of M15000017 with FFT length  786432
         Res64: 545ACAF7C5DB12F5. Program: E2.6a
         Clocks = 00:04:42.171
        281.59u 0.11s 4:42 99% 0+368k 0+6io 0pf+0w

I can give you the raw data for my additions to the table (including
exponents, iterations and residues) if you want.

Simon.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to