[EMAIL PROTECTED] wrote:

> Simon Burge writes (Hi, Simon!):

Howdy!

>  The 2.6a source has been moved to the /pub/archived directory, for those
> of you who want to see what changes I made to the source to squeeze more
> speed out of the code.

I'm curious - there seems to be definite patterns in the sections of
code (especially when comparing the different length radix subroutines).
Is any of this code machine generated, or is it all hand generated?  In
the past I attempted to get a C version of your code working but didn't
quite get there, and then Gord Palameta did a version last year.  I was
thinking of taking a stab at trying 2.6b again...

> I look forward to the revised (and hopefully improved) ev6 timings!

Here's some more times with a few explainitory notes below:

                                Platform/per-iteration time (sec)
            500MHz 21264    500MHz 21264    500MHz 21264
            64kB I-cache    64kB I-cache    64kB I-cache
            64kB D-cache    64kB D-cache    64kB D-cache
            4MB L2          4MB L2          4MB L2
            Mlucas2.6a[1]   Mlucas2.6b[2]   Mlucas2.6b[3]  [4]
FFT length: -------------   -------------   -------------  -------
 128K       0.043           0.040           0.041          0.001
 160K       0.051           0.049           0.053          0.003
 192K       0.062           0.062           0.066          0.004
 224K       0.081           0.081           0.081          0.001
 256K       0.10            0.089           0.093          0.003
 320K       0.12            0.112           0.118          0.005
 384K       0.16            0.14            0.15           0.009
 448K       0.19            0.18            0.18           0.003
 512K       0.21            0.19            0.20           0.004
 640K       0.28            0.25            0.26           0.015
 768K       0.38            0.34            0.36           0.011
 896K       0.40            0.38            0.38           0.006
1024K       0.46            0.42

[1] - This is the average for the first N iters INCLUDING SETUP TIME.
      N ranges from 8000 for a 128K FFT to 500 for a 1024K FFT.
[2] - This is the average for the time difference between a 100 iter
      run and a 200 iter run as described in the README file and
      completely ignored by me the first time around :-)  This using
      my binary compiled with:
        f90 -arch ev6 -tune ev6 -fast -O5 -assume accuracy_sensitive
[3] - As for [2], but using Ernst's Mlucas_2.6b.exe.ev6 binary.  In
      both [2] and [3], the residues matched.
[4] - The speed increase in secs/iter between my binary and Ernst's
      binary.  Ernst - what compiler options did you use?
     
Simon.
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to