Dear All: some corrections to my two postings of yesterday (West Coast U.S.
yesterday, at least):

The error summary for Mlucas 2.6c should read as follows:

1) Any first exponent at a particular FFT length should be fine;
2) Any subsequent exponent at the same length (whether there are exponents
   using a different runlength between them or not) will be bad.

Some of the Alpha 21064 timings in the Mlucas 2.7 timings table were wrong
(they were for 2.6, not 2.7). The corrected table follows - the corrected 
timings
are indicated with a +. I also replaced the tabs with spaces, so hopefully the
table will transmit better this time. (If it's misaligned on your end, try 
switching
your browser or edit window to a true type font):

                   Platform/per-iteration time (sec)
           200MHz 21064 400MHz 21164 195MHz R10K  250MHz R10K
           cache sizes  8KB L1       32kB L1      32kB L1
           unknown      96KB mixed
                        512KB L2     4MB L2       1MB L2       
FFT length ----------   ----------   ----------   -------------
  64K        .095        .035         .041         .035
  80K        .12         .045         .054         .047
  96K        .16         .057         .069         .062
 112K        .19         .069         .082         .074
 128K        .21         .078         .100         .090
 160K        .27         .098         .118         .115
 192K        .32         .115         .143         .144
 224K        .39         .140         .170         .170
 256K        .48         .177         .221         .210
 320K        .65         .241         .261         .248
 384K        .81+        .316         .345         .317
 448K        .98+        .399         .388         .354
 512K       1.17+        .545         .525         .451
 640K       1.50+        .620         .649         .543
 768K       1.82+        .756         .814         .659
 896K       2.16+        .890         .932         .771
1024K       2.42+       1.20*        1.16          .937
1280K       3.20        1.32         1.40         1.13
1536K       4.15        1.86         1.90*        1.54*
1792K       4.99        2.13         2.04         1.68
2048K       5.45        2.73         2.57         2.22
2560K       6.93        3.16         3.25         2.61
3072K       8.33        4.02         3.92         3.16
3584K       9.96        4.53         4.58         3.69
4096K      11.42        5.62         6.14         7.26*

Also, in my comments regarding the anomalous timings (*) in the table
yesterday, I had no explanation for the slowish 21164 time at 1024K. It
may in fact be that at 1024K FFT length, the small FFT sincos and DWT
weights tables (which contain sqrt(n) 64-bit floats each) are each 8KB and
thus can't reside completely in the 21164's 8KB L1 cache along with anything
else. The MIPS R10000 has a 32KB L1 cache, so doesn't suffer the same
problem. Thus, the only remaining unexplained anomaly is the truly bizarre
behavior on the 250MHz R10000 at 4096K.

-Ernst

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to