Dear All: some corrections to my two postings of yesterday (West Coast U.S.
yesterday, at least):
The error summary for Mlucas 2.6c should read as follows:
1) Any first exponent at a particular FFT length should be fine;
2) Any subsequent exponent at the same length (whether there are exponents
using a different runlength between them or not) will be bad.
Some of the Alpha 21064 timings in the Mlucas 2.7 timings table were wrong
(they were for 2.6, not 2.7). The corrected table follows - the corrected
timings
are indicated with a +. I also replaced the tabs with spaces, so hopefully the
table will transmit better this time. (If it's misaligned on your end, try
switching
your browser or edit window to a true type font):
Platform/per-iteration time (sec)
200MHz 21064 400MHz 21164 195MHz R10K 250MHz R10K
cache sizes 8KB L1 32kB L1 32kB L1
unknown 96KB mixed
512KB L2 4MB L2 1MB L2
FFT length ---------- ---------- ---------- -------------
64K .095 .035 .041 .035
80K .12 .045 .054 .047
96K .16 .057 .069 .062
112K .19 .069 .082 .074
128K .21 .078 .100 .090
160K .27 .098 .118 .115
192K .32 .115 .143 .144
224K .39 .140 .170 .170
256K .48 .177 .221 .210
320K .65 .241 .261 .248
384K .81+ .316 .345 .317
448K .98+ .399 .388 .354
512K 1.17+ .545 .525 .451
640K 1.50+ .620 .649 .543
768K 1.82+ .756 .814 .659
896K 2.16+ .890 .932 .771
1024K 2.42+ 1.20* 1.16 .937
1280K 3.20 1.32 1.40 1.13
1536K 4.15 1.86 1.90* 1.54*
1792K 4.99 2.13 2.04 1.68
2048K 5.45 2.73 2.57 2.22
2560K 6.93 3.16 3.25 2.61
3072K 8.33 4.02 3.92 3.16
3584K 9.96 4.53 4.58 3.69
4096K 11.42 5.62 6.14 7.26*
Also, in my comments regarding the anomalous timings (*) in the table
yesterday, I had no explanation for the slowish 21164 time at 1024K. It
may in fact be that at 1024K FFT length, the small FFT sincos and DWT
weights tables (which contain sqrt(n) 64-bit floats each) are each 8KB and
thus can't reside completely in the 21164's 8KB L1 cache along with anything
else. The MIPS R10000 has a 32KB L1 cache, so doesn't suffer the same
problem. Thus, the only remaining unexplained anomaly is the truly bizarre
behavior on the 250MHz R10000 at 4096K.
-Ernst
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers