Guillermo Ballester Valor writes:

<< Do you Know the GNU-FFTW package ?(The Fastest Fourier Transform in the
West). Last week I thought it would be interesting to see if it is as
fast as they say. It is really a fast C-written FFT code. >>

Hi, Guillermo (please, let's use first names - I'm an informal guy) -

While not having used it myself, certainly I've heard of FFTW. Note that
a better person to comment on your questions might be Jason Papadopoulos
([EMAIL PROTECTED]) - he adapted FFTW for his very fast SPARC Fermat
number code.

My comment is this: If you want to create a fast FFT-based program in a
short time, FFTW is certainly a good way to do it. On the other hand, if
you're really striving for extreme performance (i.e. trying to write a
code that possibly many people will use intensively), it is best to have
a code whose details you understand well. In my case, the latter criterion
(and the fact that I wanted to learn as much as I could about FFTs) led me
to write my own code. I started with the Numerical Recipes FFT (slow and
not very accurate, but easy to play with) about 3 years ago and have been
working on it ever since - my current code looks nothing like the NR FFT,
but you have to start somewhere.

Looking at the FFTW timings page, for a length-262144 real-vector transform
they list (http://www.fftw.org/benchfft/results/alpha467.html)
a performance of around  105 "MFlops" on a 467 MHz Alpha 21164. Using
their definition of MFlops for real FFTs, this translates to a per-FFT time of

0.5*[5*262144*log2(262144) Flop]/[115 MFlop/sec] = 0.112 sec.

My LL code does 2 FFT's plus other operations per Mersenne-mod squaring,
so we estimate about 80% of the per-iteration time equals one FFT. At 256K
vector length it needs .177 sec per iteration on a 400 MHz 21164, which
leads to an estimate of .40*.177*400/467 = 0.061 sec on a 467MHz 21164
which is significantly faster than FFTW.

At real-vector length 362880 (the closest I could find to 384K), FFTW needs

0.5*[5*362880*log2(362880) Flop]/[125 MFlop/sec] = .134 sec per FFT,

whereas Mlucas 2.7 (multiplying the 384K = 393216 timing by 362880/393216
to get a comparison with the FFTW timing for the latter length) needs

.40*.316*(400/467)*(362880/393216)) = .100 sec per FFT on a 467 MHz 21164.


On a 195 MHz MIPS R10000 CPU (SGI Origin 2000 workstation) FFTW needs

0.5*[5*262144*log2(262144) Flop]/[62 MFlop/sec] = .190 sec at n=262144
0.5*[5*362880*log2(362880) Flop]/[67 MFlop/sec] = .250 sec at n=362880

whereas Mlucas 2.7 needs (again extrapolating the 384K timing backward to
362880) .088 and .127 seconds, respectively, per FFT on the same hardware.

The fact that it took me so much work to achieve these speeds is a
testament to the speed of FFTW - the sample timings I sent to Steven Johnson
(one of the authors of FFTW) last year were still generally slower than
FFTW. However, "rolling one's own" (as it were) allows one to do lots of
algorithm-specific optimizations not available to the FFTW folks. Since
FFT-based large-integer arithmetic can do a pointwise (dyadic) squaring
on the outputs of the forward FFT without them being ordered in any special
way, one can do a forward decimation-in-frequency FFT, followed by a
pointwise squaring of the (bit-reversal-reordered) data, followed by
decimation-in-time inverse FFT, thus avoiding the need for explicit
bit-reversal-reorderings of data (or matrix transposes, in the context
of the 4-step FFT used by FFTW), for example. One can also combine the
last pass of the forward FFT, the dyadic squaring and the first pass
of the inverse FFT into a single pass through the data. Similarly, one
can fuse the final pass of the inverse FFT, the rounding-and-carry-
propagation step, and the first pass of the forward FFT, which both
minimizes data movement (memory accesses) and allows one to propagate
carries for several sub-blocks of the full array in parallel fashion.

I'm sending a copy of this to both the Mersenne list and to Steven
Johnson, the latter so he can correct any gross errors I might have
made in my timings calculations for FFTW.) 

Cheers,
Ernst

_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to