On Sun, 22 Aug 1999 [EMAIL PROTECTED] wrote:
> For example, a typical complex radix-16 FFT pass in my Mlucas code
> takes 16 complex data (32 8-byte floats), and including multiplies by
> "twiddle" factors (FFT sincos data) does 168 FADDs and 88 FMULs on them-
> that's nearly twice as many adds as multiplies, i.e. the floating multiplier
> is idle nearly half the time. If one could do two adds per cycle, one could
> remedy that imbalance and (neglecting the fact that memory traffic is also
> responsible for many processor stalls) in theory nearly double the speed.
I've wondered for a while if the DFT can be re-factorized to have *more*
multiplies and fewer total ops. From the doodling I've done, though, it
seems that the ratio can never be better than 3:2 adds to muls.
There was a paper in (I believe) Appl. Math and Comp. from about two
years ago that reformulated radix-2,3,4 and 5 FFT butterfiles to use
multiply-adds wherever possible. The savings can be impressive: a radix-2
FFT butterfly has 4 multiplies and six adds, but this boils down to
six multiply-adds. Can't the RS6000 do two multiply-adds per clock?
Or, if you just want to cheapen a complex multiply, turn the standard
form
xr,xi * yr,ri -> xr*yr-xi*yi, xr*yi+xi*yr
into
xr,xi * yr,ri -> xr*(yr - xi/xr * yi), xr*(yi + xi/xr * yr)
and precompute xr and xi/xr. Presto! 2 multiplies and 2 multiply-adds.
> I've always thought the cheapest way to improve performance of a transform-
> based code (since floating hardware is very expensive) would be to design
> a floating adder that could take two data x and y and return both the sum
> and difference, x+y and x-y, in one clock. This would be a much cheaper way to
> speed an FFT than adding an entire third unit to the FPU, since much of this
> pair of operations is the same (for instance, the exponent compare and
> mantissa
> shift are the same for the sum and diference, i.e. need only be done once.)
Analog Devices' SHARC DSP chips have a fused add-subtract (and a fused
multiply add); a 50MHz SHARC can do an FFT in only twice as long as a
300MHz PPC 604e (sorry, single precision only).
jasonp
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers