Hi, Jason:
>Or, if you just want to cheapen a complex multiply, turn the standard
>form
>
> xr,xi * yr,ri -> xr*yr-xi*yi, xr*yi+xi*yr
>
>into
>
> xr,xi * yr,ri -> xr*(yr - xi/xr * yi), xr*(yi + xi/xr * yr)
>
>and precompute xr and xi/xr. Presto! 2 multiplies and 2 multiply-adds.
An excellent suggestion - why rely on the compiler perhaps being smarter
than it really is, when you drop a broad hint with very little modification
of the normal code?
I tried the above in just the forward radix-16 FFT loop of my code, replacing
sines with tangents in the precomputation of the FFT sincos data. No timing
change on the MIPS R10K, indicating that the MIPSPro f90 compiler was likely
already doing such a replacement for me. But on the Alpha 21064 (which has
no MADD instruction) my times for large FFT lengths dropped about 5%!
(I expect another 5% when I modify the inverse FFT similarly).
Weird, but welcome. Any ideas how the above replacement might improve
pipelineability of a twiddle-multiply/add/subtract sequence, assuming just
FMUL and FADD are available?
Onward and Upward,
-Ernst
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers