On 18 Jun 00, at 1:27, Jason Stratos Papadopoulos wrote:
> For example, integer multiply-adds take only a little longer
> than floating point multiply-adds; should IA64Prime use an integer or
> floating point FFT? If integer, there are big delays in shuffling between
> integer and FPU registers (only the FPU can multiply). If floating point,
> loads and stores will all take longer, the cache behavior is totally
> different, and the arrays involved get longer because you can't pack bits
> as densely as an integer solution.
This obviously needs to be looked at & evaluated on real hardware.
>
> Itanium can do two FPU operations per clock, but both can be multiply-adds
> instead of just multiplies or adds. Can you rearrange a real-valued FFT to
> use multiply-adds as much as possible? It could cut the operation count in
> half if you do, but to my knowledge no one has yet done so.
Yes, but you get only about 40% saving - you need about 10 additions
for every 6 multiplications. "No-one has bothered" because the effort
is pointless on IA32 architecture processors, and there simply aren't
enough processors which have an efficient "a*b+c" instruction around
to make it worthwhile to put in the required programming effort.
(What goes around, comes around. The VAX architecture - very, very
CISC - had three-operand instructions, but the trend to RISC systems
made such things unfashionable. Now hardware designers are realizing
that we need such things for performance reasons!)
> Moving to IA64 will be a much bigger challenge than simply rewriting
> half a meg of assembly language.
I wonder whether we will get better performance running IA32 software
or using a HLL program like Mlucas in native IA64 mode? Depends to a
large extent on the availability & performance of optimizing
compilers, I suppose.
Regards
Brian Beesley
_________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers