> Aside from the fact the we need to do a single FFT instead
> of two, why isn't squaring inherently faster than multiplying?
>
a * b = .25 * [ (a + b)^2 - (a - b)^2 ]
So if you figured out a fundamentally faster way to square, a multiply
is automatically as fast (within a constant factor, i.e. about 2)
That being said, if you were using Schnhage-Strassen multiplies (where
a huge number is broken up into a few big numbers), squaring is 4x faster
since you only do one FFT and element-by element squaring is twice as fast
as element by element multiplying. For floating pt FFTs, multiplication
hardware takes equal time to square as to multiply.
jasonp