https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116979
--- Comment #33 from Paul Caprioli <paul at hpkfft dot com> --- > Despite the issue described above? Yes, despite the issue you raised, I would still advise GCC to use the mul_fma algorithm instead of the mul_mul_add algorithm. I agree, of course, that there are input values for which rounding errors cancel in the latter but not in the former. But the reverse is true as well. There are input values for which the mul_fma algorithm is the more accurate one. So, for some inputs, one algorithm is better, and for other inputs, the other algorithm is better. If error is measured norm-wise, then the mul_fma algorithm has lower maximum error. I think norm-wise error is a reasonable measure of goodness. Without application-specific knowledge of why complex numbers are being used, it's reasonable to consider a complex number as a single mathematical entity. Cartesian coordinates are a useful representation, but not the only one. Polar coordinates are a valid representation as well. For (a + bi) where a >> b, the complex number is close to the real axis. Its magnitude is close to |a| and its angle is close to zero (if a is positive). Is the relative error of the computed b important if b itself is very tiny? An error in b may be large relative to b, but not significantly change the magnitude or angle of the complex number. For component-wise relative accuracy, both mul_mul_add and mul_fma can be highly inaccurate due to possible catastrophic cancellations in the real or imaginary part of the computed product. The mul_fma has better throughput (performance) than mul_mul_add. For _Float16, x86 has standardized the complex multiply instruction as: tmp.fp16[0] := src1.fp16[0] * src2.fp16[0] tmp.fp16[1] := src1.fp16[1] * src2.fp16[0] dest.fp16[0] := tmp.fp16[0] - src1.fp16[1] * src2.fp16[1] dest.fp16[1] := tmp.fp16[1] + src1.fp16[0] * src2.fp16[1] so it's reasonable to use mul_fma also for single and double. > And note that since use of FMA is > asymmetric, complex multiplication becomes non-commutative (imag(c0*c1) != > imag(c1*c0) if you expand both sides with the same use of FMA), which can > make the result unpredictable as long as GCC doesn't promise to always keep > operands of complex multiplication in the source order. Yes. It lacks beauty. I mean that sincerely. Maybe compilers (or even the C or C++ standards) should promise something (similar to how x86 documents VFMULCSH). But, even if they do, it's not commutative. I do like the idea of implementing a more accurate complex multiply. Probably, for performance reasons, many users would not want that to be the default. A high accurate cmulha() could be provided. Then users could mix that with operator*(). Also, a compiler flag could cause cmulha() to be used anytime a developer writes *. I have seen applications that care about component-wise error of complex numbers. In some sense, they have (or want) a tuple of two real numbers, and a complex number is a convenient way to represent their data in a computer program. In some sense, it's not really a mathematical complex number.
