16 regression] fma not always used in complex product

paul at hpkfft dot com via Gcc-bugs Tue, 24 Feb 2026 21:01:59 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116979


--- Comment #33 from Paul Caprioli <paul at hpkfft dot com> ---
> Despite the issue described above?

Yes, despite the issue you raised, I would still advise GCC to use the mul_fma
algorithm instead of the mul_mul_add algorithm.  I agree, of course, that there
are input values for which rounding errors cancel in the latter but not in the
former.  But the reverse is true as well.  There are input values for which the
mul_fma algorithm is the more accurate one.
So, for some inputs, one algorithm is better, and for other inputs, the other
algorithm is better.

If error is measured norm-wise, then the mul_fma algorithm has lower maximum
error.  I think norm-wise error is a reasonable measure of goodness.  Without
application-specific knowledge of why complex numbers are being used, it's
reasonable to consider a complex number as a single mathematical entity.

Cartesian coordinates are a useful representation, but not the only one.  Polar
coordinates are a valid representation as well.  For (a + bi) where a >> b, the
complex number is close to the real axis.  Its magnitude is close to |a| and
its angle is close to zero (if a is positive).  Is the relative error of the
computed b important if b itself is very tiny?  An error in b may be large
relative to b, but not significantly change the magnitude or angle of the
complex number.

For component-wise relative accuracy, both mul_mul_add and mul_fma can be
highly inaccurate due to possible catastrophic cancellations in the real or
imaginary part of
the computed product.

The mul_fma has better throughput (performance) than mul_mul_add.

For _Float16, x86 has standardized the complex multiply instruction as:

  tmp.fp16[0] := src1.fp16[0] * src2.fp16[0]
  tmp.fp16[1] := src1.fp16[1] * src2.fp16[0]
  dest.fp16[0] := tmp.fp16[0] - src1.fp16[1] * src2.fp16[1]
  dest.fp16[1] := tmp.fp16[1] + src1.fp16[0] * src2.fp16[1]

so it's reasonable to use mul_fma also for single and double.

> And note that since use of FMA is
> asymmetric, complex multiplication becomes non-commutative (imag(c0*c1) !=
> imag(c1*c0) if you expand both sides with the same use of FMA), which can
> make the result unpredictable as long as GCC doesn't promise to always keep
> operands of complex multiplication in the source order.

Yes.  It lacks beauty.  I mean that sincerely.  Maybe compilers (or even the C
or C++ standards) should promise something (similar to how x86 documents
VFMULCSH).  But, even if they do, it's not commutative.

I do like the idea of implementing a more accurate complex multiply.  Probably,
for performance reasons, many users would not want that to be the default.  A
high accurate cmulha() could be provided.  Then users could mix that with
operator*().  Also, a compiler flag could cause cmulha() to be used anytime a
developer writes *.

I have seen applications that care about component-wise error of complex
numbers.  In some sense, they have (or want) a tuple of two real numbers, and a
complex number is a convenient way to represent their data in a computer
program.  In some sense, it's not really a mathematical complex number.

[Bug target/116979] [13/14/15/16 regression] fma not always used in complex product

Reply via email to