https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116979

--- Comment #34 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Paul Caprioli from comment #33)
> > Despite the issue described above?
> 
> Yes, despite the issue you raised, I would still advise GCC to use the
> mul_fma algorithm instead of the mul_mul_add algorithm.  I agree, of course,
> that there are input values for which rounding errors cancel in the latter
> but not in the former.  But the reverse is true as well.  There are input
> values for which the mul_fma algorithm is the more accurate one.
> So, for some inputs, one algorithm is better, and for other inputs, the
> other algorithm is better.
> 
> If error is measured norm-wise, then the mul_fma algorithm has lower maximum
> error.  I think norm-wise error is a reasonable measure of goodness. 
> Without application-specific knowledge of why complex numbers are being
> used, it's reasonable to consider a complex number as a single mathematical
> entity.
> 
> Cartesian coordinates are a useful representation, but not the only one. 
> Polar coordinates are a valid representation as well.  For (a + bi) where a
> >> b, the complex number is close to the real axis.  Its magnitude is close
> to |a| and its angle is close to zero (if a is positive).  Is the relative
> error of the computed b important if b itself is very tiny?  An error in b
> may be large relative to b, but not significantly change the magnitude or
> angle of the complex number.
> 
> For component-wise relative accuracy, both mul_mul_add and mul_fma can be
> highly inaccurate due to possible catastrophic cancellations in the real or
> imaginary part of
> the computed product.
> 
> The mul_fma has better throughput (performance) than mul_mul_add.
> 
> For _Float16, x86 has standardized the complex multiply instruction as:
> 
>   tmp.fp16[0] := src1.fp16[0] * src2.fp16[0]
>   tmp.fp16[1] := src1.fp16[1] * src2.fp16[0]
>   dest.fp16[0] := tmp.fp16[0] - src1.fp16[1] * src2.fp16[1]
>   dest.fp16[1] := tmp.fp16[1] + src1.fp16[0] * src2.fp16[1]
> 
> so it's reasonable to use mul_fma also for single and double.
> 
> > And note that since use of FMA is
> > asymmetric, complex multiplication becomes non-commutative (imag(c0*c1) !=
> > imag(c1*c0) if you expand both sides with the same use of FMA), which can
> > make the result unpredictable as long as GCC doesn't promise to always keep
> > operands of complex multiplication in the source order.
> 
> Yes.  It lacks beauty.  I mean that sincerely.  Maybe compilers (or even the
> C or C++ standards) should promise something (similar to how x86 documents
> VFMULCSH).  But, even if they do, it's not commutative.
> 
> I do like the idea of implementing a more accurate complex multiply. 
> Probably, for performance reasons, many users would not want that to be the
> default.  A high accurate cmulha() could be provided.  Then users could mix
> that with operator*().  Also, a compiler flag could cause cmulha() to be
> used anytime a developer writes *.
> 
> I have seen applications that care about component-wise error of complex
> numbers.  In some sense, they have (or want) a tuple of two real numbers,
> and a complex number is a convenient way to represent their data in a
> computer program.  In some sense, it's not really a mathematical complex
> number.

I'd mention that we have -fcx-method={limited-range,fortran,stdc} which
could be "easily" extended to implement alternate inline lowering for
complex multiplication.  Note the option also affects division, splitting
it or allow complex args like -fcx-method=stdc,mul-fma might be possible.
Note there's also libgccs multiplication/division implementation which to
day does not use IFUNCs to select advanced ISA variants but is a plain
C implementation.

Reply via email to