Fix PR 118541 (V3), do not generate unordered fp cmoves for IEEE compares

Michael Meissner Wed, 21 May 2025 22:16:48 -0700

Fix PR 118541, do not generate unordered fp cmoves for IEEE compares.

This is version 3 of patch.  I re-implemented the patch to just focus on the
generation of the XSCMP{EQ,GT,GE}{DP,QP} instructions.


In bug PR target/118541 on power9, power10, and power11 systems, for the
function:

        extern double __ieee754_acos (double);

        double
        __acospi (double x)
        {
          double ret = __ieee754_acos (x) / 3.14;
          return __builtin_isgreater (ret, 1.0) ? 1.0 : ret;
        }

GCC currently generates the following code:

        Power9                          Power10 and Power11
        ======                          ===================
        bl __ieee754_acos               bl __ieee754_acos@notoc
        nop                             plfd 0,.LC0@pcrel
        addis 9,2,.LC2@toc@ha           xxspltidp 12,1065353216
        addi 1,1,32                     addi 1,1,32
        lfd 0,.LC2@toc@l(9)             ld 0,16(1)
        addis 9,2,.LC0@toc@ha           fdiv 0,1,0
        ld 0,16(1)                      mtlr 0
        lfd 12,.LC0@toc@l(9)            xscmpgtdp 1,0,12
        fdiv 0,1,0                      xxsel 1,0,12,1
        mtlr 0                          blr
        xscmpgtdp 1,0,12
        xxsel 1,0,12,1
        blr

This is because ifcvt.c optimizes the conditional floating point move to use the
XSCMPGTDP instruction.

However, the XSCMPGTDP instruction will generate an interrupt if one of the
arguments is a signalling NaN and signalling NaNs can generate an interrupt.
The IEEE comparison functions (isgreater, etc.) require that the comparison not
raise an interrupt.

The root cause of this is we allow floating point comparisons to be reversed
(i.e. LT will be reversed to UNGE).  Before power9, this was ok because we only
generated the FCMPU or XSCMPUDP instructions.

But with power9, we can generate the XSCMPEQDP, XSCMPGTDP, or XSCMPGEDP
instructions.  This code now does not convert an unordered compare into an
ordered compare.  Instead, it does the opposite comparison and swaps the
arguments.  I.e. it converts:

        r = (a < b) ? c : d;

into:

        r = (b >= a) ? c : d;

For the following code:

        double
        ordered_compare (double a, double b, double c, double d)
        {
          return __builtin_isgreater (a, b) ? c : d;
        }

        /* Verify normal > does generate xscmpgtdp.  */

        double
        normal_compare (double a, double b, double c, double d)
        {
          return a > b ? c : d;
        }

with the following patch, GCC generates the following for power9, power10, and
power11:

        ordered_compare:
                fcmpu 0,1,2
                fmr 1,4
                bnglr 0
                fmr 1,3
                blr

        normal_compare:
                xscmpgtdp 1,1,2
                xxsel 1,4,3,1
                blr

I have built bootstrap compilers on big endian power9 systems and little endian
power9/power10 systems and there were no regressions.  Can I check this patch
into the GCC trunk, and after a waiting period, can I check this into the active
older branches?

2025-05-21  Michael Meissner  <meiss...@linux.ibm.com>

gcc/

        PR target/118541
        * config/rs6000/predicates.md (invert_fpmask_comparison_operator):
        Delete.
        (fpmask_reverse_args_comparison_operator): New predicate.
        * config/rs6000/rs6000-proto.h (rs6000_fpmask_reverse_args): New
        declaration.
        * config/rs6000/rs6000.cc (rs6000_fpmask_reverse_args): New function.
        * config/rs6000/rs6000.h (REVERSIBLE_CC_MODE): Do not allow floating
        point comparisons to be reversed unless -ffinite-math-only is used.
        * config/rs6000/rs6000.md (mov<SFDF:mode><SFDF2:mode>cc_p9): Add
        comment.
        (mov<SFDF:mode><SFDF2:mode>cc_invert_p9): Reverse the argument order for
        the comparison, and use an unordered comparison, instead of ordered
        comparison.
        (mov<mode>cc_invert_p10): Likewise.

gcc/testsuite/

        PR target/118541
        * gcc.target/powerpc/pr118541.c: New test.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Fix PR 118541 (V3), do not generate unordered fp cmoves for IEEE compares

Reply via email to