[to-be-committed][RISC-V][PR rtl-optimization/123322] Split general conditional moves into simpler sequences

Jeffrey Law Mon, 02 Feb 2026 14:34:11 -0800

This has gone round and round a few times... But I think we're finallyat a resolution of the regression. There's more work to do for gcc-17though.

---

At its core this regression is caused by more accurately costingbranches during if-conversion. The more accurate cost causes themulti-set path during the first ifcvt pass to (profitably) convert thesequence. However, it would have been more profitable to have waitedfor things to be cleaned up and by the 2nd ifcvt pass we'd be able toconvert using the condzero path which is *more* profitable than themulti-set code (which is what happens in gcc-15).

One possible solution would be to disable the multi-set ifcvt codeduring its first pass. That was evaluated and ultimately rejectedbecause of clear undesirable impacts. It may still be a good thing todo, but certainly not at this stage of gcc-16.

Another path that was considered was recognizing the sign extension inthe if-convertable block and avoiding the multi-set path in that case onthe assumption it'd be cleaned up later and the condzero path would beused. That felt way too target specific and optimistic about removal ofthe sign extension.

Daniel looked at if-converting these cases during phiopt; that generallylooks like a good thing to do, but had notable undesirable fallout onRISC-V and I was significantly worried it would re-introduce sequenceson x86 and aarch64 that we just fixed. ie, solving one regression andcreating another. So that feels like a gcc-17 evaluation.

After much head-banging it occurred to me that we could tackle this inthe target with a tiny bit of help from combine. These conditional opsare 4 instruction sequences. The operation, a pair of czeros and anadd/ior to select between the output of the two czeros. Combine ispretty conservative in when it chooses to try 4 insn sequences.

So step #1 was to loosen the restrictions on combine a little bit. Ifit sees an IF_THEN_ELSE, then it considers that "good" which in turnallows us to try 4 insn combinations a bit more often.

That exposes the potential for 4 insn combinations and since the IOR/XORcase is a 4->2 sequence fixing those regressions is just a gooddefine_split. In fact, we can use that splitter for any case where theneutral operand is zero (IOR, XOR, PLUS, MINUS, etc). We need a secondpattern to handle reversed operands since we're using code iterators andmatch_dup for the matching operand rather than something like a matchingconstraint.

THat still leaves conditional shifts regressing due to the sameproblem. For shifts/rotates, the count is in QImode, so we need adifferent pattern to handle those cases, but it has the same basicstructure and behavior.

AND was *still* regressing though. That would require a 4->3 splitwhich isn't supported by combine. Given the issues with the other pathsattempted, I ultimately decided on a define_insn_and_split (boo!). Thisshouldn't be happening a ton like mvconst_internal, so not great, butworkable. This also required recognizing the pattern and giving it asuitable cost (COSTS_N_INSNS (3)).

That fixes the regressions. We're still not generating ideal code forrv64 when the operands are 32bit quantities and the architectureprovides an instruction variant that sign extends from 32 to 64 bits(add, subtract, shifts and rotates). But the code we generate forgcc-16 is better than we were generating for gcc-15, it's just notoptimal. So there's a TODO for gcc-17 as well.

This was bootstrapped and regression tested on x86, alpha, riscv, armv7,hppa, maybe others, but those definitely for sure. It was also testedon the various crosses without regressions. Waiting on pre-committesting to render a verdict before going forward.


jeff

        PR rtl-optimization/123322
gcc/
        * combine.cc (try_combine): Consider an IF_THEN_ELSE "good" when
        evaluating if 4 insn combinations should be tried.
        * config/riscv/iterators.md (zero_is_neutral_op): New iterator.
        (zero_is_neutral_op_c): Likewise.
        (any_shift_rotate): Likewise.
        * config/riscv/riscv.cc (riscv_rtx_costs): Recognize the conditional
        AND RTL and cost is appropriately.
        * config/riscv/zicond.md: Add patterns to rewrite general conditional
        move sequences into simpler forms.
        
gcc/testsuite
        * target/riscv/pr123322.c: New test.

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 573ed716ad63..e0dab3d88280 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -2615,6 +2615,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
            }
          else if (BINARY_P (src) && CONSTANT_P (XEXP (src, 1)))
            ngood++;
+         else if (GET_CODE (src) == IF_THEN_ELSE)
+           ngood++;
          else if (GET_CODE (src) == ASHIFT || GET_CODE (src) == ASHIFTRT
                   || GET_CODE (src) == LSHIFTRT)
            nshift++;
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 4cda08848f69..192556e92e27 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -211,6 +211,13 @@ (define_code_attr extend_name [
   (sign_extend "extend") (zero_extend "zero_extend")
 ])
 
+;; This code iterator captures cases where a zero value for an operand
+;; neutralizes the operation.  ie, a + 0 -> a.  That basic idea forms
+;; conditional operations.
+(define_code_iterator zero_is_neutral_op [plus minus ior xor ashift lshiftrt
+                                         ashiftrt rotatert rotate])
+(define_code_iterator zero_is_neutral_op_c [plus ior xor])
+
 ;; These code iterators allow unsigned and signed extraction to be generated
 ;; from the same template.
 (define_code_iterator any_extract [sign_extract zero_extract])
@@ -227,6 +234,8 @@ (define_code_iterator any_shiftrt [ashiftrt lshiftrt])
 ;; from the same template.
 (define_code_iterator any_shift [ashift ashiftrt lshiftrt])
 
+(define_code_iterator any_shift_rotate [ashift ashiftrt lshiftrt rotate 
rotatert])
+
 ;; This code iterator allows the three bitwise instructions to be generated
 ;; from the same template.
 (define_code_iterator any_bitwise [and ior xor])
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 32e72128f049..472c7e5f4920 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4630,6 +4630,16 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
          return true;
        }
 
+      /* Conditional AND */
+      if (TARGET_ZICOND
+         && GET_CODE (XEXP (x, 0)) == IF_THEN_ELSE
+         && GET_CODE (XEXP (x, 1)) == IF_THEN_ELSE
+         && GET_CODE (XEXP (XEXP (x, 1), 1)) == AND)
+       {
+         *total = COSTS_N_INSNS (3);
+         return true;
+       }
+
       if (float_mode_p)
        *total = tune_param->fp_add[mode == DFmode];
       else
diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
index 383859b30ced..663d70235a30 100644
--- a/gcc/config/riscv/zicond.md
+++ b/gcc/config/riscv/zicond.md
@@ -270,3 +270,146 @@ (define_split
   [(set (match_dup 0) (ashiftrt:X (match_dup 1) (match_dup 2)))
    (set (match_dup 0) (ior:X (match_dup 0) (const_int 1)))]
   { operands[2] = GEN_INT (GET_MODE_BITSIZE (word_mode) - 1); })
+
+;; The next several splitters are mean to capture cases where if
+;; conversion was successful, but used a generalized conditional
+;; move during the first pass of if-conversion (typically because
+;; the conditional branch jumps over multiple sets).
+;;
+;; If the multiple sets would simplify into a single set, then we
+;; would often have been better off not converting during the first
+;; pass because we could use a more efficient sequence with a single
+;; czero.
+;;
+;; The basic idea here is to recognize when the two arms of the
+;; generalized conditional move have related values and where
+;; zero is a neutral operand.
+;;
+;; We conditionally zero the relevant operand, then emit the
+;; operation unconditionally.
+;;
+;; This is made more complex by the fact that 32-bit ops on rv64
+;; have an embedded sign extend.  Even worse, the mode of a shift
+;; count is QImode.  These quirks mean we have many more patterns
+;; than one might otherwise think we should.
+;;
+;; This does not handle the 32-bit ops on rv64 like addw right now.
+;; It's unclear how to do that safely since we have different modes
+;; in the two arms.
+
+;; Simple rv32 or rv64 ops where we can zero either operand to make
+;; it neutral.  Two as the the common and potentially neutral op in
+;; the 2nd if-then-else can be swapped.
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+       (plus:X (if_then_else:X
+                 (eq:X (match_operand:X 1 "register_operand") (const_int 0))
+                 (match_operand:X 2 "register_operand")
+                 (const_int 0))
+               (if_then_else:X
+                 (ne:X (match_dup 1) (const_int 0))
+                 (zero_is_neutral_op:X
+                   (match_dup 2)
+                   (match_operand:X 3 "register_operand"))
+                 (const_int 0))))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_ZICOND_LIKE || TARGET_XTHEADCONDMOV"
+  [(set (match_dup 4) (if_then_else:X (ne:X (match_dup 1) (const_int 0))
+                                     (match_dup 3)
+                                     (const_int 0)))
+   (set (match_dup 0) (zero_is_neutral_op:X (match_dup 2) (match_dup 4)))])
+
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+       (plus:X (if_then_else:X
+                 (eq:X (match_operand:X 1 "register_operand") (const_int 0))
+                 (match_operand:X 2 "register_operand")
+                 (const_int 0))
+               (if_then_else:X
+                 (ne:X (match_dup 1) (const_int 0))
+                 (zero_is_neutral_op_c:X
+                   (match_operand:X 3 "register_operand")
+                   (match_dup 2))
+                 (const_int 0))))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_ZICOND_LIKE || TARGET_XTHEADCONDMOV"
+  [(set (match_dup 4) (if_then_else:X (ne:X (match_dup 1) (const_int 0))
+                                     (match_dup 3)
+                                     (const_int 0)))
+   (set (match_dup 0) (zero_is_neutral_op_c:X (match_dup 2) (match_dup 4)))])
+
+;; This is a separate pattern because the mode on the shift count
+;; varies.  I guess we could make it modeless here and check it in
+;; the condition.  But that tends to trigger warnings from gen*.
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+       (plus:X (if_then_else:X
+                 (eq:X (match_operand:X 1 "register_operand") (const_int 0))
+                 (match_operand:X 2 "register_operand")
+                 (const_int 0))
+               (if_then_else:X
+                 (ne:X (match_dup 1) (const_int 0))
+                 (any_shift_rotate:X
+                   (match_dup 2)
+                   (match_operand:QI 3 "register_operand"))
+                 (const_int 0))))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_ZICOND_LIKE || TARGET_XTHEADCONDMOV"
+  [(set (match_dup 4) (if_then_else:X (ne:X (match_dup 1) (const_int 0))
+                                     (match_dup 3)
+                                     (const_int 0)))
+   (set (match_dup 0) (any_shift_rotate:X (match_dup 2) (subreg:QI (match_dup 
4) 0)))]
+  "operands[3] = gen_lowpart (word_mode, operands[3]);")
+
+;; AND is special as the czero doesn't produce the neutral operand.  It
+;; would be a natural 4->3 split, but combine doesn't support that, so
+;; a define_insn_and_split seems to be the safest path forward to address
+;; the remaining code quality regression (match.pd approaches will likely
+;; have undesirable fallout based on what's been seen in the gcc-16 cycle).
+(define_insn_and_split "conditional_and<mode>"
+  [(set (match_operand:X 0 "register_operand" "=&r")
+       (plus:X (if_then_else:X
+                 (eq:X (match_operand:X 1 "register_operand" "r") (const_int 
0))
+                 (match_operand:X 2 "register_operand" "r")
+                 (const_int 0))
+               (if_then_else:X
+                 (ne:X (match_dup 1) (const_int 0))
+                 (and:X
+                   (match_dup 2)
+                   (match_operand:X 3 "register_operand" "r"))
+                 (const_int 0))))]
+  "(TARGET_ZICOND_LIKE || TARGET_XTHEADCONDMOV) && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 4) (and:X (match_dup 2) (match_dup 3)))
+   (set (match_dup 5) (if_then_else:X (ne:X (match_dup 1) (const_int 0))
+                                     (match_dup 2)
+                                     (const_int 0)))
+   (set (match_dup 0) (ior:X (match_dup 5) (match_dup 4)))]
+  "operands[4] = gen_reg_rtx (word_mode);
+   operands[5] = gen_reg_rtx (word_mode);")
+
+;; Same thing, but with operand order reversed
+(define_insn_and_split "conditional_rand<mode>"
+  [(set (match_operand:X 0 "register_operand" "=&r")
+       (plus:X (if_then_else:X
+                 (eq:X (match_operand:X 1 "register_operand" "r") (const_int 
0))
+                 (match_operand:X 2 "register_operand" "r")
+                 (const_int 0))
+               (if_then_else:X
+                 (ne:X (match_dup 1) (const_int 0))
+                 (and:X
+                   (match_operand:X 3 "register_operand" "r")
+                   (match_dup 2))
+                 (const_int 0))))]
+  "(TARGET_ZICOND_LIKE || TARGET_XTHEADCONDMOV) && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 4) (and:X (match_dup 2) (match_dup 3)))
+   (set (match_dup 5) (if_then_else:X (ne:X (match_dup 1) (const_int 0))
+                                     (match_dup 2)
+                                     (const_int 0)))
+   (set (match_dup 0) (ior:X (match_dup 5) (match_dup 4)))]
+  "operands[4] = gen_reg_rtx (word_mode);
+   operands[5] = gen_reg_rtx (word_mode);")
+
diff --git a/gcc/testsuite/gcc.target/riscv/pr123322.c 
b/gcc/testsuite/gcc.target/riscv/pr123322.c
new file mode 100644
index 000000000000..da6759602dbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr123322.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcb_zicond -mabi=ilp32 -mbranch-cost=4" { target { 
rv32 } } } */
+/* { dg-options "-march=rv64gcb_zicond -mabi=lp64d -mbranch-cost=4" { target { 
rv64 } } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" "-Os" "-Oz" } } */
+
+#define TEST(NAME,OP) int f_int_##NAME(int a, int b, int c) { if (c) a OP b; 
return a; } \
+                      long f_long_##NAME(long a, long b, long c) { if (c) a OP 
b; return a; }
+
+TEST(xor, ^=)
+TEST(ior, |=)
+TEST(and, &=)
+TEST(plus, +=)
+TEST(minus, -=)
+TEST(rshift, >>=)
+TEST(lshift, <<=)
+
+/* AND and MULT can be handled too, but need a different neutral element that
+   we aren't handling yet.  */
+
+/* Each test should have precisely one czero.  But for rv64, the int tests
+   generate 2 for int cases where the op is widened.  */
+/* { dg-final { scan-assembler-times "czero" 14 { target rv32 } } } */
+/* { dg-final { scan-assembler-times "czero" 18 { target rv64 } } } */

[to-be-committed][RISC-V][PR rtl-optimization/123322] Split general conditional moves into simpler sequences

Reply via email to