Whee. So what got me wandering down this path was looking for a good bug for Shreya or Austin and concluding this one would be dreadful for both :-)

We're basically looking at single bit extractions where there's a bit-not somewhere in the sequence.

A few examples for the motivating PR64345. They were for the SH, but aren't handled well for RISC-V either.

unsigned int test0 (unsigned int x)
{
  return ((x >> 4) ^ 1) & 1;
}

unsigned int test1 (unsigned int x)
{
  return ((x >> 4) & 1) ^ 1;
}

unsigned int test2 (unsigned int x)
{
  return ~(x >> 4) & 1;
}

Right now those generates sequences like this:

        li      a5,1
        srliw   a0,a0,4
        andn    a0,a5,a0

But we can do better. This is semantically equivalent, two bytes shorter and at least as fast.

        xori    a0,a0,16        # 8     [c=4 l=4]  *xordi3/1
        bexti   a0,a0,4 # 16    [c=4 l=4]  *bexti


The core problem is the little white lie we have for and-not:

(define_insn_and_split "*<optab>_not_const<mode>"
  [(set (match_operand:X 0 "register_operand" "=r")
       (bitmanip_bitwise:X (not:X (match_operand:X 1 "register_operand" "r"))
              (match_operand:X 2 "const_arith_operand" "I")))

There is no such insn. andn, orn, xorn do not accept constants. But pretending we do may help generate better performing code in some cases.

That pattern is a single insn from combine's standpoint. So when we see this:

Trying 6 -> 10:
    6: r140:DI=zero_extract(r145:DI,0x1c,0x4)
      REG_DEAD r145:DI
   10: {r143:DI=~r140:DI&0x1;clobber scratch;}
      REG_DEAD r140:DI
Failed to match this instruction:
(parallel [
        (set (reg:DI 143)
            (zero_extract:DI (xor:DI (reg:DI 145 [ x ])
                    (const_int 16 [0x10]))
                (const_int 1 [0x1])
                (const_int 4 [0x4])))
        (clobber (scratch:DI))
    ])

We can't split it because the result would be 2 insns and it was already 2 insns from combine's standpoint (the little while lie shows up in insn 10 which is really 2 instructions, but just one insn).


I looked at the wacky possibility of making the problem pattern only available after reload in the hopes that late-combine could generate it, but late-combine doesn't handle scratches/clobbers like that.

I consider the cases where the lie helps code generation very much on the margins and realized that we could turn it into a peephole2. That way we don't regress on those marginal cases, but the problem pattern doesn't get in the way of combine's work.

So that's a good first step. But not entirely sufficient to get the best possible code for those tests. In particular, given equal costs this patch also steers towards AND which has the advantage that on an OoO core the constant load in that case can sometimes issue for free or it might be encodable directly as well. On an in-order core it gives the scheduler more freedom.

There's a bit of tension on that topic and some issues I'm not trying to tackle at this time. Essentially in the sign-bit-splat path, depending on the constants different paths might be preferred (particularly when there's a NOT in the sequence). It's on the margins and touched on by a different BZ.

The net is we can fix the various extraction problems on RISC-V exposed by the testcases in PR64345 without regressing the minor and-not cases the define_insn_and_split was handling with one less define_insn_and_split in the port. It likely improves pr80770 as well, though I haven't checked that.

Bootstrapped and regression tested on both the Pioneer and BPI. Also regression tested on riscv64-elf and riscv32-elf. And since this touched ifcvt.cc, bootstrapped and regression tested on x86_64 as well :-) It's also regression tested across all the embedded targets in my tester.


Pushing to the trunk after pre-commit CI gives it the green light.

jeff

        PR target/64345
        PR tree-optimization/80770
gcc/

        * config/riscv/bitmanip.md (<optab>_not_const<mode>): Turn into a
        peephole2 to avoid matching prior to combine.
        * ifcvt.cc (noce_try_sign_bit_splat): When costs are equal steer
        towards an AND based sequence.

gcc/testsuite/
        * gcc.target/riscv/pr120553-2.c: Update expected output.
        * gcc.target/riscv/pr64345.c: New test.
        * gcc.target/riscv/zbb-andn-orn-01.c: Skip when peephole2 isn't run.
        * gcc.target/riscv/zbb-andn-orn-02.c: Likewise.


diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 59b71ed263b0..697198fcc913 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -1,4 +1,4 @@
-;); Machine description for RISC-V Bit Manipulation operations.
+;; Machine description for RISC-V Bit Manipulation operations.
 ;; Copyright (C) 2021-2025 Free Software Foundation, Inc.
 
 ;; This file is part of GCC.
@@ -237,19 +237,20 @@ (define_insn "<optab>_not<mode>3"
   [(set_attr "type" "bitmanip")
    (set_attr "mode" "<X:MODE>")])
 
-(define_insn_and_split "*<optab>_not_const<mode>"
-  [(set (match_operand:X 0 "register_operand" "=r")
-       (bitmanip_bitwise:X (not:X (match_operand:X 1 "register_operand" "r"))
-              (match_operand:X 2 "const_arith_operand" "I")))
-  (clobber (match_scratch:X 3 "=&r"))]
+(define_peephole2
+  [(match_scratch:X 4 "r")
+   (set (match_operand:X 0 "register_operand")
+       (not:X (match_operand:X 1 "register_operand")))
+   (set (match_operand:X 2 "register_operand")
+       (bitmanip_bitwise:X (match_dup 0)
+                           (match_operand 3 "const_int_operand")))
+   (match_dup 4)]
   "(TARGET_ZBB || TARGET_ZBKB) && !TARGET_ZCB
-   && !optimize_function_for_size_p (cfun)"
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 3) (match_dup 2))
-   (set (match_dup 0) (bitmanip_bitwise:X (not:X (match_dup 1)) (match_dup 
3)))]
-  ""
-  [(set_attr "type" "bitmanip")])
+   && !optimize_function_for_size_p (cfun)
+   && rtx_equal_p (operands[0], operands[2])
+   && riscv_const_insns (operands[3], false) == 1"
+  [(set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (bitmanip_bitwise:X (not:X (match_dup 1)) (match_dup 
4)))])
 
 ;; '(a >= 0) ? b : 0' is emitted branchless (from if-conversion).  Without a
 ;; bit of extra help for combine (i.e., the below split), we end up emitting
diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 3dcb1be48692..97ef09a7a680 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -1289,13 +1289,13 @@ noce_try_sign_bit_splat (struct noce_if_info *if_info)
       bool speed_p = optimize_insn_for_speed_p ();
       if (exact_log2 (val_a + 1) >= 0
          && (rtx_cost (shift_right, mode, SET, 1, speed_p)
-             <= rtx_cost (and_form, mode, SET, 1, speed_p)))
+             < rtx_cost (and_form, mode, SET, 1, speed_p)))
        temp = expand_simple_binop (mode, LSHIFTRT, temp,
                                    GEN_INT (rshift_count),
                                    if_info->x, false, OPTAB_WIDEN);
       else if (exact_log2 (~val_a + 1) >= 0
               && (rtx_cost (shift_left, mode, SET, 1, speed_p)
-                  <= rtx_cost (and_form, mode, SET, 1, speed_p)))
+                  < rtx_cost (and_form, mode, SET, 1, speed_p)))
        temp = expand_simple_binop (mode, ASHIFT, temp,
                                    GEN_INT (ctz_hwi (val_a)),
                                    if_info->x, false, OPTAB_WIDEN);
@@ -1341,13 +1341,13 @@ noce_try_sign_bit_splat (struct noce_if_info *if_info)
       bool speed_p = optimize_insn_for_speed_p ();
       if (exact_log2 (val_b + 1) >= 0
          && (rtx_cost (shift_right, mode, SET, 1, speed_p)
-             <= rtx_cost (and_form, mode, SET, 1, speed_p)))
+             < rtx_cost (and_form, mode, SET, 1, speed_p)))
        temp = expand_simple_binop (mode, LSHIFTRT, temp,
                                    GEN_INT (rshift_count),
                                    if_info->x, false, OPTAB_WIDEN);
       else if (exact_log2 (~val_b + 1) >= 0
               && (rtx_cost (shift_left, mode, SET, 1, speed_p)
-                  <= rtx_cost (and_form, mode, SET, 1, speed_p)))
+                  < rtx_cost (and_form, mode, SET, 1, speed_p)))
        temp = expand_simple_binop (mode, ASHIFT, temp,
                                    GEN_INT (ctz_hwi (val_b)),
                                    if_info->x, false, OPTAB_WIDEN);
diff --git a/gcc/testsuite/gcc.target/riscv/pr120553-2.c 
b/gcc/testsuite/gcc.target/riscv/pr120553-2.c
index 1501f8654d9a..000f4bb687cd 100644
--- a/gcc/testsuite/gcc.target/riscv/pr120553-2.c
+++ b/gcc/testsuite/gcc.target/riscv/pr120553-2.c
@@ -83,8 +83,8 @@ T1(63)
 #endif
 
 /* { dg-final { scan-assembler-times "\\t(srai)" 128 { target rv64 } } } */
-/* { dg-final { scan-assembler-times "\\t(orn|ori|bset)" 128 { target rv64 } } 
} */
+/* { dg-final { scan-assembler-times "\\t(orn|ori|bset)" 196 { target rv64 } } 
} */
 
 /* { dg-final { scan-assembler-times "\\t(srai)" 64 { target rv32 } } } */
-/* { dg-final { scan-assembler-times "\\t(orn|ori|bset)" 64 { target rv32 } } 
} */
+/* { dg-final { scan-assembler-times "\\t(orn|ori|bset)" 66 { target rv32 } } 
} */
 
diff --git a/gcc/testsuite/gcc.target/riscv/pr64345.c 
b/gcc/testsuite/gcc.target/riscv/pr64345.c
new file mode 100644
index 000000000000..8ca4e2411ee9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr64345.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gcbv_zicond -mabi=lp64d" { target rv64 } } */
+/* { dg-options "-O2 -march=rv32gcbv_zicond -mabi=ilp32" { target rv32 } } */
+
+
+
+unsigned int test0 (unsigned int x) { return ((x >> 4) ^ 1) & 1; }
+
+unsigned int test1 (unsigned int x) { return ((x >> 4) & 1) ^ 1; }
+
+unsigned int test2 (unsigned int x) { return ~(x >> 4) & 1; }
+
+unsigned int test3 (unsigned int x) { return ((~x >> 4) & 1); }
+
+unsigned int test4 (unsigned int x) { return (x >> 4) & 1; }
+
+int test5 (int vi) { return vi - (((vi >> 6) & 0x01) << 1); }
+
+int test6 (int vi) { return vi - (((vi >> 6) & 0x01) << 1) - 1; }
+
+
+/* { dg-final { scan-assembler-times "\\tbexti" 5 } } */
+/* { dg-final { scan-assembler-times "\\txori" 3 } } */
+/* { dg-final { scan-assembler-times "\\tnot" 1 } } */
+/* { dg-final { scan-assembler-times "\\tsrli" 2 } } */
+/* { dg-final { scan-assembler-times "\\tandi" 2 } } */
+/* { dg-final { scan-assembler-times "\\tsub" 2 } } */
+/* { dg-final { scan-assembler-times "\\taddi" 1 } } */
+
+
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-andn-orn-01.c 
b/gcc/testsuite/gcc.target/riscv/zbb-andn-orn-01.c
index f9f32227bd58..9d8a772fb8f3 100644
--- a/gcc/testsuite/gcc.target/riscv/zbb-andn-orn-01.c
+++ b/gcc/testsuite/gcc.target/riscv/zbb-andn-orn-01.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64gc_zbb -mabi=lp64" } */
-/* { dg-skip-if "" { *-*-* } { "-O0" "-g" "-Oz" "-Os" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-g" "-O1" "-Oz" "-Os" } } */
 
 int foo1(int rs1)
 {
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-andn-orn-02.c 
b/gcc/testsuite/gcc.target/riscv/zbb-andn-orn-02.c
index 112c0fa968eb..430d9984c4b3 100644
--- a/gcc/testsuite/gcc.target/riscv/zbb-andn-orn-02.c
+++ b/gcc/testsuite/gcc.target/riscv/zbb-andn-orn-02.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv32gc_zbb -mabi=ilp32" } */
-/* { dg-skip-if "" { *-*-* } { "-O0" "-g" "-Oz" "-Os" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-g" "-O1" "-Oz" "-Os" } } */
 
 int foo1(int rs1)
 {

Reply via email to