[to-be-committed][RISC-V][tree-optimization/106244] Improve code when generating (1 << N) & 0x1

Jeffrey Law Thu, 07 May 2026 12:24:06 -0700

So as noted in the PR, GCC fails to optimize this well:



int8_t f(int8_t x)
{
    int8_t sh = 1 << x;
    return sh & 1;
}

I strongly suspect this kind of code is exceedingly rare in practice. Ijust happened to notice that it could be improved when looking for bugsto pass along to Shreya & Austin. As noted in the PR, most of the timethis is cleaned up in gimple, but in some cases it slips through.

I'd love to tackle this in simplify-rtx, but SHIFT_COUNT_TRUNCATED, modehandling for shift counts, subregs to deal with 32 bit objects on 64 bittargets, etc make it fairly messy. Rather than spend a ton of time onit, I've just created a simple risc-v splitter to handle the case of a32bit shift on rv64. The other cases can't be optimized.




For rv64 we generate:

        li      a5,1            # 7     [c=4 l=4] *movsi_internal/1
        sllw    a0,a5,a0        # 8     [c=8 l=4]  ashlsi3_extend
        andi    a0,a0,1 # 17    [c=4 l=4]  *anddi3/1


Instead we can generate:


        andi    a0,a0,31        # 8     [c=4 l=4]  *anddi3/1
        seqz    a0,a0   # 17    [c=4 l=4]  *seq_zero_didi

I purposefully added the masking of the shift count. While the RISC-Vport does not define SHIFT_COUNT_TRUNCATED, it does have patterns thatoptimize away the the masking when they can. If the masking gotoptimized away on the assumption the count would be used in ashift/rotate and thus masked by the hardware, we could have junk in theupper bits. It's worth noting that because of the need to sanitize theshift count we're generating 2 insns, thus we can't really improve forrv32 or for 64 bit objects on rv64. If we didn't need to do that thiswould be a define_insn that generated a single instruction.

Bootstrapped and regression tested on rv64 for on both the BPI and thePioneer. Also regression tested on riscv32-elf and riscv64-elf. Planning to push once pre-commit CI gives the green light.


Jeff

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4a49e778fed5..e4dac32d71db 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -5168,10 +5168,22 @@ (define_insn "*sign_bit_splat_equality_test"
 }
   [(set_attr "type" "branch")
    (set_attr "mode" "none")])
-                                       
-       
 
-;; Standard extensions and pattern for optimization
+;; We can save an instruction for this case.  Essentially we can
+;; test the (sanitized) shift count against zero.  This only comes
+;; up for 32 bit objects on rv64.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+       (and:DI (subreg:DI
+                (ashift:SI (const_int 1)
+                           (match_operand:QI 1 "register_operand")) 0)
+               (const_int 1)))
+   (clobber (match_operand:DI 2 "register_operand"))]
+  "TARGET_64BIT"
+  [(set (match_dup 2) (and:DI (match_dup 1) (const_int 31)))
+   (set (match_dup 0) (eq:DI (match_dup 2) (const_int 0)))]
+  { operands[1] = gen_lowpart (DImode, operands[1]); })
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/testsuite/gcc.target/riscv/pr106244.c 
b/gcc/testsuite/gcc.target/riscv/pr106244.c
new file mode 100644
index 000000000000..18c2e1b49504
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr106244.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target rv64 } } */
+/* { dg-additional-options "-march=rv64gc_zicond -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og"} } */
+
+typedef signed char int8_t;
+int8_t f(int8_t x)
+{
+    int8_t sh = 1 << x;
+    return sh & 1;
+}
+
+/* { dg-final { scan-assembler-not "li\t" } } */
+/* { dg-final { scan-assembler-not "sllw\t" } } */
+/* { dg-final { scan-assembler-times "andi\t" 1 } } */
+/* { dg-final { scan-assembler-times "seqz\t" 1 } } */

[to-be-committed][RISC-V][tree-optimization/106244] Improve code when generating (1 << N) & 0x1

Reply via email to