So as noted in the PR, GCC fails to optimize this well:
int8_t f(int8_t x)
{
int8_t sh = 1 << x;
return sh & 1;
}
I strongly suspect this kind of code is exceedingly rare in practice. I
just happened to notice that it could be improved when looking for bugs
to pass along to Shreya & Austin. As noted in the PR, most of the time
this is cleaned up in gimple, but in some cases it slips through.
I'd love to tackle this in simplify-rtx, but SHIFT_COUNT_TRUNCATED, mode
handling for shift counts, subregs to deal with 32 bit objects on 64 bit
targets, etc make it fairly messy. Rather than spend a ton of time on
it, I've just created a simple risc-v splitter to handle the case of a
32bit shift on rv64. The other cases can't be optimized.
For rv64 we generate:
li a5,1 # 7 [c=4 l=4] *movsi_internal/1
sllw a0,a5,a0 # 8 [c=8 l=4] ashlsi3_extend
andi a0,a0,1 # 17 [c=4 l=4] *anddi3/1
Instead we can generate:
andi a0,a0,31 # 8 [c=4 l=4] *anddi3/1
seqz a0,a0 # 17 [c=4 l=4] *seq_zero_didi
I purposefully added the masking of the shift count. While the RISC-V
port does not define SHIFT_COUNT_TRUNCATED, it does have patterns that
optimize away the the masking when they can. If the masking got
optimized away on the assumption the count would be used in a
shift/rotate and thus masked by the hardware, we could have junk in the
upper bits. It's worth noting that because of the need to sanitize the
shift count we're generating 2 insns, thus we can't really improve for
rv32 or for 64 bit objects on rv64. If we didn't need to do that this
would be a define_insn that generated a single instruction.
Bootstrapped and regression tested on rv64 for on both the BPI and the
Pioneer. Also regression tested on riscv32-elf and riscv64-elf.
Planning to push once pre-commit CI gives the green light.
Jeff
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4a49e778fed5..e4dac32d71db 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -5168,10 +5168,22 @@ (define_insn "*sign_bit_splat_equality_test"
}
[(set_attr "type" "branch")
(set_attr "mode" "none")])
-
-
-;; Standard extensions and pattern for optimization
+;; We can save an instruction for this case. Essentially we can
+;; test the (sanitized) shift count against zero. This only comes
+;; up for 32 bit objects on rv64.
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (and:DI (subreg:DI
+ (ashift:SI (const_int 1)
+ (match_operand:QI 1 "register_operand")) 0)
+ (const_int 1)))
+ (clobber (match_operand:DI 2 "register_operand"))]
+ "TARGET_64BIT"
+ [(set (match_dup 2) (and:DI (match_dup 1) (const_int 31)))
+ (set (match_dup 0) (eq:DI (match_dup 2) (const_int 0)))]
+ { operands[1] = gen_lowpart (DImode, operands[1]); })
+
(include "bitmanip.md")
(include "crypto.md")
(include "sync.md")
diff --git a/gcc/testsuite/gcc.target/riscv/pr106244.c
b/gcc/testsuite/gcc.target/riscv/pr106244.c
new file mode 100644
index 000000000000..18c2e1b49504
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr106244.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target rv64 } } */
+/* { dg-additional-options "-march=rv64gc_zicond -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og"} } */
+
+typedef signed char int8_t;
+int8_t f(int8_t x)
+{
+ int8_t sh = 1 << x;
+ return sh & 1;
+}
+
+/* { dg-final { scan-assembler-not "li\t" } } */
+/* { dg-final { scan-assembler-not "sllw\t" } } */
+/* { dg-final { scan-assembler-times "andi\t" 1 } } */
+/* { dg-final { scan-assembler-times "seqz\t" 1 } } */