Some functional change as was already posted, this time with a
testcase. Given it's been in my tester and through the pre-commit CI
system, I'm going forward now.
--
So as the PR notes, this is an attempt to squeeze out some instructions
from a hot part of leela, the random number generator in particular.
typedef unsigned int uint32;
uint32 random(uint32 s1) {
const uint32 mask = 0xffffffff;
s1 = (((s1 & 0xFFFFFFFEU) << 12) & mask);
return s1;
}
Generates this RISC-V code:
slli a5,a0,44 # 25 [c=4 l=4] ashldi3
srai a0,a5,44 # 26 [c=4 l=4] ashrdi3
andi a0,a0,-2 # 21 [c=4 l=4] *anddi3/1
slli a0,a0,12 # 22 [c=4 l=4] ashldi3
But this is an equivalent sequence:
andi a0, a0, -2
slliw a0, a0, 12
The key is realizing that the the first two statements are just a sign
extended bitfield of length 20. That ultimately gets shifted left 12
bits. 20+12 = 32, so we can at least conceptually use slliw (shift left
sign extending result from SI to DI). The andi just turns off the low bit.
Given a sign extracted bitfield starting at bit 0, of size N that is
then left shifted by M where N+M == 32 is a natural slliw instruction.
However, when I tried to recognize that and generate the slliw form I
saw code quality regressions that didn't look particularly reasonable to
try and fix. So we want to be more selective about recognizing that
idiom. So we recognize it when we subsequently mask off some bits and
the mask can be encoded via andi. This likely could be extended to
other logical operations that don't ultimately affect the SI sign bit.
Jeff
commit 046bc3484c90a25fb09c851d6afac13b790bb20c
Author: Jeff Law <[email protected]>
Date: Fri May 8 11:40:29 2026 -0600
[V2][RISC-V][PR target/124955] Utilize slliw for some left shifted signed
bitfield extractions
Some functional change as was already posted, this time with a testcase.
Given
it's been in my tester and through the pre-commit CI system, I'm going
forward
now.
--
So as the PR notes, this is an attempt to squeeze out some instructions
from a
hot part of leela, the random number generator in particular.
typedef unsigned int uint32;
uint32 random(uint32 s1) {
const uint32 mask = 0xffffffff;
s1 = (((s1 & 0xFFFFFFFEU) << 12) & mask);
return s1;
}
Generates this RISC-V code:
slli a5,a0,44 # 25 [c=4 l=4] ashldi3
srai a0,a5,44 # 26 [c=4 l=4] ashrdi3
andi a0,a0,-2 # 21 [c=4 l=4] *anddi3/1
slli a0,a0,12 # 22 [c=4 l=4] ashldi3
But this is an equivalent sequence:
andi a0, a0, -2
slliw a0, a0, 12
The key is realizing that the the first two statements are just a sign
extended
bitfield of length 20. That ultimately gets shifted left 12 bits. 20+12 =
32,
so we can at least conceptually use slliw (shift left sign extending result
from SI to DI). The andi just turns off the low bit.
Given a sign extracted bitfield starting at bit 0, of size N that is then
left
shifted by M where N+M == 32 is a natural slliw instruction. However, when
I
tried to recognize that and generate the slliw form I saw code quality
regressions that didn't look particularly reasonable to try and fix. So we
want to be more selective about recognizing that idiom. So we recognize it
when we subsequently mask off some bits and the mask can be encoded via
andi.
This likely could be extended to other logical operations that don't
ultimately
affect the SI sign bit.
PR target/124955
gcc/
* config/riscv/riscv.md (masked shifted bitfield extraction): New
splitter to utilize slliw to eliminate the need for sign extnesion.
gcc/testsuite/
* gcc.target/riscv/pr124955.c: New test
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index b7fba2e88a3..869061e18ae 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -5184,6 +5184,36 @@ (define_split
(set (match_dup 0) (eq:DI (match_dup 2) (const_int 0)))]
{ operands[1] = gen_lowpart (DImode, operands[1]); })
+;; The basic idea is to realize that we can get the sign extension
+;; for free when sign extracting a field shifting it such that
+;; the sign bit of the field ends up in the SI sign bit. In that
+;; case it's just a slliw.
+;;
+;; It is tempting to do the extract+shift rewriting independent of
+;; the outer AND. But that's shown to regress code quality in other
+;; contexts. So we're being more conservative about trying to
+;; exploit the free sign extension opportunities that show up with
+;; shifted sign extractions
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (and:DI
+ (ashift:DI (sign_extract:DI (match_operand:DI 1 "register_operand")
+ (match_operand 2 "const_int_operand")
+ (match_operand 3 "const_int_operand"))
+ (match_operand 4 "const_int_operand"))
+ (match_operand:DI 5 "const_int_operand")))
+ (clobber (match_operand:DI 6 "register_operand"))]
+ "(TARGET_64BIT
+ && INTVAL (operands[2]) + INTVAL (operands[4]) == 32
+ && SMALL_OPERAND (INTVAL (operands[5]) >> INTVAL (operands[4])))"
+ [(set (match_dup 6) (and:DI (match_dup 1) (match_dup 5)))
+ (set (match_dup 0) (sign_extend:DI (ashift:SI (match_dup 7) (match_dup
4))))]
+{
+ HOST_WIDE_INT new_mask = INTVAL (operands[5]) >> INTVAL (operands[4]);
+ operands[5] = GEN_INT (new_mask);
+ operands[7] = gen_lowpart (SImode, operands[6]);
+})
+
(include "bitmanip.md")
(include "crypto.md")
(include "sync.md")
diff --git a/gcc/testsuite/gcc.target/riscv/pr124955.c
b/gcc/testsuite/gcc.target/riscv/pr124955.c
new file mode 100644
index 00000000000..db6a08b3878
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr124955.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target rv64} } */
+/* { dg-additional-options "-march=rv64gc_zicond -mabi=lp64d" } */
+
+typedef unsigned int uint32;
+
+uint32 random(uint32 s1) {
+ const uint32 mask = 0xffffffff;
+ s1 = (((s1 & 0xFFFFFFFEU) << 12) & mask);
+ return s1;
+}
+
+/* { dg-final { scan-assembler-not "slli\t" } } */
+/* { dg-final { scan-assembler-not "srai\t" } } */
+/* { dg-final { scan-assembler-times "slliw\t" 1 } } */