So as the PR notes, this is an attempt to squeeze out some instructions
from a hot part of leela, the random number generator in particular.
typedef unsigned int uint32;
uint32 random(uint32 s1) {
const uint32 mask = 0xffffffff;
s1 = (((s1 & 0xFFFFFFFEU) << 12) & mask);
return s1;
}
Generates this RISC-V code:
slli a5,a0,44 # 25 [c=4 l=4] ashldi3
srai a0,a5,44 # 26 [c=4 l=4] ashrdi3
andi a0,a0,-2 # 21 [c=4 l=4] *anddi3/1
slli a0,a0,12 # 22 [c=4 l=4] ashldi3
But this is an equivalent sequence:
andi a0, a0, -2
slliw a0, a0, 12
The key is realizing that the the first two statements are just a sign
extended bitfield of length 20. That ultimately gets shifted left 12
bits. 20+12 = 32, so we can at least conceptually use slliw (shift left
sign extending result from SI to DI). The andi just turns off the low bit.
Given a sign extracted bitfield starting at bit 0, of size N that is
then left shifted by M where N+M == 32 is a natural slliw instruction.
However, when I tried to recognize that and generate the slliw form I
saw code quality regressions that didn't look particularly reasonable to
try and fix. So we want to be more selective about recognizing that
idiom. So we recognize it when we subsequently mask off some bits and
the mask can be encoded via andi. This likely could be extended to
other logical operations that don't ultimately affect the SI sign bit.
So here's the patch I'm playing with right now. It's passed riscv32-elf
and riscv64-elf. Bootstrap on the BPI and Pioneer is in progress. I'm
posting it now to get the CI system chewing on it overnight.
Jeff
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4a49e778fed5..b6c29db13c1d 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -5168,8 +5168,37 @@ (define_insn "*sign_bit_splat_equality_test"
}
[(set_attr "type" "branch")
(set_attr "mode" "none")])
-
-
+
+
+;; The basic idea is to realize that we can get the sign extension
+;; for free when sign extracting a field shifting it such that
+;; the sign bit of the field ends up in the SI sign bit. In that
+;; case it's just a slliw.
+;;
+;; It is tempting to do the extract+shift rewriting independent of
+;; the outer AND. But that's shown to regress code quality in other
+;; contexts. So we're being more conservative about trying to
+;; exploit the free sign extension opportunities that show up with
+;; shifted sign extractions
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (and:DI
+ (ashift:DI (sign_extract:DI (match_operand:DI 1 "register_operand")
+ (match_operand 2 "const_int_operand")
+ (match_operand 3 "const_int_operand"))
+ (match_operand 4 "const_int_operand"))
+ (match_operand:DI 5 "const_int_operand")))
+ (clobber (match_operand:DI 6 "register_operand"))]
+ "(TARGET_64BIT
+ && INTVAL (operands[2]) + INTVAL (operands[4]) == 32
+ && SMALL_OPERAND (INTVAL (operands[5]) >> INTVAL (operands[4])))"
+ [(set (match_dup 6) (and:DI (match_dup 1) (match_dup 5)))
+ (set (match_dup 0) (sign_extend:DI (ashift:SI (match_dup 7) (match_dup
4))))]
+{
+ HOST_WIDE_INT new_mask = INTVAL (operands[5]) >> INTVAL (operands[4]);
+ operands[5] = GEN_INT (new_mask);
+ operands[7] = gen_lowpart (SImode, operands[6]);
+})
;; Standard extensions and pattern for optimization
(include "bitmanip.md")