So I instrumented the 3 ALU synthesis routines and built 502.gcc, then
fed those results into some python code that allows me to compare
instruction counts and total size for tests across LLVM and GCC.
Naturally the idea was to see if there were cases we should handle but
were missing.
This fixes cases in add_synthesis. First we weren't utilizing add.uw,
so there's a relatively small set of cases where we can take the
original constant C, sign extend it from 32 to 64 bit resulting in C'.
If C' is cheaper to synthesize than C, then we can load up C' into a
GPR, then use add.uw. This (of course) requires the upper 32 bits of C
to be zero and bit 31 to be on.
The second case is for INT_MIN. Adding INT_MIN to a register ultimately
just flips the uppermost bit and thus can be implemented with a binvi.
Combine (of course) collapses the bit inversion case back into
arithmetic. Given the result is just a binvi, this patch recognizes
that special case as a new pattern. That has a secondary effect of
fixing the xfail for xor-synthesis-2.c which was failing for precisely
this reason.
While exploring the logical space it also came to light that we should
be using riscv_integer_cost rather than riscv_const_insns. The latter
clamps at 3. So if we had C with cost 5 and C' with cost 4 and we can
use either, we really want to use C', but didn't have a way to make that
selection. Using riscv_integer_cost resolves that *and* we generate
less junk RTL since we don't have to call GEN_INT so often. I haven't
included testcase for that in this patch, but definitely will on the
ior/xor/and space.
At this time the synthesis side for addition looks good relative to
LLVM, but sometimes combine is going to undo its work. I checked every
case from that set where GCC has more instructions than LLVM and each
and every one was a scenario where combine+mvconst_internal undid the
early synthesis work. So just more reasons to keep pushing on that
problem. I did add a special pattern for the INT_MIN case. That was
trivial and since it collapses to a single insn with Zbs it seemed like
the right thing to do in case combine discovers it from some other path.
Both GCC and LLVM seem to be missing shNadd.uw support; after some
head-banging I did manage to characterize some cases where shNadd.uw was
unique enough to be useful. That exploration was ongoing when the
latest test run fired up so that support will land in a later patch.
I mentioned my evaluation also looked at code size differences. That
brings in general constant synthesis and there's a significant cluster
of cases where LLVM consistently does better (li|lui+shift sometimes
encodes better than lui+addi). That's already being tracked in bugzilla.
The other insight from this effort is that ADD, IOR, XOR are relatively
minor when compared to AND. I'm filtering out simm12 constants because
those are trivially handled. What was left was ~1k unique constants
passed to AND. ~100 to ADD and ~100 to IOR/XOR. Point being the larger
effort towards AND handling seems more likely to pay dividends. Given
the larger set of primitives for AND it's no surprise we've already
spent considerably more effort there.
Tested on riscv32-elf and riscv64-elf with no regressions. Bootstrapped
and regression tested on the K3 and c920 platforms. Waiting on
pre-commit CI before pushing.
Jeff
gcc/
* config/risc/bitmanip.md (xor_for_plus_minint): New pattern.
* config/riscv/riscv.cc (synthesize_add): Handle INT_MIN as
bit inversion. Add support for add.uw. Use riscv_integer_cost
rather than riscv_const_insns.
(synthesize_add_extended): Use riscv_integer_cost rather than
riscv_const_insns.
gcc/testsuite/
* gcc.target/riscv/add-synthesis-3.c: New test.
* gcc.target/riscv/xor-synthesis-2.c: No longer xfail.
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 992e949a0990..786072035601 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -899,6 +899,20 @@ (define_insn "*<bit_optab>i<mode>"
"<bit_optab>i\t%0,%1,%S2"
[(set_attr "type" "bitmanip")])
+;; This form can be created by combine.
+(define_insn "*xor_for_plus_minint"
+ [(set (match_operand:X 0 "register_operand" "=r")
+ (plus:X (match_operand:X 1 "register_operand" "r")
+ (match_operand 2 "const_int_operand")))]
+ "(TARGET_ZBS
+ && (INTVAL (operands[2])
+ == sext_hwi ((HOST_WIDE_INT_1U << (BITS_PER_WORD - 1)),
BITS_PER_WORD)))"
+{
+ operands[2] = GEN_INT (BITS_PER_WORD - 1);
+ return "binvi\t%0,%1,%2";
+}
+ [(set_attr "type" "bitmanip")])
+
;; We can easily handle zero extensions
(define_split
[(set (match_operand:DI 0 "register_operand")
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d5eab5421318..74cbca077147 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -16166,10 +16215,20 @@ synthesize_add (rtx operands[3])
if (SMALL_OPERAND (INTVAL (operands[2])))
return false;
- int budget1 = riscv_const_insns (operands[2], true);
- int budget2 = riscv_const_insns (GEN_INT (-INTVAL (operands[2])), true);
-
HOST_WIDE_INT ival = INTVAL (operands[2]);
+ int budget1 = riscv_integer_cost (ival, true);
+ int budget2 = riscv_integer_cost (-ival, true);
+
+ /* If the constant is MIN_INT for the target, then it's just a bit flip
+ of the highest bit. */
+ HOST_WIDE_INT sextval = sext_hwi (HOST_WIDE_INT_1U << (BITS_PER_WORD - 1),
+ BITS_PER_WORD);
+ if (TARGET_ZBS && ival == sextval)
+ {
+ rtx x = gen_rtx_XOR (word_mode, operands[1], operands[2]);
+ emit_insn (gen_rtx_SET (operands[0], x));
+ return true;
+ }
/* If we can emit two addi insns then that's better than synthesizing
the constant into a temporary, then adding the temporary to the
@@ -16200,11 +16259,11 @@ synthesize_add (rtx operands[3])
ival = INTVAL (operands[2]);
if (TARGET_ZBA
&& (((ival % 2) == 0 && budget1
- > riscv_const_insns (GEN_INT (ival >> 1), true))
+ > riscv_integer_cost (ival >> 1, true))
|| ((ival % 4) == 0 && budget1
- > riscv_const_insns (GEN_INT (ival >> 2), true))
+ > riscv_integer_cost (ival >> 2, true))
|| ((ival % 8) == 0 && budget1
- > riscv_const_insns (GEN_INT (ival >> 3), true))))
+ > riscv_integer_cost (ival >> 3, true))))
{
// Load the shifted constant into a temporary
int shct = ctz_hwi (ival);
@@ -16225,6 +16284,24 @@ synthesize_add (rtx operands[3])
return true;
}
+ /* If the constant has the upper 32 bits clear and if after sign
+ extension from 32 to 64 bits it's synthesizable cheaply,
+ then synthesize C' and use add.uw. */
+ if ((TARGET_64BIT && TARGET_ZBA)
+ && (ival & HOST_WIDE_INT_UC (0xffffffff00000000)) == 0
+ && riscv_integer_cost (sext_hwi (ival, 32), true) < budget1)
+ {
+ /* Load the sign extended constant into a temporary. */
+ rtx tempreg = force_reg (word_mode, GEN_INT (sext_hwi (ival, 32)));
+
+ /* Add the zero-extended temporary to the other input to construct
+ the add.uw insn. */
+ rtx x = gen_rtx_ZERO_EXTEND (word_mode, gen_lowpart (SImode, tempreg));
+ x = gen_rtx_PLUS (word_mode, x, operands[1]);
+ emit_insn (gen_rtx_SET (operands[0], x));
+ return true;
+ }
+
/* If the negated constant is cheaper than the original, then negate
the constant and use sub. */
if (budget2 < budget1)
@@ -16272,8 +16349,8 @@ synthesize_add_extended (rtx operands[3])
return false;
HOST_WIDE_INT ival = INTVAL (operands[2]);
- int budget1 = riscv_const_insns (operands[2], true);
- int budget2 = riscv_const_insns (GEN_INT (-INTVAL (operands[2])), true);
+ int budget1 = riscv_integer_cost (INTVAL (operands[2]), true);
+ int budget2 = riscv_integer_cost (-UINTVAL (operands[2]), true);
/* If operands[2] can be split into two 12-bit signed immediates,
split add into two adds. */
diff --git a/gcc/testsuite/gcc.target/riscv/add-synthesis-3.c
b/gcc/testsuite/gcc.target/riscv/add-synthesis-3.c
new file mode 100644
index 000000000000..e10b4032d464
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/add-synthesis-3.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target rv64 } } */
+/* { dg-options "-march=rv64gcb -mabi=lp64d" } */
+
+long F1 (long x) { return x + 0xffffffff; }
+long F2 (long x) { return x + (1UL << (sizeof (long) * 8 - 1) ); }
+
+/* { dg-final { scan-assembler-times "\\tadd.uw\\t" 1 } } */
+/* { dg-final { scan-assembler-times "\\tbinvi\t" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c
b/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c
index 25457d260750..b250cc2e6d6c 100644
--- a/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c
+++ b/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c
@@ -4,7 +4,6 @@
unsigned long foo(unsigned long src) { return src ^ 0x8800000000000007; }
-/* xfailed until we remove mvconst_internal. */
-/* { dg-final { scan-assembler-times "\\sbinvi\t" 2 { xfail *-*-* } } } */
-/* { dg-final { scan-assembler-times "\\sxori\t" 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times "\\sbinvi\t" 2 } } */
+/* { dg-final { scan-assembler-times "\\sxori\t" 1 } } */