https://gcc.gnu.org/g:ea1a75bfd580788b6c1f0a772051797bbddedf82

commit r17-2097-gea1a75bfd580788b6c1f0a772051797bbddedf82
Author: Jeff Law <[email protected]>
Date:   Thu Jul 2 11:41:09 2026 -0600

    [RISC-V] Improve ADD synthesis
    
    So testing passed for the V2 of this patch, but there were some minor 
issues I
    felt needed to be addressed.
    
    First in the new pattern, we can use %S to output the right value rather 
than
    recomputing it ourselves.  Second the test was tightend up slightly by 
adding
    missing escapes.  Finally a typos in the ChangeLog and a comment in 
bitmanip.md
    was fixed.  I didn't go through a full test cycle on those changes, but I 
did
    test those on riscv32-elf and riscv64-elf with no regressions.
    
    Attached is the final patch I'm pushing to the trunk.
    
    --
    
    So I instrumented the 3 ALU synthesis routines and built 502.gcc, then fed
    those results into some python code that allows me to compare instruction
    counts and total size for tests across LLVM and GCC.  Naturally the idea 
was to
    see if there were cases we should handle but were missing.
    
    This fixes cases in add_synthesis.  First we weren't utilizing add.uw, so
    there's a relatively small set of cases where we can take the original 
constant
    C, sign extend it from 32 to 64 bit resulting in C'.  If C' is cheaper to
    synthesize than C, then we can load up C' into a GPR, then use add.uw.  This
    (of course) requires the upper 32 bits of C to be zero and bit 31 to be on.
    
    The second case is for INT_MIN.  Adding INT_MIN to a register ultimately 
just
    flips the uppermost bit and thus can be implemented with a binvi.  Combine 
(of
    course) collapses the bit inversion case back into arithmetic.  Given the
    result is just a binvi, this patch recognizes that special case as a new
    pattern. That has a secondary effect of fixing the xfail for 
xor-synthesis-2.c
    which was failing for precisely this reason.
    
    While exploring the logical space it also came to light that we should be 
using
    riscv_integer_cost rather than riscv_const_insns. The latter clamps at 3.  
So
    if we had C with cost 5 and C' with cost 4 and we can use either, we really
    want to use C', but didn't have a way to make that selection.  Using
    riscv_integer_cost resolves that *and* we generate less junk RTL since we 
don't
    have to call GEN_INT so often.  I haven't included  testcase for that in 
this
    patch, but definitely will on the ior/xor/and space.
    
    At this time the synthesis side for addition looks good relative to LLVM, 
but
    sometimes combine is going to undo its work.  I checked every case from that
    set where GCC has more instructions than LLVM and each and every one was a
    scenario where combine+mvconst_internal undid the early synthesis work.  So
    just more reasons to keep pushing on that problem.  I did add a special 
pattern
    for the INT_MIN case.  That was trivial and since it collapses to a single 
insn
    with Zbs it seemed like the right thing to do in case combine discovers it 
from
    some other path.
    
    Both GCC and LLVM seem to be missing shNadd.uw support; after some 
head-banging
    I did manage to characterize some cases where shNadd.uw was unique enough 
to be
    useful.  That exploration was ongoing when the latest test run fired up so 
that
    support will land in a later patch.
    
    I mentioned my evaluation also looked at code size differences. That brings 
in
    general constant synthesis and there's a significant cluster of cases where
    LLVM consistently does better (li|lui+shift sometimes encodes better than
    lui+addi).  That's already being tracked in bugzilla.
    
    The other insight from this effort is that ADD, IOR, XOR are relatively 
minor
    when compared to AND.  I'm filtering out simm12 constants because those are
    trivially handled.  What was left was ~1k unique constants passed to AND.  
~100
    to ADD and ~100 to IOR/XOR.  Point being the larger effort towards AND 
handling
    seems more likely to pay dividends.  Given the larger set of primitives for 
AND
    it's no surprise we've already spent considerably more effort there.
    
    Tested on riscv32-elf and riscv64-elf with no regressions. Bootstrapped and
    regression tested on the K3 and c920 platforms. Waiting on pre-commit CI 
before
    pushing.
    
    gcc/
    
            * config/riscv/bitmanip.md (xor_for_plus_minint): New pattern.
            * config/riscv/riscv.cc (synthesize_add): Handle INT_MIN as
            bit inversion.  Add support for add.uw.  Use riscv_integer_cost
            rather than riscv_const_insns.
            (synthesize_add_extended): Use riscv_integer_cost rather than
            riscv_const_insns.
    
    gcc/testsuite/
    
            * gcc.target/riscv/add-synthesis-3.c: New test.
            * gcc.target/riscv/xor-synthesis-2.c: No longer xfail.

Diff:
---
 gcc/config/riscv/bitmanip.md                     | 11 ++++++
 gcc/config/riscv/riscv.cc                        | 44 +++++++++++++++++++-----
 gcc/testsuite/gcc.target/riscv/add-synthesis-3.c |  8 +++++
 gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c |  5 ++-
 4 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 992e949a0990..be893f12d917 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -899,6 +899,17 @@
   "<bit_optab>i\t%0,%1,%S2"
   [(set_attr "type" "bitmanip")])
 
+;; This form can be created by combine.
+(define_insn "*xor_for_plus_minint"
+  [(set (match_operand:X 0 "register_operand" "=r")
+       (plus:X (match_operand:X 1 "register_operand" "r")
+               (match_operand 2 "const_int_operand")))]
+  "(TARGET_ZBS
+    && (INTVAL (operands[2])
+       == sext_hwi ((HOST_WIDE_INT_1U << (BITS_PER_WORD - 1)), 
BITS_PER_WORD)))"
+  "binvi\t%0,%1,%S2"
+  [(set_attr "type" "bitmanip")])
+
 ;; We can easily handle zero extensions
 (define_split
   [(set (match_operand:DI 0 "register_operand")
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d5eab5421318..09dc41930aa7 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -16166,10 +16166,20 @@ synthesize_add (rtx operands[3])
   if (SMALL_OPERAND (INTVAL (operands[2])))
     return false;
 
-  int budget1 = riscv_const_insns (operands[2], true);
-  int budget2 = riscv_const_insns (GEN_INT (-INTVAL (operands[2])), true);
-
   HOST_WIDE_INT ival = INTVAL (operands[2]);
+  int budget1 = riscv_integer_cost (ival, true);
+  int budget2 = riscv_integer_cost (-ival, true);
+
+  /* If the constant is MIN_INT for the target, then it's just a bit flip
+     of the highest bit.  */
+  HOST_WIDE_INT sextval = sext_hwi (HOST_WIDE_INT_1U << (BITS_PER_WORD - 1),
+                                   BITS_PER_WORD);
+  if (TARGET_ZBS && ival == sextval)
+    {
+      rtx x = gen_rtx_XOR (word_mode, operands[1], operands[2]);
+      emit_insn (gen_rtx_SET (operands[0], x));
+      return true;
+    }
 
   /* If we can emit two addi insns then that's better than synthesizing
      the constant into a temporary, then adding the temporary to the
@@ -16200,11 +16210,11 @@ synthesize_add (rtx operands[3])
   ival = INTVAL (operands[2]);
   if (TARGET_ZBA
       && (((ival % 2) == 0 && budget1
-          > riscv_const_insns (GEN_INT (ival >> 1), true))
+          > riscv_integer_cost (ival >> 1, true))
           || ((ival % 4) == 0 && budget1
-              > riscv_const_insns (GEN_INT (ival >> 2), true))
+              > riscv_integer_cost (ival >> 2, true))
           || ((ival % 8) == 0 && budget1
-              > riscv_const_insns (GEN_INT (ival >> 3), true))))
+              > riscv_integer_cost (ival >> 3, true))))
     {
       // Load the shifted constant into a temporary
       int shct = ctz_hwi (ival);
@@ -16225,6 +16235,24 @@ synthesize_add (rtx operands[3])
       return true;
     }
 
+  /* If the constant has the upper 32 bits clear and if after sign
+     extension from 32 to 64 bits it's synthesizable cheaply,
+     then synthesize C' and use add.uw.  */
+  if ((TARGET_64BIT && TARGET_ZBA)
+      && (ival & HOST_WIDE_INT_UC (0xffffffff00000000)) == 0
+      && riscv_integer_cost (sext_hwi (ival, 32), true) < budget1)
+    {
+      /* Load the sign extended constant into a temporary.  */
+      rtx tempreg = force_reg (word_mode, GEN_INT (sext_hwi (ival, 32)));
+
+      /* Add the zero-extended temporary to the other input to construct
+        the add.uw insn.  */
+      rtx x = gen_rtx_ZERO_EXTEND (word_mode, gen_lowpart (SImode, tempreg));
+      x = gen_rtx_PLUS (word_mode, x, operands[1]);
+      emit_insn (gen_rtx_SET (operands[0], x));
+      return true;
+    }
+
   /* If the negated constant is cheaper than the original, then negate
      the constant and use sub.  */
   if (budget2 < budget1)
@@ -16272,8 +16300,8 @@ synthesize_add_extended (rtx operands[3])
     return false;
 
   HOST_WIDE_INT ival = INTVAL (operands[2]);
-  int budget1 = riscv_const_insns (operands[2], true);
-  int budget2 = riscv_const_insns (GEN_INT (-INTVAL (operands[2])), true);
+  int budget1 = riscv_integer_cost (INTVAL (operands[2]), true);
+  int budget2 = riscv_integer_cost (-UINTVAL (operands[2]), true);
 
 /*  If operands[2] can be split into two 12-bit signed immediates,
     split add into two adds.  */
diff --git a/gcc/testsuite/gcc.target/riscv/add-synthesis-3.c 
b/gcc/testsuite/gcc.target/riscv/add-synthesis-3.c
new file mode 100644
index 000000000000..ffed5735f3d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/add-synthesis-3.c
@@ -0,0 +1,8 @@
+/* { dg-do compile { target rv64 } } */
+/* { dg-options "-march=rv64gcb -mabi=lp64d" } */
+
+long F1 (long x) { return x + 0xffffffff; }
+long F2 (long x) { return x + (1UL << (sizeof (long) * 8 - 1) ); }
+
+/* { dg-final { scan-assembler-times "\\tadd\\.uw\\t" 1 } } */
+/* { dg-final { scan-assembler-times "\\tbinvi\\t" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c 
b/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c
index 25457d260750..b250cc2e6d6c 100644
--- a/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c
+++ b/gcc/testsuite/gcc.target/riscv/xor-synthesis-2.c
@@ -4,7 +4,6 @@
 
 unsigned long foo(unsigned long src) { return src ^ 0x8800000000000007; }
 
-/* xfailed until we remove mvconst_internal.  */
-/* { dg-final { scan-assembler-times "\\sbinvi\t" 2 { xfail *-*-* } } } */
-/* { dg-final { scan-assembler-times "\\sxori\t" 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times "\\sbinvi\t" 2 } } */
+/* { dg-final { scan-assembler-times "\\sxori\t" 1 } } */

Reply via email to