On Sun, Aug 10, 2025 at 09:28:50AM -0600, Jeff Law wrote:
> Conceptually OK.  I slightly worry about the use of PREV_INSN as we could
> conceptually end up with notes, line markers, etc between the two fusible
> insns.
> 
> Right now the scheduler is the only pass that ever examines SCHED_GROUP_P
> and, IIRC, we remove all the notes from the IL, schedule, then put all the
> notes back in.  Point being within the scheduler PREV_INSN is always going
> to get us the previous real insn.  Outside the scheduler context PREV_INSN
> might not do exactly what we want and may even introduce compare-debug
> failures.
> 
> ISTM it's probably safer to use prev_nonnote_nondebug_insn.  The biggest
> worry with using that API is walking past a BB marker.  But in the fusion
> case, I think we're going to be safe from those problems.
> 
> 
> Given the simplicity of this patch, I'd be inclined to ACK it independent of
> the main body of work, so if you could bootstrap and regression test it in
> isolation after adjusting the PREV_INSN uses it'd be appreciated.

Hi Jeff, thanks for taking a look.

Please find a new version of this patch attached.  I've bootstrapped and
regtested it in isolation and with patch #3 on x86_64, i386, aarch64,
and riscv64, and nothing came up.  I've also done the same on top of
patch #1 (which provides quite a lot more coverage), and everything
passed as well.

> 
> Jeff

Thanks,
Artemiy

------------ 8< ------------

Some of the instruction pairs recognized as fusible by a preceding
invocation of the dep_fusion pass require that both components of a pair
have the same hard register output for the fusion to work in hardware.
(An example of this would be a multiply-add operation, or a zero-extract
operation composed of two shifts.)

For all such pairs, the following conditions will hold:
  (a) Both insns are single_sets
  (b) Both insns have a register destination
  (c) The pair has been marked as fusible by setting the second insn's
SCHED_GROUP flag
  (d) Additionally, post-RA, both instructions' destination regnos are
equal

(All of these conditions are encapsulated in the newly created
single_output_fused_pair_p () predicate.)

During IRA, if conditions (a)-(c) above hold, we need to tie the two
instructions' destination allocnos together so that they are allocated
to the same hard register.  We do this in add_insn_allocno_copies () by
adding a constraint conflict to the output operands of the two
instructions.

gcc/ChangeLog:

        * ira-conflicts.cc (add_insn_allocno_copies): Handle fused insn pairs.
        * rtl.h (single_output_fused_pair_p): Declare new function.
        * rtlanal.cc (single_output_fused_pair_p): Define it.

Suggested-by: Jeff Law <j...@ventanamicro.com>
Signed-off-by: Artemiy Volkov <arte...@synopsys.com>
---
 gcc/ira-conflicts.cc | 12 ++++++++++--
 gcc/rtl.h            |  1 +
 gcc/rtlanal.cc       | 20 ++++++++++++++++++++
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/gcc/ira-conflicts.cc b/gcc/ira-conflicts.cc
index d8f7c1e1c37..e9ab16a5d20 100644
--- a/gcc/ira-conflicts.cc
+++ b/gcc/ira-conflicts.cc
@@ -448,7 +448,7 @@ process_reg_shuffles (rtx_insn *insn, rtx reg, int op_num, 
int freq,
 static void
 add_insn_allocno_copies (rtx_insn *insn)
 {
-  rtx set, operand, dup;
+  rtx set = single_set (insn), operand, dup;
   bool bound_p[MAX_RECOG_OPERANDS];
   int i, n, freq;
   alternative_mask alts;
@@ -456,7 +456,15 @@ add_insn_allocno_copies (rtx_insn *insn)
   freq = REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn));
   if (freq == 0)
     freq = 1;
-  if ((set = single_set (insn)) != NULL_RTX
+
+  /* Tie output register operands of two consecutive single_sets
+     marked as a fused pair.  */
+  if (single_output_fused_pair_p (insn))
+    process_regs_for_copy (SET_DEST (set),
+                  SET_DEST (single_set (prev_nonnote_nondebug_insn (insn))),
+                  true, NULL, freq);
+
+  if (set != NULL_RTX
       && REG_SUBREG_P (SET_DEST (set)) && REG_SUBREG_P (SET_SRC (set))
       && ! side_effects_p (set)
       && find_reg_note (insn, REG_DEAD,
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 5bd0bd4d168..9684b45f2a5 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3670,6 +3670,7 @@ extern bool contains_symbol_ref_p (const_rtx);
 extern bool contains_symbolic_reference_p (const_rtx);
 extern bool contains_constant_pool_address_p (const_rtx);
 extern void add_auto_inc_notes (rtx_insn *, rtx);
+extern bool single_output_fused_pair_p (rtx_insn *);
 
 /* Handle the cheap and common cases inline for performance.  */
 
diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
index 87332ffebce..19b66456e45 100644
--- a/gcc/rtlanal.cc
+++ b/gcc/rtlanal.cc
@@ -6976,6 +6976,26 @@ add_auto_inc_notes (rtx_insn *insn, rtx x)
     }
 }
 
+/* Return true if INSN is the second element of a pair of macro-fused
+   single_sets, both of which having the same register output as another.  */
+bool
+single_output_fused_pair_p (rtx_insn *insn)
+{
+  rtx set, prev_set;
+  rtx_insn *prev;
+
+  return INSN_P (insn)
+        && SCHED_GROUP_P (insn)
+        && (prev = prev_nonnote_nondebug_insn (insn))
+        && (set = single_set (insn)) != NULL_RTX
+        && (prev_set = single_set (prev))
+            != NULL_RTX
+        && REG_P (SET_DEST (set))
+        && REG_P (SET_DEST (prev_set))
+        && (!reload_completed
+            || REGNO (SET_DEST (set)) == REGNO (SET_DEST (prev_set)));
+}
+
 /* Return true if X is register asm.  */
 
 bool
-- 
2.43.0

Reply via email to