On Wed, Feb 04, 2026 at 12:32:11AM +0000, Richard Sandiford wrote:
> Artemiy Volkov <[email protected]> writes:
> > This patch addresses a bug exposed by the dep_fusion pass on aarch64 and
> > reported in PR123786.  Essentially, the generic-armv9-a defines a CMP+CSEL
> > fused instruction pair (in the following example those are insns 73 and
> > 74) and dep_fusion moves 74 right after 73, as in:
> >
> > Before:
> >
> > (note 62 61 66 10 [bb 10] NOTE_INSN_BASIC_BLOCK)
> > (...)
> > (insn 73 71 69 10 (set (reg:CC 66 cc)
> >         (compare:CC (reg:SI 148 [ _4 ])
> >             (const_int 2 [0x2]))) "../test.c":27:33 discrim 2 452 {cmpsi}
> >      (expr_list:REG_DEAD (reg:SI 148 [ _4 ])
> >         (nil)))
> > (jump_insn 69 73 70 10 (set (pc)
> >         (if_then_else (eq (reg:SI 109 [ _19 ])
> >                 (const_int 0 [0]))
> >             (label_ref:DI 114)
> >             (pc))) "../test.c":27:33 7 {*aarch64_cbzeqsi}
> >      (int_list:REG_BR_PROB 536870913 (nil))
> >  -> 114)
> > (note 70 69 74 11 [bb 11] NOTE_INSN_BASIC_BLOCK)
> > (insn 74 70 142 11 (set (reg:SI 110 [ iftmp.3_21 ])
> >         (if_then_else:SI (leu (reg:CC 66 cc)
> >                 (const_int 0 [0]))
> >             (reg:SI 109 [ _19 ])
> >             (const_int 0 [0]))) "../test.c":27:33 discrim 2 504 
> > {*cmovsi_insn}
> >      (expr_list:REG_DEAD (reg:SI 109 [ _19 ])
> >         (expr_list:REG_DEAD (reg:CC 66 cc)
> >             (nil))))
> > (jump_insn 142 74 143 11 (set (pc)
> >         (label_ref 76)) 6 {jump}
> >      (nil)
> >  -> 76)
> >
> > After:
> >
> > (note 62 61 66 10 [bb 10] NOTE_INSN_BASIC_BLOCK)
> > (...)
> > (insn 73 71 74 10 (set (reg:CC 66 cc)
> >         (compare:CC (reg:SI 148 [ _4 ])
> >             (const_int 2 [0x2]))) "../test.c":27:33 discrim 2 452 {cmpsi}
> >      (expr_list:REG_DEAD (reg:SI 148 [ _4 ])
> >         (nil)))
> > (insn/s 74 73 69 10 (set (reg:SI 110 [ iftmp.3_21 ])
> >         (if_then_else:SI (leu (reg:CC 66 cc)
> >                 (const_int 0 [0]))
> >             (reg:SI 109 [ _19 ])
> >             (const_int 0 [0]))) "../test.c":27:33 discrim 2 504 
> > {*cmovsi_insn}
> >      (nil))
> > (jump_insn 69 74 70 10 (set (pc)
> >         (if_then_else (eq (reg:SI 109 [ _19 ])
> >                 (const_int 0 [0]))
> >             (label_ref:DI 114)
> >             (pc))) "../test.c":27:33 7 {*aarch64_cbzeqsi}
> >      (int_list:REG_BR_PROB 536870913 (nil))
> >  -> 114)
> > (note 70 69 142 11 [bb 11] NOTE_INSN_BASIC_BLOCK)
> > (jump_insn 142 70 143 11 (set (pc)
> >         (label_ref 76)) 6 {jump}
> >      (nil)
> >  -> 76)
> >
> > That is, we are moving insn 74 up past a conditional branch (insn 69),
> > resulting in it being executed unconditionally.
> >
> > The root problem is the following: rtl_ssa::restrict_movement_for_defs (),
> > when narrowing the allowed move range of a def within an EBB, uses the
> > last use of the previous def (return by the last_access () function) for
> > the earliest valid destination location (taken in the RTL list sense).
> > For the following CFG (taken from the same function as the RTL fragments
> > above):
> >
> >                     |
> >                     |   live-in
> >                     |    r110
> >                     v
> >                   ebb10
> >             +--------------------+
> >             |    bb10            |
> >             | +------------+     |
> >             | | ...        |     |
> >             | +------------+     |
> >             |   /    \           |
> >             |  /      v          |
> >             | /      bb11        |
> >             |/    +------------+ |
> >             |     | r110 = ... | |
> >      live-out  /|     |     ...        | |
> >     r110  / |     +------------+ |
> >          /  |               /    |
> >         v   +--------------------+
> >        bb12               /
> >       +------------+         /
> >       | r110 = ... |        /
> >       | ...        |       /
> >       +------------+      /
> >             \        /
> >              \      /
> >               v    v
> >                bb13
> >             +------------+
> >             | use r110   |
> >             | ...        |
> >             +------------+
> >
> > restrict_movement_for_defs () will detect that, from the perspective of
> > the r110 def in bb11, the last use of the previous def was somewhere
> > before the start of ebb10 and will therefore allow unrestricted movement
> > of that def to any point within bb10.
> >
> > To address this, during initial traversal of a new EBB, we create an
> > artificial live-out use at the bottom of the current BB when (a) a
> > variable is live-in at the entry to the EBB, (b) it hasn't yet been
> > redefined in the current EBB, and (c) it is redefined in the next BB.
> > This live-out use serves as a barrier that restricts movement of new defs
> > from later BBs having a new value to earlier ones still having the old
> > live-in value.  In the diagram, this live-out use will be created for r110
> > at the end of bb10.
> >
> > Since the def that this newly created live-out use sees has to be
> > dominating (otherwise whenever the last use occurs after the current EBB,
> > the movement will be restricted to an empty range), we create a new def at
> > the beginning of the EBB.  For this, we use the create_reg_use () function
> > that will also create a degenerate PHI for this purpose when needed.
> >
> > One problem that we run into with this approach is that during RTL-SSA
> > construction single-input PHI nodes are sometimes eliminated as
> > unnecessary, which causes the last dominating def to be reverted to an
> > earlier EBB and the move range restriction failing as before.  To remedy
> > this, we mark PHIs that we create as 'persistent' and do not eliminate
> > them where previously they would be replaced by their single input.  (To
> > remain aligned with the design goal of minimizing memory overhead that
> > RTL-SSA incurs, the m_has_been_superceded flag has been repurposed to
> > track this in the phi_info class.)
> >
> > Bootstrapped and regtested on aarch64-linux-gnu; no performance
> > regressions across SPEC2017.  New testcase added as provided by Alex
> > Coplan in the BZ, many thanks for that.
>
> Thanks for looking at this and for the nice explanation.
> I agree that the underlying problem is the lack of a live-out use.
> However, rather than add a separate mechanism for this case, I think we
> should instead force a phi to be created from the outset.  That will
> make sure that EBBs with a single dominating live-in definition
> (as here) are handled similarly to EBBs with potentially multiple
> incoming live-in definitions.
> 
> Also, rather than add a flag to the phi, I think we can test whether
> the next definition is in the same EBB.  This should also ensure that we
> handle EBBs that start out with potentially multiple incoming live-in
> definitions but eventually simplify to a single value.

Hi Richard, apologies for the slow reply.

I obviously agree that your approach is the right one to take.  I can
confirm that your patch fixes the original testcase; what is still missing
for it to pass bootstrap/regtest is handling of chains of degenerate PHIs
which didn't occur before.  Where before your patch phi->input_value (0)
would always point to a "real" definition (either an insn or a
non-degenerate PHI), now a CFG can be conceived where you'd have to
traverse a chain of (non-simplified) degenerate PHIs to get to that
definition.

I've made two main fixes on top of your patch to cater to this new
possibility:  (1) in function_info::live_out_value (), when one degenerate
PHI is feeding into another, we need to use the former as the input for
the latter, as otherwise we'd be setting the latter's def to NULL; (2) in
look_through_degenerate_phi (), we need to walk the PHI chain as explained
before (this also necessitates a small assertion change in
function_info::make_use_available ()).  There will be a performance
penalty associated with this walk in the worst case, but I believe that
this should not be a problem on real-life inputs.

Please let me know if these changes make sense to you and whether I should
post them as a v2.  (Full patch attached at the end of the message.)

> 
> The patch below is a rough sketch -- only tested on the testcase so far.
> I've tried to ensure that the final loop in place_phis still only has one
> AND with the live-in set, due to:
> 
>   // In extreme cases, the number of live-in registers can be much
>   // greater than the number of phi nodes needed in a block (see PR98863).
>   // Try to reduce the number of operations involving live-in sets by using
>   // PENDING as a staging area: registers in PENDING need phi nodes if
>   // they are live on entry to the corresponding block, but do not need
>   // phi nodes otherwise.
> 
> The change to create_ebbs is a bit of a hack.  Normally the block links
> are set up during the dominance walk, but that's too late for the new code.
> Perhaps we could move append_bb earlier, but I've not yet looked at whether
> that would adversely affect something else.

This works without any issues; I even went so far as to move the
append_bb () call from function_info::start_block () to
function_info::create_ebbs (), which also works.  (There are other
simplifications in order in function_info::start_block (), but they are
beyond the scope of this patch.)

Thanks,
Artemiy

diff --git a/gcc/rtl-ssa/access-utils.h b/gcc/rtl-ssa/access-utils.h
index 769446584bc..b938d80e960 100644
--- a/gcc/rtl-ssa/access-utils.h
+++ b/gcc/rtl-ssa/access-utils.h
@@ -220,9 +220,17 @@ template<typename T>
 inline const T *
 look_through_degenerate_phi (const T *access)
 {
-  if (auto *phi = dyn_cast<const phi_info *> (access))
-    if (phi->is_degenerate ())
-      return phi->input_value (0);
+  while (access)
+    {
+      auto *phi = dyn_cast <const phi_info *> (access);
+      if (!phi)
+       return access;
+
+      if (!phi->input_value (0) || !phi->is_degenerate ())
+       return phi;
+
+      access = phi->input_value (0);
+    }
   return access;
 }
 
diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index bdba5f734d3..d3a21078a7e 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -1450,7 +1450,8 @@ function_info::make_use_available (use_info *use, bb_info 
*bb,
        {
          // There is an existing phi.
          phi = as_a<phi_info *> (set);
-         gcc_checking_assert (phi->input_value (0) == ultimate_def);
+         gcc_checking_assert (ultimate_def
+               == look_through_degenerate_phi (phi->input_value (0)));
        }
       else
        {
diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
index d445fcc2cf2..b4cc5ac1f90 100644
--- a/gcc/rtl-ssa/blocks.cc
+++ b/gcc/rtl-ssa/blocks.cc
@@ -349,6 +349,12 @@ function_info::live_out_value (bb_info *bb, set_info *set)
       {
        set = phi->input_value (0);
 
+       // If the only input is NULL (which means that the SSA is still
+       // being constructed), return the PHI itself in order not to lose
+       // the use-def chain.
+       if (!set)
+         return phi;
+
        // Remove the phi if it turned out to be useless.  This is
        // mainly useful for memory, because we don't know ahead of time
        // whether a block will use memory or not.
@@ -454,7 +460,9 @@ function_info::replace_phi (phi_info *phi, set_info 
*new_value)
 
   if (new_value)
     for (use_info *use : phi->nondebug_insn_uses ())
-      if (!use->is_live_out_use ())
+      if (!use->is_live_out_use ()
+         || (phi->next_def ()
+             && phi->next_def ()->ebb () == phi->ebb ()))
        {
          // We need to keep the phi around for its local uses.
          // Turn it into a degenerate phi, if it isn't already.
@@ -731,12 +739,34 @@ function_info::place_phis (build_info &bi)
   basic_block cfg_bb;
   FOR_ALL_BB_FN (cfg_bb, m_fn)
     {
-      // Calculate the set of phi nodes for blocks that don't have any
-      // dominance frontiers.  We only need to do this once per block.
       unsigned int i = cfg_bb->index;
       bb_phi_info &phis = bi.bb_phis[i];
+
+      // The loop above propagated unfiltered[i] to phis[i].regs for
+      // blocks that have dominance frontiers.  Anything we do below
+      // is in addition to that, so we can start from a fresh set.
+      if (!bitmap_empty_p (&frontiers[i]))
+       bitmap_clear (&unfiltered[i]);
+
+      // If a live input is redefined by later blocks in the same EBB,
+      // we need to add live-out uses to prevent the later definition
+      // being moved into earlier blocks.  Force phis for that purpose,
+      // if they're not naturally needed.
+      auto *bb = m_bbs[i];
+      if (bb == bb->ebb ()->first_bb ())
+       for (auto other_bb : bb->ebb ()->bbs ())
+        if (other_bb != bb)
+          bitmap_ior_into (&unfiltered[i],
+                           &DF_LR_BB_INFO (other_bb->cfg_bb ())->def);
+
+      // Calculate the set of phi nodes for blocks that don't have any
+      // dominance frontiers.  Add any extra phis that are needed for
+      // blocks that do have dominance frontiers.  We only need to do
+      // this once per block.
       if (bitmap_empty_p (&frontiers[i]))
        bitmap_and (&phis.regs, &unfiltered[i], DF_LR_IN (cfg_bb));
+      else if (!bitmap_empty_p (&unfiltered[i]))
+       bitmap_ior_and_into (&phis.regs, &unfiltered[i], DF_LR_IN (cfg_bb));
 
       // Create an array that contains all phi inputs for this block.
       // See the comment above the member variables for more information.
@@ -1079,9 +1109,6 @@ function_info::start_block (build_info &bi, bb_info *bb)
   // Record the start of this block's definitions in the definitions stack.
   bi.old_def_stack_limit.safe_push (bi.def_stack.length ());
 
-  // Add the block itself.
-  append_bb (bb);
-
   // If the block starts an EBB, create the phi insn.  This insn should exist
   // for all EBBs, even if they don't (yet) need phis.
   if (bb == ebb->first_bb ())
@@ -1299,7 +1326,10 @@ function_info::create_ebbs (build_info &bi)
        // Create the EBB itself.
        auto *ebb = allocate<ebb_info> (bbs[0], bbs.last ());
        for (bb_info *bb : bbs)
-         bb->set_ebb (ebb);
+         {
+           bb->set_ebb (ebb);
+           append_bb (bb);
+         }
        bbs.truncate (0);
       }
 
> 
> (BTW, bb->cfg_bb ()->next_bb is not necessarily the next block in the EBB;
> see choose_next_block_in_ebb for details.)
> 
> Thanks,
> Richard
> 
> diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
> index d445fcc2cf2..af7fb435ace 100644
> --- a/gcc/rtl-ssa/blocks.cc
> +++ b/gcc/rtl-ssa/blocks.cc
> @@ -454,7 +454,9 @@ function_info::replace_phi (phi_info *phi, set_info 
> *new_value)
>  
>    if (new_value)
>      for (use_info *use : phi->nondebug_insn_uses ())
> -      if (!use->is_live_out_use ())
> +      // Live-out uses act as a movement barrier for later definitions
> +      // in the same EBB.
> +      if (!use->is_live_out_use () || phi->next_def ()->ebb () == phi->ebb 
> ())
>       {
>         // We need to keep the phi around for its local uses.
>         // Turn it into a degenerate phi, if it isn't already.
> @@ -731,12 +733,34 @@ function_info::place_phis (build_info &bi)
>    basic_block cfg_bb;
>    FOR_ALL_BB_FN (cfg_bb, m_fn)
>      {
> -      // Calculate the set of phi nodes for blocks that don't have any
> -      // dominance frontiers.  We only need to do this once per block.
>        unsigned int i = cfg_bb->index;
>        bb_phi_info &phis = bi.bb_phis[i];
> +
> +      // The loop above propagated unfiltered[i] to phis[i].regs for
> +      // blocks that have dominance frontiers.  Anything we do below
> +      // is in addition to that, so we can start from a fresh set.
> +      if (!bitmap_empty_p (&frontiers[i]))
> +     bitmap_clear (&unfiltered[i]);
> +
> +      // If a live input is redefined by later blocks in the same EBB,
> +      // we need to add live-out uses to prevent the later definition
> +      // being moved into earlier blocks.  Force phis for that purpose,
> +      // if they're not naturally needed.
> +      auto *bb = m_bbs[i];
> +      if (bb == bb->ebb ()->first_bb ())
> +     for (auto other_bb : bb->ebb ()->bbs ())
> +       if (other_bb != bb)
> +         bitmap_ior_into (&unfiltered[i],
> +                          &DF_LR_BB_INFO (other_bb->cfg_bb ())->def);
> +
> +      // Calculate the set of phi nodes for blocks that don't have any
> +      // dominance frontiers.  Add any extra phis that are needed for
> +      // blocks that do have dominance frontiers.  We only need to do
> +      // this once per block.
>        if (bitmap_empty_p (&frontiers[i]))
>       bitmap_and (&phis.regs, &unfiltered[i], DF_LR_IN (cfg_bb));
> +      else if (!bitmap_empty_p (&unfiltered[i]))
> +     bitmap_ior_and_into (&phis.regs, &unfiltered[i], DF_LR_IN (cfg_bb));
>  
>        // Create an array that contains all phi inputs for this block.
>        // See the comment above the member variables for more information.
> @@ -1298,8 +1322,17 @@ function_info::create_ebbs (build_info &bi)
>  
>       // Create the EBB itself.
>       auto *ebb = allocate<ebb_info> (bbs[0], bbs.last ());
> +     bb_info *prev_bb = nullptr;
>       for (bb_info *bb : bbs)
> -       bb->set_ebb (ebb);
> +       {
> +         bb->set_ebb (ebb);
> +         if (prev_bb)
> +           {
> +             bb->set_prev_bb (prev_bb);
> +             prev_bb->set_next_bb (bb);
> +           }
> +         prev_bb = bb;
> +       }
>       bbs.truncate (0);
>        }
>  

Reply via email to