Artemiy Volkov <[email protected]> writes:
> This patch addresses a bug exposed by the dep_fusion pass on aarch64 and
> reported in PR123786.  Essentially, the generic-armv9-a defines a CMP+CSEL
> fused instruction pair (in the following example those are insns 73 and
> 74) and dep_fusion moves 74 right after 73, as in:
>
> Before:
>
> (note 62 61 66 10 [bb 10] NOTE_INSN_BASIC_BLOCK)
> (...)
> (insn 73 71 69 10 (set (reg:CC 66 cc)
>         (compare:CC (reg:SI 148 [ _4 ])
>             (const_int 2 [0x2]))) "../test.c":27:33 discrim 2 452 {cmpsi}
>      (expr_list:REG_DEAD (reg:SI 148 [ _4 ])
>         (nil)))
> (jump_insn 69 73 70 10 (set (pc)
>         (if_then_else (eq (reg:SI 109 [ _19 ])
>                 (const_int 0 [0]))
>             (label_ref:DI 114)
>             (pc))) "../test.c":27:33 7 {*aarch64_cbzeqsi}
>      (int_list:REG_BR_PROB 536870913 (nil))
>  -> 114)
> (note 70 69 74 11 [bb 11] NOTE_INSN_BASIC_BLOCK)
> (insn 74 70 142 11 (set (reg:SI 110 [ iftmp.3_21 ])
>         (if_then_else:SI (leu (reg:CC 66 cc)
>                 (const_int 0 [0]))
>             (reg:SI 109 [ _19 ])
>             (const_int 0 [0]))) "../test.c":27:33 discrim 2 504 {*cmovsi_insn}
>      (expr_list:REG_DEAD (reg:SI 109 [ _19 ])
>         (expr_list:REG_DEAD (reg:CC 66 cc)
>             (nil))))
> (jump_insn 142 74 143 11 (set (pc)
>         (label_ref 76)) 6 {jump}
>      (nil)
>  -> 76)
>
> After:
>
> (note 62 61 66 10 [bb 10] NOTE_INSN_BASIC_BLOCK)
> (...)
> (insn 73 71 74 10 (set (reg:CC 66 cc)
>         (compare:CC (reg:SI 148 [ _4 ])
>             (const_int 2 [0x2]))) "../test.c":27:33 discrim 2 452 {cmpsi}
>      (expr_list:REG_DEAD (reg:SI 148 [ _4 ])
>         (nil)))
> (insn/s 74 73 69 10 (set (reg:SI 110 [ iftmp.3_21 ])
>         (if_then_else:SI (leu (reg:CC 66 cc)
>                 (const_int 0 [0]))
>             (reg:SI 109 [ _19 ])
>             (const_int 0 [0]))) "../test.c":27:33 discrim 2 504 {*cmovsi_insn}
>      (nil))
> (jump_insn 69 74 70 10 (set (pc)
>         (if_then_else (eq (reg:SI 109 [ _19 ])
>                 (const_int 0 [0]))
>             (label_ref:DI 114)
>             (pc))) "../test.c":27:33 7 {*aarch64_cbzeqsi}
>      (int_list:REG_BR_PROB 536870913 (nil))
>  -> 114)
> (note 70 69 142 11 [bb 11] NOTE_INSN_BASIC_BLOCK)
> (jump_insn 142 70 143 11 (set (pc)
>         (label_ref 76)) 6 {jump}
>      (nil)
>  -> 76)
>
> That is, we are moving insn 74 up past a conditional branch (insn 69),
> resulting in it being executed unconditionally.
>
> The root problem is the following: rtl_ssa::restrict_movement_for_defs (),
> when narrowing the allowed move range of a def within an EBB, uses the
> last use of the previous def (return by the last_access () function) for
> the earliest valid destination location (taken in the RTL list sense).
> For the following CFG (taken from the same function as the RTL fragments
> above):
>
>                       |
>                       |   live-in
>                       |    r110
>                       v
>                     ebb10
>               +--------------------+
>               |    bb10            |
>               | +------------+     |
>               | | ...        |     |
>               | +------------+     |
>               |   /    \           |
>               |  /      v          |
>               | /      bb11        |
>               |/    +------------+ |
>               |     | r110 = ... | |
>      live-out  /|     |       ...        | |
>       r110  / |     +------------+ |
>            /  |               /    |
>           v   +--------------------+
>          bb12               /
>       +------------+         /
>       | r110 = ... |        /
>       | ...        |       /
>       +------------+      /
>               \        /
>                \      /
>                 v    v
>                  bb13
>               +------------+
>               | use r110   |
>               | ...        |
>               +------------+
>
> restrict_movement_for_defs () will detect that, from the perspective of
> the r110 def in bb11, the last use of the previous def was somewhere
> before the start of ebb10 and will therefore allow unrestricted movement
> of that def to any point within bb10.
>
> To address this, during initial traversal of a new EBB, we create an
> artificial live-out use at the bottom of the current BB when (a) a
> variable is live-in at the entry to the EBB, (b) it hasn't yet been
> redefined in the current EBB, and (c) it is redefined in the next BB.
> This live-out use serves as a barrier that restricts movement of new defs
> from later BBs having a new value to earlier ones still having the old
> live-in value.  In the diagram, this live-out use will be created for r110
> at the end of bb10.
>
> Since the def that this newly created live-out use sees has to be
> dominating (otherwise whenever the last use occurs after the current EBB,
> the movement will be restricted to an empty range), we create a new def at
> the beginning of the EBB.  For this, we use the create_reg_use () function
> that will also create a degenerate PHI for this purpose when needed.
>
> One problem that we run into with this approach is that during RTL-SSA
> construction single-input PHI nodes are sometimes eliminated as
> unnecessary, which causes the last dominating def to be reverted to an
> earlier EBB and the move range restriction failing as before.  To remedy
> this, we mark PHIs that we create as 'persistent' and do not eliminate
> them where previously they would be replaced by their single input.  (To
> remain aligned with the design goal of minimizing memory overhead that
> RTL-SSA incurs, the m_has_been_superceded flag has been repurposed to
> track this in the phi_info class.)
>
> Bootstrapped and regtested on aarch64-linux-gnu; no performance
> regressions across SPEC2017.  New testcase added as provided by Alex
> Coplan in the BZ, many thanks for that.

Thanks for looking at this and for the nice explanation.
I agree that the underlying problem is the lack of a live-out use.
However, rather than add a separate mechanism for this case, I think we
should instead force a phi to be created from the outset.  That will
make sure that EBBs with a single dominating live-in definition
(as here) are handled similarly to EBBs with potentially multiple
incoming live-in definitions.

Also, rather than add a flag to the phi, I think we can test whether
the next definition is in the same EBB.  This should also ensure that we
handle EBBs that start out with potentially multiple incoming live-in
definitions but eventually simplify to a single value.

The patch below is a rough sketch -- only tested on the testcase so far.
I've tried to ensure that the final loop in place_phis still only has one
AND with the live-in set, due to:

  // In extreme cases, the number of live-in registers can be much
  // greater than the number of phi nodes needed in a block (see PR98863).
  // Try to reduce the number of operations involving live-in sets by using
  // PENDING as a staging area: registers in PENDING need phi nodes if
  // they are live on entry to the corresponding block, but do not need
  // phi nodes otherwise.

The change to create_ebbs is a bit of a hack.  Normally the block links
are set up during the dominance walk, but that's too late for the new code.
Perhaps we could move append_bb earlier, but I've not yet looked at whether
that would adversely affect something else.

(BTW, bb->cfg_bb ()->next_bb is not necessarily the next block in the EBB;
see choose_next_block_in_ebb for details.)

Thanks,
Richard

diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
index d445fcc2cf2..af7fb435ace 100644
--- a/gcc/rtl-ssa/blocks.cc
+++ b/gcc/rtl-ssa/blocks.cc
@@ -454,7 +454,9 @@ function_info::replace_phi (phi_info *phi, set_info 
*new_value)
 
   if (new_value)
     for (use_info *use : phi->nondebug_insn_uses ())
-      if (!use->is_live_out_use ())
+      // Live-out uses act as a movement barrier for later definitions
+      // in the same EBB.
+      if (!use->is_live_out_use () || phi->next_def ()->ebb () == phi->ebb ())
        {
          // We need to keep the phi around for its local uses.
          // Turn it into a degenerate phi, if it isn't already.
@@ -731,12 +733,34 @@ function_info::place_phis (build_info &bi)
   basic_block cfg_bb;
   FOR_ALL_BB_FN (cfg_bb, m_fn)
     {
-      // Calculate the set of phi nodes for blocks that don't have any
-      // dominance frontiers.  We only need to do this once per block.
       unsigned int i = cfg_bb->index;
       bb_phi_info &phis = bi.bb_phis[i];
+
+      // The loop above propagated unfiltered[i] to phis[i].regs for
+      // blocks that have dominance frontiers.  Anything we do below
+      // is in addition to that, so we can start from a fresh set.
+      if (!bitmap_empty_p (&frontiers[i]))
+       bitmap_clear (&unfiltered[i]);
+
+      // If a live input is redefined by later blocks in the same EBB,
+      // we need to add live-out uses to prevent the later definition
+      // being moved into earlier blocks.  Force phis for that purpose,
+      // if they're not naturally needed.
+      auto *bb = m_bbs[i];
+      if (bb == bb->ebb ()->first_bb ())
+       for (auto other_bb : bb->ebb ()->bbs ())
+         if (other_bb != bb)
+           bitmap_ior_into (&unfiltered[i],
+                            &DF_LR_BB_INFO (other_bb->cfg_bb ())->def);
+
+      // Calculate the set of phi nodes for blocks that don't have any
+      // dominance frontiers.  Add any extra phis that are needed for
+      // blocks that do have dominance frontiers.  We only need to do
+      // this once per block.
       if (bitmap_empty_p (&frontiers[i]))
        bitmap_and (&phis.regs, &unfiltered[i], DF_LR_IN (cfg_bb));
+      else if (!bitmap_empty_p (&unfiltered[i]))
+       bitmap_ior_and_into (&phis.regs, &unfiltered[i], DF_LR_IN (cfg_bb));
 
       // Create an array that contains all phi inputs for this block.
       // See the comment above the member variables for more information.
@@ -1298,8 +1322,17 @@ function_info::create_ebbs (build_info &bi)
 
        // Create the EBB itself.
        auto *ebb = allocate<ebb_info> (bbs[0], bbs.last ());
+       bb_info *prev_bb = nullptr;
        for (bb_info *bb : bbs)
-         bb->set_ebb (ebb);
+         {
+           bb->set_ebb (ebb);
+           if (prev_bb)
+             {
+               bb->set_prev_bb (prev_bb);
+               prev_bb->set_next_bb (bb);
+             }
+           prev_bb = bb;
+         }
        bbs.truncate (0);
       }
 

Reply via email to