Artemiy Volkov <[email protected]> writes:
> This patch addresses a bug exposed by the dep_fusion pass on aarch64 and
> reported in PR123786. Essentially, the generic-armv9-a defines a CMP+CSEL
> fused instruction pair (in the following example those are insns 73 and
> 74) and dep_fusion moves 74 right after 73, as in:
>
> Before:
>
> (note 62 61 66 10 [bb 10] NOTE_INSN_BASIC_BLOCK)
> (...)
> (insn 73 71 69 10 (set (reg:CC 66 cc)
> (compare:CC (reg:SI 148 [ _4 ])
> (const_int 2 [0x2]))) "../test.c":27:33 discrim 2 452 {cmpsi}
> (expr_list:REG_DEAD (reg:SI 148 [ _4 ])
> (nil)))
> (jump_insn 69 73 70 10 (set (pc)
> (if_then_else (eq (reg:SI 109 [ _19 ])
> (const_int 0 [0]))
> (label_ref:DI 114)
> (pc))) "../test.c":27:33 7 {*aarch64_cbzeqsi}
> (int_list:REG_BR_PROB 536870913 (nil))
> -> 114)
> (note 70 69 74 11 [bb 11] NOTE_INSN_BASIC_BLOCK)
> (insn 74 70 142 11 (set (reg:SI 110 [ iftmp.3_21 ])
> (if_then_else:SI (leu (reg:CC 66 cc)
> (const_int 0 [0]))
> (reg:SI 109 [ _19 ])
> (const_int 0 [0]))) "../test.c":27:33 discrim 2 504 {*cmovsi_insn}
> (expr_list:REG_DEAD (reg:SI 109 [ _19 ])
> (expr_list:REG_DEAD (reg:CC 66 cc)
> (nil))))
> (jump_insn 142 74 143 11 (set (pc)
> (label_ref 76)) 6 {jump}
> (nil)
> -> 76)
>
> After:
>
> (note 62 61 66 10 [bb 10] NOTE_INSN_BASIC_BLOCK)
> (...)
> (insn 73 71 74 10 (set (reg:CC 66 cc)
> (compare:CC (reg:SI 148 [ _4 ])
> (const_int 2 [0x2]))) "../test.c":27:33 discrim 2 452 {cmpsi}
> (expr_list:REG_DEAD (reg:SI 148 [ _4 ])
> (nil)))
> (insn/s 74 73 69 10 (set (reg:SI 110 [ iftmp.3_21 ])
> (if_then_else:SI (leu (reg:CC 66 cc)
> (const_int 0 [0]))
> (reg:SI 109 [ _19 ])
> (const_int 0 [0]))) "../test.c":27:33 discrim 2 504 {*cmovsi_insn}
> (nil))
> (jump_insn 69 74 70 10 (set (pc)
> (if_then_else (eq (reg:SI 109 [ _19 ])
> (const_int 0 [0]))
> (label_ref:DI 114)
> (pc))) "../test.c":27:33 7 {*aarch64_cbzeqsi}
> (int_list:REG_BR_PROB 536870913 (nil))
> -> 114)
> (note 70 69 142 11 [bb 11] NOTE_INSN_BASIC_BLOCK)
> (jump_insn 142 70 143 11 (set (pc)
> (label_ref 76)) 6 {jump}
> (nil)
> -> 76)
>
> That is, we are moving insn 74 up past a conditional branch (insn 69),
> resulting in it being executed unconditionally.
>
> The root problem is the following: rtl_ssa::restrict_movement_for_defs (),
> when narrowing the allowed move range of a def within an EBB, uses the
> last use of the previous def (return by the last_access () function) for
> the earliest valid destination location (taken in the RTL list sense).
> For the following CFG (taken from the same function as the RTL fragments
> above):
>
> |
> | live-in
> | r110
> v
> ebb10
> +--------------------+
> | bb10 |
> | +------------+ |
> | | ... | |
> | +------------+ |
> | / \ |
> | / v |
> | / bb11 |
> |/ +------------+ |
> | | r110 = ... | |
> live-out /| | ... | |
> r110 / | +------------+ |
> / | / |
> v +--------------------+
> bb12 /
> +------------+ /
> | r110 = ... | /
> | ... | /
> +------------+ /
> \ /
> \ /
> v v
> bb13
> +------------+
> | use r110 |
> | ... |
> +------------+
>
> restrict_movement_for_defs () will detect that, from the perspective of
> the r110 def in bb11, the last use of the previous def was somewhere
> before the start of ebb10 and will therefore allow unrestricted movement
> of that def to any point within bb10.
>
> To address this, during initial traversal of a new EBB, we create an
> artificial live-out use at the bottom of the current BB when (a) a
> variable is live-in at the entry to the EBB, (b) it hasn't yet been
> redefined in the current EBB, and (c) it is redefined in the next BB.
> This live-out use serves as a barrier that restricts movement of new defs
> from later BBs having a new value to earlier ones still having the old
> live-in value. In the diagram, this live-out use will be created for r110
> at the end of bb10.
>
> Since the def that this newly created live-out use sees has to be
> dominating (otherwise whenever the last use occurs after the current EBB,
> the movement will be restricted to an empty range), we create a new def at
> the beginning of the EBB. For this, we use the create_reg_use () function
> that will also create a degenerate PHI for this purpose when needed.
>
> One problem that we run into with this approach is that during RTL-SSA
> construction single-input PHI nodes are sometimes eliminated as
> unnecessary, which causes the last dominating def to be reverted to an
> earlier EBB and the move range restriction failing as before. To remedy
> this, we mark PHIs that we create as 'persistent' and do not eliminate
> them where previously they would be replaced by their single input. (To
> remain aligned with the design goal of minimizing memory overhead that
> RTL-SSA incurs, the m_has_been_superceded flag has been repurposed to
> track this in the phi_info class.)
>
> Bootstrapped and regtested on aarch64-linux-gnu; no performance
> regressions across SPEC2017. New testcase added as provided by Alex
> Coplan in the BZ, many thanks for that.
Thanks for looking at this and for the nice explanation.
I agree that the underlying problem is the lack of a live-out use.
However, rather than add a separate mechanism for this case, I think we
should instead force a phi to be created from the outset. That will
make sure that EBBs with a single dominating live-in definition
(as here) are handled similarly to EBBs with potentially multiple
incoming live-in definitions.
Also, rather than add a flag to the phi, I think we can test whether
the next definition is in the same EBB. This should also ensure that we
handle EBBs that start out with potentially multiple incoming live-in
definitions but eventually simplify to a single value.
The patch below is a rough sketch -- only tested on the testcase so far.
I've tried to ensure that the final loop in place_phis still only has one
AND with the live-in set, due to:
// In extreme cases, the number of live-in registers can be much
// greater than the number of phi nodes needed in a block (see PR98863).
// Try to reduce the number of operations involving live-in sets by using
// PENDING as a staging area: registers in PENDING need phi nodes if
// they are live on entry to the corresponding block, but do not need
// phi nodes otherwise.
The change to create_ebbs is a bit of a hack. Normally the block links
are set up during the dominance walk, but that's too late for the new code.
Perhaps we could move append_bb earlier, but I've not yet looked at whether
that would adversely affect something else.
(BTW, bb->cfg_bb ()->next_bb is not necessarily the next block in the EBB;
see choose_next_block_in_ebb for details.)
Thanks,
Richard
diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
index d445fcc2cf2..af7fb435ace 100644
--- a/gcc/rtl-ssa/blocks.cc
+++ b/gcc/rtl-ssa/blocks.cc
@@ -454,7 +454,9 @@ function_info::replace_phi (phi_info *phi, set_info
*new_value)
if (new_value)
for (use_info *use : phi->nondebug_insn_uses ())
- if (!use->is_live_out_use ())
+ // Live-out uses act as a movement barrier for later definitions
+ // in the same EBB.
+ if (!use->is_live_out_use () || phi->next_def ()->ebb () == phi->ebb ())
{
// We need to keep the phi around for its local uses.
// Turn it into a degenerate phi, if it isn't already.
@@ -731,12 +733,34 @@ function_info::place_phis (build_info &bi)
basic_block cfg_bb;
FOR_ALL_BB_FN (cfg_bb, m_fn)
{
- // Calculate the set of phi nodes for blocks that don't have any
- // dominance frontiers. We only need to do this once per block.
unsigned int i = cfg_bb->index;
bb_phi_info &phis = bi.bb_phis[i];
+
+ // The loop above propagated unfiltered[i] to phis[i].regs for
+ // blocks that have dominance frontiers. Anything we do below
+ // is in addition to that, so we can start from a fresh set.
+ if (!bitmap_empty_p (&frontiers[i]))
+ bitmap_clear (&unfiltered[i]);
+
+ // If a live input is redefined by later blocks in the same EBB,
+ // we need to add live-out uses to prevent the later definition
+ // being moved into earlier blocks. Force phis for that purpose,
+ // if they're not naturally needed.
+ auto *bb = m_bbs[i];
+ if (bb == bb->ebb ()->first_bb ())
+ for (auto other_bb : bb->ebb ()->bbs ())
+ if (other_bb != bb)
+ bitmap_ior_into (&unfiltered[i],
+ &DF_LR_BB_INFO (other_bb->cfg_bb ())->def);
+
+ // Calculate the set of phi nodes for blocks that don't have any
+ // dominance frontiers. Add any extra phis that are needed for
+ // blocks that do have dominance frontiers. We only need to do
+ // this once per block.
if (bitmap_empty_p (&frontiers[i]))
bitmap_and (&phis.regs, &unfiltered[i], DF_LR_IN (cfg_bb));
+ else if (!bitmap_empty_p (&unfiltered[i]))
+ bitmap_ior_and_into (&phis.regs, &unfiltered[i], DF_LR_IN (cfg_bb));
// Create an array that contains all phi inputs for this block.
// See the comment above the member variables for more information.
@@ -1298,8 +1322,17 @@ function_info::create_ebbs (build_info &bi)
// Create the EBB itself.
auto *ebb = allocate<ebb_info> (bbs[0], bbs.last ());
+ bb_info *prev_bb = nullptr;
for (bb_info *bb : bbs)
- bb->set_ebb (ebb);
+ {
+ bb->set_ebb (ebb);
+ if (prev_bb)
+ {
+ bb->set_prev_bb (prev_bb);
+ prev_bb->set_next_bb (bb);
+ }
+ prev_bb = bb;
+ }
bbs.truncate (0);
}