This patch addresses a bug exposed by the dep_fusion pass on aarch64 and
reported in PR123786.  Essentially, the generic-armv9-a defines a CMP+CSEL
fused instruction pair (in the following example those are insns 73 and
74) and dep_fusion moves 74 right after 73, as in:

Before:

(note 62 61 66 10 [bb 10] NOTE_INSN_BASIC_BLOCK)
(...)
(insn 73 71 69 10 (set (reg:CC 66 cc)
        (compare:CC (reg:SI 148 [ _4 ])
            (const_int 2 [0x2]))) "../test.c":27:33 discrim 2 452 {cmpsi}
     (expr_list:REG_DEAD (reg:SI 148 [ _4 ])
        (nil)))
(jump_insn 69 73 70 10 (set (pc)
        (if_then_else (eq (reg:SI 109 [ _19 ])
                (const_int 0 [0]))
            (label_ref:DI 114)
            (pc))) "../test.c":27:33 7 {*aarch64_cbzeqsi}
     (int_list:REG_BR_PROB 536870913 (nil))
 -> 114)
(note 70 69 74 11 [bb 11] NOTE_INSN_BASIC_BLOCK)
(insn 74 70 142 11 (set (reg:SI 110 [ iftmp.3_21 ])
        (if_then_else:SI (leu (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg:SI 109 [ _19 ])
            (const_int 0 [0]))) "../test.c":27:33 discrim 2 504 {*cmovsi_insn}
     (expr_list:REG_DEAD (reg:SI 109 [ _19 ])
        (expr_list:REG_DEAD (reg:CC 66 cc)
            (nil))))
(jump_insn 142 74 143 11 (set (pc)
        (label_ref 76)) 6 {jump}
     (nil)
 -> 76)

After:

(note 62 61 66 10 [bb 10] NOTE_INSN_BASIC_BLOCK)
(...)
(insn 73 71 74 10 (set (reg:CC 66 cc)
        (compare:CC (reg:SI 148 [ _4 ])
            (const_int 2 [0x2]))) "../test.c":27:33 discrim 2 452 {cmpsi}
     (expr_list:REG_DEAD (reg:SI 148 [ _4 ])
        (nil)))
(insn/s 74 73 69 10 (set (reg:SI 110 [ iftmp.3_21 ])
        (if_then_else:SI (leu (reg:CC 66 cc)
                (const_int 0 [0]))
            (reg:SI 109 [ _19 ])
            (const_int 0 [0]))) "../test.c":27:33 discrim 2 504 {*cmovsi_insn}
     (nil))
(jump_insn 69 74 70 10 (set (pc)
        (if_then_else (eq (reg:SI 109 [ _19 ])
                (const_int 0 [0]))
            (label_ref:DI 114)
            (pc))) "../test.c":27:33 7 {*aarch64_cbzeqsi}
     (int_list:REG_BR_PROB 536870913 (nil))
 -> 114)
(note 70 69 142 11 [bb 11] NOTE_INSN_BASIC_BLOCK)
(jump_insn 142 70 143 11 (set (pc)
        (label_ref 76)) 6 {jump}
     (nil)
 -> 76)

That is, we are moving insn 74 up past a conditional branch (insn 69),
resulting in it being executed unconditionally.

The root problem is the following: rtl_ssa::restrict_movement_for_defs (),
when narrowing the allowed move range of a def within an EBB, uses the
last use of the previous def (return by the last_access () function) for
the earliest valid destination location (taken in the RTL list sense).
For the following CFG (taken from the same function as the RTL fragments
above):

                        |
                        |   live-in
                        |    r110
                        v
                      ebb10
                +--------------------+
                |    bb10            |
                | +------------+     |
                | | ...        |     |
                | +------------+     |
                |   /    \           |
                |  /      v          |
                | /      bb11        |
                |/    +------------+ |
                |     | r110 = ... | |
     live-out  /|     | ...        | |
        r110  / |     +------------+ |
             /  |               /    |
            v   +--------------------+
           bb12               /
      +------------+         /
      | r110 = ... |        /
      | ...        |       /
      +------------+      /
                \        /
                 \      /
                  v    v
                   bb13
                +------------+
                | use r110   |
                | ...        |
                +------------+

restrict_movement_for_defs () will detect that, from the perspective of
the r110 def in bb11, the last use of the previous def was somewhere
before the start of ebb10 and will therefore allow unrestricted movement
of that def to any point within bb10.

To address this, during initial traversal of a new EBB, we create an
artificial live-out use at the bottom of the current BB when (a) a
variable is live-in at the entry to the EBB, (b) it hasn't yet been
redefined in the current EBB, and (c) it is redefined in the next BB.
This live-out use serves as a barrier that restricts movement of new defs
from later BBs having a new value to earlier ones still having the old
live-in value.  In the diagram, this live-out use will be created for r110
at the end of bb10.

Since the def that this newly created live-out use sees has to be
dominating (otherwise whenever the last use occurs after the current EBB,
the movement will be restricted to an empty range), we create a new def at
the beginning of the EBB.  For this, we use the create_reg_use () function
that will also create a degenerate PHI for this purpose when needed.

One problem that we run into with this approach is that during RTL-SSA
construction single-input PHI nodes are sometimes eliminated as
unnecessary, which causes the last dominating def to be reverted to an
earlier EBB and the move range restriction failing as before.  To remedy
this, we mark PHIs that we create as 'persistent' and do not eliminate
them where previously they would be replaced by their single input.  (To
remain aligned with the design goal of minimizing memory overhead that
RTL-SSA incurs, the m_has_been_superceded flag has been repurposed to
track this in the phi_info class.)

Bootstrapped and regtested on aarch64-linux-gnu; no performance
regressions across SPEC2017.  New testcase added as provided by Alex
Coplan in the BZ, many thanks for that.

        PR rtl-optimization/123786

gcc/ChangeLog:

        * rtl-ssa/accesses.cc (access_info::print_prefix_flags): Amend
        output prefix for persistent phis.
        * rtl-ssa/accesses.h (class access_info): Add friend class
        phi_info.
        (access_info::m_has_been_superceded): Amend comment.
        (phi_info::set_is_persistent_phi): New method.
        (phi_info::is_persistent_phi): Likewise.
        * rtl-ssa/blocks.cc (function_info::live_out_value): Refuse to
        simplify phis marked as persistent.
        (function_info::record_block_live_out): Add live-out uses for
        variables live-in at the entry to an EBB and redefined in the next BB of
        that EBB.  Create degenerate phi nodes if needed and mark those as
        persistent.
        * rtl-ssa/functions.cc (function_info::simplify_phis): Refuse to
        simplify phis marked as persistent.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/pr123786.c: New test.
---
 gcc/rtl-ssa/accesses.cc                     |  7 +++-
 gcc/rtl-ssa/accesses.h                      | 10 ++++-
 gcc/rtl-ssa/blocks.cc                       | 43 ++++++++++++++++++++-
 gcc/rtl-ssa/functions.cc                    |  4 +-
 gcc/testsuite/gcc.target/aarch64/pr123786.c | 38 ++++++++++++++++++
 5 files changed, 98 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr123786.c

diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index bdba5f734d3..c2f1d8c6993 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -137,7 +137,12 @@ access_info::print_prefix_flags (pretty_printer *pp) const
   if (m_is_temp)
     pp_string (pp, "temporary ");
   if (m_has_been_superceded)
-    pp_string (pp, "superceded ");
+    {
+      if (is_a <phi_info *>(this))
+       pp_string (pp, "persistent ");
+      else
+       pp_string (pp, "superceded ");
+    }
 }
 
 // Print properties not handled by print_prefix_flags to PP, putting
diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h
index 257c6a2fd39..d1a2d5fad42 100644
--- a/gcc/rtl-ssa/accesses.h
+++ b/gcc/rtl-ssa/accesses.h
@@ -116,6 +116,7 @@ class access_info
 {
   // Size: 1 LP64 word
   friend class function_info;
+  friend class phi_info;
 
 public:
   // Return the resource that is being accessed.
@@ -248,7 +249,7 @@ protected:
 
 private:
   // Used as a flag during various update routines; has no long-lasting
-  // meaning.
+  // meaning.  For PHI nodes, it's used to inhibit simplification.
   unsigned int m_has_been_superceded : 1;
 
   // Indicates that this access has been allocated on the function_info's
@@ -826,6 +827,13 @@ public:
   // value is completely undefined for that edge.
   set_info *input_value (unsigned int e) const;
 
+  // Set the persistent flag so that no simplification is attempted even
+  // for degenerate PHIs.
+  void set_is_persistent_phi (bool value) { m_has_been_superceded = value; }
+
+  // Return the value of the persistent flag.
+  bool is_persistent_phi () const { return m_has_been_superceded; }
+
   // Print a description of the phi node to PP under the control of
   // PP_ACCESS_* flags FLAGS.
   void print (pretty_printer *pp,
diff --git a/gcc/rtl-ssa/blocks.cc b/gcc/rtl-ssa/blocks.cc
index d445fcc2cf2..21104421d9e 100644
--- a/gcc/rtl-ssa/blocks.cc
+++ b/gcc/rtl-ssa/blocks.cc
@@ -352,7 +352,9 @@ function_info::live_out_value (bb_info *bb, set_info *set)
        // Remove the phi if it turned out to be useless.  This is
        // mainly useful for memory, because we don't know ahead of time
        // whether a block will use memory or not.
-       if (bb == bb->ebb ()->last_bb () && all_uses_are_live_out_uses (phi))
+       if (bb == bb->ebb ()->last_bb ()
+           && !phi->is_persistent_phi ()
+           && all_uses_are_live_out_uses (phi))
          replace_phi (phi, set);
       }
 
@@ -1058,6 +1060,45 @@ function_info::record_block_live_out (build_info &bi)
          record_live_out_regs (DF_LR_IN (e->dest));
       }
 
+  // Create live-out values for those registers that were live-in on entry
+  // to the EBB, still contain the live-in value, and are going to be
+  // redefined in the next BB.  This is needed to track liveness of
+  // live-in values and restrict movement of defs between BBs.
+  if (bb != bi.current_ebb->last_bb ())
+    {
+      unsigned int regno;
+      bitmap_iterator out_bi;
+      basic_block first_cfg_bb = ebb->first_bb ()->cfg_bb ();
+      EXECUTE_IF_AND_IN_BITMAP (DF_LR_IN (first_cfg_bb),
+                               &DF_LR_BB_INFO (bb->cfg_bb ()->next_bb)->def,
+                               0, regno, out_bi)
+       {
+         // If the live-in definition of REGNO has already been
+         // overwritten in this EBB, we don't need to do anything here.
+         if (bitmap_bit_p (bi.ebb_def_regs, regno))
+           continue;
+
+         // Check if there isn't a use of REGNO at the end of BB already.
+         set_info *value = bi.current_reg_value (regno);
+         if (!value || find_use (value, bb->end_insn ()).matching_use ())
+           continue;
+
+         // None found, create a new live-out use and maybe a new
+         // dominating def.
+         use_info *live_out_use = create_reg_use (bi,
+                                                  bb->end_insn (),
+                                                  value->resource ());
+         live_out_use->set_is_live_out_use (true);
+
+         // The create_reg_use () function could have created a
+         // degenerate PHI to serve as a dominating def of the new
+         // live-out use.  We need to mark it as immutable so that the
+         // post-processing pass does not simplify it away.
+         if (phi_info *phi = dyn_cast<phi_info *>(live_out_use->def ()))
+           phi->set_is_persistent_phi (true);
+       }
+    }
+
   // Record the live-out memory value.
   bi.bb_mem_live_out[cfg_bb->index]
     = live_out_value (bb, bi.current_mem_value ());
diff --git a/gcc/rtl-ssa/functions.cc b/gcc/rtl-ssa/functions.cc
index ad5ed95fc78..17c7c595db7 100644
--- a/gcc/rtl-ssa/functions.cc
+++ b/gcc/rtl-ssa/functions.cc
@@ -286,7 +286,9 @@ function_info::simplify_phis ()
 
   // Update any phis that turned out to be equivalent to a single input.
   for (unsigned int i = 0; i < m_next_phi_uid; ++i)
-    if (bitmap_bit_p (valid_phi_uids, i) && phis[i] != assumed_values[i])
+    if (bitmap_bit_p (valid_phi_uids, i)
+       && !phis[i]->is_persistent_phi ()
+       && phis[i] != assumed_values[i])
       replace_phi (phis[i], assumed_values[i]);
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr123786.c 
b/gcc/testsuite/gcc.target/aarch64/pr123786.c
new file mode 100644
index 00000000000..1b59f35949c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr123786.c
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -mtune=generic-armv9-a" } */
+
+#include <arm_neon.h>
+
+int g_36[3];
+short g_s;
+signed char g_82, g_179, g_x, g_101;
+
+__attribute__((noipa))
+void check(void)
+{
+  if (g_36[0] != 1)
+    __builtin_abort ();
+}
+
+int main(void)
+{
+BS_LABEL_3:
+  short l_65 = g_s;
+  if (g_x) goto BS_LABEL_6;
+  for (; g_101; g_101 = 5)
+  {
+  BS_LABEL_6:
+  }
+  for (; g_82 < 1; g_82++)
+  {
+    switch (vqadds_u32 (0, 0))
+      {
+        case 4: goto BS_LABEL_3;
+        case 9: goto BS_LABEL_3;
+      }
+    short si1 = (l_65 &= 1) || (g_179 &= 0) != 6;
+    g_36[0] = g_36[2] == 0 ? 1 : si1 / g_36[2];
+  }
+
+  check ();
+}
-- 
2.43.0

Reply via email to