The following fixes a regression introduced by r11-5542 which
restricts replacing uses of live original defs of now vectorized
stmts to when that does not require new loop-closed PHIs to be
inserted.  That restriction keeps the original scalar definition
live which is sub-optimal and also not reflected in costing.

The particular case the following fixes can be seen in
gcc.dg/vect/bb-slp-57.c is the case where we are replacing an
existing loop closed PHI argument.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, will push
after that succeeded.  I'm also considering backporting to at least
GCC 15 after a while (code gen for gcc.dg/vect/bb-slp-57.c is really
awful w/o this fix).

        PR tree-optimization/98064
        * tree-vect-loop.cc (vectorizable_live_operation): Do
        not restrict replacing uses in a LC PHI.

        * gcc.dg/vect/bb-slp-57.c: Verify we do not keep original
        stmts live.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-57.c |  1 +
 gcc/tree-vect-loop.cc                 | 45 ++++++++++++++++-----------
 2 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-57.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-57.c
index 6f13507fd67..6633a3092ad 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-57.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-57.c
@@ -36,3 +36,4 @@ void l()
 
 /* { dg-final { scan-tree-dump-times "transform load" 1 "slp1" { target { { 
x86_64-*-* i?86-*-* } && lp64 } } } } */
 /* { dg-final { scan-tree-dump "optimized: basic block" "slp1" { target { { 
x86_64-*-* i?86-*-* } && lp64 } } } } */
+/* { dg-final { scan-tree-dump-not "missed: Using original scalar computation" 
"slp1" } } */
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 670a03ea06b..4818a8e88a1 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10441,26 +10441,35 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
                                   "def\n");
                continue;
              }
-           /* ???  It can also happen that we end up pulling a def into
-              a loop where replacing out-of-loop uses would require
-              a new LC SSA PHI node.  Retain the original scalar in
-              those cases as well.  PR98064.  */
-           if (TREE_CODE (new_tree) == SSA_NAME
-               && !SSA_NAME_IS_DEFAULT_DEF (new_tree)
-               && (gimple_bb (use_stmt)->loop_father
-                   != gimple_bb (vec_stmt)->loop_father)
-               && !flow_loop_nested_p (gimple_bb (vec_stmt)->loop_father,
-                                       gimple_bb (use_stmt)->loop_father))
+           FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
              {
-               if (dump_enabled_p ())
-                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                                  "Using original scalar computation for "
-                                  "live lane because there is an out-of-loop "
-                                  "definition for it\n");
-               continue;
+               /* ???  It can also happen that we end up pulling a def into
+                  a loop where replacing out-of-loop uses would require
+                  a new LC SSA PHI node.  Retain the original scalar in
+                  those cases as well.  PR98064.  */
+               edge e;
+               if (TREE_CODE (new_tree) == SSA_NAME
+                   && !SSA_NAME_IS_DEFAULT_DEF (new_tree)
+                   && (gimple_bb (use_stmt)->loop_father
+                       != gimple_bb (vec_stmt)->loop_father)
+                   /* But a replacemend in a LC PHI is OK.  This happens
+                      in gcc.dg/vect/bb-slp-57.c for example.  */
+                   && (gimple_code (use_stmt) != GIMPLE_PHI
+                       || (((e = phi_arg_edge_from_use (use_p)), true)
+                           && !loop_exit_edge_p
+                                 (gimple_bb (vec_stmt)->loop_father, e)))
+                   && !flow_loop_nested_p (gimple_bb (vec_stmt)->loop_father,
+                                           gimple_bb (use_stmt)->loop_father))
+                 {
+                   if (dump_enabled_p ())
+                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                                      "Using original scalar computation for "
+                                      "live lane because there is an "
+                                      "out-of-loop definition for it\n");
+                   continue;
+                 }
+               SET_USE (use_p, new_tree);
              }
-           FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-             SET_USE (use_p, new_tree);
            update_stmt (use_stmt);
          }
     }
-- 
2.51.0

Reply via email to