From: Xin Wang <[email protected]>

When split_loop does iteration space splitting, split_at_bb_p may
swap the guard condition so that operand 0 is always the loop IV
and operand 1 is the invariant.  For example, "t < i" (LT_EXPR)
becomes "i > t" (GT_EXPR).  This can cause initial_true to be
false, meaning loop1 handles iterations where the guard is false
and loop2 handles iterations where the guard is true.

The function fix_loop_bb_probability scales loop1's body by
true_edge->probability and loop2's body by its inverse.  But when
initial_true is false, loop1's body executes the false edge path.
Loop1 should be scaled by false_edge->probability instead.

This inconsistency is visible in the guard patching code a few
lines below, which does swap force_true/force_false based on
initial_true.  The profile scaling should apply the same logic.

The bug caused BB counts in the split loops to be swapped when
initial_true is false: the loop body whose guard is forced false
(loop1, executing fewer iterations) would get the higher profile
count, and vice versa.

gcc/ChangeLog:

        * tree-ssa-loop-split.cc (split_loop): Pass edges to
        fix_loop_bb_probability with consideration of initial_true,
        so that loop1 is always scaled by the edge probability
        corresponding to the branch that actually executes in it.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-prof/loop-split-4.c: New test.

Signed-off-by: Xin Wang <[email protected]>
---
 gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c | 34 +++++++++++++++++++
 gcc/tree-ssa-loop-split.cc                    | 13 ++++++-
 2 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c

diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c 
b/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c
new file mode 100644
index 00000000000..7e0aa883276
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c
@@ -0,0 +1,34 @@
+/* PR tree-optimization/XXXXX */
+/* { dg-options "-O2 -fdump-tree-lsplit-details" } */
+
+volatile int sink;
+
+__attribute__((noinline)) int
+helper (int a, int b)
+{
+  return a + b;
+}
+
+int
+main (void)
+{
+  int n = 100, t = 3, total = 0;
+  /* With t=3, n=100 the guard "t < i" is false for i=0..3 (4 iterations,
+     empty else) and true for i=4..99 (96 iterations, calls helper).
+     split_at_bb_p swaps "t < i" to "i > t" (GT_EXPR), giving
+     initial_true = false.  Loop1 (cold, 4 iterations) handles the false
+     case, loop2 (hot, 96 iterations) handles the true case.
+     Without the fix loop1 was scaled by true_edge->probability (96%),
+     inverting the counts.  */
+  for (int i = 0; i < n; i++)
+    if (t < i)
+      total += helper (i, t);
+  sink = total;
+  return 0;
+}
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" } 
} */
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "lsplit" 
} } */
+/* With the fix loop1 (cold, 4 iterations) count ~4, loop2 (hot, 96
+   iterations) count ~96.  Without the fix the counts are inverted.
+   Check loop1 is a single-digit and loop2 is 90+.  */
+/* { dg-final-use-not-autofdo { scan-tree-dump "loop1 count \[0-9\], loop2 
count 9\[0-9\]" "lsplit" } } */
diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
index ba6cc45d7f0..8aa6275695b 100644
--- a/gcc/tree-ssa-loop-split.cc
+++ b/gcc/tree-ssa-loop-split.cc
@@ -712,7 +712,18 @@ split_loop (class loop *loop1)
                (loop_preheader_edge (loop2)->src)->probability
                        = loop1_prob.invert ();
 
-       fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
+       fix_loop_bb_probability (loop1, loop2,
+                                initial_true ? true_edge : false_edge,
+                                initial_true ? false_edge : true_edge);
+
+       if (dump_file && (dump_flags & TDF_DETAILS))
+         fprintf (dump_file,
+                  ";; Split loop: initial_true %s, "
+                  "loop1 count %" PRId64 ", loop2 count %" PRId64 "\n",
+                  initial_true ? "true" : "false",
+                  (int64_t) loop1->header->count.to_gcov_type (),
+                  (int64_t) loop2->header->count.to_gcov_type ());
+
        /* If conditional we split on has reliable profilea nd both
           preconditionals of loop1 and loop2 are constant true, we can
           only redistribute the iteration counts to the split loops.
-- 
2.33.0

Reply via email to