From: Xin Wang <[email protected]>
When split_loop does iteration space splitting, split_at_bb_p may
swap the guard condition so that operand 0 is always the loop IV
and operand 1 is the invariant. For example, "t < i" (LT_EXPR)
becomes "i > t" (GT_EXPR). This can cause initial_true to be
false, meaning loop1 handles iterations where the guard is false
and loop2 handles iterations where the guard is true.
The function fix_loop_bb_probability scales loop1's body by
true_edge->probability and loop2's body by its inverse. But when
initial_true is false, loop1's body executes the false edge path.
Loop1 should be scaled by false_edge->probability instead.
This inconsistency is visible in the guard patching code a few
lines below, which does swap force_true/force_false based on
initial_true. The profile scaling should apply the same logic.
The bug caused BB counts in the split loops to be swapped when
initial_true is false: the loop body whose guard is forced false
(loop1, executing fewer iterations) would get the higher profile
count, and vice versa.
gcc/ChangeLog:
* tree-ssa-loop-split.cc (split_loop): Pass edges to
fix_loop_bb_probability with consideration of initial_true,
so that loop1 is always scaled by the edge probability
corresponding to the branch that actually executes in it.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/loop-split-4.c: New test.
Signed-off-by: Xin Wang <[email protected]>
---
gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c | 34 +++++++++++++++++++
gcc/tree-ssa-loop-split.cc | 13 ++++++-
2 files changed, 46 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c
diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c
b/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c
new file mode 100644
index 00000000000..7e0aa883276
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-4.c
@@ -0,0 +1,34 @@
+/* PR tree-optimization/XXXXX */
+/* { dg-options "-O2 -fdump-tree-lsplit-details" } */
+
+volatile int sink;
+
+__attribute__((noinline)) int
+helper (int a, int b)
+{
+ return a + b;
+}
+
+int
+main (void)
+{
+ int n = 100, t = 3, total = 0;
+ /* With t=3, n=100 the guard "t < i" is false for i=0..3 (4 iterations,
+ empty else) and true for i=4..99 (96 iterations, calls helper).
+ split_at_bb_p swaps "t < i" to "i > t" (GT_EXPR), giving
+ initial_true = false. Loop1 (cold, 4 iterations) handles the false
+ case, loop2 (hot, 96 iterations) handles the true case.
+ Without the fix loop1 was scaled by true_edge->probability (96%),
+ inverting the counts. */
+ for (int i = 0; i < n; i++)
+ if (t < i)
+ total += helper (i, t);
+ sink = total;
+ return 0;
+}
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit" }
} */
+/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0 "lsplit"
} } */
+/* With the fix loop1 (cold, 4 iterations) count ~4, loop2 (hot, 96
+ iterations) count ~96. Without the fix the counts are inverted.
+ Check loop1 is a single-digit and loop2 is 90+. */
+/* { dg-final-use-not-autofdo { scan-tree-dump "loop1 count \[0-9\], loop2
count 9\[0-9\]" "lsplit" } } */
diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
index ba6cc45d7f0..8aa6275695b 100644
--- a/gcc/tree-ssa-loop-split.cc
+++ b/gcc/tree-ssa-loop-split.cc
@@ -712,7 +712,18 @@ split_loop (class loop *loop1)
(loop_preheader_edge (loop2)->src)->probability
= loop1_prob.invert ();
- fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
+ fix_loop_bb_probability (loop1, loop2,
+ initial_true ? true_edge : false_edge,
+ initial_true ? false_edge : true_edge);
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ ";; Split loop: initial_true %s, "
+ "loop1 count %" PRId64 ", loop2 count %" PRId64 "\n",
+ initial_true ? "true" : "false",
+ (int64_t) loop1->header->count.to_gcov_type (),
+ (int64_t) loop2->header->count.to_gcov_type ());
+
/* If conditional we split on has reliable profilea nd both
preconditionals of loop1 and loop2 are constant true, we can
only redistribute the iteration counts to the split loops.
--
2.33.0