Before this patch, init_loop_unswitch_info always walked outwards through
a singly-nested loop chain (up to max-unswitch-depth) and selected the
outermost loop as the unswitching candidate. That heuristic assumes
unswitching the outer loop is always preferable, but it can cause large
code growth when the outer loop body is much bigger than the inner one:
the entire outer body gets duplicated for each guarded version, even
though the unswitched condition only matters inside the inner loop.
Switch to a guarded ascent: only keep walking out when the outer
candidate is not much larger than the current inner loop, or when its
size still fits within max-unswitch-insns. Concretely, stop ascending
when the outer is both more than twice the size of the inner and larger
than max-unswitch-insns. This preserves outer-loop unswitching for the
common case of small outer loops, while preventing the pathological
duplication when the outer body dominates the nest.
This patch fixes a regression in SPEC2026 772.marian_r introduced by
r13-3875-g9e11ceef165bc0
In the benchmark, the inner loop contains two predicates. After the guilty
commit, only one of them is hoisted out via unswitching; the second fails to be
hoisted because it exceeds param_max_unswitch_insns, and causes miss
vectorization. The outermost loop is very large — more than twice the size of
its inner loop — which makes unswitching it especially costly.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ok for trunk?
gcc/ChangeLog:
* tree-ssa-loop-unswitch.cc (estimate_loop_insns): New function.
(init_loop_unswitch_info): Do not select an outer loop for
unswitching when it is both more than twice the size of the inner
loop and larger than max-unswitch-insns.
gcc/testsuite/ChangeLog:
* gcc.dg/loop-unswitch-19.c: New test.
---
gcc/testsuite/gcc.dg/loop-unswitch-19.c | 42 +++++++++++++++++++++++
gcc/tree-ssa-loop-unswitch.cc | 44 ++++++++++++++++++++++---
2 files changed, 82 insertions(+), 4 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-19.c
diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-19.c
b/gcc/testsuite/gcc.dg/loop-unswitch-19.c
new file mode 100644
index 00000000000..24d450bd7a8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-unswitch-19.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-unswitch-optimized" } */
+
+extern volatile int sink;
+
+void
+foo (int flag, int n, int m)
+{
+ for (int i = 0; i < n; ++i)
+ {
+ /* Pad the outer loop to make it noticeably larger than the inner. */
+ sink += i;
+ sink += 2;
+ sink += 3;
+ sink += 4;
+ sink += 5;
+ sink += 6;
+ sink += 7;
+ sink += 8;
+ sink += 9;
+ sink += 10;
+ sink += 11;
+ sink += 12;
+ sink += 13;
+ sink += 14;
+ sink += 15;
+ sink += 16;
+ sink += 17;
+ sink += 18;
+ sink += 19;
+ sink += 20;
+
+ for (int j = 0; j < m; ++j)
+ if (flag)
+ sink += j;
+ }
+}
+
+/* We should still unswitch, but stay on the inner loop because the outer
+ body is much larger than the inner and exceeds the unswitch size cap. */
+/* { dg-final { scan-tree-dump "unswitching loop" "unswitch" } } */
+/* { dg-final { scan-tree-dump-not "unswitching outer loop" "unswitch" } } */
diff --git a/gcc/tree-ssa-loop-unswitch.cc b/gcc/tree-ssa-loop-unswitch.cc
index 96680bc64ea..06763b13911 100644
--- a/gcc/tree-ssa-loop-unswitch.cc
+++ b/gcc/tree-ssa-loop-unswitch.cc
@@ -250,6 +250,21 @@ set_predicates_for_bb (basic_block bb,
vec<unswitch_predicate *> predicates)
bb_predicates->safe_push (predicates);
}
+/* Estimate number of instructions in LOOP using eni_size_weights. */
+
+static unsigned
+estimate_loop_insns (class loop *loop)
+{
+ unsigned insns = 0;
+ basic_block *body = get_loop_body (loop);
+ for (unsigned i = 0; i < loop->num_nodes; i++)
+ for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]);
+ !gsi_end_p (gsi); gsi_next (&gsi))
+ insns += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
+ free (body);
+ return insns;
+}
+
/* Initialize LOOP information reused during the unswitching pass.
Return total number of instructions in the loop. Adjusts LOOP to
the outermost loop all candidates are invariant in. */
@@ -266,9 +281,31 @@ init_loop_unswitch_info (class loop *&loop,
unswitch_predicate *&hottest,
class loop *outer_loop = loop;
unsigned max_depth = param_max_unswitch_depth;
while (loop_outer (outer_loop)->num != 0
- && !loop_outer (outer_loop)->inner->next
- && --max_depth != 0)
- outer_loop = loop_outer (outer_loop);
+ && !loop_outer (outer_loop)->inner->next)
+ {
+ if (--max_depth == 0)
+ break;
+
+ class loop *candidate = loop_outer (outer_loop);
+ unsigned inner_size = estimate_loop_insns (outer_loop);
+ unsigned candidate_size = estimate_loop_insns (candidate);
+
+ /* Avoid selecting large outer loops when they are also much bigger
+ than the current inner loop. */
+ if (candidate_size > 2 * inner_size
+ && candidate_size > (unsigned) param_max_unswitch_insns)
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_NOTE, find_loop_location (loop),
+ "Not unswitching outer loop, size %u > 2x inner "
+ "%u and exceeds max-unswitch-insns %u\n",
+ candidate_size, inner_size,
+ (unsigned) param_max_unswitch_insns);
+ break;
+ }
+
+ outer_loop = candidate;
+ }
hottest = NULL;
hottest_bb = NULL;
/* Find all unswitching candidates in the innermost loop. */
@@ -1678,4 +1715,3 @@ make_pass_tree_unswitch (gcc::context *ctxt)
return new pass_tree_unswitch (ctxt);
}
-
--
2.34.1