On Tue, Oct 24, 2017 at 3:30 PM, Michael Matz <[email protected]> wrote:
> Hello,
>
> On Fri, 22 Sep 2017, Bin.Cheng wrote:
>
>> This is updated patch for loop interchange with review suggestions
>> resolved. Changes are:
>> 1) It does more light weight checks like rectangle loop nest check
>> earlier than before.
>> 2) It checks profitability of interchange before data dependence
>> computation.
>> 3) It calls find_data_references_in_loop only once for a loop nest now.
>> 4) Data dependence is open-computed so that we can skip instantly at
>> unknown dependence.
>> 5) It improves code generation in mapping induction variables for
>> loop nest, as well as
>> adding a simple dead code elimination pass.
>> 6) It changes magic constants into parameters.
>
> So I have a couple comments/questions. Something stylistic:
Hi Michael,
Thanks for reviewing.
>
>> +class loop_cand
>> +{
>> +public:
>> ...
>> + friend class tree_loop_interchange;
>> +private:
>
> Just make this all public (and hence a struct, not class).
> No need for friends in file local classes.
Done.
>
>> +single_use_in_loop (tree var, struct loop *loop)
>> ...
>> + FOR_EACH_IMM_USE_FAST (use_p, iterator, var)
>> + {
>> + stmt = USE_STMT (use_p);
>> ...
>> + basic_block bb = gimple_bb (stmt);
>> + gcc_assert (bb != NULL);
>
> This pattern reoccurs often in your patch: you check for a bb associated
> for a USE_STMT. Uses of SSA names always occur in basic blocks, no need
> for checking.
Done.
>
> Then, something about your handling of simple reductions:
>
>> +void
>> +loop_cand::classify_simple_reduction (reduction_p re)
>> +{
>> ...
>> + /* Require memory references in producer and consumer are the same so
>> + that we can undo reduction during interchange. */
>> + if (re->init_ref && !operand_equal_p (re->init_ref, re->fini_ref, 0))
>> + return;
>
> Where is it checked that the undoing transformation is legal also
> from a data dep point of view? Think code like this:
>
> sum = X[i];
> for (j ...)
> sum += X[j];
> X[i] = sum;
>
> Moving the store into the inner loop isn't always correct and I don't seem
> to find where the above situation is rejected.
Yeah. for the old patch, it's possible to have such loop wrongly interchanged;
in practice, it's hard to create an example. The pass will give up
when computing
data dep between references in inner/outer loops. In this updated
patch, it's fixed
by giving up if there is any dependence between references of inner/outer loops.
>
> Maybe I'm confused because I also don't see where you even can get into
> the above situation (though I do see testcases about this). The thing is,
> for an 2d loop nest to contain something like the above reduction it can't
> be perfect:
>
> for (j) {
> int sum = X[j]; // 1
> for (i)
> sum += Y[j][i];
> X[j] = sum; // 2
> }
>
> But you do check for perfectness in proper_loop_form_for_interchange and
> prepare_perfect_loop_nest, so either you can't get into the situation or
> the checking can't be complete, or you define the above to be perfect
> nevertheless (probably because the load and store are in outer loop
> header/exit blocks?). The latter would mean that you accept also other
> code in header/footer of loops from a pure CFG perspective, so where is it
> checked that that other code (which aren't simple reductions) isn't
> harmful to the transformation?
Yes, I used the name perfect loop nest, but the pass can handle special form
imperfect loop nest for the simple reduction. I added comments describing
this before function prepare_perfect_loop_nest.
>
> Then, the data dependence part of the new pass:
>
>> +bool
>> +tree_loop_interchange::valid_data_dependences (unsigned inner, unsigned
>> outer)
>> +{
>> + struct data_dependence_relation *ddr;
>> +
>> + for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i)
>> + {
>> + /* Skip no-dependence case. */
>> + if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
>> + continue;
>> +
>> + for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j)
>> + {
>> + lambda_vector dist_vect = DDR_DIST_VECT (ddr, j);
>> + unsigned level = dependence_level (dist_vect, loop_nest.length ());
>> +
>> + /* If there is no carried dependence. */
>> + if (level == 0)
>> + continue;
>> +
>> + level --;
>> + /* Skip case which has '>' as the leftmost direction. */
>> + if (!lambda_vector_lexico_pos (dist_vect, level))
>> + return false;
>
> Shouldn't happen as dist vectors are forced positive via DDR_REVERSED.
Done.
>
>> + /* If dependence is carried by outer loop of the two loops for
>> + interchange. */
>> + if (level < outer)
>> + continue;
>> +
>> + lambda_vector dir_vect = DDR_DIR_VECT (ddr, j);
>> + /* If directions at both inner/outer levels are the same. */
>> + if (dir_vect[inner] == dir_vect[outer])
>> + continue;
>> +
>> + /* Be conservative, skip case if either direction at inner/outer
>> + levels is not '=' or '<'. */
>> + if (dir_vect[inner] != dir_equal
>> + && dir_vect[inner] != dir_positive
>> + && dir_vect[inner] != dir_independent
>> + && dir_vect[inner] != dir_positive_or_equal)
>> + return false;
>> +
>> + if (dir_vect[outer] != dir_equal
>> + && dir_vect[outer] != dir_positive
>> + && dir_vect[outer] != dir_independent
>> + && dir_vect[outer] != dir_positive_or_equal)
>> + return false;
>
> Checking dir vectors doesn't make much sense in GCC: the elements are only
> ever set to dir_positive, dir_negative or dir_equal, exactly when distance
> is
> > 0, < 0 or == 0. So checking dist vector is enough. (though sameness of
> direction checks sameness of sign with zero). Incidentally:
Done.
>
>> +tree_loop_interchange::update_data_deps (unsigned inner, unsigned outer)
>> +{
>> + struct data_dependence_relation *ddr;
>> +
>> + for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i)
>> + {
>> + /* Skip no-dependence case. */
>> + if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
>> + continue;
>> +
>> + for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j)
>> + {
>> + lambda_vector dir_vect = DDR_DIR_VECT (ddr, j);
>> + std::swap (dir_vect[inner], dir_vect[outer]);
>> + }
>> + }
>> +}
>
> Here you swap only the direction but not the distance vector, which can't
> be right. I suggest only using (and updating) the distance vector.
Yeah, fixed.
>
> And then your usage and update of DR_ACCESS_FNs: there's quite some
> complexity connected with that and I'm not sure how worthwhile it is.
> You're basically using the ACCESS_FNs to determine profitability (and not
> for validity, and that's good). But e.g. for pointer based accesses like
> in fortran with explicit address arithmetic the relation between access-fn
> step and stride and actual access stride isn't that easy (e.g. in your
> should_interchange_loops function iloop_stride and oloop_stride will
> always be one for pointer based accesses).
>
> Conceptually what you should check is how the access address for each data
> ref revolves for each loop, so why not doing this explicitely? What I
> mean is: calculate a (complicated) chrec for the DR addresses for the
> whole nest at the beginning. It should be in the form like (assume "+"
> always):
>
> {{{init, s1}_l1, s2}_l2, s3}_l3
>
> (i.e. all steps should be invariants/constants, and only one non-chrec
> init value). Addresses which aren't in this form you're already ignoring
> right now, so you could continue doing that. (Or better said, all
> non-constant steps you regard as being AVG_DIM_SIZE, which you still can
> continue doing).
>
> Now, with the above form you can form expressions for the difference
> between addresses per iteration for each loop (i.e. the address stride per
> loop); store these. Then, when interchanging loops you need to merely
> swap these expressions like you have to with the distance vector, instead
> of fiddling inside the DR_ACCESS_FNs themself. Much code would go away.
Yeah. Did similar thing in loop nest distribution pass. See
compute_access_range
in tree-loop-distribution.c. Actually, I would do the same here if I
had implemented
this pass after loop nest distribution patches. Done in this updated patch.
>
> Testcases: given that we had to remove our old separate interchange pass
> because it miscompiled stuff all over I'm missing some testcases where
> interchange should _not_ happen for validity reasons, like my above
> example with an reduction that can't be moved inside. Perhaps you can
> think of some more.
As mentioned above, it's hard to create test that fail exactly for this reason.
I added one that data dependence prevents us from interchanging the loop.
>
> I hope this is of some help to you :)
Thanks again, it's very helpful.
I also fixed several bugs of previous implementation, mostly about debug info
statements and simple reductions. As for test, I enabled this pass by default,
bootstrap and regtest GCC, I also build/run specs. There must be some other
latent bugs in it, but guess we have to exercise it by enabling it at
some point.
So any comments?
Thanks,
bin
2017-11-15 Bin Cheng <[email protected]>
* Makefile.in (tree-ssa-loop-interchange.o): New object file.
* common.opt (ftree-loop-interchange): New option.
* doc/invoke.texi (-ftree-loop-interchange): Document new option.
* gimple-iterator.c (gsi_remove): New parameter determining if dbg
bind stmt is inserted or not.
* gimple-iterator.h (gsi_remove): New parameter in declaration.
* params.def (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS): New parameter.
(PARAM_LOOP_INTERCHANGE_STRIDE_RATIO): New parameter.
* passes.def (pass_linterchange): New pass.
* timevar.def (TV_LINTERCHANGE): New time var.
* tree-pass.h (make_pass_linterchange): New declaration.
* tree-ssa-loop-interchange.cc: New file.
* tree-ssa-loop-ivcanon.c (create_canonical_iv): Change to external.
Record IV before/after increment in new parameters.
* tree-ssa-loop-ivopts.h (create_canonical_iv): New declaration.
gcc/testsuite
2017-11-15 Bin Cheng <[email protected]>
* gcc.dg/tree-ssa/loop-interchange-1.c: New test.
* gcc.dg/tree-ssa/loop-interchange-2.c: New test.
* gcc.dg/tree-ssa/loop-interchange-3.c: New test.
* gcc.dg/tree-ssa/loop-interchange-4.c: New test.
* gcc.dg/tree-ssa/loop-interchange-5.c: New test.
* gcc.dg/tree-ssa/loop-interchange-6.c: New test.
* gcc.dg/tree-ssa/loop-interchange-7.c: New test.
* gcc.dg/tree-ssa/loop-interchange-8.c: New test.
* gcc.dg/tree-ssa/loop-interchange-9.c: New test.
* gcc.dg/tree-ssa/loop-interchange-10.c: New test.
* gcc.dg/tree-ssa/loop-interchange-11.c: New test.
>
>
> Ciao,
> Michael.
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 5db7855..a720bed 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1524,6 +1524,7 @@ OBJS = \
tree-ssa-live.o \
tree-ssa-loop-ch.o \
tree-ssa-loop-im.o \
+ tree-ssa-loop-interchange.o \
tree-ssa-loop-ivcanon.o \
tree-ssa-loop-ivopts.o \
tree-ssa-loop-manip.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index f8f2ed3..f4eeaa7 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2574,6 +2574,10 @@ ftree-loop-distribute-patterns
Common Report Var(flag_tree_loop_distribute_patterns) Optimization
Enable loop distribution for patterns transformed into a library call.
+ftree-loop-interchange
+Common Report Var(flag_tree_loop_interchange) Optimization
+Enable loop interchange on trees.
+
ftree-loop-im
Common Report Var(flag_tree_loop_im) Init(1) Optimization
Enable loop invariant motion on trees.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4427328..00bdc1d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8620,6 +8620,25 @@ ENDDO
@end smallexample
and the initialization loop is transformed into a call to memset zero.
+@item -ftree-loop-interchange
+@opindex ftree-loop-interchange
+Perform loop interchange outside of graphite. This flag can improve cache
+performance on loop nest and allow further loop optimizations, like
+vectorization, to take place. For example, the loop
+@smallexample
+for (int i = 0; i < N; i++)
+ for (int j = 0; j < N; j++)
+ for (int k = 0; k < N; k++)
+ c[i][j] = c[i][j] + a[i][k]*b[k][j];
+@end smallexample
+is transformed to
+@smallexample
+for (int i = 0; i < N; i++)
+ for (int k = 0; k < N; k++)
+ for (int j = 0; j < N; j++)
+ c[i][j] = c[i][j] + a[i][k]*b[k][j];
+@end smallexample
+
@item -ftree-loop-im
@opindex ftree-loop-im
Perform loop invariant motion on trees. This pass moves only invariants that
diff --git a/gcc/gimple-iterator.c b/gcc/gimple-iterator.c
index d9d02d3..a4e8f46 100644
--- a/gcc/gimple-iterator.c
+++ b/gcc/gimple-iterator.c
@@ -549,16 +549,20 @@ gsi_insert_after (gimple_stmt_iterator *i, gimple *stmt,
from the IL and not reinserted elsewhere. In that case we remove the
statement pointed to by iterator I from the EH tables, and free its
operand caches. Otherwise we do not modify this information. Returns
- true whether EH edge cleanup is required. */
+ true whether EH edge cleanup is required.
+
+ If INSERT_DGB is true, this function inserts debug bind stmt for each
+ variable defined by the current stmt when necessary; replaces uses of
+ def with the newly-created debug temp. */
bool
-gsi_remove (gimple_stmt_iterator *i, bool remove_permanently)
+gsi_remove (gimple_stmt_iterator *i, bool remove_permanently, bool insert_dbg)
{
gimple_seq_node cur, next, prev;
gimple *stmt = gsi_stmt (*i);
bool require_eh_edge_purge = false;
- if (gimple_code (stmt) != GIMPLE_PHI)
+ if (insert_dbg && gimple_code (stmt) != GIMPLE_PHI)
insert_debug_temps_for_defs (i);
/* Free all the data flow information for STMT. */
diff --git a/gcc/gimple-iterator.h b/gcc/gimple-iterator.h
index 70f18be..f1c43cc 100644
--- a/gcc/gimple-iterator.h
+++ b/gcc/gimple-iterator.h
@@ -77,7 +77,7 @@ extern void gsi_insert_after_without_update
(gimple_stmt_iterator *, gimple *,
enum gsi_iterator_update);
extern void gsi_insert_after (gimple_stmt_iterator *, gimple *,
enum gsi_iterator_update);
-extern bool gsi_remove (gimple_stmt_iterator *, bool);
+extern bool gsi_remove (gimple_stmt_iterator *, bool, bool = true);
extern gimple_stmt_iterator gsi_for_stmt (gimple *);
extern gphi_iterator gsi_for_phi (gphi *);
extern void gsi_move_after (gimple_stmt_iterator *, gimple_stmt_iterator *);
diff --git a/gcc/params.def b/gcc/params.def
index 8881f4c..b5e5893 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -790,6 +790,20 @@ DEFPARAM (PARAM_L2_CACHE_SIZE,
"The size of L2 cache.",
512, 0, 0)
+/* Maximum number of statements in loop nest for loop interchange. */
+
+DEFPARAM (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS,
+ "loop-interchange-max-num-stmts",
+ "The maximum number of stmts in loop nest for loop interchange.",
+ 64, 0, 0)
+
+/* Minimum stride ratio for loop interchange to be profitiable. */
+
+DEFPARAM (PARAM_LOOP_INTERCHANGE_STRIDE_RATIO,
+ "loop-interchange-stride-ratio",
+ "The minimum stride ratio for loop interchange to be profitable",
+ 2, 0, 0)
+
/* Whether we should use canonical types rather than deep "structural"
type checking. Setting this value to 1 (the default) improves
compilation performance in the C++ and Objective-C++ front end;
diff --git a/gcc/passes.def b/gcc/passes.def
index 00e75d2..2e38c6f 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -278,6 +278,7 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_cd_dce);
NEXT_PASS (pass_iv_canon);
NEXT_PASS (pass_loop_distribution);
+ NEXT_PASS (pass_linterchange);
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_graphite);
PUSH_INSERT_PASSES_WITHIN (pass_graphite)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-1.c
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-1.c
new file mode 100644
index 0000000..f2392e3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-1.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-interchange -fdump-tree-linterchange-details"
} */
+
+/* Copied from graphite/interchange-4.c */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+double u[1782225];
+
+static int __attribute__((noinline))
+foo (int N, int *res)
+{
+ int i, j;
+ double sum = 0;
+ for (i = 0; i < N; i++)
+ for (j = 0; j < N; j++)
+ sum = sum + u[i + 1335 * j];
+
+ for (i = 0; i < N; i++)
+ u[1336 * i] *= 2;
+
+ *res = sum + N + u[1336 * 2] + u[1336];
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+ int i, j, res;
+
+ for (i = 0; i < 1782225; i++)
+ u[i] = 2;
+
+ foo (1335, &res);
+
+#if DEBUG
+ fprintf (stderr, "res = %d \n", res);
+#endif
+
+ if (res != 3565793)
+ abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is
interchanged" 1 "linterchange"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-10.c
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-10.c
new file mode 100644
index 0000000..610610b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-10.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-interchange -fdump-tree-linterchange-details"
} */
+
+#define M 256
+int a[M][M], b[M][M];
+int __attribute__((noinline))
+double_reduc (int n)
+{
+ int sum = 0;
+ for (int j = 0; j < n; j++)
+ {
+ for (int i = 0; i < n; i++)
+ sum = sum + a[i][j]*b[i][j];
+ }
+ return sum;
+}
+
+extern void abort ();
+
+static void __attribute__((noinline))
+init (int i)
+{
+ for (int j = 0; j < M; j++)
+ {
+ a[i][j] = i;
+ b[i][j] = j;
+ }
+}
+
+int main (void)
+{
+ for (int i = 0; i < M; ++i)
+ init (i);
+
+ int sum = double_reduc (M);
+
+ if (sum != 1065369600)
+ abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is
interchanged" 1 "linterchange" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-11.c
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-11.c
new file mode 100644
index 0000000..e7acc1b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-11.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-interchange -fdump-tree-linterchange-details"
} */
+
+#define M 256
+int a[M][M], b[M][M];
+
+void
+simple_reduc_1 (int n, int *p)
+{
+ for (int j = 0; j < n; j++)
+ {
+ int sum = p[j];
+ for (int i = 0; i < n; i++)
+ {
+ sum = sum + b[i][j];
+ b[i][j] += a[i][j];
+ }
+
+ p[j] = sum;
+ }
+}
+/* { dg-final { scan-tree-dump-not "Loop_pair<outer:., inner:.> is
interchanged" "linterchange" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-2.c
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-2.c
new file mode 100644
index 0000000..8341a22
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-2.c
@@ -0,0 +1,58 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-interchange -fdump-tree-linterchange-details"
} */
+
+/* Copied from graphite/interchange-5.c */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 100
+#define M 1111
+int A[N][M];
+
+static int __attribute__((noinline))
+foo (void)
+{
+ int i, j;
+
+ for( i = 0; i < M; i++)
+ for( j = 0; j < N; j++)
+ A[j][i] = 5 * A[j][i];
+
+ return A[0][0] + A[N-1][M-1];
+}
+
+extern void abort ();
+
+static void __attribute__((noinline))
+init (int i)
+{
+ int j;
+
+ for (j = 0; j < M; j++)
+ A[i][j] = 2;
+}
+
+int
+main (void)
+{
+ int i, j, res;
+
+ for (i = 0; i < N; i++)
+ init (i);
+
+ res = foo ();
+
+#if DEBUG
+ fprintf (stderr, "res = %d \n", res);
+#endif
+
+ if (res != 20)
+ abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is
interchanged" 1 "linterchange"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-3.c
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-3.c
new file mode 100644
index 0000000..ca2a114
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-3.c
@@ -0,0 +1,59 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-interchange -fdump-tree-linterchange-details"
} */
+
+/* Copied from graphite/interchange-6.c */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 100
+#define M 200
+
+static int __attribute__((noinline))
+foo (int A[N][M])
+{
+ int i, j;
+
+ /* This loop should be interchanged. */
+ for(j = 0; j < M; j++)
+ for(i = 0; i < N; i++)
+ A[i][j] = A[i][j] + A[i][j];
+
+ return A[0][0] + A[N-1][M-1];
+}
+
+extern void abort ();
+
+static void __attribute__((noinline))
+init (int *arr, int i)
+{
+ int j;
+
+ for (j = 0; j < M; j++)
+ arr[j] = 2;
+}
+
+int
+main (void)
+{
+ int A[N][M];
+ int i, j, res;
+
+ for (i = 0; i < N; i++)
+ init (A[i], i);
+
+ res = foo (A);
+
+#if DEBUG
+ fprintf (stderr, "res = %d \n", res);
+#endif
+
+ if (res != 8)
+ abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is
interchanged" 1 "linterchange"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-4.c
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-4.c
new file mode 100644
index 0000000..ff820f3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-4.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-interchange -fdump-tree-linterchange-details"
} */
+
+/* Copied from graphite/interchange-7.c */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 111
+#define M 1111
+
+static int __attribute__((noinline))
+foo (double *a)
+{
+ int i,j;
+ int r = 0;
+
+ for (i = 0; i < N; ++i)
+ for (j = 0; j < M; ++j)
+ r += a[j * N + i];
+
+ return r;
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+ double A[N*M];
+ int i, res;
+
+ for (i = 0; i < N*M; i++)
+ A[i] = 2;
+
+ res = foo (A);
+
+#if DEBUG
+ fprintf (stderr, "res = %d \n", res);
+#endif
+
+ if (res != 246642)
+ abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is
interchanged" 1 "linterchange"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-5.c
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-5.c
new file mode 100644
index 0000000..706da88
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-5.c
@@ -0,0 +1,71 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-interchange -fdump-tree-linterchange-details"
} */
+
+#define M 256
+int a[M][M], b[M][M], c[M][M], d[M][M];
+void __attribute__((noinline))
+matrix_mul_1 (int n)
+{
+ for (int i = 0; i < n; i++)
+ for (int j = 0; j < n; j++)
+ for (int k = 0; k < n; k++)
+ c[i][j] = c[i][j] + a[i][k]*b[k][j];
+}
+
+void __attribute__((noinline))
+matrix_mul_2 (int n)
+{
+ for (int i = 0; i < n; i++)
+ {
+ for (int j = 0; j < n; j++)
+ {
+ for (int k = 0; k < n; k++)
+ d[i][j] = d[i][j] + a[i][k]*b[k][j];
+
+ asm volatile ("" ::: "memory");
+ }
+ asm volatile ("" ::: "memory");
+ }
+}
+
+extern void abort ();
+
+static void __attribute__((noinline))
+init (int i)
+{
+ for (int j = 0; j < M; j++)
+ {
+ a[i][j] = i;
+ b[i][j] = j;
+ c[i][j] = 0;
+ d[i][j] = 0;
+ }
+}
+
+static int __attribute__((noinline))
+check (int i)
+{
+ for (int j = 0; j < M; j++)
+ if (c[i][j] != d[i][j])
+ return 0;
+
+ return 1;
+}
+
+int main (void)
+{
+ for (int i = 0; i < M; ++i)
+ init (i);
+
+ matrix_mul_1 (M);
+ matrix_mul_2 (M);
+
+ for (int i = 0; i < M; ++i)
+ if (!check (i))
+ abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is
interchanged" 1 "linterchange" } } */
+/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is not
interchanged" 1 "linterchange" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-6.c
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-6.c
new file mode 100644
index 0000000..97555ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-6.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-interchange -fdump-tree-linterchange-details"
} */
+
+#define M 256
+int a[M][M], b[M][M], c[M][M], d[M][M];
+void __attribute__((noinline))
+matrix_mul_1 (int n)
+{
+ for (int j = 0; j < n; j++)
+ for (int k = 0; k < n; k++)
+ for (int i = 0; i < n; i++)
+ c[i][j] = c[i][j] + a[i][k]*b[k][j];
+}
+
+void __attribute__((noinline))
+matrix_mul_2 (int n)
+{
+ for (int i = 0; i < n; i++)
+ {
+ for (int j = 0; j < n; j++)
+ {
+ for (int k = 0; k < n; k++)
+ d[i][j] = d[i][j] + a[i][k]*b[k][j];
+
+ asm volatile ("" ::: "memory");
+ }
+ asm volatile ("" ::: "memory");
+ }
+}
+
+extern void abort ();
+
+static void __attribute__((noinline))
+init (int i)
+{
+ for (int j = 0; j < M; j++)
+ {
+ a[i][j] = i;
+ b[i][j] = j;
+ c[i][j] = 0;
+ d[i][j] = 0;
+ }
+}
+
+static int __attribute__((noinline))
+check (int i)
+{
+ for (int j = 0; j < M; j++)
+ if (c[i][j] != d[i][j])
+ return 0;
+
+ return 1;
+}
+
+int main (void)
+{
+ for (int i = 0; i < M; ++i)
+ init (i);
+
+ matrix_mul_1 (M);
+ matrix_mul_2 (M);
+
+ for (int i = 0; i < M; ++i)
+ if (!check (i))
+ abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is
interchanged" 2 "linterchange" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-7.c
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-7.c
new file mode 100644
index 0000000..b93ca78
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-7.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-interchange -fdump-tree-linterchange-details"
} */
+
+#define M 256
+int a[M][M], b[M][M], c[M][M], d[M][M];
+void __attribute__((noinline))
+matrix_mul_1 (int n)
+{
+ for (int k = 0; k < n; k++)
+ for (int j = 0; j < n; j++)
+ for (int i = 0; i < n; i++)
+ c[i][j] = c[i][j] + a[i][k]*b[k][j];
+}
+
+void __attribute__((noinline))
+matrix_mul_2 (int n)
+{
+ for (int i = 0; i < n; i++)
+ {
+ for (int j = 0; j < n; j++)
+ {
+ for (int k = 0; k < n; k++)
+ d[i][j] = d[i][j] + a[i][k]*b[k][j];
+
+ asm volatile ("" ::: "memory");
+ }
+ asm volatile ("" ::: "memory");
+ }
+}
+
+extern void abort ();
+
+static void __attribute__((noinline))
+init (int i)
+{
+ for (int j = 0; j < M; j++)
+ {
+ a[i][j] = i;
+ b[i][j] = j;
+ c[i][j] = 0;
+ d[i][j] = 0;
+ }
+}
+
+static int __attribute__((noinline))
+check (int i)
+{
+ for (int j = 0; j < M; j++)
+ if (c[i][j] != d[i][j])
+ return 0;
+
+ return 1;
+}
+
+int main (void)
+{
+ for (int i = 0; i < M; ++i)
+ init (i);
+
+ matrix_mul_1 (M);
+ matrix_mul_2 (M);
+
+ for (int i = 0; i < M; ++i)
+ if (!check (i))
+ abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is
interchanged" 2 "linterchange" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-8.c
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-8.c
new file mode 100644
index 0000000..29f5917
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-8.c
@@ -0,0 +1,70 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-interchange -fdump-tree-linterchange-details"
} */
+
+#define M 256
+int a[M][M], b[M][M], c[M][M], d[M][M];
+void __attribute__((noinline))
+matrix_mul_1 (int n)
+{
+ for (int i = 0; i < n; i++)
+ for (int k = 0; k < n; k++)
+ for (int j = 0; j < n; j++)
+ c[i][j] = c[i][j] + a[i][k]*b[k][j];
+}
+
+void __attribute__((noinline))
+matrix_mul_2 (int n)
+{
+ for (int i = 0; i < n; i++)
+ {
+ for (int j = 0; j < n; j++)
+ {
+ for (int k = 0; k < n; k++)
+ d[i][j] = d[i][j] + a[i][k]*b[k][j];
+
+ asm volatile ("" ::: "memory");
+ }
+ asm volatile ("" ::: "memory");
+ }
+}
+
+extern void abort ();
+
+static void __attribute__((noinline))
+init (int i)
+{
+ for (int j = 0; j < M; j++)
+ {
+ a[i][j] = i;
+ b[i][j] = j;
+ c[i][j] = 0;
+ d[i][j] = 0;
+ }
+}
+
+static int __attribute__((noinline))
+check (int i)
+{
+ for (int j = 0; j < M; j++)
+ if (c[i][j] != d[i][j])
+ return 0;
+
+ return 1;
+}
+
+int main (void)
+{
+ for (int i = 0; i < M; ++i)
+ init (i);
+
+ matrix_mul_1 (M);
+ matrix_mul_2 (M);
+
+ for (int i = 0; i < M; ++i)
+ if (!check (i))
+ abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Loop_pair<outer:., inner:.> is
interchanged" "linterchange" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-9.c
b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-9.c
new file mode 100644
index 0000000..d6a3f5c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-interchange-9.c
@@ -0,0 +1,62 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-interchange -fdump-tree-linterchange-details"
} */
+
+#define M 256
+int a[M][M], b[M][M], c[M], d[M];
+void __attribute__((noinline))
+simple_reduc_1 (int n)
+{
+ for (int j = 0; j < n; j++)
+ {
+ int sum = c[j];
+ for (int i = 0; i < n; i++)
+ sum = sum + a[i][j]*b[i][j];
+
+ c[j] = sum;
+ }
+}
+
+void __attribute__((noinline))
+simple_reduc_2 (int n)
+{
+ for (int j = 0; j < n; j++)
+ {
+ int sum = d[j];
+ for (int i = 0; i < n; i++)
+ sum = sum + a[i][j]*b[i][j];
+
+ asm volatile ("" ::: "memory");
+ d[j] = sum;
+ }
+}
+
+extern void abort ();
+
+static void __attribute__((noinline))
+init (int i)
+{
+ c[i] = 0;
+ d[i] = 0;
+ for (int j = 0; j < M; j++)
+ {
+ a[i][j] = i;
+ b[i][j] = j;
+ }
+}
+
+int main (void)
+{
+ for (int i = 0; i < M; ++i)
+ init (i);
+
+ simple_reduc_1 (M);
+ simple_reduc_2 (M);
+
+ for (int i = 0; i < M; ++i)
+ if (c[i] != d[i])
+ abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is
interchanged" 1 "linterchange" } } */
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 8cec6af..730a1dc 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -184,6 +184,7 @@ DEFTIMEVAR (TV_TREE_LOOP , "tree loop optimization")
DEFTIMEVAR (TV_TREE_NOLOOP , "loopless fn")
DEFTIMEVAR (TV_TREE_LOOP_BOUNDS , "tree loop bounds")
DEFTIMEVAR (TV_LIM , "tree loop invariant motion")
+DEFTIMEVAR (TV_LINTERCHANGE , "tree loop interchange")
DEFTIMEVAR (TV_TREE_LOOP_IVCANON , "tree canonical iv")
DEFTIMEVAR (TV_SCEV_CONST , "scev constant prop")
DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH , "tree loop unswitching")
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 9777308..18b06bd 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -368,6 +368,7 @@ extern gimple_opt_pass *make_pass_tree_loop (gcc::context
*ctxt);
extern gimple_opt_pass *make_pass_tree_no_loop (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_tree_loop_init (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_linterchange (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-interchange.cc b/gcc/tree-ssa-loop-interchange.cc
new file mode 100644
index 0000000..abffbf6
--- /dev/null
+++ b/gcc/tree-ssa-loop-interchange.cc
@@ -0,0 +1,2274 @@
+/* Loop invariant motion.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3. If not see
+<http://www.gnu.org/licenses/>. */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-pretty-print.h"
+#include "fold-const.h"
+#include "gimplify.h"
+#include "gimple-iterator.h"
+#include "cfgloop.h"
+#include "params.h"
+#include "tree-ssa.h"
+#include "tree-scalar-evolution.h"
+#include "tree-ssa-loop-manip.h"
+#include "tree-ssa-loop-niter.h"
+#include "tree-ssa-loop-ivopts.h"
+#include "tree-data-ref.h"
+
+/* This pass performs loop interchange: for example, the loop nest
+
+ for (int j = 0; j < N; j++)
+ for (int k = 0; k < N; k++)
+ for (int i = 0; i < N; i++)
+ c[i][j] = c[i][j] + a[i][k]*b[k][j];
+
+ is transformed to
+
+ for (int i = 0; i < N; i++)
+ for (int j = 0; j < N; j++)
+ for (int k = 0; k < N; k++)
+ c[i][j] = c[i][j] + a[i][k]*b[k][j];
+
+ This pass implements loop interchange in the following steps:
+
+ 1) Find perfect loop nest for each innermost loop and compute data
+ dependence relations for it. For above example, loop nest is
+ <loop_j, loop_k, loop_i>.
+ 2) From innermost to outermost loop, this pass tries to interchange
+ each loop pair. For above case, it firstly tries to interchange
+ <loop_k, loop_i> and loop nest becomes <loop_j, loop_i, loop_k>.
+ Then it tries to interchange <loop_j, loop_i> and loop nest becomes
+ <loop_i, loop_j, loop_k>. The overall effect is to move innermost
+ loop to the outermost position. For loop pair <loop_i, loop_j>
+ to be interchanged, we:
+ 3) Check if data dependence relations are valid for loop interchange.
+ 4) Check if both loops can be interchanged in terms of transformation.
+ 5) Check if interchanging the two loops is profitable.
+ 6) Interchange the two loops by mapping induction variables.
+
+ This pass also handles reductions in loop nest. So far we only support
+ simple reduction of inner loop and double reduction of the loop nest. */
+
+/* Maximum number of stmts in each loop that should be interchanged. */
+#define MAX_NUM_STMT (PARAM_VALUE (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS))
+
+/* Default average number of loop iterations. */
+#define AVG_LOOP_NITER (PARAM_VALUE (PARAM_AVG_LOOP_NITER))
+
+/* Comparison ratio of access stride between inner/outer loops to be
+ interchanged. This is the minimum stride ratio for loop interchange
+ to be profitable. */
+#define OUTER_STRIDE_RATIO (PARAM_VALUE (PARAM_LOOP_INTERCHANGE_STRIDE_RATIO))
+/* The same as above, but we require higher ratio for interchanging the
+ innermost two loops. */
+#define INNER_STRIDE_RATIO ((OUTER_STRIDE_RATIO) + 1)
+
+/* Vector of strides that DR accesses in each level loop of a loop nest. */
+#define DR_ACCESS_STRIDE(dr) ((vec<tree> *) dr->aux)
+
+/* Structure recording loop induction variable. */
+typedef struct induction
+{
+ /* IV itself. */
+ tree var;
+ /* Initializer. */
+ tree init;
+ /* IV's base and step part of SCEV. */
+ tree base;
+ tree step;
+ /* Mapped IV variabled used for interchanging loops. */
+ tree mapped_var;
+}*induction_p;
+
+/* Enum type for loop reduction variable. */
+enum reduction_type
+{
+ UNKNOWN_RTYPE = 0,
+ SIMPLE_RTYPE,
+ DOUBLE_RTYPE
+};
+
+/* Structure recording loop reduction variable. */
+typedef struct reduction
+{
+ /* Reduction itself. */
+ tree var;
+ /* PHI node defining reduction variable. */
+ gphi *phi;
+ /* Init and next variables of the reduction. */
+ tree init;
+ tree next;
+ /* Lcssa PHI node if reduction is used outside of its definition loop. */
+ gphi *lcssa_phi;
+ /* Stmts defining init and next. */
+ gimple *producer;
+ gimple *consumer;
+ /* If init is loaded from memory, this is the loading memory reference. */
+ tree init_ref;
+ /* If reduction is finally stored to memory, this is the stored memory
+ reference. */
+ tree fini_ref;
+ enum reduction_type type;
+}*reduction_p;
+
+
+/* Dump reduction RE. */
+
+static void
+dump_reduction (reduction_p re)
+{
+ if (re->type == SIMPLE_RTYPE)
+ fprintf (dump_file, " Simple reduction: ");
+ else if (re->type == DOUBLE_RTYPE)
+ fprintf (dump_file, " Double reduction: ");
+ else
+ fprintf (dump_file, " Unknown reduction: ");
+
+ print_gimple_stmt (dump_file, re->phi, 0);
+}
+
+/* Dump LOOP's induction IV. */
+static void
+dump_induction (struct loop *loop, induction_p iv)
+{
+ fprintf (dump_file, " Induction: ");
+ print_generic_expr (dump_file, iv->var, TDF_SLIM);
+ fprintf (dump_file, " = {");
+ print_generic_expr (dump_file, iv->base, TDF_SLIM);
+ fprintf (dump_file, ", ");
+ print_generic_expr (dump_file, iv->step, TDF_SLIM);
+ fprintf (dump_file, "}_%d\n", loop->num);
+}
+
+/* Loop candidate for interchange. */
+
+struct loop_cand
+{
+ loop_cand (unsigned, struct loop *, struct loop *);
+ ~loop_cand ();
+
+ reduction_p find_reduction_by_init (tree);
+ reduction_p find_reduction_by_stmt (gimple *);
+ void classify_simple_reduction (reduction_p);
+ bool analyze_iloop_reduction_var (tree);
+ bool analyze_oloop_reduction_var (loop_cand *, tree);
+ bool analyze_reduction_var (loop_cand *, tree);
+ bool analyze_induction_var (tree, tree);
+ bool analyze_carried_vars (loop_cand *);
+ bool analyze_lcssa_phis (void);
+ bool can_interchange_p (loop_cand *);
+ bool unsupported_operation (basic_block, loop_cand *);
+ void undo_simple_reduction (reduction_p);
+ void eliminate_dead_code (void);
+
+ /* The index of this loop in the whole loop nest vector. */
+ unsigned idx;
+ /* The loop itself. */
+ struct loop *loop;
+ /* The outer loop for interchange. It equals to loop if this loop cand
+ itself represents the outer loop. */
+ struct loop *outer;
+ /* Vector of induction variables in loop. */
+ vec<induction_p> inductions;
+ /* Vector of reduction variables in loop. */
+ vec<reduction_p> reductions;
+ /* Lcssa PHI nodes of this loop. */
+ vec<gphi *> lcssa_nodes;
+ /* # of iterations of this loop. */
+ tree niters;
+ /* Single exit edge of this loop. */
+ edge exit;
+ /* Basic blocks of this loop. */
+ basic_block *bbs;
+ /* # of stmts of this loop. */
+ int num_stmts;
+};
+
+/* Constructor. */
+
+loop_cand::loop_cand (unsigned idx, struct loop *loop, struct loop *outer)
+ : idx (idx), loop (loop), outer (outer),
+ niters (unshare_expr (number_of_latch_executions (loop))),
+ exit (single_exit (loop)), bbs (get_loop_body (loop)), num_stmts (0)
+{
+ inductions.create (3);
+ reductions.create (3);
+ lcssa_nodes.create (3);
+}
+
+/* Destructor. */
+
+loop_cand::~loop_cand ()
+{
+ induction_p iv;
+ for (unsigned i = 0; inductions.iterate (i, &iv); ++i)
+ free (iv);
+
+ reduction_p re;
+ for (unsigned i = 0; reductions.iterate (i, &re); ++i)
+ free (re);
+
+ inductions.release ();
+ reductions.release ();
+ lcssa_nodes.release ();
+ free (bbs);
+}
+
+/* Return single use stmt of VAR in LOOP, otherwise return NULL. */
+
+static gimple *
+single_use_in_loop (tree var, struct loop *loop)
+{
+ gimple *stmt, *res = NULL;
+ use_operand_p use_p;
+ imm_use_iterator iterator;
+
+ FOR_EACH_IMM_USE_FAST (use_p, iterator, var)
+ {
+ stmt = USE_STMT (use_p);
+ if (is_gimple_debug (stmt))
+ continue;
+
+ if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
+ continue;
+
+ if (res)
+ return NULL;
+
+ res = stmt;
+ }
+ return res;
+}
+
+/* Return true if E is unsupported in loop interchange, i.e, E is a complex
+ edge or part of irreducible loop. */
+
+static inline bool
+unsupported_edge (edge e)
+{
+ return (e->flags & (EDGE_COMPLEX | EDGE_IRREDUCIBLE_LOOP));
+}
+
+/* Return true if PHI is unsupported in loop interchange, i.e, PHI contains
+ ssa var appearing in any abnormal phi node. */
+
+static inline bool
+unsupported_phi_node (gphi *phi)
+{
+ if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (PHI_RESULT (phi)))
+ return true;
+
+ for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+ {
+ tree arg = PHI_ARG_DEF (phi, i);
+ if (TREE_CODE (arg) == SSA_NAME
+ && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (arg))
+ return true;
+ }
+
+ return false;
+}
+
+/* Return reduction whose init variable is VAR, otherwise return NULL. */
+
+reduction_p
+loop_cand::find_reduction_by_init (tree var)
+{
+ reduction_p re;
+
+ for (unsigned i = 0; reductions.iterate (i, &re); ++i)
+ if (re->init == var || operand_equal_p (re->init, var, 0))
+ return re;
+
+ return NULL;
+}
+
+/* Return the reduction if STMT is one of its lcssa PHI, producer or consumer
+ stmt. */
+
+reduction_p
+loop_cand::find_reduction_by_stmt (gimple *stmt)
+{
+ gphi *phi = NULL;
+ reduction_p re;
+
+ if (is_a <gphi *> (stmt))
+ phi = as_a <gphi *> (stmt);
+
+ for (unsigned i = 0; reductions.iterate (i, &re); ++i)
+ if ((phi != NULL && phi == re->lcssa_phi)
+ || (stmt == re->producer || stmt == re->consumer))
+ return re;;
+
+ return NULL;
+}
+
+/* Return true if all stmts in BB can be supported by loop interchange,
+ otherwise return false. ILOOP is not NULL if this loop_cand is the
+ outer loop in loop nest. */
+
+bool
+loop_cand::unsupported_operation (basic_block bb, loop_cand *iloop)
+{
+ int bb_num_stmts = 0;
+ gphi_iterator psi;
+ gimple_stmt_iterator gsi;
+
+ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ gimple *stmt = gsi_stmt (gsi);
+ if (is_gimple_debug (stmt))
+ continue;
+
+ if (gimple_has_volatile_ops (stmt)
+ || gimple_has_side_effects (stmt))
+ return false;
+
+ bb_num_stmts++;
+ if (is_gimple_call (stmt))
+ {
+ int cflags = gimple_call_flags (stmt);
+ /* Only support const/pure calls. */
+ if (!(cflags & (ECF_CONST | ECF_PURE)))
+ return false;
+
+ /* In basic block of outer loop, the call should be cheap since
+ it will be moved to inner loop. */
+ if (iloop != NULL
+ && !gimple_inexpensive_call_p (as_a <gcall *> (stmt)))
+ return false;
+
+ continue;
+ }
+
+ if (!iloop || !gimple_vuse (stmt))
+ continue;
+
+ /* Support stmt accessing memory in outer loop only if it is for inner
+ loop's reduction. */
+ if (iloop->find_reduction_by_stmt (stmt))
+ continue;
+
+ tree lhs;
+ /* Or it's invariant memory reference and only used by inner loop. */
+ if (gimple_assign_single_p (stmt)
+ && (lhs = gimple_assign_lhs (stmt)) != NULL_TREE
+ && TREE_CODE (lhs) == SSA_NAME
+ && single_use_in_loop (lhs, iloop->loop))
+ continue;
+
+ return false;
+ }
+ num_stmts += bb_num_stmts;
+
+ /* Allow PHI nodes in any basic block of inner loop, or PHI nodes in
+ (outer) loop's header. */
+ if (!iloop || bb == loop->header)
+ return true;
+
+ for (psi = gsi_start_phis (bb); !gsi_end_p (psi); gsi_next (&psi))
+ {
+ gphi *phi = psi.phi ();
+
+ if (unsupported_phi_node (phi))
+ return false;
+
+ if (virtual_operand_p (PHI_RESULT (phi)))
+ continue;
+
+ /* For outer loop, we only support PHI in loop header and lcssa PHI
+ of inner loop's reduction. */
+ if (!iloop->find_reduction_by_stmt (phi))
+ return false;
+ }
+ return true;
+}
+
+/* Return true if current loop_cand be interchanged. ILOOP is not NULL if
+ current loop_cand is outer loop in loop nest. */
+
+bool
+loop_cand::can_interchange_p (loop_cand *iloop)
+{
+ /* For now we only support at most one reduction. */
+ unsigned allowed_reduction_num = 1;
+
+ /* Only support reduction if the loop nest to be interchanged is the
+ innermostin two loops. */
+ if ((iloop == NULL && loop->inner != NULL)
+ || (iloop != NULL && iloop->loop->inner != NULL))
+ allowed_reduction_num = 0;
+
+ if (reductions.length () > allowed_reduction_num
+ || (reductions.length () == 1
+ && reductions[0]->type == UNKNOWN_RTYPE))
+ return false;
+
+ /* Only support lcssa PHI node which is for reduction. */
+ if (lcssa_nodes.length () > allowed_reduction_num)
+ return false;
+
+ /* Check basic blocks other than loop header/exit. */
+ for (unsigned i = 0; i < loop->num_nodes; i++)
+ {
+ basic_block bb = bbs[i];
+
+ /* Skip basic blocks of inner loops. */
+ if (bb->loop_father != loop)
+ continue;
+
+ /* Check if basic block has any unsupported operation. */
+ if (!unsupported_operation (bb, iloop))
+ return false;
+
+ /* Check if loop has too many stmts. */
+ if (num_stmts > MAX_NUM_STMT)
+ return false;
+ }
+
+ return true;
+}
+
+/* Classify if reduction RE is a simple one. */
+
+void
+loop_cand::classify_simple_reduction (reduction_p re)
+{
+ gimple *producer, *consumer;
+ enum tree_code code;
+ tree lhs, rhs;
+
+ /* Check init variable of reduction and how it is initialized. */
+ if (TREE_CODE (re->init) == SSA_NAME)
+ {
+ producer = SSA_NAME_DEF_STMT (re->init);
+ re->producer = producer;
+ basic_block bb = gimple_bb (producer);
+ if (!bb || bb->loop_father != outer)
+ return;
+
+ if (!is_gimple_assign (producer))
+ return;
+
+ code = gimple_assign_rhs_code (producer);
+ if (get_gimple_rhs_class (code) != GIMPLE_SINGLE_RHS)
+ return;
+
+ if ((lhs = gimple_assign_lhs (producer)) == NULL_TREE
+ || lhs != re->init)
+ return;
+
+ if ((rhs = gimple_assign_rhs1 (producer)) == NULL_TREE
+ || !REFERENCE_CLASS_P (rhs))
+ return;
+
+ re->init_ref = rhs;
+ }
+ else if (!CONSTANT_CLASS_P (re->init))
+ return;
+
+ /* Check how reduction variable is used. Note usually reduction variable
+ is used outside of its defining loop, we don't require that in terms
+ loop interchange. */
+ if (re->lcssa_phi == NULL)
+ consumer = single_use_in_loop (re->next, loop);
+ else
+ consumer = single_use_in_loop (PHI_RESULT (re->lcssa_phi), outer);
+
+ if (consumer == NULL)
+ return;
+
+ re->consumer = consumer;
+
+ if (!is_gimple_assign (consumer))
+ return;
+
+ code = gimple_assign_rhs_code (consumer);
+ if (get_gimple_rhs_class (code) != GIMPLE_SINGLE_RHS)
+ return;
+
+ if ((rhs = gimple_assign_rhs1 (consumer)) == NULL_TREE
+ || rhs != PHI_RESULT (re->lcssa_phi))
+ return;
+
+ if ((lhs = gimple_assign_lhs (consumer)) == NULL_TREE
+ || !REFERENCE_CLASS_P (lhs))
+ return;
+
+ re->fini_ref = lhs;
+
+ /* Simple reduction with constant initializer. */
+ if (re->init_ref == NULL_TREE)
+ {
+ gcc_assert (CONSTANT_CLASS_P (re->init));
+ re->init_ref = unshare_expr (re->fini_ref);
+ }
+
+ /* Require memory references in producer and consumer are the same so
+ that we can undo reduction during interchange. */
+ if (re->init_ref && !operand_equal_p (re->init_ref, re->fini_ref, 0))
+ return;
+
+ re->type = SIMPLE_RTYPE;
+}
+
+/* Analyze reduction variable VAR. ILOOP is not NULL if current loop_cand
+ is outer loop in loop nest. Return true if analysis succeeds. */
+
+bool
+loop_cand::analyze_reduction_var (loop_cand *iloop, tree var)
+{
+ if (iloop != NULL)
+ return analyze_oloop_reduction_var (iloop, var);
+ else
+ return analyze_iloop_reduction_var (var);
+}
+
+/* Analyze reduction variable VAR for inner loop of the loop nest to be
+ interchanged. Return true if analysis succeeds. */
+
+bool
+loop_cand::analyze_iloop_reduction_var (tree var)
+{
+ gphi *phi = as_a <gphi *> (SSA_NAME_DEF_STMT (var));
+ gphi *lcssa_phi = NULL, *use_phi;
+ tree init = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
+ tree next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
+ edge e = single_exit (loop);
+ reduction_p re;
+ gimple *stmt, *next_def, *single_use = NULL;
+ use_operand_p use_p;
+ imm_use_iterator iterator;
+
+ if (TREE_CODE (next) != SSA_NAME)
+ return false;
+
+ next_def = SSA_NAME_DEF_STMT (next);
+ basic_block bb = gimple_bb (next_def);
+ if (!bb || !flow_bb_inside_loop_p (loop, bb))
+ return false;
+
+ /* In restricted reduction, the var is (and must be) used in defining
+ the updated var. The process can be depicted as below:
+
+ var ;; = PHI<init, next>
+ |
+ |
+ v
+ +---------------------+
+ | reduction operators | <-- other operands
+ +---------------------+
+ |
+ |
+ v
+ next
+
+ In terms loop interchange, we don't change how NEXT is computed based
+ on VAR and OTHER OPERANDS. In case of double reduction in loop nest
+ to be interchanged, we don't changed it at all. In the case of simple
+ reduction in inner loop, we only make change how VAR/NEXT is loaded or
+ stored. With these conditions, we can relax restrictions on reduction
+ in a way that reduction operation is seen as black box. In general,
+ we can ignore reassociation of reduction operator; we can handle fake
+ reductions in which VAR is not even used to compute NEXT. */
+ FOR_EACH_IMM_USE_FAST (use_p, iterator, var)
+ {
+ stmt = USE_STMT (use_p);
+ if (is_gimple_debug (stmt))
+ continue;
+
+ if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
+ return false;
+
+ if (single_use != NULL)
+ return false;
+
+ single_use = stmt;
+ }
+
+ if (single_use != next_def
+ && !stmt_dominates_stmt_p (single_use, next_def))
+ return false;
+
+ /* Only support cases in which INIT is used in inner loop. */
+ if (TREE_CODE (init) == SSA_NAME)
+ FOR_EACH_IMM_USE_FAST (use_p, iterator, init)
+ {
+ stmt = USE_STMT (use_p);
+ if (is_gimple_debug (stmt))
+ continue;
+
+ if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
+ return false;
+ }
+
+ FOR_EACH_IMM_USE_FAST (use_p, iterator, next)
+ {
+ stmt = USE_STMT (use_p);
+ if (is_gimple_debug (stmt))
+ continue;
+
+ /* Or else it's used in PHI itself. */
+ use_phi = NULL;
+ if (is_a <gphi *> (stmt)
+ && (use_phi = as_a <gphi *> (stmt)) != NULL
+ && use_phi == phi)
+ continue;
+
+ if (use_phi != NULL
+ && lcssa_phi == NULL
+ && gimple_bb (stmt) == e->dest
+ && PHI_ARG_DEF_FROM_EDGE (use_phi, e) == next)
+ lcssa_phi = use_phi;
+ else
+ return false;
+ }
+ re = XCNEW (struct reduction);
+ re->var = var;
+ re->init = init;
+ re->next = next;
+ re->phi = phi;
+ re->lcssa_phi = lcssa_phi;
+
+ classify_simple_reduction (re);
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ dump_reduction (re);
+
+ reductions.safe_push (re);
+ return true;
+}
+
+/* Analyze reduction variable VAR for outer loop of the loop nest to be
+ interchanged. ILOOP is not NULL and points to inner loop. For the
+ moment, we only support double reduction for outer loop, like:
+
+ for (int i = 0; i < n; i++)
+ {
+ int sum = 0;
+
+ for (int j = 0; j < n; j++) // outer loop
+ for (int k = 0; k < n; k++) // inner loop
+ sum += a[i][k]*b[k][j];
+
+ s[i] = sum;
+ }
+
+ Note the innermost two loops are the loop nest to be interchanged.
+ Return true if analysis succeeds. */
+
+bool
+loop_cand::analyze_oloop_reduction_var (loop_cand *iloop, tree var)
+{
+ gphi *phi = as_a <gphi *> (SSA_NAME_DEF_STMT (var));
+ gphi *lcssa_phi = NULL, *use_phi;
+ tree init = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
+ tree next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
+ edge e = single_exit (loop);
+ reduction_p re;
+ gimple *stmt, *next_def;
+ use_operand_p use_p;
+ imm_use_iterator iterator;
+
+ if (TREE_CODE (next) != SSA_NAME)
+ return false;
+
+ next_def = SSA_NAME_DEF_STMT (next);
+ basic_block bb = gimple_bb (next_def);
+ if (!bb || !flow_bb_inside_loop_p (loop, bb))
+ return false;
+
+ /* Find inner loop's simple reduction that uses var as initializer. */
+ reduction_p inner_re = iloop->find_reduction_by_init (var);
+ if (inner_re == NULL
+ || inner_re->type != UNKNOWN_RTYPE
+ || inner_re->producer != phi)
+ return false;
+
+ /* In case of double reduction, outer loop's reduction should be updated
+ by inner loop's simple reduction. */
+ if (next_def != inner_re->lcssa_phi)
+ return false;
+
+ /* Outer loop's reduction should only be used to initialize inner loop's
+ simple reduction. */
+ FOR_EACH_IMM_USE_FAST (use_p, iterator, var)
+ {
+ stmt = USE_STMT (use_p);
+ if (is_gimple_debug (stmt))
+ continue;
+
+ if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
+ return false;
+
+ if (! is_a <gphi *> (stmt)
+ || (use_phi = as_a <gphi *> (stmt)) == NULL
+ || use_phi != inner_re->phi)
+ return false;
+ }
+
+ /* Check this reduction is correctly used outside of loop via lcssa phi. */
+ FOR_EACH_IMM_USE_FAST (use_p, iterator, next)
+ {
+ stmt = USE_STMT (use_p);
+ if (is_gimple_debug (stmt))
+ continue;
+
+ /* Or else it's used in PHI itself. */
+ use_phi = NULL;
+ if (is_a <gphi *> (stmt)
+ && (use_phi = as_a <gphi *> (stmt)) != NULL
+ && use_phi == phi)
+ continue;
+
+ if (lcssa_phi == NULL
+ && use_phi != NULL
+ && gimple_bb (stmt) == e->dest
+ && PHI_ARG_DEF_FROM_EDGE (use_phi, e) == next)
+ lcssa_phi = use_phi;
+ else
+ return false;
+ }
+
+ re = XCNEW (struct reduction);
+ re->var = var;
+ re->init = init;
+ re->next = next;
+ re->phi = phi;
+ re->lcssa_phi = lcssa_phi;
+ re->type = DOUBLE_RTYPE;
+ inner_re->type = DOUBLE_RTYPE;
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ dump_reduction (re);
+
+ reductions.safe_push (re);
+ return true;
+}
+
+/* Return true if VAR is induction variable of current loop whose scev is
+ specified by CHREC. */
+
+bool
+loop_cand::analyze_induction_var (tree var, tree chrec)
+{
+ gphi *phi = as_a <gphi *> (SSA_NAME_DEF_STMT (var));
+ tree init = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
+
+ /* Var is loop invariant, though it's unlikely to happen. */
+ if (tree_does_not_contain_chrecs (chrec))
+ {
+ struct induction *iv = XCNEW (struct induction);
+ iv->var = var;
+ iv->init = init;
+ iv->base = chrec;
+ iv->step = build_int_cst (TREE_TYPE (chrec), 0);
+ inductions.safe_push (iv);
+ return true;
+ }
+
+ if (TREE_CODE (chrec) != POLYNOMIAL_CHREC
+ || CHREC_VARIABLE (chrec) != (unsigned) loop->num
+ || tree_contains_chrecs (CHREC_LEFT (chrec), NULL)
+ || tree_contains_chrecs (CHREC_RIGHT (chrec), NULL))
+ return false;
+
+ struct induction *iv = XCNEW (struct induction);
+ iv->var = var;
+ iv->init = init;
+ iv->base = CHREC_LEFT (chrec);
+ iv->step = CHREC_RIGHT (chrec);
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ dump_induction (loop, iv);
+
+ inductions.safe_push (iv);
+ return true;
+}
+
+/* Return true if all loop carried variables defined in loop header can
+ be successfully analyzed. */
+
+bool
+loop_cand::analyze_carried_vars (loop_cand *iloop)
+{
+ edge e = loop_preheader_edge (outer);
+ gphi_iterator gsi;
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "\nLoop(%d) carried vars:\n", loop->num);
+
+ for (gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ gphi *phi = gsi.phi ();
+
+ if (unsupported_phi_node (phi))
+ return false;
+
+ tree var = PHI_RESULT (phi);
+ if (virtual_operand_p (var))
+ continue;
+
+ tree chrec = analyze_scalar_evolution (loop, var);
+ chrec = instantiate_scev (e, loop, chrec);
+
+ /* Analyze var as reduction variable. */
+ if (chrec_contains_undetermined (chrec)
+ || chrec_contains_symbols_defined_in_loop (chrec, outer->num))
+ {
+ if (!analyze_reduction_var (iloop, var))
+ return false;
+ }
+ /* Analyze var as induction variable. */
+ else if (!analyze_induction_var (var, chrec))
+ return false;
+ }
+
+ return true;
+}
+
+/* Return TRUE if loop closed PHI nodes can be analyzed successfully. */
+
+bool
+loop_cand::analyze_lcssa_phis (void)
+{
+ edge e = single_exit (loop);
+ gphi_iterator gsi;
+
+ for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ gphi *phi = gsi.phi ();
+
+ if (unsupported_phi_node (phi))
+ return false;
+
+ if (virtual_operand_p (PHI_RESULT (phi)))
+ continue;
+
+ /* TODO: We only support lcssa phi for reduction for now. */
+ if (!find_reduction_by_stmt (phi))
+ return false;
+ }
+
+ return true;
+}
+
+/* Find all stmts on which CONSUMER depends in basic block BB, put the
+ result stmts in sequence STMTS. */
+
+static void
+find_deps_in_bb_for_stmt (gimple_seq *stmts, basic_block bb, gimple *consumer)
+{
+ auto_vec<gimple *> worklist;
+ use_operand_p use_p;
+ ssa_op_iter iter;
+ gimple *stmt;
+
+ worklist.safe_push (consumer);
+ while (!worklist.is_empty ())
+ {
+ stmt = worklist.pop ();
+ FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
+ {
+ gimple *def_stmt = SSA_NAME_DEF_STMT (USE_FROM_PTR (use_p));
+
+ if (is_a <gphi *> (def_stmt))
+ continue;
+
+ if (gimple_bb (def_stmt) != bb)
+ continue;
+
+ if (gimple_plf (def_stmt, GF_PLF_1))
+ continue;
+
+ worklist.safe_push (def_stmt);
+ }
+ gimple_set_plf (stmt, GF_PLF_1, true);
+ }
+ for (gimple_stmt_iterator gsi = gsi_start_bb_nondebug (bb); !gsi_end_p
(gsi);)
+ {
+ stmt = gsi_stmt (gsi);
+ if (stmt == consumer)
+ break;
+ if (gimple_plf (stmt, GF_PLF_1))
+ {
+ gsi_remove (&gsi, false, false);
+ gimple_seq_add_stmt (stmts, stmt);
+ }
+ else
+ gsi_next_nondebug (&gsi);
+ }
+}
+
+/* Given inner loop with simple reduction as below:
+
+ for (i = 0; i < N; i++)
+ for (j = 0; j < N; j++)
+ {
+ int red = c[i][j]; // producer
+ for (k = 0; k < N; k++)
+ red += a[i][k] * b[k][j];
+
+ c[i][j] = red; // consumer
+ }
+
+ This function undo the reduction and generates below loop nest:
+
+ for (i = 0; i < N; i++)
+ for (j = 0; j < N; j++)
+ {
+ for (k = 0; k < N; k++)
+ c[i][j] += a[i][k] * b[k][j];
+ }
+
+ This basically reverts transformation done by LIM or PRE. */
+
+void
+loop_cand::undo_simple_reduction (reduction_p re)
+{
+ gimple *phi = SSA_NAME_DEF_STMT (re->var);
+ gimple_stmt_iterator from, gsi = gsi_after_labels (loop->header);
+
+ if (re->producer != NULL)
+ {
+ /* Move producer stmt into inner loop. */
+ if (gimple_vuse (re->producer))
+ gimple_set_vuse (re->producer, NULL_TREE);
+ reset_debug_uses (re->producer);
+ gimple_assign_set_lhs (re->producer, re->var);
+ update_stmt(re->producer);
+ from = gsi_for_stmt (re->producer);
+ gsi_remove (&from, false, false);
+ gsi_insert_before (&gsi, re->producer, GSI_SAME_STMT);
+
+ /* Replace all uses of init in loop. */
+ gcc_assert (TREE_CODE (re->init) == SSA_NAME);
+ gimple *stmt;
+ use_operand_p use_p;
+ imm_use_iterator iterator;
+ FOR_EACH_IMM_USE_STMT (stmt, iterator, re->init)
+ {
+ FOR_EACH_IMM_USE_ON_STMT (use_p, iterator)
+ SET_USE (use_p, re->var);
+
+ update_stmt (stmt);
+ }
+ }
+ else
+ {
+ /* Transform simple reduction of below form:
+
+ init = 0;
+ loop:
+ var = phi<init, next>
+ next = var op ...
+ reduc_sum = phi<next>
+ MEM_REF[...] = reduc_sum
+
+ into:
+
+ init = 0;
+ loop:
+ tmp = MEM_REF[...]
+ var = !first_iteration ? tmp : init
+ next = var op ...
+ MEM_REF[...] = next
+ reduc_sum = phi<next>
+
+ Note constant init value is used in the first iteration. */
+
+ /* Find all stmts on which consumer depends because we generate new
+ load stmt to TMP. */
+ gimple_seq stmts = NULL;
+ find_deps_in_bb_for_stmt (&stmts, gimple_bb (re->consumer),
re->consumer);
+
+ /* Generate stmt loading from mem reference to TMP variable. */
+ tree tmp = copy_ssa_name (re->var);
+ gimple *stmt = gimple_build_assign (tmp, re->init_ref);
+ gimple_seq_add_stmt (&stmts, stmt);
+
+ /* Generate cond stmt so that constant init value is used in the first
+ iteration. */
+ induction_p iv = inductions[0];
+ tree cond = fold_build2 (NE_EXPR, boolean_type_node, iv->var, iv->init);
+ stmt = gimple_build_assign (re->var, COND_EXPR, cond, tmp, re->init);
+ gimple_seq_add_stmt (&stmts, stmt);
+
+ gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
+ }
+
+ /* Delete loop header PHI node of reduction. */
+ gsi = gsi_for_stmt (phi);
+ gsi_remove (&gsi, true);
+
+ /* Move consumer stmt into inner loop, just after its def. */
+ if (gimple_vdef (re->consumer))
+ gimple_set_vuse (re->consumer, NULL_TREE);
+ if (gimple_vuse (re->consumer))
+ gimple_set_vuse (re->consumer, NULL_TREE);
+ gimple_assign_set_rhs1 (re->consumer, re->next);
+ update_stmt (re->consumer);
+ from = gsi_for_stmt (re->consumer);
+ gsi = gsi_for_stmt (SSA_NAME_DEF_STMT (re->next));
+ gsi_remove (&from, false, false);
+ gsi_insert_after (&gsi, re->consumer, GSI_NEW_STMT);
+
+ /* Delete loop closed PHI node of reduction. */
+ gsi = gsi_for_stmt (re->lcssa_phi);
+ reset_debug_uses (re->lcssa_phi);
+ gsi_remove (&gsi, true);
+}
+
+/* Eliminate dead code after loop interchange. */
+
+void
+loop_cand::eliminate_dead_code (void)
+{
+ /* Check basic blocks other than loop header/exit. */
+ for (unsigned i = 0; i < loop->num_nodes; i++)
+ {
+ basic_block bb = bbs[i];
+
+ /* Skip basic blocks of inner loops. */
+ if (bb->loop_father != loop)
+ continue;
+
+ for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
+ {
+ tree lhs;
+ gimple *stmt = gsi_stmt (gsi);
+
+ /* Given copy propagation is done during interchange, we can
+ simply check zero uses of var and eliminate it. */
+ if (is_gimple_assign (stmt)
+ && !gimple_vuse (stmt)
+ && !gimple_has_volatile_ops (stmt)
+ && !gimple_has_side_effects (stmt)
+ && (lhs = gimple_assign_lhs (stmt)) != NULL_TREE
+ && TREE_CODE (lhs) == SSA_NAME
+ && has_zero_uses (lhs))
+ gsi_remove (&gsi, true);
+ else
+ gsi_next (&gsi);
+ }
+ }
+}
+
+/* Free DATAREFS and its auxiliary memory. */
+
+static void
+free_data_refs_with_aux (vec<data_reference_p> datarefs)
+{
+ data_reference_p dr;
+ for (unsigned i = 0; datarefs.iterate (i, &dr); ++i)
+ if (dr->aux != NULL)
+ {
+ DR_ACCESS_STRIDE (dr)->release ();
+ free (dr->aux);
+ }
+
+ free_data_refs (datarefs);
+}
+
+/* Class for loop interchange transformation. */
+
+class tree_loop_interchange
+{
+public:
+ tree_loop_interchange (vec<loop_p> loop_nest,
+ vec<data_reference_p> datarefs, vec<ddr_p> ddrs)
+ : loop_nest (loop_nest), datarefs (datarefs),
+ ddrs (ddrs), niters_iv_var (NULL_TREE) { }
+ ~tree_loop_interchange () {
+ free_dependence_relations (ddrs);
+ free_data_refs_with_aux (datarefs);
+ loop_nest.release ();
+ }
+ bool interchange ();
+
+private:
+ void update_data_refs (unsigned, unsigned);
+ void update_data_deps (unsigned, unsigned);
+ bool valid_data_dependences (unsigned, unsigned);
+ bool can_interchange_loops (loop_cand &, loop_cand &);
+ void interchange_loops (loop_cand &, loop_cand &);
+ void interchange_reductions (loop_cand &, loop_cand &);
+ void interchange_inductions (loop_cand &, loop_cand &);
+ void map_inductions_to_loop (loop_cand &, loop_cand &);
+ void move_code_to_inner_loop (struct loop *, struct loop *, basic_block *);
+
+ /* Vector of the loop nest. */
+ vec<struct loop *> loop_nest;
+ /* Vector of data references in loop nest. */
+ vec<data_reference_p> datarefs;
+ /* Vector of data dependence relations in loop nest. */
+ vec<ddr_p> ddrs;
+
+ /* We create new IV which is only used in loop's exit condition check.
+ In case of 3-level loop nest interchange, when we interchange the
+ innermost two loops, new IV created in the middle level loop does
+ not need to be preserved in interchanging the outermost two loops
+ later. We record the IV so that it can be skipped. */
+ tree niters_iv_var;
+};
+
+/* Update data refs' access stride after interchanging loops. I_IDX/O_IDX
+ gives index of interchanged loops in loop nest. */
+
+void
+tree_loop_interchange::update_data_refs (unsigned i_idx, unsigned o_idx)
+{
+ struct data_reference *dr;
+ for (unsigned i = 0; datarefs.iterate (i, &dr); ++i)
+ {
+ vec<tree> *stride = DR_ACCESS_STRIDE (dr);
+ gcc_assert (stride->length () > i_idx);
+ std::swap ((*stride)[i_idx], (*stride)[o_idx]);
+ }
+}
+
+/* Update data dependence relations after interchanging loops. I_IDX/O_IDX
+ gives index of interchanged loops in loop nest, they are used to access
+ DIST_MATRIX. */
+
+void
+tree_loop_interchange::update_data_deps (unsigned i_idx, unsigned o_idx)
+{
+ struct data_dependence_relation *ddr;
+
+ for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i)
+ {
+ /* Skip no-dependence case. */
+ if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
+ continue;
+
+ for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); ++j)
+ {
+ lambda_vector dist_vect = DDR_DIST_VECT (ddr, j);
+ std::swap (dist_vect[i_idx], dist_vect[o_idx]);
+ }
+ }
+}
+
+/* Check data dependence relations, return TRUE if it's valid to interchange
+ two loops specified by I_IDX/O_IDX. Theoretically, interchanging the two
+ loops is valid only if dist vector, after interchanging, doesn't have '>'
+ as the leftmost non-'=' direction. Practically, this function have been
+ conservative here by not checking some valid cases. */
+
+bool
+tree_loop_interchange::valid_data_dependences (unsigned i_idx, unsigned o_idx)
+{
+ struct data_dependence_relation *ddr;
+
+ for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i)
+ {
+ /* Skip no-dependence case. */
+ if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
+ continue;
+
+ for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); ++j)
+ {
+ lambda_vector dist_vect = DDR_DIST_VECT (ddr, j);
+ unsigned level = dependence_level (dist_vect, loop_nest.length ());
+
+ /* If there is no carried dependence. */
+ if (level == 0)
+ continue;
+
+ level --;
+
+ /* If dependence is not carried by any loop in between the two
+ loops [oloop, iloop] to interchange. */
+ if (level < o_idx || level > i_idx)
+ continue;
+
+ /* Be conservative, skip case if either direction at i_idx/o_idx
+ levels is not '=' or '<'. */
+ if (dist_vect[i_idx] < 0 || dist_vect[o_idx] < 0)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/* Return true if ILOOP and OLOOP can be interchanged in terms of code
+ transformation. */
+
+bool
+tree_loop_interchange::can_interchange_loops (loop_cand &iloop,
+ loop_cand &oloop)
+{
+ return (iloop.analyze_carried_vars (NULL)
+ && iloop.analyze_lcssa_phis ()
+ && oloop.analyze_carried_vars (&iloop)
+ && oloop.analyze_lcssa_phis ()
+ && iloop.can_interchange_p (NULL)
+ && oloop.can_interchange_p (&iloop));
+}
+
+/* Interchange niters info of ILOOP and OLOOP while reset any other niters
+ estimates information for now. */
+
+static inline void
+interchange_nb_iterations (struct loop *iloop, struct loop *oloop)
+{
+ tree nb_iterations = oloop->nb_iterations;
+
+ oloop->any_upper_bound = false;
+ oloop->any_likely_upper_bound = false;
+ free_numbers_of_iterations_estimates (oloop);
+
+ oloop->nb_iterations = iloop->nb_iterations;
+
+ iloop->any_upper_bound = false;
+ iloop->any_likely_upper_bound = false;
+ free_numbers_of_iterations_estimates (iloop);
+
+ iloop->nb_iterations = nb_iterations;
+}
+
+/* Interchange two loops specified by ILOOP and OLOOP. */
+
+void
+tree_loop_interchange::interchange_loops (loop_cand &iloop, loop_cand &oloop)
+{
+ interchange_reductions (iloop, oloop);
+ interchange_inductions (iloop, oloop);
+
+ interchange_nb_iterations (iloop.loop, oloop.loop);
+
+ iloop.eliminate_dead_code ();
+}
+
+/* If STMT is a debug stmt that uses VAR, mark pass local flag if it only
+ refers to VAR, otherwise, remove the stmt. */
+
+static inline void
+mark_or_remove_dbg_stmt (gimple *stmt, tree var)
+{
+ if (!is_gimple_debug (stmt))
+ return;
+
+ tree t = SINGLE_SSA_TREE_OPERAND (stmt, SSA_OP_USE);
+ if (t == var)
+ gimple_set_plf (stmt, GF_PLF_1, true);
+ else
+ {
+ gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+ gsi_remove (&gsi, true);
+ }
+}
+
+/* Interchange transformation for reductions of ILOOP and OLOOP. We only
+ support two types reductions for now:
+ 1) simple reduction of inner loop.
+ 2) double reduction of loop nest.
+ For simple reduction, we simply undo it by moving producer/consumer to
+ inner loop; for double reduction, we don't need to do anything. */
+
+void
+tree_loop_interchange::interchange_reductions (loop_cand &iloop,
+ loop_cand &oloop)
+{
+ unsigned i;
+ reduction_p re;
+
+ for (i = 0; iloop.reductions.iterate (i, &re); ++i)
+ {
+ if (re->type == DOUBLE_RTYPE)
+ continue;
+
+ /* Undo simple reductions. */
+ iloop.undo_simple_reduction (re);
+ }
+
+ for (i = 0; oloop.reductions.iterate (i, &re); ++i)
+ {
+ if (re->type != DOUBLE_RTYPE)
+ gcc_unreachable ();
+
+ use_operand_p use_p;
+ imm_use_iterator iterator;
+ FOR_EACH_IMM_USE_FAST (use_p, iterator, re->var)
+ mark_or_remove_dbg_stmt (USE_STMT (use_p), re->var);
+ FOR_EACH_IMM_USE_FAST (use_p, iterator, re->next)
+ mark_or_remove_dbg_stmt (USE_STMT (use_p), re->next);
+ if (TREE_CODE (re->init) == SSA_NAME)
+ {
+ FOR_EACH_IMM_USE_FAST (use_p, iterator, re->init)
+ mark_or_remove_dbg_stmt (USE_STMT (use_p), re->init);
+ }
+ }
+}
+
+/* Interchange transformation for inductions of ILOOP and OLOOP. */
+
+void
+tree_loop_interchange::interchange_inductions (loop_cand &iloop,
+ loop_cand &oloop)
+{
+ /* Map outer loop's IV to inner loop. */
+ map_inductions_to_loop (oloop, iloop);
+ /* Map inner loop's IV to outer loop. */
+ map_inductions_to_loop (iloop, oloop);
+
+ /* Create canonical IV for both loops. Note canonical IV for outer/inner
+ loop is actually from inner/outer loop. Also we record the new IV
+ created for the outer loop so that it can be skipped in later loop
+ interchange. */
+ create_canonical_iv (oloop.loop, single_exit (oloop.loop), iloop.niters,
+ &niters_iv_var);
+ create_canonical_iv (iloop.loop, single_exit (iloop.loop), oloop.niters);
+}
+
+/* Map induction variables of SRC loop to TGT loop. The function firstly
+ creates the same IV of SRC loop in TGT loop, then deletes the original
+ IV and re-initialize it using the newly created IV. For example, loop
+ nest:
+
+ for (i = 0; i < N; i++)
+ for (j = 0; j < M; j++)
+ {
+ //use of i;
+ //use of j;
+ }
+
+ will be transformed into:
+
+ for (jj = 0; jj < M; jj++)
+ for (ii = 0; ii < N; ii++)
+ {
+ //use of ii;
+ //use of jj;
+ }
+
+ after loop interchange. */
+
+void
+tree_loop_interchange::map_inductions_to_loop (loop_cand &src, loop_cand &tgt)
+{
+ induction_p iv;
+ edge e = single_exit (tgt.loop);
+ gimple_stmt_iterator incr_pos = gsi_last_bb (e->src), gsi;
+ bool move_code_p = flow_loop_nested_p (src.loop, tgt.loop);
+
+ /* Move src's code to tgt loop. This is necessary when src is the outer
+ loop and tgt is the inner loop. */
+ if (move_code_p)
+ move_code_to_inner_loop (src.loop, tgt.loop, src.bbs);
+
+ /* Map source loop's IV to target loop. */
+ for (unsigned i = 0; src.inductions.iterate (i, &iv); ++i)
+ {
+ gimple *stmt = SSA_NAME_DEF_STMT (iv->var);
+ gcc_assert (is_a <gphi *> (stmt));
+
+ /* Delete var definition of the original IV's in the source loop. */
+ gsi = gsi_for_stmt (stmt);
+ gsi_remove (&gsi, true);
+
+ /* No need to map PHI to target loop if it is created in previous
+ loop interchange. */
+ if (niters_iv_var == iv->var)
+ {
+ gcc_assert (!move_code_p);
+ continue;
+ }
+
+ /* Map the IV by creating the same one in target loop. */
+ tree base = unshare_expr (iv->base), step = unshare_expr (iv->step);
+ create_iv (base, step, SSA_NAME_VAR (iv->var),
+ tgt.loop, &incr_pos, false, &iv->mapped_var, NULL);
+
+ /* Replace uses of the original IV var with newly created IV var. */
+ use_operand_p imm_use_p;
+ imm_use_iterator iterator;
+ FOR_EACH_IMM_USE_STMT (stmt, iterator, iv->var)
+ {
+ FOR_EACH_IMM_USE_ON_STMT (imm_use_p, iterator)
+ SET_USE (imm_use_p, iv->mapped_var);
+
+ update_stmt (stmt);
+ }
+ }
+}
+
+/* Compute the insert position at inner loop when moving code from outer
+ loop to inner one. */
+
+static inline void
+insert_pos_at_inner_loop (struct loop *outer, struct loop *inner,
+ basic_block bb, gimple_stmt_iterator *pos)
+{
+ /* Move code from header/latch to header/latch. */
+ if (bb == outer->header)
+ *pos = gsi_after_labels (inner->header);
+ else if (bb == outer->latch)
+ *pos = gsi_after_labels (inner->latch);
+ else
+ {
+ /* Otherwise, simply move to exit->src. */
+ edge e = single_exit (inner);
+ *pos = gsi_last_bb (e->src);
+ }
+}
+
+/* Move stmts of outer loop to inner loop. */
+
+void
+tree_loop_interchange::move_code_to_inner_loop (struct loop *outer,
+ struct loop *inner,
+ basic_block *outer_bbs)
+{
+ unsigned int i;
+ edge oloop_exit = single_exit (outer);
+ gimple_stmt_iterator insert_pos, gsi;
+
+ for (i = 0; i < outer->num_nodes; i++)
+ {
+ basic_block bb = outer_bbs[i];
+
+ /* Skip basic blocks of inner loop. */
+ if (flow_bb_inside_loop_p (inner, bb))
+ continue;
+
+ insert_pos_at_inner_loop (outer, inner, bb, &insert_pos);
+ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
+ {
+ gimple *stmt = gsi_stmt (gsi);
+ if (gimple_code (stmt) == GIMPLE_LABEL)
+ {
+ gsi_next (&gsi);
+ continue;
+ }
+
+ if (oloop_exit->src == bb
+ && stmt == gsi_stmt (gsi_last_bb (oloop_exit->src)))
+ {
+ gsi_next (&gsi);
+ continue;
+ }
+
+ if (gimple_plf (stmt, GF_PLF_1))
+ {
+ gsi_next (&gsi);
+ continue;
+ }
+
+ if (gimple_vuse (stmt))
+ gimple_set_vuse (stmt, NULL_TREE);
+ if (gimple_vdef (stmt))
+ gimple_set_vdef (stmt, NULL_TREE);
+
+ gsi_remove (&gsi, false, false);
+ gsi_insert_before (&insert_pos, stmt, GSI_SAME_STMT);
+ }
+ }
+}
+
+/* Estimate and return the value of EXPR by replacing variables in EXPR
+ with CST_TREE and simplifying. */
+
+static tree
+estimate_val_by_simplify_replace (tree expr, tree cst_tree)
+{
+ unsigned i, n;
+ tree ret = NULL_TREE, e, se;
+
+ if (!expr)
+ return NULL_TREE;
+
+ /* Do not bother to replace constants. */
+ if (CONSTANT_CLASS_P (expr))
+ return expr;
+
+ if (!EXPR_P (expr))
+ return cst_tree;
+
+ n = TREE_OPERAND_LENGTH (expr);
+ for (i = 0; i < n; i++)
+ {
+ e = TREE_OPERAND (expr, i);
+ se = estimate_val_by_simplify_replace (e, cst_tree);
+ if (e == se)
+ continue;
+
+ if (!ret)
+ ret = copy_node (expr);
+
+ TREE_OPERAND (ret, i) = se;
+ }
+
+ return (ret ? fold (ret) : expr);
+}
+
+/* Given data reference DR in LOOP_NEST, the function computes DR's access
+ stride at each level of loop from innermost LOOP to outer. On success,
+ it saves access stride at each level loop in a vector which is pointed
+ by DR->aux. For example:
+
+ int arr[100][100][100];
+ for (i = 0; i < 100; i++) ;(DR->aux)strides[0] = 40000
+ for (j = 100; j > 0; j--) ;(DR->aux)strides[1] = 400
+ for (k = 0; k < 100; k++) ;(DR->aux)strides[2] = 4
+ arr[i][j - 1][k] = 0; */
+
+static void
+compute_access_stride (struct loop *loop_nest, struct loop *loop,
+ data_reference_p dr)
+{
+ vec<tree> *strides = new vec<tree> ();
+ basic_block bb = gimple_bb (DR_STMT (dr));
+
+ while (!flow_bb_inside_loop_p (loop, bb))
+ {
+ strides->safe_push (build_int_cst (sizetype, 0));
+ loop = loop_outer (loop);
+ }
+
+ gcc_assert (loop == bb->loop_father);
+
+ tree ref = DR_REF (dr);
+ tree scev_base = build_fold_addr_expr (ref);
+ tree access_size = TYPE_SIZE_UNIT (TREE_TYPE (ref));
+ tree niters = build_int_cst (sizetype, AVG_LOOP_NITER);
+ access_size = fold_build2 (MULT_EXPR, sizetype, niters, access_size);
+
+ do {
+ tree scev_fn = analyze_scalar_evolution (loop, scev_base);
+ if (chrec_contains_undetermined (scev_fn)
+ || chrec_contains_symbols_defined_in_loop (scev_fn, loop->num))
+ break;
+
+ if (TREE_CODE (scev_fn) != POLYNOMIAL_CHREC)
+ {
+ scev_base = scev_fn;
+ strides->safe_push (build_int_cst (sizetype, 0));
+ continue;
+ }
+
+ scev_base = CHREC_LEFT (scev_fn);
+ if (tree_contains_chrecs (scev_base, NULL))
+ break;
+
+ tree scev_step = fold_convert (sizetype, CHREC_RIGHT (scev_fn));
+
+ enum ev_direction scev_dir = scev_direction (scev_fn);
+ /* Estimate if step isn't constant. */
+ if (scev_dir == EV_DIR_UNKNOWN)
+ {
+ scev_step = estimate_val_by_simplify_replace (scev_step, niters);
+ if (TREE_CODE (scev_step) != INTEGER_CST
+ || tree_int_cst_lt (scev_step, access_size))
+ scev_step = access_size;
+ }
+ /* Compute absolute value of scev step. */
+ else if (scev_dir == EV_DIR_DECREASES)
+ scev_step = fold_build1 (NEGATE_EXPR, sizetype, scev_step);
+
+ strides->safe_push (scev_step);
+ } while (loop != loop_nest && (loop = loop_outer (loop)) != NULL);
+
+ dr->aux = strides;
+}
+
+/* Given loop nest LOOP_NEST with innermost LOOP, the function computes
+ access strides with respect to each level loop for all data refs in
+ DATAREFS from inner loop to outer loop. On success, it returns the
+ outermost loop that access strides can be computed successfully for
+ all data references. If access strides cannot be computed at least
+ for two levels of loop for any data reference, it returns NULL. */
+
+static struct loop *
+compute_access_strides (struct loop *loop_nest, struct loop *loop,
+ vec<data_reference_p> datarefs)
+{
+ unsigned i, j, num_loops = (unsigned) -1;
+ data_reference_p dr;
+ vec<tree> *stride;
+
+ for (i = 0; datarefs.iterate (i, &dr); ++i)
+ {
+ compute_access_stride (loop_nest, loop, dr);
+ stride = DR_ACCESS_STRIDE (dr);
+ if (stride->length () < num_loops)
+ {
+ num_loops = stride->length ();
+ if (num_loops < 2)
+ return NULL;
+ }
+ }
+
+ for (i = 0; datarefs.iterate (i, &dr); ++i)
+ {
+ stride = DR_ACCESS_STRIDE (dr);
+ if (stride->length () > num_loops)
+ stride->truncate (num_loops);
+
+ for (j = 0; j < (num_loops >> 1); ++j)
+ std::swap ((*stride)[j], (*stride)[num_loops - j - 1]);
+ }
+
+ while (--num_loops > 0)
+ loop = loop_outer (loop);
+
+ gcc_assert (loop_nest == loop || flow_loop_nested_p (loop_nest, loop));
+ return loop;
+}
+
+/* Count and return the number of loops in LOOP_NEST. */
+
+unsigned int
+num_loops_in_loop_nest (struct loop *loop_nest)
+{
+ unsigned num_loops;
+ for (num_loops = 0; loop_nest; num_loops++, loop_nest = loop_nest->inner)
+ ;
+ return num_loops;
+}
+
+/* Prune access strides for data references in DATAREFS by removing strides
+ of loops that isn't in current LOOP_NEST. */
+
+static void
+prune_access_strides_not_in_loop (struct loop *loop_nest,
+ vec<data_reference_p> datarefs)
+{
+ data_reference_p dr;
+ unsigned num_loops = num_loops_in_loop_nest (loop_nest);
+ gcc_assert (num_loops > 1);
+
+ /* Block remove strides of loops that is not in current loop nest. */
+ for (unsigned i = 0; datarefs.iterate (i, &dr); ++i)
+ {
+ vec<tree> *stride = DR_ACCESS_STRIDE (dr);
+ if (stride->length () > num_loops)
+ stride->block_remove (0, stride->length () - num_loops);
+ }
+}
+
+/* Dump access strides for all DATAREFS. */
+
+static void
+dump_access_strides (vec<data_reference_p> datarefs)
+{
+ data_reference_p dr;
+ fprintf (dump_file, "Access Strides for DRs:\n");
+ for (unsigned i = 0; datarefs.iterate (i, &dr); ++i)
+ {
+ fprintf (dump_file, " ");
+ print_generic_expr (dump_file, DR_REF (dr), TDF_SLIM);
+ fprintf (dump_file, ":\t\t<");
+
+ vec<tree> *stride = DR_ACCESS_STRIDE (dr);
+ unsigned num_loops = stride->length ();
+ for (unsigned j = 0; j < num_loops; ++j)
+ {
+ print_generic_expr (dump_file, (*stride)[j], TDF_SLIM);
+ fprintf (dump_file, "%s", (j < num_loops - 1) ? ",\t" : ">\n");
+ }
+ }
+}
+
+/* Return true if it's profitable to interchange two loops whose index
+ in whole loop nest vector are I_IDX/O_IDX respectively. The function
+ computes and compares three types information from all DATAREFS:
+ 1) Access stride for loop I_IDX and O_IDX.
+ 2) Number of invariant memory references with respect to I_IDX before
+ and after loop interchange.
+ 3) Flags indicating if all memory references access sequential memory
+ in ILOOP, before and after loop interchange.
+ If INNMOST_LOOP_P is true, the two loops for interchanging are the two
+ innermost loops in loop nest. This function also dumps information if
+ DUMP_INFO_P is true. */
+
+static bool
+should_interchange_loops (unsigned i_idx, unsigned o_idx,
+ vec<data_reference_p> datarefs,
+ bool innermost_loops_p, bool dump_info_p = true)
+{
+ unsigned HOST_WIDE_INT ratio;
+ unsigned i, j, num_old_inv_drs = 0, num_new_inv_drs = 0;
+ struct data_reference *dr;
+ bool all_seq_dr_before_p = true, all_seq_dr_after_p = true;
+ widest_int iloop_strides = 0, oloop_strides = 0;
+
+ if (dump_info_p && dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "\nData ref strides:\n\tmem_ref:\t\tiloop\toloop\n");
+
+ for (i = 0; datarefs.iterate (i, &dr); ++i)
+ {
+ vec<tree> *stride = DR_ACCESS_STRIDE (dr);
+ tree iloop_stride = (*stride)[i_idx], oloop_stride = (*stride)[o_idx];
+ gcc_assert (TREE_CODE (iloop_stride) == INTEGER_CST);
+ gcc_assert (TREE_CODE (oloop_stride) == INTEGER_CST);
+
+ bool subloop_stride_p = false;
+ for (j = i_idx + 1; j < stride->length (); ++j)
+ if (integer_nonzerop ((*stride)[j]))
+ {
+ subloop_stride_p = true;
+ break;
+ }
+
+ if (integer_nonzerop (iloop_stride))
+ iloop_strides = wi::add (iloop_strides, wi::to_widest (iloop_stride));
+ else if (!subloop_stride_p)
+ num_old_inv_drs++;
+
+ if (integer_nonzerop (oloop_stride))
+ oloop_strides = wi::add (oloop_strides, wi::to_widest (oloop_stride));
+ else if (!subloop_stride_p)
+ num_new_inv_drs++;
+
+ /* Track if all data references access sequential memory before and
+ after loop interchange. */
+ if (subloop_stride_p)
+ {
+ /* Data ref can't be sequential if it evaluates wrto any sub loop
+ of inner loop. */
+ all_seq_dr_before_p = false;
+ all_seq_dr_after_p = false;
+ }
+ else
+ {
+ tree access_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
+ /* Consider data reference with smaller access stride than the
+ threshold as sequential access. Note invariant is considered
+ sequential too. */
+ if (all_seq_dr_before_p
+ && integer_nonzerop (iloop_stride)
+ && wi::gtu_p (wi::to_wide (iloop_stride),
+ wi::to_wide (access_size)))
+ all_seq_dr_before_p = false;
+ if (all_seq_dr_after_p
+ && integer_nonzerop (oloop_stride)
+ && wi::gtu_p (wi::to_wide (oloop_stride),
+ wi::to_wide (access_size)))
+ all_seq_dr_after_p = false;
+ }
+ }
+
+ if (dump_info_p && dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "\toverall:\t\t");
+ print_decu (iloop_strides, dump_file);
+ fprintf (dump_file, "\t");
+ print_decu (oloop_strides, dump_file);
+ fprintf (dump_file, "\n");
+
+ fprintf (dump_file, "Invariant data ref: before(%d), after(%d)\n",
+ num_old_inv_drs, num_new_inv_drs);
+ fprintf (dump_file, "Consecutive stride: before(%s), after(%s)\n",
+ all_seq_dr_before_p ? "true" : "false",
+ all_seq_dr_after_p ? "true" : "false");
+ }
+
+ /* We use different stride comparison ratio for interchanging innermost
+ two loops or not. The idea is to be conservative in interchange for
+ the innermost loops. */
+ ratio = innermost_loops_p ? INNER_STRIDE_RATIO : OUTER_STRIDE_RATIO;
+ /* Do interchange if it gives better data locality behavior. */
+ if (wi::gtu_p (iloop_strides, wi::mul (oloop_strides, ratio)))
+ return true;
+ if (wi::gtu_p (iloop_strides, oloop_strides))
+ {
+ /* Or it creates more invariant memory references. */
+ if ((!all_seq_dr_before_p || all_seq_dr_after_p)
+ && num_new_inv_drs > num_old_inv_drs)
+ return true;
+ /* Or it makes all memory references sequential. */
+ if (num_new_inv_drs >= num_old_inv_drs
+ && !all_seq_dr_before_p && all_seq_dr_after_p)
+ return true;
+ }
+
+ return false;
+}
+
+/* Try to interchange inner loop of a loop nest to outer level. */
+
+bool
+tree_loop_interchange::interchange ()
+{
+ bool changed_p = false;
+ /* In each iteration we try to interchange I-th loop with (I+1)-th loop.
+ The overall effect is to push inner loop to outermost level in whole
+ loop nest. */
+ for (unsigned i = loop_nest.length (); i > 1; --i)
+ {
+ unsigned i_idx = i - 1, o_idx = i - 2;
+
+ /* Check validity for loop interchange. */
+ if (!valid_data_dependences (i_idx, o_idx))
+ break;
+
+ loop_cand iloop (i_idx, loop_nest[i_idx], loop_nest[o_idx]);
+ loop_cand oloop (o_idx, loop_nest[o_idx], loop_nest[o_idx]);
+
+ /* Check if we can do transformation for loop interchange. */
+ if (!can_interchange_loops (iloop, oloop))
+ break;
+
+ /* Check profitability for loop interchange. */
+ if (should_interchange_loops (i_idx, o_idx, datarefs,
+ iloop.loop->inner == NULL))
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ "Loop_pair<outer:%d, inner:%d> is interchanged\n\n",
+ oloop.loop->num, iloop.loop->num);
+
+ interchange_loops (iloop, oloop);
+ /* No need to update information if there is no further loop
+ interchange. */
+ if (o_idx > 0)
+ {
+ update_data_refs (i_idx, o_idx);
+ update_data_deps (i_idx, o_idx);
+ }
+ changed_p = true;
+ }
+ else
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ "Loop_pair<outer:%d, inner:%d> is not interchanged\n\n",
+ oloop.loop->num, iloop.loop->num);
+ }
+ }
+
+ return changed_p;
+}
+
+
+/* Loop interchange pass. */
+
+namespace {
+
+const pass_data pass_data_linterchange =
+{
+ GIMPLE_PASS, /* type */
+ "linterchange", /* name */
+ OPTGROUP_LOOP, /* optinfo_flags */
+ TV_LINTERCHANGE, /* tv_id */
+ PROP_cfg, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ 0, /* todo_flags_finish */
+};
+
+class pass_linterchange : public gimple_opt_pass
+{
+public:
+ pass_linterchange (gcc::context *ctxt)
+ : gimple_opt_pass (pass_data_linterchange, ctxt)
+ {}
+
+ /* opt_pass methods: */
+ opt_pass * clone () { return new pass_linterchange (m_ctxt); }
+ virtual bool gate (function *) { return flag_tree_loop_interchange; }
+ virtual unsigned int execute (function *);
+
+}; // class pass_linterchange
+
+
+/* Return true if LOOP has proper form for interchange. */
+
+static bool
+proper_loop_form_for_interchange (struct loop *loop)
+{
+ edge e0, e1, exit;
+
+ /* Don't interchange if loop has unsupported information for the moment. */
+ if (loop->safelen > 0
+ || loop->constraints != 0
+ || loop->can_be_parallel
+ || loop->dont_vectorize
+ || loop->force_vectorize
+ || loop->in_oacc_kernels_region
+ || loop->orig_loop_num != 0
+ || loop->simduid != NULL_TREE)
+ return false;
+
+ /* Don't interchange if outer loop has basic block other than header,
+ exit->src and latch. In general, only below form of loop nest:
+ header<---+
+ | |
+ v |
+ INNER_LOOP |
+ | |
+ v |
+ exit--->latch
+ is supported. */
+ if (loop->inner != NULL
+ && loop->num_nodes != loop->inner->num_nodes + 3)
+ return false;
+
+ if ((exit = single_dom_exit (loop)) == NULL)
+ return false;
+
+ /* Check control flow on loop header/exit blocks. */
+ if (loop->header == exit->src
+ && (EDGE_COUNT (loop->header->preds) != 2
+ || EDGE_COUNT (loop->header->succs) != 2))
+ return false;
+ else if (loop->header != exit->src
+ && (EDGE_COUNT (loop->header->preds) != 2
+ || !single_succ_p (loop->header)
+ || unsupported_edge (single_succ_edge (loop->header))
+ || EDGE_COUNT (exit->src->succs) != 2
+ || !single_pred_p (exit->src)
+ || unsupported_edge (single_pred_edge (exit->src))))
+ return false;
+
+ e0 = EDGE_PRED (loop->header, 0);
+ e1 = EDGE_PRED (loop->header, 1);
+ if (unsupported_edge (e0) || unsupported_edge (e1)
+ || (e0->src != loop->latch && e1->src != loop->latch)
+ || (e0->src->loop_father == loop && e1->src->loop_father == loop))
+ return false;
+
+ e0 = EDGE_SUCC (exit->src, 0);
+ e1 = EDGE_SUCC (exit->src, 1);
+ if (unsupported_edge (e0) || unsupported_edge (e1)
+ || (e0->dest != loop->latch && e1->dest != loop->latch)
+ || (e0->dest->loop_father == loop && e1->dest->loop_father == loop))
+ return false;
+
+ return true;
+}
+
+/* Return true if any two adjacent loops in loop nest OUTERMOST_LOOP should
+ be interchanged by looking into all DATAREFS. INNERMOST_LOOP is the
+ innermost loop of this loop nest. */
+
+static bool
+should_interchange_loop_nest (struct loop *outermost_loop,
+ struct loop *innermost_loop,
+ vec<data_reference_p> datarefs)
+{
+ unsigned idx = num_loops_in_loop_nest (outermost_loop) - 1;
+ gcc_assert (idx > 0);
+
+ /* Check if any two adjacent loops should be interchanged. */
+ for (struct loop *loop = innermost_loop;
+ loop != outermost_loop;
+ loop = loop_outer (loop), idx--)
+ if (should_interchange_loops (idx, idx - 1, datarefs,
+ loop == innermost_loop, false))
+ return true;
+
+ return false;
+}
+
+/* Given loop nest LOOP_NEST and data references DATAREFS, compute data
+ dependences for loop interchange and store it in DDRS. Note we compute
+ dependences directly rather than call generic interface so that we can
+ return on unknown dependence instantly. */
+
+static bool
+tree_loop_interchange_compute_ddrs (vec<loop_p> loop_nest,
+ vec<data_reference_p> datarefs,
+ vec<ddr_p> *ddrs)
+{
+ struct data_reference *a, *b;
+ struct loop *innermost = loop_nest.last ();
+
+ for (unsigned i = 0; datarefs.iterate (i, &a); ++i)
+ {
+ bool a_outer_p = gimple_bb (DR_STMT (a))->loop_father != innermost;
+ for (unsigned j = i + 1; datarefs.iterate (j, &b); ++j)
+ if (DR_IS_WRITE (a) || DR_IS_WRITE (b))
+ {
+ bool b_outer_p = gimple_bb (DR_STMT (b))->loop_father != innermost;
+ /* Don't support multiple write references in outer loop. */
+ if (a_outer_p && b_outer_p && DR_IS_WRITE (a) && DR_IS_WRITE (b))
+ return false;
+
+ ddr_p ddr = initialize_data_dependence_relation (a, b, loop_nest);
+ ddrs->safe_push (ddr);
+ compute_affine_dependence (ddr, loop_nest[0]);
+
+ /* Give up if ddr is unknown dependence or classic direct vector
+ is not available. */
+ if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
+ || (DDR_ARE_DEPENDENT (ddr) == NULL_TREE
+ && DDR_NUM_DIR_VECTS (ddr) == 0))
+ return false;
+
+ /* If either data references is in outer loop of nest, we require
+ no dependence here because the data reference need to be moved
+ into inner loop during interchange. */
+ if (a_outer_p && b_outer_p
+ && operand_equal_p (DR_REF (a), DR_REF (b), 0))
+ continue;
+ if (DDR_ARE_DEPENDENT (ddr) != chrec_known
+ && (a_outer_p || b_outer_p))
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/* Prune DATAREFS by removing any data reference not inside of LOOP. */
+
+static inline void
+prune_datarefs_not_in_loop (struct loop *loop, vec<data_reference_p> datarefs)
+{
+ struct data_reference *dr;
+
+ for (unsigned i = 0; datarefs.iterate (i, &dr);)
+ if (flow_bb_inside_loop_p (loop, gimple_bb (DR_STMT (dr))))
+ i++;
+ else
+ {
+ datarefs.ordered_remove (i);
+ if (dr->aux)
+ {
+ DR_ACCESS_STRIDE (dr)->release ();
+ free (dr->aux);
+ }
+ free_data_ref (dr);
+ }
+}
+
+/* Given loop nest like <OLOOP, ILOOP>, the function strips off outer
+ loops if it forms non-rectangle loop nest. The outermost loop of
+ the rest rectangle loop nest or NULL is returned. */
+
+struct loop *
+prune_non_rectangle_loop_nest (struct loop *iloop, struct loop *oloop)
+{
+ if (!oloop || !iloop || oloop == iloop)
+ return NULL;
+
+ struct loop *loop1 = iloop;
+
+ while (loop1 != NULL && flow_loop_nested_p (oloop, loop1))
+ {
+ tree niters = number_of_latch_executions (loop1);
+ struct loop *loop2 = loop_outer (loop1);
+
+ while (loop2 != NULL
+ && (loop2 == oloop || flow_loop_nested_p (oloop, loop2)))
+ {
+ /* Strip off the outermost loop if it isn't rectangle loop nest. */
+ if (chrec_contains_symbols_defined_in_loop (niters, loop2->num))
+ {
+ oloop = loop2->inner;
+ break;
+ }
+
+ loop2 = loop_outer (loop2);
+ }
+ if (oloop == iloop)
+ return NULL;
+
+ loop1 = loop_outer (loop1);
+ }
+
+ return oloop;
+}
+
+/* Clear pass local flag stmts in LOOP. */
+
+static void
+clear_stmt_flag (struct loop *loop)
+{
+ basic_block *bbs = get_loop_body (loop);
+
+ for (unsigned i = 0; i < loop->num_nodes; i++)
+ {
+ basic_block bb = bbs[i];
+
+ for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+ !gsi_end_p (gsi); gsi_next (&gsi))
+ gimple_set_plf (gsi_stmt (gsi), GF_PLF_1, false);
+ }
+
+ free (bbs);
+}
+
+/* Return true if number of iterations of any data reference in DATAREFS
+ is different to niters of its loop header. */
+
+static bool
+dataref_niters_diff_to_loop_header (vec<data_reference_p> datarefs)
+{
+ data_reference_p dr;
+ for (unsigned i = 0; datarefs.iterate (i, &dr); ++i)
+ {
+ basic_block bb = gimple_bb (DR_STMT (dr));
+ struct loop *loop = bb->loop_father;
+ if (!dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src, bb))
+ return true;
+ }
+ return false;
+}
+
+/* Given innermost LOOP, return true if perfect loop nest can be found and
+ data dependences can be computed. If succeed, record the perfect loop
+ nest in LOOP_NEST; record all data references in DATAREFS and record all
+ data dependence relations in DDRS.
+
+ We do support a restricted form of imperfect loop nest, i.e, loop nest
+ with load/store in outer loop initializing/finalizing simple reduction
+ of the innermost loop. For such outer loop reference, we require that
+ it has no dependence with others sinve it will be moved to inner loop
+ in interchange. */
+
+static bool
+prepare_perfect_loop_nest (struct loop *loop, vec<loop_p> *loop_nest,
+ vec<data_reference_p> *datarefs, vec<ddr_p> *ddrs)
+{
+ tree niters;
+ struct loop *start_loop = NULL, *innermost_loop = loop;
+
+ /* Find loop nest from the innermost loop. */
+ while (loop->num != 0 && loop->inner == start_loop)
+ {
+ if (!proper_loop_form_for_interchange (loop))
+ break;
+
+ /* Loop must have determined niters. */
+ niters = number_of_latch_executions (loop);
+ if (!niters || chrec_contains_undetermined (niters))
+ break;
+
+ start_loop = loop;
+ /* If this loop has sibling loop, the father loop won't be in perfect
+ loop nest. */
+ if (loop->next != NULL)
+ break;
+
+ loop = loop_outer (loop);
+ }
+
+ start_loop = prune_non_rectangle_loop_nest (innermost_loop, start_loop);
+
+ if (!start_loop || !start_loop->inner)
+ return false;
+
+ datarefs->create (20);
+ if (find_data_references_in_loop (start_loop, datarefs) == chrec_dont_know
+ /* Check if there is no data reference. */
+ || datarefs->length () == 0
+ /* Check if there are too many of data references. */
+ || ((int) datarefs->length ()
+ > PARAM_VALUE (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS))
+ /* Check if there is any data reference in loop latch. We can't handle
+ loops which loop header and data references have different execution
+ times. */
+ || dataref_niters_diff_to_loop_header (*datarefs)
+ /* Compute access strides for all data references. */
+ || ((start_loop = compute_access_strides (start_loop, innermost_loop,
+ *datarefs)) == NULL)
+ /* Check if loop nest should be interchanged. */
+ || !should_interchange_loop_nest (start_loop, innermost_loop, *datarefs))
+ {
+ free_data_refs_with_aux (*datarefs);
+ return false;
+ }
+
+ /* Check if data dependences can be computed for loop nest starting from
+ start_loop. */
+ loop = start_loop;
+ loop_nest->create (3);
+ do {
+ ddrs->create (20);
+ loop_nest->truncate (0);
+
+ if (loop != start_loop)
+ prune_datarefs_not_in_loop (start_loop, *datarefs);
+
+ if (find_loop_nest (start_loop, loop_nest)
+ && tree_loop_interchange_compute_ddrs (*loop_nest, *datarefs, ddrs))
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ "\nConsider loop interchange for loop_nest<%d - %d>\n",
+ start_loop->num, innermost_loop->num);
+
+ if (loop != start_loop)
+ prune_access_strides_not_in_loop (start_loop, *datarefs);
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ dump_access_strides (*datarefs);
+
+ clear_stmt_flag (start_loop);
+ return true;
+ }
+
+ free_dependence_relations (*ddrs);
+ /* Try to compute data dependences with the outermost loop stripped. */
+ loop = start_loop;
+ start_loop = start_loop->inner;
+ } while (start_loop && start_loop->inner);
+
+ loop_nest->release ();
+ free_data_refs_with_aux (*datarefs);
+ return false;
+}
+
+/* Main entry for loop interchange pass. */
+
+unsigned int
+pass_linterchange::execute (function *fun)
+{
+ if (number_of_loops (fun) <= 2)
+ return 0;
+
+ bool changed_p = false;;
+ struct loop *loop;
+ vec<loop_p> loop_nest;
+ vec<data_reference_p> datarefs;
+ vec<ddr_p> ddrs;
+
+ FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST)
+ if (prepare_perfect_loop_nest (loop, &loop_nest, &datarefs, &ddrs))
+ {
+ tree_loop_interchange loop_interchange (loop_nest, datarefs, ddrs);
+ changed_p |= loop_interchange.interchange ();
+ }
+
+ if (changed_p)
+ scev_reset_htab ();
+
+ return changed_p ? (TODO_update_ssa_only_virtuals) : 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_linterchange (gcc::context *ctxt)
+{
+ return new pass_linterchange (ctxt);
+}
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 8b1daa6..205534f 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -76,10 +76,13 @@ enum unroll_level
};
/* Adds a canonical induction variable to LOOP iterating NITER times. EXIT
- is the exit edge whose condition is replaced. */
+ is the exit edge whose condition is replaced. The ssa versions of the new
+ IV before and after increment will be stored in VAR_BEFORE and VAR_AFTER
+ if they are not NULL. */
-static void
-create_canonical_iv (struct loop *loop, edge exit, tree niter)
+void
+create_canonical_iv (struct loop *loop, edge exit, tree niter,
+ tree *var_before = NULL, tree *var_after = NULL)
{
edge in;
tree type, var;
@@ -112,7 +115,9 @@ create_canonical_iv (struct loop *loop, edge exit, tree
niter)
create_iv (niter,
build_int_cst (type, -1),
NULL_TREE, loop,
- &incr_at, false, NULL, &var);
+ &incr_at, false, var_before, &var);
+ if (var_after)
+ *var_after = var;
cmp = (exit->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
gimple_cond_set_code (cond, cmp);
diff --git a/gcc/tree-ssa-loop-ivopts.h b/gcc/tree-ssa-loop-ivopts.h
index bd92051..a723f46 100644
--- a/gcc/tree-ssa-loop-ivopts.h
+++ b/gcc/tree-ssa-loop-ivopts.h
@@ -32,4 +32,6 @@ extern tree strip_offset (tree, unsigned HOST_WIDE_INT *);
bool may_be_nonaddressable_p (tree expr);
void tree_ssa_iv_optimize (void);
+void create_canonical_iv (struct loop *, edge, tree,
+ tree * = NULL, tree * = NULL);
#endif /* GCC_TREE_SSA_LOOP_IVOPTS_H */