On Wed, Aug 30, 2017 at 3:18 PM, Bin Cheng <bin.ch...@arm.com> wrote: > Hi, > This patch implements a simple loop interchange pass in GCC, as described by > its comments: > +/* This pass performs loop interchange: for example, the loop nest > + > + for (int j = 0; j < N; j++) > + for (int k = 0; k < N; k++) > + for (int i = 0; i < N; i++) > + c[i][j] = c[i][j] + a[i][k]*b[k][j]; > + > + is transformed to > + > + for (int i = 0; i < N; i++) > + for (int j = 0; j < N; j++) > + for (int k = 0; k < N; k++) > + c[i][j] = c[i][j] + a[i][k]*b[k][j]; > + > + This pass implements loop interchange in the following steps: > + > + 1) Find perfect loop nest for each innermost loop and compute data > + dependence relations for it. For above example, loop nest is > + <loop_j, loop_k, loop_i>. > + 2) From innermost to outermost loop, this pass tries to interchange > + each loop pair. For above case, it firstly tries to interchange > + <loop_k, loop_i> and loop nest becomes <loop_j, loop_i, loop_k>. > + Then it tries to interchange <loop_j, loop_i> and loop nest becomes > + <loop_i, loop_j, loop_k>. The overall effect is to move innermost > + loop to the outermost position. For loop pair <loop_i, loop_j> > + to be interchanged, we: > + 3) Check if data dependence relations are valid for loop interchange. > + 4) Check if both loops can be interchanged in terms of transformation. > + 5) Check if interchanging the two loops is profitable. > + 6) Interchange the two loops by mapping induction variables. > + > + This pass also handles reductions in loop nest. So far we only support > + simple reduction of inner loop and double reduction of the loop nest. */ > > Actually, this pass only does loop shift which outermosting inner loop to > outer, rather > than permutation. Also as a traditional loop optimizer, it only works for > perfect loop > nest. I put it just after loop distribution thus ideally loop > split/distribution could > create perfect nest for it. Unfortunately, we don't get any perfect nest > from distribution > for now because it only works for innermost loop. For example, the > motivation case in > spec2k6/bwaves is not handled on this pass alone. I have a patch extending > distribution > for (innermost) loop nest and with that patch bwaves case can be handled. > Another point is I deliberately make both the cost model and code > transformation (very) > conservative. We can support more cases, or more transformations with great > care when > it is for sure known beneficial. IMHO, we already hit over-baked issues > quite often and > don't want to introduce more. > As for code generation, this patch has an issue that invariant code in outer > loop could > be moved to inner loop. For the moment, we rely on the last lim pass to > handle all INV > generated during interchange. In the future, we may need to avoid that in > interchange > itself, or another lim pass just like the one after graphite optimizations. > > Boostrap and test on x86_64 and AArch64. Various benchmarks built and run > successfully. > Note this pass is disabled in patch, while the code is exercised by > bootstrap/building > programs with it enabled by default. Any comments?
+/* The same as above, but this one is only used for interchanging not + innermost loops. */ +#define OUTER_STRIDE_RATIO (2) please make all these knobs --params. +/* Enum type for loop reduction variable. */ +enum reduction_type +{ + UNKNOWN_RTYPE = 0, + SIMPLE_RTYPE, + DOUBLE_RTYPE +}; seeing this we should have some generic data structure / analysis for reduction detection. This adds a third user (autopar and vectorizer are the others). Just an idea. +/* Return true if E is abnormal edge. */ + +static inline bool +abnormal_edge (edge e) +{ + return (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_IRREDUCIBLE_LOOP)); +} bad name/comment for what it does. ... jumping to end of file / start of pass + /* Get the outer loop. */ + loop = superloop_at_depth (loop, loop_depth (loop) - 1); loop_outer (loop)? + /* Only support rectangle loop nest, i.e, inner loop's niters doesn't + depends on outer loop's IV. */ + if (chrec_contains_symbols_defined_in_loop (niters, loop->num)) + break; but you don't check for a three-nest whether niters depends on outer outer loop's IV that way. Either the check is superfluous here or incomplete. + /* Check if start_loop forms a perfect loop nest. */ + loop_nest->create (3); + do { + datarefs->create (20); + ddrs->create (20); + loop_nest->truncate (0); + if (compute_data_dependences_for_loop (start_loop, false, loop_nest, + datarefs, ddrs)) + { + unsigned i; + struct data_dependence_relation *ddr; + + for (i = 0; ddrs->iterate (i, &ddr); ++i) + { + if (DDR_ARE_DEPENDENT (ddr) == chrec_known) + continue; + /* Give up on any unknown dependence. */ + if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know + || DDR_NUM_DIR_VECTS (ddr) == 0) + break; + } + + if (i == ddrs->length ()) + return true; better open-code this so we dont' waste time computing all dependences when we give up in the majority of cases (unknown dependence). Memleak here (ddrs and datarefs). + start_loop = start_loop->inner; + } while (start_loop && start_loop->inner); ick. So this is cubic -- nest depth * #drs * #drs ... (exactly why I never committed loop distribution for nests ;)). I see that should_interchange_loops only uses datarefs. This means I'd rather do that as a very first step before considering validity (and computing dependences). That analysis (for all possible interchanges) should be much cheaper? I see it probably doesn't fit with the iteration you do very well... can't we somehow compute a loop permutation and apply it in a single step rather than piecewise with update_data_refs/deps? valid_data_dependences has almost no comments, it would be nice to add some (overall) one(s). Thanks, Richard. > Thanks, > bin > 2017-08-29 Bin Cheng <bin.ch...@arm.com> > > * Makefile.in (tree-ssa-loop-interchange.o): New object file. > * common.opt (ftree-loop-interchange): New option. > * doc/invoke.texi (-ftree-loop-interchange): Document new option. > * passes.def (pass_linterchange): New pass. > * timevar.def (TV_LINTERCHANGE): New time var. > * tree-pass.h (make_pass_linterchange): New declaration. > * tree-ssa-loop-interchange.cc: New file. > * tree-ssa-loop-ivcanon.c (create_canonical_iv): Change to external. > * tree-ssa-loop-ivopts.h (create_canonical_iv): New declaration. > > gcc/testsuite > 2017-08-29 Bin Cheng <bin.ch...@arm.com> > > * gcc.dg/tree-ssa/loop-interchange-1.c: New test. > * gcc.dg/tree-ssa/loop-interchange-2.c: New test. > * gcc.dg/tree-ssa/loop-interchange-3.c: New test. > * gcc.dg/tree-ssa/loop-interchange-4.c: New test. > * gcc.dg/tree-ssa/loop-interchange-5.c: New test. > * gcc.dg/tree-ssa/loop-interchange-6.c: New test. > * gcc.dg/tree-ssa/loop-interchange-7.c: New test. > * gcc.dg/tree-ssa/loop-interchange-8.c: New test. > * gcc.dg/tree-ssa/loop-interchange-9.c: New test. > * gcc.dg/tree-ssa/loop-interchange-10.c: New test.