On Wed, Aug 30, 2017 at 3:19 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Wed, Aug 30, 2017 at 3:18 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>> Hi,
>> This patch implements a simple loop interchange pass in GCC, as described by 
>> its comments:
>> +/* This pass performs loop interchange: for example, the loop nest
>> +
>> +   for (int j = 0; j < N; j++)
>> +     for (int k = 0; k < N; k++)
>> +       for (int i = 0; i < N; i++)
>> +        c[i][j] = c[i][j] + a[i][k]*b[k][j];
>> +
>> +   is transformed to
>> +
>> +   for (int i = 0; i < N; i++)
>> +     for (int j = 0; j < N; j++)
>> +       for (int k = 0; k < N; k++)
>> +        c[i][j] = c[i][j] + a[i][k]*b[k][j];
>> +
>> +   This pass implements loop interchange in the following steps:
>> +
>> +     1) Find perfect loop nest for each innermost loop and compute data
>> +       dependence relations for it.  For above example, loop nest is
>> +       <loop_j, loop_k, loop_i>.
>> +     2) From innermost to outermost loop, this pass tries to interchange
>> +       each loop pair.  For above case, it firstly tries to interchange
>> +       <loop_k, loop_i> and loop nest becomes <loop_j, loop_i, loop_k>.
>> +       Then it tries to interchange <loop_j, loop_i> and loop nest becomes
>> +       <loop_i, loop_j, loop_k>.  The overall effect is to move innermost
>> +       loop to the outermost position.  For loop pair <loop_i, loop_j>
>> +       to be interchanged, we:
>> +     3) Check if data dependence relations are valid for loop interchange.
>> +     4) Check if both loops can be interchanged in terms of transformation.
>> +     5) Check if interchanging the two loops is profitable.
>> +     6) Interchange the two loops by mapping induction variables.
>> +
>> +   This pass also handles reductions in loop nest.  So far we only support
>> +   simple reduction of inner loop and double reduction of the loop nest.  */
>>
>> Actually, this pass only does loop shift which outermosting inner loop to 
>> outer, rather
>> than permutation.  Also as a traditional loop optimizer, it only works for 
>> perfect loop
>> nest.  I put it just after loop distribution thus ideally loop 
>> split/distribution could
>> create perfect nest for it.  Unfortunately, we don't get any perfect nest 
>> from distribution
>> for now because it only works for innermost loop.  For example, the 
>> motivation case in
>> spec2k6/bwaves is not handled on this pass alone.  I have a patch extending 
>> distribution
>> for (innermost) loop nest and with that patch bwaves case can be handled.
>> Another point is I deliberately make both the cost model and code 
>> transformation (very)
>> conservative.  We can support more cases, or more transformations with great 
>> care when
>> it is for sure known beneficial.  IMHO, we already hit over-baked issues 
>> quite often and
>> don't want to introduce more.
>> As for code generation, this patch has an issue that invariant code in outer 
>> loop could
>> be moved to inner loop.  For the moment, we rely on the last lim pass to 
>> handle all INV
>> generated during interchange.  In the future, we may need to avoid that in 
>> interchange
>> itself, or another lim pass just like the one after graphite optimizations.
>>
>> Boostrap and test on x86_64 and AArch64.  Various benchmarks built and run 
>> successfully.
>> Note this pass is disabled in patch, while the code is exercised by 
>> bootstrap/building
>> programs with it enabled by default.  Any comments?
>
Thanks for quick review.
> +/* The same as above, but this one is only used for interchanging not
> +   innermost loops.  */
> +#define OUTER_STRIDE_RATIO     (2)
>
> please make all these knobs --params.
>
> +/* Enum type for loop reduction variable.  */
> +enum reduction_type
> +{
> +  UNKNOWN_RTYPE = 0,
> +  SIMPLE_RTYPE,
> +  DOUBLE_RTYPE
> +};
>
> seeing this we should have some generic data structure / analysis for
> reduction detection.  This adds a third user (autopar and vectorizer
> are the others).  Just an idea.
>
> +/* Return true if E is abnormal edge.  */
> +
> +static inline bool
> +abnormal_edge (edge e)
> +{
> +  return (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_IRREDUCIBLE_LOOP));
> +}
>
> bad name/comment for what it does.
>
> ... jumping to end of file / start of pass
>
> +      /* Get the outer loop.  */
> +      loop = superloop_at_depth (loop, loop_depth (loop) - 1);
>
> loop_outer (loop)?
>
> +      /* Only support rectangle loop nest, i.e, inner loop's niters doesn't
> +        depends on outer loop's IV.  */
> +      if (chrec_contains_symbols_defined_in_loop (niters, loop->num))
> +       break;
>
> but you don't check for a three-nest whether niters depends on outer outer
> loop's IV that way.  Either the check is superfluous here or incomplete.
It is checked for multi-nest case in can_interchange_loops.  I will
move the check to this function so that we can save compilation time.
>
> +  /* Check if start_loop forms a perfect loop nest.  */
> +  loop_nest->create (3);
> +  do {
> +    datarefs->create (20);
> +    ddrs->create (20);
> +    loop_nest->truncate (0);
> +    if (compute_data_dependences_for_loop (start_loop, false, loop_nest,
> +                                          datarefs, ddrs))
> +      {
> +       unsigned i;
> +       struct data_dependence_relation *ddr;
> +
> +       for (i = 0; ddrs->iterate (i, &ddr); ++i)
> +         {
> +           if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
> +             continue;
> +           /* Give up on any unknown dependence.  */
> +           if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
> +               || DDR_NUM_DIR_VECTS (ddr) == 0)
> +             break;
> +         }
> +
> +       if (i == ddrs->length ())
> +         return true;
>
> better open-code this so we dont' waste time computing all dependences
> when we give up in the majority of cases (unknown dependence).
Right.  Will make the change.
> Memleak here (ddrs and datarefs).
>
> +    start_loop = start_loop->inner;
> +  } while (start_loop && start_loop->inner);
>
> ick.  So this is cubic -- nest depth * #drs * #drs ... (exactly why I
> never committed loop distribution for nests ;)).
Hmm, loop distribution for (innermost) nest is necessary for this pass
to handle bwaves unfortunately.  It is also necessary to distribute
for (i;)
  for (j)
    arr[i * len + j] = 0;
into a single memcall, rather than a loop of sub-memcall.

>
> I see that should_interchange_loops only uses datarefs.  This means
> I'd rather do that as a very first step before considering validity
> (and computing dependences).  That analysis (for all possible
> interchanges) should be much cheaper?  I see it probably doesn't
Yes it makes the code less straightforward.  I think we can
additionally call should_interchange_loops before computing
dependencies.  This will bypass for most cases, for example, ~3500
loops can be interchanged without cost model, while only  ~200 loops
actually pass the cost model.

> fit with the iteration you do very well...  can't we somehow compute
> a loop permutation and apply it in a single step rather than
> piecewise with update_data_refs/deps?
Arbitrary loop permutation would be non-trivial.  If you meant
computing loop shit (outermosting here) and transforming the code once
for all, I think it's doable.  Cost model check can be easily extended
in that way, though transformation part needs quite lot rework.  One
of my concern is, other than matrix-mul, I don't have motivation case
for multi-nest interchange so far.
>
> valid_data_dependences has almost no comments, it would be nice
> to add some (overall) one(s).
Sorry, I just realized it's wrong because direct vector is misused for
distance vector...  Surprised it didn't bite with thousands of
interchanged loops in spec...  Will fix it.

>
> Thanks,
> Richard.
>
>
>> Thanks,
>> bin
>> 2017-08-29  Bin Cheng  <bin.ch...@arm.com>
>>
>>         * Makefile.in (tree-ssa-loop-interchange.o): New object file.
>>         * common.opt (ftree-loop-interchange): New option.
>>         * doc/invoke.texi (-ftree-loop-interchange): Document new option.
>>         * passes.def (pass_linterchange): New pass.
>>         * timevar.def (TV_LINTERCHANGE): New time var.
>>         * tree-pass.h (make_pass_linterchange): New declaration.
>>         * tree-ssa-loop-interchange.cc: New file.
>>         * tree-ssa-loop-ivcanon.c (create_canonical_iv): Change to external.
>>         * tree-ssa-loop-ivopts.h (create_canonical_iv): New declaration.
>>
>> gcc/testsuite
>> 2017-08-29  Bin Cheng  <bin.ch...@arm.com>
>>
>>         * gcc.dg/tree-ssa/loop-interchange-1.c: New test.
>>         * gcc.dg/tree-ssa/loop-interchange-2.c: New test.
>>         * gcc.dg/tree-ssa/loop-interchange-3.c: New test.
>>         * gcc.dg/tree-ssa/loop-interchange-4.c: New test.
>>         * gcc.dg/tree-ssa/loop-interchange-5.c: New test.
>>         * gcc.dg/tree-ssa/loop-interchange-6.c: New test.
>>         * gcc.dg/tree-ssa/loop-interchange-7.c: New test.
>>         * gcc.dg/tree-ssa/loop-interchange-8.c: New test.
>>         * gcc.dg/tree-ssa/loop-interchange-9.c: New test.
>>         * gcc.dg/tree-ssa/loop-interchange-10.c: New test.

Reply via email to