Hello,
On Fri, 22 Sep 2017, Bin.Cheng wrote:
> This is updated patch for loop interchange with review suggestions
> resolved. Changes are:
> 1) It does more light weight checks like rectangle loop nest check
> earlier than before.
> 2) It checks profitability of interchange before data dependence
> computation.
> 3) It calls find_data_references_in_loop only once for a loop nest now.
> 4) Data dependence is open-computed so that we can skip instantly at
> unknown dependence.
> 5) It improves code generation in mapping induction variables for
> loop nest, as well as
> adding a simple dead code elimination pass.
> 6) It changes magic constants into parameters.
So I have a couple comments/questions. Something stylistic:
> +class loop_cand
> +{
> +public:
> ...
> + friend class tree_loop_interchange;
> +private:
Just make this all public (and hence a struct, not class).
No need for friends in file local classes.
> +single_use_in_loop (tree var, struct loop *loop)
> ...
> + FOR_EACH_IMM_USE_FAST (use_p, iterator, var)
> + {
> + stmt = USE_STMT (use_p);
> ...
> + basic_block bb = gimple_bb (stmt);
> + gcc_assert (bb != NULL);
This pattern reoccurs often in your patch: you check for a bb associated
for a USE_STMT. Uses of SSA names always occur in basic blocks, no need
for checking.
Then, something about your handling of simple reductions:
> +void
> +loop_cand::classify_simple_reduction (reduction_p re)
> +{
> ...
> + /* Require memory references in producer and consumer are the same so
> + that we can undo reduction during interchange. */
> + if (re->init_ref && !operand_equal_p (re->init_ref, re->fini_ref, 0))
> + return;
Where is it checked that the undoing transformation is legal also
from a data dep point of view? Think code like this:
sum = X[i];
for (j ...)
sum += X[j];
X[i] = sum;
Moving the store into the inner loop isn't always correct and I don't seem
to find where the above situation is rejected.
Maybe I'm confused because I also don't see where you even can get into
the above situation (though I do see testcases about this). The thing is,
for an 2d loop nest to contain something like the above reduction it can't
be perfect:
for (j) {
int sum = X[j]; // 1
for (i)
sum += Y[j][i];
X[j] = sum; // 2
}
But you do check for perfectness in proper_loop_form_for_interchange and
prepare_perfect_loop_nest, so either you can't get into the situation or
the checking can't be complete, or you define the above to be perfect
nevertheless (probably because the load and store are in outer loop
header/exit blocks?). The latter would mean that you accept also other
code in header/footer of loops from a pure CFG perspective, so where is it
checked that that other code (which aren't simple reductions) isn't
harmful to the transformation?
Then, the data dependence part of the new pass:
> +bool
> +tree_loop_interchange::valid_data_dependences (unsigned inner, unsigned
> outer)
> +{
> + struct data_dependence_relation *ddr;
> +
> + for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i)
> + {
> + /* Skip no-dependence case. */
> + if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
> + continue;
> +
> + for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j)
> + {
> + lambda_vector dist_vect = DDR_DIST_VECT (ddr, j);
> + unsigned level = dependence_level (dist_vect, loop_nest.length ());
> +
> + /* If there is no carried dependence. */
> + if (level == 0)
> + continue;
> +
> + level --;
> + /* Skip case which has '>' as the leftmost direction. */
> + if (!lambda_vector_lexico_pos (dist_vect, level))
> + return false;
Shouldn't happen as dist vectors are forced positive via DDR_REVERSED.
> + /* If dependence is carried by outer loop of the two loops for
> + interchange. */
> + if (level < outer)
> + continue;
> +
> + lambda_vector dir_vect = DDR_DIR_VECT (ddr, j);
> + /* If directions at both inner/outer levels are the same. */
> + if (dir_vect[inner] == dir_vect[outer])
> + continue;
> +
> + /* Be conservative, skip case if either direction at inner/outer
> + levels is not '=' or '<'. */
> + if (dir_vect[inner] != dir_equal
> + && dir_vect[inner] != dir_positive
> + && dir_vect[inner] != dir_independent
> + && dir_vect[inner] != dir_positive_or_equal)
> + return false;
> +
> + if (dir_vect[outer] != dir_equal
> + && dir_vect[outer] != dir_positive
> + && dir_vect[outer] != dir_independent
> + && dir_vect[outer] != dir_positive_or_equal)
> + return false;
Checking dir vectors doesn't make much sense in GCC: the elements are only
ever set to dir_positive, dir_negative or dir_equal, exactly when distance
is
> 0, < 0 or == 0. So checking dist vector is enough. (though sameness of
direction checks sameness of sign with zero). Incidentally:
> +tree_loop_interchange::update_data_deps (unsigned inner, unsigned outer)
> +{
> + struct data_dependence_relation *ddr;
> +
> + for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i)
> + {
> + /* Skip no-dependence case. */
> + if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
> + continue;
> +
> + for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j)
> + {
> + lambda_vector dir_vect = DDR_DIR_VECT (ddr, j);
> + std::swap (dir_vect[inner], dir_vect[outer]);
> + }
> + }
> +}
Here you swap only the direction but not the distance vector, which can't
be right. I suggest only using (and updating) the distance vector.
And then your usage and update of DR_ACCESS_FNs: there's quite some
complexity connected with that and I'm not sure how worthwhile it is.
You're basically using the ACCESS_FNs to determine profitability (and not
for validity, and that's good). But e.g. for pointer based accesses like
in fortran with explicit address arithmetic the relation between access-fn
step and stride and actual access stride isn't that easy (e.g. in your
should_interchange_loops function iloop_stride and oloop_stride will
always be one for pointer based accesses).
Conceptually what you should check is how the access address for each data
ref revolves for each loop, so why not doing this explicitely? What I
mean is: calculate a (complicated) chrec for the DR addresses for the
whole nest at the beginning. It should be in the form like (assume "+"
always):
{{{init, s1}_l1, s2}_l2, s3}_l3
(i.e. all steps should be invariants/constants, and only one non-chrec
init value). Addresses which aren't in this form you're already ignoring
right now, so you could continue doing that. (Or better said, all
non-constant steps you regard as being AVG_DIM_SIZE, which you still can
continue doing).
Now, with the above form you can form expressions for the difference
between addresses per iteration for each loop (i.e. the address stride per
loop); store these. Then, when interchanging loops you need to merely
swap these expressions like you have to with the distance vector, instead
of fiddling inside the DR_ACCESS_FNs themself. Much code would go away.
Testcases: given that we had to remove our old separate interchange pass
because it miscompiled stuff all over I'm missing some testcases where
interchange should _not_ happen for validity reasons, like my above
example with an reduction that can't be moved inside. Perhaps you can
think of some more.
I hope this is of some help to you :)
Ciao,
Michael.