Hi,

> This one only works for known misalignment, otherwise it's overkill.
>
> OTOH if with some refactoring we can end up using a single cost model
> that would be great.  That is for the SAME_ALIGN_REFS we want to
> choose the unknown misalignment with the maximum number of
> SAME_ALIGN_REFS.  And if we know the misalignment of a single
> ref then we still may want to align a unknown misalign ref if that has
> more SAME_ALIGN_REFS (I think we always choose the known-misalign
> one currently).

[0/3]
Attempt to unify the peeling cost model as follows:

 - Keep the treatment of known misalignments.

 - Save the load and store with the most frequent misalignment.
  - Compare their costs and get the hardware-preferred one via costs.

 - Choose the best peeling from the best peeling with known
   misalignment and the best with unknown misalignment according to
   the number of aligned data refs.

 - Calculate costs for leaving everything misaligned and compare with
   the best peeling so far.

I also performed some refactoring that seemed necessary during writing
but which is not strictly necessary anymore ([1/3] and [2/3]) yet imho
simplifies understanding the code.  The bulk of the changes is in [3/3].

Testsuite on i386 and s390x is clean.  I guess some additional test
cases won't hurt and I will add them later, however I didn't succeed
defining a test cases with two datarefs with same but unknown
misalignment.  How can this be done?


A thing I did not understand when going over the existing code: In
vect_get_known_peeling_cost() we have

/* If peeled iterations are known but number of scalar loop
         iterations are unknown, count a taken branch per peeled loop.  */

retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken,
                                 NULL, 0, vect_prologue);
retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken,
                                 NULL, 0, vect_epilogue);

In all uses of the function, prologue_cost_vec is discarded afterwards,
only the return value is used.  Should the second statement read retval
+=?  This is only executed when the number of loop iterations is
unknown.  Currently we indeed count one taken branch, but why then
execute record_stmt_cost twice or rather not discard the first retval?

Regards
 Robin

Reply via email to