Hi, > This one only works for known misalignment, otherwise it's overkill. > > OTOH if with some refactoring we can end up using a single cost model > that would be great. That is for the SAME_ALIGN_REFS we want to > choose the unknown misalignment with the maximum number of > SAME_ALIGN_REFS. And if we know the misalignment of a single > ref then we still may want to align a unknown misalign ref if that has > more SAME_ALIGN_REFS (I think we always choose the known-misalign > one currently).
[0/3] Attempt to unify the peeling cost model as follows: - Keep the treatment of known misalignments. - Save the load and store with the most frequent misalignment. - Compare their costs and get the hardware-preferred one via costs. - Choose the best peeling from the best peeling with known misalignment and the best with unknown misalignment according to the number of aligned data refs. - Calculate costs for leaving everything misaligned and compare with the best peeling so far. I also performed some refactoring that seemed necessary during writing but which is not strictly necessary anymore ([1/3] and [2/3]) yet imho simplifies understanding the code. The bulk of the changes is in [3/3]. Testsuite on i386 and s390x is clean. I guess some additional test cases won't hurt and I will add them later, however I didn't succeed defining a test cases with two datarefs with same but unknown misalignment. How can this be done? A thing I did not understand when going over the existing code: In vect_get_known_peeling_cost() we have /* If peeled iterations are known but number of scalar loop iterations are unknown, count a taken branch per peeled loop. */ retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken, NULL, 0, vect_prologue); retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken, NULL, 0, vect_epilogue); In all uses of the function, prologue_cost_vec is discarded afterwards, only the return value is used. Should the second statement read retval +=? This is only executed when the number of loop iterations is unknown. Currently we indeed count one taken branch, but why then execute record_stmt_cost twice or rather not discard the first retval? Regards Robin