https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123225

--- Comment #6 from Victor Do Nascimento <victorldn at gcc dot gnu.org> ---
Thanks for the feedback, both in terms of code examples and observations
regarding the prologue peeling expense.

Also, sorry for the slow turnaround time. After the holidays, I've been ramping
up on the code for the loop costing.

I figured the easiest way (though I've yet to convince myself it's the right
way) to tweak which uncounted loops we accept for vectorization is to replicate
what we do if (loop_cost_model (loop) == VECT_COST_MODEL_VERY_CHEAP, where we
check min_profitable_estimate against some constant, e.g. vect_vf_for_cost
(loop_vinfo).

Even using the vect_vf_for_cost (loop_vinfo) as for VECT_COST_MODEL_VERY_CHEAP
in the uncounted loop criterion allows us to recover 86% of the increase in
code-size for 523.xalancbmk_r and most of the performance degradation we
observe in AArch64 (though admittedly the performance loss is considerably
smaller for AArch64 than it is for x86_64).

I'll try other cut off values (Richi mentioned about vector loop being less
than 2x expensive as a single scalar iteration, while I had thought half of
vect_vf_for_cost) and report back, tough equally any feedback on my as of yet
rudimentary approach to the problem is most welcome.

Reply via email to