https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123225

--- Comment #14 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Victor Do Nascimento from comment #13)
> > So rather than restricting to PGO we could just handle the cases above and
> > restrict uncounted loops to cases that don't require a forced epilogue.
> Forgive my ignorance here, but surely we are talking about 2 separate
> (though closely-related) problems...
> 
> If we can elide the epilogue, I understand we are definitely making the
> vectorized code cheaper to execute (and smaller, improving the resulting
> code-size), but surely we still need to make sure we get costing right, no?
> 
> No expensive epilogue will mean the loop becomes profitable faster, yes, but
> we still need to either:
> 
> 1. know whether we will execute enough iterations to reach that
> profitability threshold (which is where the PGO idea comes in) or

The profitability threshold is mainly a function of whether you amortized the
cost of the epilogue loop. PGO is a great idea. But barely anyone (outside of
people running benchmarks) uses PGO today so that's a severe limitation. 

> 2. ensure we have a conservative enough assumption about min iterations
> (e.g. going back to Richi's idea that the vectorized loop should be no more
> expensive than 2 scalar iterations) so that we always reject loops that will
> need too many iterations for profitability.
> 

Those numbers don't add up to me. The Adv. SIMD loop above has a scalar
iteration cost of 5 and the vector of 10. 
So the loop in your example will be profitable with the "no more than 2 scalar
iteration costs". Furthermore at least the aarch64 cost model works by
comparing the cost per scalar iterations including the VF

So by every metric you'd pass profitably unless a very high and random penalty
is applied. And it's not wrong. The problem is that you do extra work when you
exit. We don't feel this as much on AArch64 because eg  Neon has short vectors
and the majority of SVE implementations are 128 to 256 bits.

So personally I don't think a static number like that will work since we're
trying to work around an issue with codegen by fiddling with a random cost
multiplier.

Reply via email to