https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123225

--- Comment #9 from Victor Do Nascimento <victorldn at gcc dot gnu.org> ---
> I wonder if for now (w/o the ability to elide the epilog, w/o the ability
> to use first-fault loads) we should restrict this to PGO when we have
> a more reliable expected iteration count to work with?  Though as we
> do not have a histogram of actual loop iterations an estimated count
> of 10 can result from a mix of 1 and 20 loop iterations ...
> 
> Plus eventually handling loops marked as force_vectorize (we do not
> yet have a #pragma users can use, but OMP SIMD marks loops this way).

Yes, I do think that the poor handling of both prologue and epilogue at present
severely hurt the usefulness of this approach. As for the prologue, AArch64
targets with SVE can considerably counter the performance hit by implementing
masking for alignment.  This, in particular, is something I am working on as a
follow up to this work and will be looking to submit once we are back in stage
1.

Regarding the proposed use of PGO to guide vectorization, I guess that the
easiest thing to do for now will be to generate some numbers and see whether
the numbers look sufficiently good using averages alone, without recourse to
histograms.

As for the `force_vectorize' pragma, it is something I had thought about as a
way of marking uncounted loops we know will run for long enough to warrant
vectorization. Between time constraints and uncertainty about whether it'd be
well-received, this was punted.  I'll be happy to add that to my list of things
to do, again, for submission once we are back in stage 1.

Reply via email to