https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122234

--- Comment #2 from Manuel López-Ibáñez <manu at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> We do not implement this kind of prologue peeling on GIMPLE (only full loop
> peeling).  I'm also not sure if doing this would be profitable on modern
> uarchs.

AVX should be able to subtract 4 doubles at a time with one instruction and
multiply the 4 differences with 2 instructions. 

AVX2 should be able to do 8 doubles at a time.

Is that slower than the scalar loop?

Reply via email to