https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107
--- Comment #12 from N Schaeffer <nathanael.schaeffer at gmail dot com> --- I found the "offending" option, and it seems to be indeed a cost-model problem as Andrew Pinski said: good code is generated by: gcc -O2 -ftree-vectorize -march=skylake (since gcc 6.1) gcc -O1 -ftree-vectorize -march=skylake (since gcc 8.1) gcc -O3 -fvect-cost-model=very-cheap -march=skylake (with gcc 13.1+) bad code is generated otherwise, and in particular: gcc -O2 -march=skylake (does not vectorize) gcc -O3 -march=skylake (bad vectorization with so many permutations)