https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97707
--- Comment #3 from vincenzo Innocente <vincenzo.innocente at cern dot ch> --- the main point in using -mprefer-vector-width=256 is to avoid clock throttling in "mixed" workloads. In small benchmarks like this one avx512 is faster (even on an old Silver) even if trigger a slower clock. (and the test should be performed with the machine fully loaded). Still if I ask -mprefer-vector-width=256 I would like to see no 512-wide instructions to be used. A disturbing feature is also the difference between using int or long long as loop index.