https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #13 from Wilco <wdijkstr at arm dot com> --- (In reply to Andrew Pinski from comment #11) > (In reply to Wilco from comment #10) > > The loops shown are not the correct inner loops for those options - with > > -ffast-math they are vectorized. LLVM unrolls 2x but GCC doesn't. So the > > question is why GCC doesn't unroll vectorized loops like LLVM? > > Because unrolling is not enabled at -O3. Try adding -funroll-loops. Isn't it odd that GCC doesn't even do the most basic unrolling at its maximum optimization setting? But it does do vectorization? Note -funroll-loops is not sufficient either, you need -fvariable-expansion-in-unroller as well for this particular loop which also isn't enabled at -O3. Plus setting the associated param to 4 or 8. So GCC is certainly capable of generating quality code for this example, it just doesn't do so by default - unlike LLVM.