[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

wdijkstr at arm dot com Wed, 22 Oct 2014 05:10:50 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503


--- Comment #13 from Wilco <wdijkstr at arm dot com> ---
(In reply to Andrew Pinski from comment #11)
> (In reply to Wilco from comment #10)
> > The loops shown are not the correct inner loops for those options - with
> > -ffast-math they are vectorized. LLVM unrolls 2x but GCC doesn't. So the
> > question is why GCC doesn't unroll vectorized loops like LLVM?
> 
> Because unrolling is not enabled at -O3.  Try adding -funroll-loops.

Isn't it odd that GCC doesn't even do the most basic unrolling at its maximum
optimization setting? But it does do vectorization?

Note -funroll-loops is not sufficient either, you need
-fvariable-expansion-in-unroller as well for this particular loop which also
isn't enabled at -O3. Plus setting the associated param to 4 or 8.

So GCC is certainly capable of generating quality code for this example, it
just doesn't do so by default - unlike LLVM.

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

Reply via email to