[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

wdijkstr at arm dot com Fri, 10 Oct 2014 06:25:04 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503


Wilco <wdijkstr at arm dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wdijkstr at arm dot com

--- Comment #6 from Wilco <wdijkstr at arm dot com> ---
I ran the assembler examples on A57 hardware with identical input. The FMADD
code is ~20% faster irrespectively of the size of the input. This is not a
surprise given that the FMADD latency is lower than the FADD and FMUL latency.

The alignment of the loop or scheduling don't matter at all as the FMADD
latency dominates by far - with serious optimization this code could run 4-5
times as fast and would only be limited by memory bandwidth on datasets larger
than L2.

So this particular example shows issues in LLVM, not in GCC.

[Bug target/63503] [AArch64] A57 executes fused multiply-add poorly in some situations

Reply via email to