https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503

Wilco <wdijkstr at arm dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wdijkstr at arm dot com

--- Comment #6 from Wilco <wdijkstr at arm dot com> ---
I ran the assembler examples on A57 hardware with identical input. The FMADD
code is ~20% faster irrespectively of the size of the input. This is not a
surprise given that the FMADD latency is lower than the FADD and FMUL latency.

The alignment of the loop or scheduling don't matter at all as the FMADD
latency dominates by far - with serious optimization this code could run 4-5
times as fast and would only be limited by memory bandwidth on datasets larger
than L2.

So this particular example shows issues in LLVM, not in GCC.

Reply via email to