https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111143

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #4 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Paul Eggert from comment #0)
> The "movl $1, %eax" immediately followed by "addq %rax, %rbx" is poorly
> scheduled; the resulting dependency makes the code run quite a bit slower
> than it should. Replacing it with "addq $1, %rbx" and readjusting the
> surrounding code accordingly, as is done in the attached file
> code-mcel-opt.s, causes the benchmark to run 38% faster on my laptop's Intel
> i5-1335U.

This is a mischaracterization. The modified loop has one uop less, because you
are replacing 'mov eax, 1; add rbx, rax' with 'add rbx, 1'.

To evaluate scheduling aspect, keep 'mov eax, 1' while changing 'add rbx, rax'
to 'add rbx, 1'.

There are two separate loop-carried data dependencies, both one cycle per
iteration (addition chains over r12 and rbx).

Reply via email to