https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111143
Alexander Monakov <amonakov at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amonakov at gcc dot gnu.org --- Comment #4 from Alexander Monakov <amonakov at gcc dot gnu.org> --- (In reply to Paul Eggert from comment #0) > The "movl $1, %eax" immediately followed by "addq %rax, %rbx" is poorly > scheduled; the resulting dependency makes the code run quite a bit slower > than it should. Replacing it with "addq $1, %rbx" and readjusting the > surrounding code accordingly, as is done in the attached file > code-mcel-opt.s, causes the benchmark to run 38% faster on my laptop's Intel > i5-1335U. This is a mischaracterization. The modified loop has one uop less, because you are replacing 'mov eax, 1; add rbx, rax' with 'add rbx, 1'. To evaluate scheduling aspect, keep 'mov eax, 1' while changing 'add rbx, rax' to 'add rbx, 1'. There are two separate loop-carried data dependencies, both one cycle per iteration (addition chains over r12 and rbx).