https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628
--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Ken Jin from comment #9) > 0000000000000000 <entry>: > 0: 55 pushq %rbp > 1: 48 89 e5 movq %rsp, %rbp > 4: 48 89 fb movq %rdi, %rbx > 7: 49 89 f4 movq %rsi, %r12 > a: 49 89 d5 movq %rdx, %r13 > d: 49 89 ce movq %rcx, %r14 > 10: e8 00 00 00 00 callq 0x15 <entry+0x15> > 15: 4c 89 f1 movq %r14, %rcx > 18: 4c 89 ea movq %r13, %rdx > 1b: 4c 89 e6 movq %r12, %rsi > 1e: 48 89 df movq %rbx, %rdi > 21: 5d popq %rbp > 22: e9 00 00 00 00 jmp 0x27 <entry+0x27> Note I am not sure if the move are the cause of the slow down though. Because on most recent (like over 10 years old now) Intel and AMD processors moves are handled during renaming and don't take up an issue/exec slot. I am thinking there are other things going on.