https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84328
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Target| |x86_64-*-* i?86-*-* Priority|P3 |P2 Component|c++ |target Target Milestone|--- |6.5 Summary|[6 Regression] |[6/7/8 Regression] |-finline-small-functions |-finline-small-functions |and inline keyword lead to |and inline keyword lead to |slowdown since version 6 |slowdown since version 6 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- The loop is (fast) .L3: movzwl %bx, %eax movl %eax, %edx sall $8, %edx orl %eax, %edx movl %edx, %eax andl $16711935, %eax movl %eax, %edx sall $4, %edx orl %eax, %edx andl $252645135, %edx leal 0(,%rdx,4), %eax orl %edx, %eax andl $858993459, %eax movl %eax, %edx leal (%rax,%rax), %eax orl %edx, %eax andl $1431655765, %eax orl $74565, %eax xorl %eax, %ebx subq $1, %rcx jne .L3 vs. (slow) .L3: movl %ebx, %eax andl $1023, %eax movl %eax, %edx sall $16, %edx xorl %eax, %edx movl %edx, %eax andl $-16776961, %eax movl %eax, %edx sall $8, %edx xorl %eax, %edx andl $50393103, %edx movl %edx, %eax sall $4, %eax xorl %edx, %eax andl $51130563, %eax movl %eax, %edx leal 0(,%rax,4), %eax xorl %edx, %eax andl $153391689, %eax orl $74565, %eax xorl %eax, %ebx subq $1, %rcx jne .L3 I didn't try to confirm the actual slowdown. I think it has nothing to do with inlining but with something x86 specific.