https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90249
Bug ID: 90249 Summary: [9 regression] Code size regression on thumb2 due to sub-optimal register allocation. Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: ra Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rearnsha at gcc dot gnu.org CC: ramana.radhakrishnan at arm dot com, vmakarov at redhat dot com, wdijkstr at arm dot com Target Milestone: --- Target: arm Created attachment 46244 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46244&action=edit testcase GCC 9 has regressed on code size due to some sub-optimal register allocation. For this example, the only difference in the output is that the assignments for r7 and r8 have been switched, but the result is significant growth in code size since r8 requires predominantly 32-bit instructions to be used while r7 requires predominantly 16-bit instructions. cc1 -fpreprocessed binding2.i -quiet -dumpbase binding2.i -mthumb -mcpu=cortex-a8 -march=armv7-a -auxbase-strip binding.o -Os -w -version -fno-short-enums -fgnu89-inline -o binding2.s In gcc-8 the output was DefineConnectorBinding: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 push {r0, r1, r2, r3, r4, r5, r6, r7, r8, lr} mov r4, r1 mov r8, r0 mov r1, r2 mov r0, r4 mov r5, r2 mov r7, r3 bl LookupBinding mov r6, r0 cbz r0, .L2 ldr r7, .L5 mov r1, r4 ldr r0, [r7] // 16-bit instruction bl GetAtomString mov r1, r5 mov r4, r0 ldr r0, [r7] // 16-bit instruction bl GetAtomString ldrh r1, [r6, #8] mov r5, r0 ldr r0, [r7] // 16-bit instruction bl GetAtomString ldrh r3, [r6, #10] ldr r2, .L5+4 movs r1, #103 str r5, [sp] strd r0, r3, [sp, #4] mov r3, r4 mov r0, r8 bl SemanticError add sp, sp, #16 @ sp needed pop {r4, r5, r6, r7, r8, pc} .L2: mov r3, r7 mov r2, r5 mov r1, r4 mov r0, r8 bl NewConnectorBindingTree add sp, sp, #16 @ sp needed pop {r4, r5, r6, r7, r8, lr} b AddBinding In gcc-9 we get DefineConnectorBinding: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 push {r0, r1, r2, r3, r4, r5, r6, r7, r8, lr} mov r4, r1 mov r7, r0 mov r1, r2 mov r0, r4 mov r5, r2 mov r8, r3 bl LookupBinding mov r6, r0 cbz r0, .L2 ldr r8, .L5+4 mov r1, r4 ldr r0, [r8] // 32-bit instruction bl GetAtomString mov r1, r5 mov r4, r0 ldr r0, [r8] // 32-bit instruction bl GetAtomString ldrh r1, [r6, #8] mov r5, r0 ldr r0, [r8] // 32-bit instruction bl GetAtomString ldrh r3, [r6, #10] ldr r2, .L5 movs r1, #103 str r5, [sp] strd r0, r3, [sp, #4] mov r3, r4 mov r0, r7 bl SemanticError add sp, sp, #16 @ sp needed pop {r4, r5, r6, r7, r8, pc} .L2: mov r3, r8 mov r2, r5 mov r1, r4 mov r0, r7 bl NewConnectorBindingTree add sp, sp, #16 @ sp needed pop {r4, r5, r6, r7, r8, lr} b AddBinding R8 is used more often than R7, so it seems odd that it is preferred over the latter.