https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116329
Bug ID: 116329
Summary: Arm M0+ doesn't do tail-call optimization
Product: gcc
Version: 13.3.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: terrygreeniaus at gmail dot com
Target Milestone: ---
Godbolt link: https://godbolt.org/z/9vMTzx4dq
Building with -mcpu=cortex-m0plus on gcc 13.3.1 shows that gcc doesn't perform
tail-call optimization:
#include <stdint.h>
uint32_t x;
void __attribute__((noinline)) foo()
{
x = 1;
}
void bar()
{
foo();
}
Disassembles as:
foo():
movs r2, #1
ldr r3, .L3
str r2, [r3]
bx lr
.L3:
.word .LANCHOR0
bar():
push {r4, lr}
bl foo()
pop {r4, pc}
x:
.space 4
Compiling with -mcpu=cortex-m4 does the right thing:
foo():
ldr r3, .L3
movs r2, #1
str r2, [r3]
bx lr
.L3:
.word .LANCHOR0
bar():
b foo()
x:
.space 4
I purposely made the code not just trivially call an extern function in case
there was an issue with M0+ not having wide enough instructions to just branch
anywhere; in this contrived example it only needs to branch back a little bit
so should have no problem with the direct branch.
I observed this with arm-none-eabi-gcc 13.3.1, but also experimenting in
Godbolt shows that it exists in ARM GCC trunk.