https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628
--- Comment #13 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Ken Jin from comment #9) > I tried this out with CPython's interpreter that uses tail calls with the > patch at > https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/saved/master?ref_type=heads > applied. > > I get a roughly 10% speedup on the pystones benchmark: > > Without preserve_none > This machine benchmarks at 912722 pystones/second > > With preserve_none > This machine benchmarks at 1.02601e+06 pystones/second > > (Higher is better). > > I noticed it's still about 10% slower than clang-20 though. It's shuffling > registers a lot at calls to external functions compared to Clang. Please see > https://github.com/llvm/llvm-project/pull/88333. On GCC I get this with the > patch applied: > > a.out: file format elf64-x86-64 > > Disassembly of section .text: > > 0000000000000000 <entry>: > 0: 55 pushq %rbp > 1: 48 89 e5 movq %rsp, %rbp > 4: 48 89 fb movq %rdi, %rbx > 7: 49 89 f4 movq %rsi, %r12 > a: 49 89 d5 movq %rdx, %r13 > d: 49 89 ce movq %rcx, %r14 > 10: e8 00 00 00 00 callq 0x15 <entry+0x15> > 15: 4c 89 f1 movq %r14, %rcx > 18: 4c 89 ea movq %r13, %rdx > 1b: 4c 89 e6 movq %r12, %rsi > 1e: 48 89 df movq %rbx, %rdi > 21: 5d popq %rbp > 22: e9 00 00 00 00 jmp 0x27 <entry+0x27> Please try my latest patch.