https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628

--- Comment #13 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Ken Jin from comment #9)
> I tried this out with CPython's interpreter that uses tail calls with the
> patch at
> https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/saved/master?ref_type=heads
> applied.
> 
> I get a roughly 10% speedup on the pystones benchmark:
> 
> Without preserve_none
> This machine benchmarks at 912722 pystones/second
> 
> With preserve_none
> This machine benchmarks at 1.02601e+06 pystones/second
> 
> (Higher is better).
> 
> I noticed it's still about 10% slower than clang-20 though. It's shuffling
> registers a lot at calls to external functions compared to Clang. Please see
> https://github.com/llvm/llvm-project/pull/88333. On GCC I get this with the
> patch applied:
> 
> a.out:        file format elf64-x86-64
> 
> Disassembly of section .text:
> 
> 0000000000000000 <entry>:
>        0: 55                                  pushq   %rbp
>        1: 48 89 e5                            movq    %rsp, %rbp
>        4: 48 89 fb                            movq    %rdi, %rbx
>        7: 49 89 f4                            movq    %rsi, %r12
>        a: 49 89 d5                            movq    %rdx, %r13
>        d: 49 89 ce                            movq    %rcx, %r14
>       10: e8 00 00 00 00                      callq   0x15 <entry+0x15>
>       15: 4c 89 f1                            movq    %r14, %rcx
>       18: 4c 89 ea                            movq    %r13, %rdx
>       1b: 4c 89 e6                            movq    %r12, %rsi
>       1e: 48 89 df                            movq    %rbx, %rdi
>       21: 5d                                  popq    %rbp
>       22: e9 00 00 00 00                      jmp     0x27 <entry+0x27>

Please try my latest patch.

Reply via email to