https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628
--- Comment #21 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Ken Jin from comment #15) > I tested again this time with taskset, turbo boost off, on a quiet system, > with PGO. These are the results. They're quite good: > > # Indirect goto + LTO + PGO > This machine benchmarks at 576728 pystones/second > > # Tail calls, no preserve_none + LTO + PGO* > This machine benchmarks at 539522 pystones/second > > # Tail calls, preserve_none + LTO + PGO* > This machine benchmarks at 572234 pystones/second > > So roughly a 6-7% gain from preserve_none on the pystones benchmark over no > preserve_none. Thanks again H.J. for the patch. > > *PGO is disabled for tail calling functions in the bytecode interpreter, but > enabled for everything else, as it seems PGO slows down those functions. I > used the attributes `no_instrument_function,no_profile_instrument_function` > to turn it off for the bytecode functions. > > Something strange is going on with PGO for tail calls on my system. However, > I can't figure it out right now. > > Everything is benchmarked on this branch > https://github.com/Fidget-Spinner/cpython/pull/new/Fidget-Spinner:cpython: > tail-call-gcc-3 Hi Ken, my patch has been merged into GCC master branch. Can you give it a try?