https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91515

Peter Cordes <peter at cordes dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |peter at cordes dot ca

--- Comment #1 from Peter Cordes <peter at cordes dot ca> ---
The real missed optimization is that GCC is returning its own incoming arg
instead of returning the copy of it that create() will return in RAX.

This is what blocks tailcall optimization; it doesn't "trust" the callee to
return what it's passing as RDI.

See https://stackoverflow.com/a/57597039/224132 for my analysis (the OP asked
the same thing on SO before reporting this, but forgot to link it in the bug
report.)

The RAX return value tends to rarely be used, but probably it should be; it's
less likely to have just been reloaded recently.

RAX is more likely to be ready sooner than R12 for out-of-order exec.  Either
reloaded earlier (still in the callee somewhere if it's complex and/or
non-leaf) or never spilled/reloaded.

So we're not even gaining a benefit from saving/restoring R12 to hold our
incoming RDI.  Thus it's not worth the extra cost (in code-size and
instructions executed), IMO.  Trust the callee to return the pointer in RAX.

Reply via email to