https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837

--- Comment #20 from Lukas Grätz <lukas.gra...@tu-darmstadt.de> ---
(In reply to Petr Skocik from comment #19)
> IMO(In reply to Xi Ruoyao from comment #16)
>  
> > In practice most _Noreturn functions are abort, exit, ..., i.e. they are
> > only executed one time so optimizing against a cold path does not help much.
> > I don't think it's a good idea to encourage people to construct some fancy
> > code by a recursive _Noreturn function (why not just use a loop?!)  And if
> > you must write such fancy code anyway IMO musttail attribute (PR83324) will
> > be a better solution.
> 
> There's also longjmp, which may not be all that super cold and may be
> executed multiple times. And while yeah, nobody will notice a single call vs
> jmp time save against a process spawn/exit, for a longjmp wrapper, it'll
> make it a few % faster (as would utilizing _Noreturn attributes for better
> register allocation: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114097,
> which would also save a bit of codesize too). Taillcalls can also save a bit
> of codesize if the target is near.


Just to emphasize, tail call optimization is not just for speed. It is
essential to avoid waste of stack space. Especially, to avoid potential stack
overflows, it should _not_ be necessary to replace all recursions with loops,
as Xi Ruoyao suggests. Ah, and I also think that recursions in C is not fancy
(anymore), since everyone expects the compiler to do sibcall or similar
optimizations. Noreturn functions are the exception for that. So it would be
consequent indeed to do sibcall optimization for noreturn functions, too!

Personally, I would be satisfied with the new attribute musttail to enforces
tail calls whenever necessary (given that this will be available for C, not C++
only). But speed-wise, musttail might not have the desired effect. It is meant
for preserving stack space.

---

Following Petr Skocik, I quick-tested on my computer:

===== longjmp_wrapper.c =====================
#include <setjmp.h>

__attribute__((noreturn))
void longjmp_wrapper(jmp_buf env, int val) {
    longjmp(env, val);
}

===== longjmp_main.c ========================
#include <setjmp.h>
#include <limits.h>

__attribute__((noreturn))
void longjmp_wrapper(jmp_buf env, int val);

int main(void) {
    jmp_buf env;
    for (int i = 0; i < INT_MAX; i++) {
        if (setjmp(env) == 0) {
            longjmp_wrapper(env, 1);
        }
    }
}
=============================================

After compiling with

$ gcc -O3 -m32 -c -S longjmp_wrapper.c -o longjmp_wrapper.S

I copied and manually modified the generated longjmp_wrapper.S as follows:

9,15c9
<       subl    $20, %esp
<       .cfi_def_cfa_offset 24
<       pushl   28(%esp)
<       .cfi_def_cfa_offset 28
<       pushl   28(%esp)
<       .cfi_def_cfa_offset 32
<       call    longjmp
---
>       jmp     longjmp


Then I compiled both versions with longjmp_main.c, again with -m32. Measured
with "time", the sibcall and unmodified version took around 23.5 sec and 24.5
sec on my computer. So around 4 % improvement for 32 bit x86. For 64 bit x86,
both took around 18 secs without noticeable speed difference (perhaps because
both arguments are passed in registers instead of stack by 64 bit calling
conventions).

Reply via email to