https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91257
--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Rogério de Souza Moraes from comment #12) > Hi Richard, > > first, thank you for the great work improving the GCC performance. > > The R&D team which I am working with provided two test cases, they show that > it was possible to reduce the build time by taking out the block containing > setjmp/longjmp to a separate routine, which is only called from the original > routine. > > Both attached files, example_base.c and example_routines.c, are generated in > a very similar way, but in example_routines.c, all the 'try' macros are > taken out to separate routines. > > The compilation times: > example_base.c: > v4.8.3 - 0m1.096s > v6.3.0 - 0m16.017s > v9.3.0 - 0m26.829s > example_routines.c > v4.8.3 - 0m0.955s > v6.3.0 - 0m1.205s > v9.3.0 - 0m1.617s > > Is this approach ok to improve the build performance? Yes, that avoids the complex CFG. > Even if this approach is OK, there are still details unclear to us, and some > might be not even known: > > - Should we worry about inlining? Can we hint this to compiles, or should we > make sure it's avoided (by using routine pointers, for example)? In principle GCCs own heuristics should make sure it does not inline all of the single-use routines but for extra safety I'd suggest to use static void __attribute__((noinline)) routine_for_try_298(t__reg_s reg, int* v, int n0, int n1, int n2) { TRY_BEGIN { > - Can we assume that routine call (with all low-level work like copying data > on the stack etc.) is the only runtime performance price for this approach? I think so, yes (make sure to declare the functions static as above so the compiler can do IPA constant propagation, avoiding passing n0, n1, ..) > - Is having many small routines instead of a few very large is universally > good, or there are cases when it by itself can cause a problem? You are trading a complex callgraph for a complex CFG (though in the setjmp/longjmp case the CFG is artifically way more complex than the callgraph variant), so in general you trade intra-FN compile-time for inter-FN compile-time. So yes, there could be similar issues in GCCs IPA passes. But while it is possible to short-cut all IPA optimization there are select "transforms" on functions that do not scale well to arbitrary large functions / complex CFGs. A step further would decompose the TU with the many small functions into multiple TUs (if you'd use LTO for compiling then that's a no-op of course). > We appreciate very much any feedback. > > Best regards, > -- > Rogerio