https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91257

--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Rogério de Souza Moraes from comment #12)
> Hi Richard,
> 
> first, thank you for the great work improving the GCC performance.
> 
> The R&D team which I am working with provided two test cases, they show that
> it was possible to reduce the build time by taking out the block containing
> setjmp/longjmp to a separate routine, which is only called from the original
> routine.
> 
> Both attached files, example_base.c and example_routines.c, are generated in
> a very similar way, but in example_routines.c, all the 'try' macros are
> taken out to separate routines. 
> 
> The compilation times:
> example_base.c:
> v4.8.3 - 0m1.096s
> v6.3.0 - 0m16.017s
> v9.3.0 - 0m26.829s
> example_routines.c
> v4.8.3 - 0m0.955s
> v6.3.0 - 0m1.205s
> v9.3.0 - 0m1.617s
> 
> Is this approach ok to improve the build performance?

Yes, that avoids the complex CFG.

> Even if this approach is OK, there are still details unclear to us, and some
> might be not even known:
> 
> - Should we worry about inlining? Can we hint this to compiles, or should we
> make sure it's avoided (by using routine pointers, for example)?

In principle GCCs own heuristics should make sure it does not inline all
of the single-use routines but for extra safety I'd suggest to use

static void __attribute__((noinline))
routine_for_try_298(t__reg_s reg, int* v, int n0, int n1, int n2) {
    TRY_BEGIN {


> - Can we assume that routine call (with all low-level work like copying data
> on the stack etc.) is the only runtime performance price for this approach?

I think so, yes (make sure to declare the functions static as above so
the compiler can do IPA constant propagation, avoiding passing n0, n1, ..)

> - Is having many small routines instead of a few very large is universally
> good, or there are cases when it by itself can cause a problem?

You are trading a complex callgraph for a complex CFG (though in the
setjmp/longjmp case the CFG is artifically way more complex than the
callgraph variant), so in general you trade intra-FN compile-time for
inter-FN compile-time.  So yes, there could be similar issues in GCCs
IPA passes.

But while it is possible to short-cut all IPA optimization there are
select "transforms" on functions that do not scale well to arbitrary
large functions / complex CFGs.  A step further would decompose the
TU with the many small functions into multiple TUs (if you'd use LTO
for compiling then that's a no-op of course).

> We appreciate very much any feedback.
> 
> Best regards,
> --
> Rogerio

Reply via email to