Saieiei wrote: > What's the before/after performance difference? > > ``` > module m1 > integer, parameter :: iters = 10000000 > contains > subroutine callme(proc) > interface > subroutine proc > end > end interface > call proc > end > end > > module m2 > use m1 > private > public modproctest > integer :: n = 0 > contains > subroutine modproc > n = n + 1 > end > subroutine modproctest > do j = 1, iters > call callme(modproc) > end do > print *, ' modproc', n > end > end > > module m3 > use m1 > contains > subroutine innerproctest > integer :: m = 0 > do j = 1, iters > call callme(innerproc) > end do > print *, 'innerproc', m > contains > subroutine innerproc > m = m + 1 > end > end > end > > program main > use m1 > use m2 > use m3 > integer :: n = 0 > do j = 1, iters > call callme(mainproc) > end do > print *, ' mainproc', n > call modproctest > call innerproctest > contains > subroutine mainproc > n = n + 1 > end > end > ```
Your test as written doesn't exercise the trampoline path `integer :: m = 0` gives `m` implicit SAVE, making it static. The internal procedures access static variables directly, so no host-association tuple / trampoline is generated. Both paths produce identical code. Modified to use true stack variables (`integer :: m` then `m = 0` on a separate line). Results on x86-64 (AMD EPYC, 10M iterations): | Opt | Legacy (stack trampoline) | Safe (`-fsafe-trampoline`) | | | :--- | :--- | :--- | :--- | | `-O0` | 6.96s | 0.20s | **safe 35x faster** | | `-O1` | 0.072s | 0.052s | **safe 1.4x faster** | | `-O2` | 0.005s | 0.050s | **legacy 10x faster** | At `-O0` / `-O1`, safe wins because Init/Adjust are hoisted to function entry, the trampoline is set up once and the 10M iterations just call through an already-initialized pointer. The legacy path calls `llvm.init.trampoline` + `llvm.adjust.trampoline` every iteration. At `-O2`, LLVM sees through the trampoline intrinsics and inlines the call, while the opaque runtime calls block that. This gap could be narrowed with LLVM attributes that help the optimizer reason through runtime trampoline calls. https://github.com/llvm/llvm-project/pull/183108 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
