Jeff Law wrote: > aarch64 is the first target that does not have any implicit probes in > the caller. Thus at prologue entry it must make conservative > assumptions about the offset of the most recent probed address relative > to the stack pointer.
No - like I mentioned before that's not correct nor acceptable as it would imply that ~70% of functions need a probe at entry. I did a quick run across SPEC and found the outgoing argument size is > 1024 in just 9 functions out of 147000! Those 9 were odd special cases due to auto generated code to interface between C and Fortran. This is extremely unlikely to occur anywhere else. So even assuming an unchecked caller, large outgoing arguments are simply not a realistic threat. Therefore even when using a tiny 4K probe size we can safely adjust SP by 3KB before needing an explicit probe - now only 0.6% of functions need a probe. If we choose a proper minimum probe distance, say 64KB, explicit probes are basically non-existent (just 35 functions, or ~0.02% of all functions are > 64KB). Clearly inserting probes can be the default as the impact on code quality is negligible. With respect to implementation it is relatively easy to decide in aarch64_layout_frame which frames need probes and where. I don't think keeping a running offset of the last probe/store is useful, it'll just lead to inefficiencies and bugs. The patch doesn't deal with the delayed stores due to shrinkwrapping for example. Inserting probes before the prolog would be easier, eg. sub tmp, sp, 65536 str xzr, [tmp, 1024] // allow up to 1KB of outgoing arguments in callee sub tmp, sp, 131072 str xzr, [tmp, 1024] ... normal prolog for frame size 128-192KB Wilco