OK, so about a week later than I wanted. Too many fires, not enough water. The V3 patch has expanded a bit...
1. For constant sized dynamic allocations we'll allocate/probe up to 4 STACK_CLASH_PROTECTION_PROBE_INTERVAL regions inline and unrolled. 2. For larger constant sized dynamic allocations we rotate the loop, saving a compare/jump. 3. blockage insns added to prevent scheduler reordering, particularly in the inline/unrolled loop case. 4. Generic code for dynamic handles case where target makes optimistic assumptions about probing state in its prologue (ie, aarch64). 5. PARAMs to control the assumed size of the guard and the probing interval. Both default to 4k. Note that the backends may not support all possible values for these PARAMs. a. The size of the guard helps determine how big of a local static frame can be allocated without probing on targets that have an implicit probe in the caller b. The interval determines how often we probe once we decide probing is required. c. Backends can override the default values. aarch64 for example overrides the guard size 6. More aarch64 improvements based on discussions with Wilco, Richard and Ramana. a. Support for a probing interval > 4k. b. Assume guard of 64k, with 1k for outgoing arglist. Thus frames less than 63k require no probing. c. Fix missed probe for outgoing arguments d. Add missing notes and barriers e. Some aarch64 specific testcases for issues identified by Wilco and some of my own f. Some simplifications based on invariants I was previously unaware of for aarch64 prologues 7. Scheduler honors the stack probing notes and avoids breaking memory dependencies when it encounters them 8. PPC port takes advantage of improved generic code for dynamic stack allocations 9 Many s390 improvements from IBM 10. Additional tests for the unrolled inline dynamic case, rotated loop case and use of a large guard value to avoid probing (x86 and ppc only) Jeff