Hi, > Split-stack prologue on function entry is as follow (this goes before the > usual function prologue):
> mrs x9, tpidr_el0 > mov x10, -<required stack allocation> As Jiong already remarked, the nop won't work. Do we know the maximum adjustment that the linker is allowed to make? If so, and we can limit the adjustment to 16MB in most cases, emitting 2 subtracts is best. Larger offset need mov/movk/sub but that should be extremely rare. > nop/movk > add x10, sp, x10 > ldr x9, [x9, 16] Is there any need to detect underflow of x10 or is there a guarantee that stacks are never allocated in the low 2GB (given the maximum adjustment is 2GB)? It's safe to do a signed comparison. > cmp x10, x9 > b.cs enough Why save/restore x30 and the call x30+8 trick when we could pass the continuation address and use a tailcall? That also avoids emitting extra unwind info. > stp x30, [sp, -16] > bl __morestack > ldp x30, [sp], 16 > ret This part doesn't make any sense - both x28 and carry flag as an input, and spread across the prolog - why??? > enough: > mov x10, sp [prolog] > b.cs continue > mov x10, x28 continue: [rest of function] Why not do this? function: mrs x9, tpidr_el0 sub x10, sp, N & 0xfff000 sub x10, x10, N & 0xfff ldr x9, [x9, 16] adr x12, main_fn_entry mov x11, sp [if function has stacked arguments] cmp x10, x9 b.ge main_fn_entry b __morestack main_fn_entry: [x11 is argument pointer] [prolog] [rest of function] In __morestack you need to save x8 as well (another argument register!) and x12 (the continuation address). After returning from the call x8 doesn't need to be preserved. There are several issues with unwinding in __morestack. x28 is not described as a callee-save so will be corrupted if unwinding across a __morestack call. This won't unwind correctly after the ldp as the unwinder will use the restored frame pointer to try to restore x29/x30: + ldp x29, x30, [x28, STACKFRAME_BASE] + ldr x28, [x28, STACKFRAME_BASE + 80] + + .cfi_remember_state + .cfi_restore 30 + .cfi_restore 29 + .cfi_def_cfa 31, 0 This stores a random x30 value on the stack, what is the purpose of this? Nothing can unwind to here: + # Start using new stack + stp x29, x30, [x0, -16]! + mov sp, x0 Also we no longer need split_stack_arg_pointer_used_p () or any code that uses it (functions that don't have any arguments passed on the stack could omit the mov x11, sp). Wilco