Hi,

> Split-stack prologue on function entry is as follow (this goes before the
> usual function prologue):

>       mrs    x9, tpidr_el0
>       mov    x10, -<required stack allocation>

As Jiong already remarked, the nop won't work. Do we know the maximum adjustment
that the linker is allowed to make? If so, and we can limit the adjustment to 
16MB in
most cases, emitting 2 subtracts is best. Larger offset need mov/movk/sub but 
that
should be extremely rare.

>       nop/movk

>       add    x10, sp, x10
>       ldr    x9, [x9, 16]

Is there any need to detect underflow of x10 or is there a guarantee that 
stacks are
never allocated in the low 2GB (given the maximum adjustment is 2GB)? It's safe
to do a signed comparison.

>       cmp    x10, x9
>       b.cs    enough

Why save/restore x30 and the call x30+8 trick when we could pass the
continuation address and use a tailcall? That also avoids emitting extra unwind 
info.

>       stp    x30, [sp, -16]
>       bl     __morestack
>       ldp    x30, [sp], 16
>       ret

This part doesn't make any sense - both x28 and carry flag as an input, and 
spread
across the prolog - why???

> enough:
>       mov     x10, sp
        [prolog]
>       b.cs    continue
>       mov     x10, x28
continue:
        [rest of function]

Why not do this?

function:
        mrs    x9, tpidr_el0
        sub    x10, sp, N & 0xfff000
        sub    x10, x10, N & 0xfff
        ldr    x9, [x9, 16]
        adr     x12, main_fn_entry
        mov    x11, sp   [if function has stacked arguments]
        cmp    x10, x9
        b.ge    main_fn_entry
        b     __morestack
main_fn_entry: [x11 is argument pointer]
        [prolog]
        [rest of function]

In __morestack you need to save x8 as well (another argument register!) and x12 
(the 
continuation address). After returning from the call x8 doesn't need to be 
preserved.

There are several issues with unwinding in __morestack. x28 is not described as 
a callee-save
so will be corrupted if unwinding across a __morestack call. This won't unwind 
correctly after
the ldp as the unwinder will use the restored frame pointer to try to restore 
x29/x30:

+       ldp     x29, x30, [x28, STACKFRAME_BASE]
+       ldr     x28, [x28, STACKFRAME_BASE + 80]
+
+       .cfi_remember_state
+       .cfi_restore 30
+       .cfi_restore 29
+       .cfi_def_cfa 31, 0

This stores a random x30 value on the stack, what is the purpose of this? 
Nothing can unwind
to here:

+       # Start using new stack
+       stp     x29, x30, [x0, -16]!
+       mov     sp, x0

Also we no longer need split_stack_arg_pointer_used_p () or any code that uses 
it (functions
that don't have any arguments passed on the stack could omit the mov x11, sp).

Wilco

Reply via email to