Re: [PATCH v2] aarch64: Add split-stack initial support

Adhemerval Zanella Mon, 14 Nov 2016 05:31:59 -0800

Ping.


On 07/11/2016 16:59, Adhemerval Zanella wrote:
> 
> 
> On 14/10/2016 15:59, Wilco Dijkstra wrote:
>> Hi,
>>
> 
> Thanks for the thoughtful review and sorry for late response. 
> 
>>> Split-stack prologue on function entry is as follow (this goes before the
>>> usual function prologue):
>>
>>>     mrs    x9, tpidr_el0
>>>     mov    x10, -<required stack allocation>
>>
>> As Jiong already remarked, the nop won't work. Do we know the maximum 
>> adjustment
>> that the linker is allowed to make? If so, and we can limit the adjustment 
>> to 16MB in
>> most cases, emitting 2 subtracts is best. Larger offset need mov/movk/sub 
>> but that
>> should be extremely rare.
> 
> There is no limit afaik on gold split stack allocation handling,
> and I think one could be added for each backend (in the method
> override require to implement it).
> 
> In fact it is not really required to tie the nop generation with the
> instruction generated by 'aarch64_internal_mov_immediate', it is
> just a matter to simplify linker code.  
> 
> And although 16MB should be rare, nilptr2.go tests allocates 134217824
> so this test fails with this low stack limit.  I am not sure how well
> is the stack usage on 'go', but I think we should at least support
> current testcase scenario.  So for current iteration I kept my
> current approach, but I am open to suggestions.
> 
> 
>>
>>>     nop/movk
>>
>>>     add    x10, sp, x10
>>>     ldr    x9, [x9, 16]
>>
>> Is there any need to detect underflow of x10 or is there a guarantee that 
>> stacks are
>> never allocated in the low 2GB (given the maximum adjustment is 2GB)? It's 
>> safe
>> to do a signed comparison.
> 
> I do not think so, at least none of current backend that implements
> split stack do so.
> 
>>
>>>     cmp    x10, x9
>>>     b.cs    enough
>>
>> Why save/restore x30 and the call x30+8 trick when we could pass the
>> continuation address and use a tailcall? That also avoids emitting extra 
>> unwind info.
>>
>>>     stp    x30, [sp, -16]
>>>     bl     __morestack
>>>     ldp    x30, [sp], 16
>>>     ret
>>
>> This part doesn't make any sense - both x28 and carry flag as an input, and 
>> spread
>> across the prolog - why???
>>
>>> enough:
>>>     mov     x10, sp
>>      [prolog]
>>>     b.cs    continue
>>>     mov     x10, x28
>> continue:
>>      [rest of function]
>>
>> Why not do this?
>>
>> function:
>>      mrs    x9, tpidr_el0
>>      sub    x10, sp, N & 0xfff000
>>      sub    x10, x10, N & 0xfff
>>      ldr    x9, [x9, 16]
>>      adr     x12, main_fn_entry
>>      mov    x11, sp   [if function has stacked arguments]
>>      cmp    x10, x9
>>      b.ge    main_fn_entry
>>      b     __morestack
>> main_fn_entry: [x11 is argument pointer]
>>      [prolog]
>>      [rest of function]
>>
>> In __morestack you need to save x8 as well (another argument register!) and 
>> x12 (the 
>> continuation address). After returning from the call x8 doesn't need to be 
>> preserved.
> 
> Indeed this strategy is way better and I adjusted the code follow it.
> The only change is I am using a:
> 
>       [...]
>       cmp     x9, x10
>       b.lt    main_fn_entr
>       b       __morestack.
>       [...]
> 
> So I can issue a 'cmp <continuation address>, 0' on __morestack to indicate
> the function was called.
> 
>>
>> There are several issues with unwinding in __morestack. x28 is not described 
>> as a callee-save
>> so will be corrupted if unwinding across a __morestack call. This won't 
>> unwind correctly after
>> the ldp as the unwinder will use the restored frame pointer to try to 
>> restore x29/x30:
>>
>> +    ldp     x29, x30, [x28, STACKFRAME_BASE]
>> +    ldr     x28, [x28, STACKFRAME_BASE + 80]
>> +
>> +    .cfi_remember_state
>> +    .cfi_restore 30
>> +    .cfi_restore 29
>> +    .cfi_def_cfa 31, 0
> 
> Indeed, it misses x28 save/restore. I think I have added the missing bits, 
> but I
> must confess that I am not well versed in CFI directives.  I will appreciate 
> if 
> you could help me on this new version.
> 
>>
>> This stores a random x30 value on the stack, what is the purpose of this? 
>> Nothing can unwind
>> to here:
>>
>> +    # Start using new stack
>> +    stp     x29, x30, [x0, -16]!
>> +    mov     sp, x0
>>
>> Also we no longer need split_stack_arg_pointer_used_p () or any code that 
>> uses it (functions
>> that don't have any arguments passed on the stack could omit the mov x11, 
>> sp).
> 
> Right, we new strategy you proposed to do a branch this is indeed not
> really required.  I remove it from on this new patch.
> 
>>
>> Wilco
>>

Re: [PATCH v2] aarch64: Add split-stack initial support

Reply via email to