On Mon, Jun 20, 2016 at 9:51 AM, Borislav Petkov <[email protected]> wrote:
> On Sat, Jun 18, 2016 at 04:56:17PM -0400, Brian Gerst wrote:
>> Instead of setting up a fake pt_regs context, put the kernel thread
>> function pointer and arg into the unused callee-restored registers
>> of struct fork_frame.
>>
>> Signed-off-by: Brian Gerst <[email protected]>
>
>> ---
>>  arch/x86/entry/entry_32.S    | 31 +++++++++++++++----------------
>>  arch/x86/entry/entry_64.S    | 35 ++++++++++++++++-------------------
>>  arch/x86/kernel/process_32.c | 14 ++++----------
>>  arch/x86/kernel/process_64.c | 10 +++-------
>>  4 files changed, 38 insertions(+), 52 deletions(-)
>
> ...
>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index 7574528..23e764c 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -404,37 +404,34 @@ END(__switch_to_asm)
>>   * A newly forked process directly context switches into this address.
>>   *
>>   * rax: prev task we switched from
>> + * rbx: kernel thread func
>> + * r12: kernel thread arg
>>   */
>>  ENTRY(ret_from_fork)
>>       movq    %rax, %rdi
>>       call    schedule_tail                   /* rdi: 'prev' task parameter 
>> */
>>
>> -     testb   $3, CS(%rsp)                    /* from kernel_thread? */
>> +     testq   %rbx, %rbx                      /* from kernel_thread? */
>>       jnz     1f
>>
>> -     /*
>> -      * We came from kernel_thread.  This code path is quite twisted, and
>> -      * someone should clean it up.
>> -      *
>> -      * copy_thread_tls stashes the function pointer in RBX and the
>> -      * parameter to be passed in RBP.  The called function is permitted
>> -      * to call do_execve and thereby jump to user mode.
>> -      */
>> -     movq    RBP(%rsp), %rdi
>> -     call    *RBX(%rsp)
>> -     movl    $0, RAX(%rsp)
>> -
>> -     /*
>> -      * Fall through as though we're exiting a syscall.  This makes a
>> -      * twisted sort of sense if we just called do_execve.
>> -      */
>> -
>> -1:
>> +2:
>>       movq    %rsp, %rdi
>>       call    syscall_return_slowpath /* returns with IRQs disabled */
>>       TRACE_IRQS_ON                   /* user mode is traced as IRQS on */
>>       SWAPGS
>>       jmp     restore_regs_and_iret
>> +
>> +1:
>> +     /* kernel thread */
>> +     movq    %r12, %rdi
>> +     call    *%rbx
>> +     /*
>> +      * A kernel thread is allowed to return here after successfully
>> +      * calling do_execve().  Exit to userspace to complete the execve()
>> +      * syscall.
>> +      */
>> +     movq    $0, RAX(%rsp)
>> +     jmp     2b
>>  END(ret_from_fork)
>
> Wouldn't it be simpler to reverse the JNZ after the TESTB above and thus
> have a single label and simpler code flow (pasting the whole thing here
> because a diff is unreadable):
>
> ENTRY(ret_from_fork)
>         movq    %rax, %rdi
>         call    schedule_tail                   /* rdi: 'prev' task parameter 
> */
>
>         testq   %rbx, %rbx                      /* from kernel_thread? */
>         jz      1f
>
>         /* kernel thread */
>         movq    %r12, %rdi
>         call    *%rbx
>         /*
>          * A kernel thread is allowed to return here after successfully
>          * calling do_execve().  Exit to userspace to complete the execve()
>          * syscall.
>          */
>         movq    $0, RAX(%rsp)
>
> 1:
>         movq    %rsp, %rdi
>         call    syscall_return_slowpath /* returns with IRQs disabled */
>         TRACE_IRQS_ON                   /* user mode is traced as IRQS on */
>         SWAPGS
>         jmp     restore_regs_and_iret
> END(ret_from_fork)
>
> It boots fine in my guest this way too.

The idea was to put the uncommon case (kernel thread) out of line for
performance reasons.

--
Brian Gerst

Reply via email to