On Thu, Mar 04, 2021 at 11:05:57AM -0800, Andy Lutomirski wrote:
> For kernel threads, task_pt_regs is currently all zeros, a valid user state
> (if kernel_execve() has been called), or some combination thereof during
> execution of kernel_execve().  If a stack trace is printed, the unwinder
> might get confused and treat task_pt_regs as a kernel state.  Indeed,
> forcing a stack dump results in an attempt to display _kernel_ code bytes
> from a bogus address at the very top of kernel memory.
> 
> Fix the confusion by setting cs=3 so that user_mode(task_pt_regs(...)) ==
> true for kernel threads.
> 
> Also improve the error when failing to fetch code bytes to show which type
> of fetch failed.  This makes it much easier to understand what's happening.
> 
> Cc: Josh Poimboeuf <jpoim...@redhat.com>
> Signed-off-by: Andy Lutomirski <l...@kernel.org>
> ---
>  arch/x86/kernel/dumpstack.c |  4 ++--
>  arch/x86/kernel/process.c   | 13 +++++++++++++
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> index 55cf3c8325c6..9b7b69bb12e5 100644
> --- a/arch/x86/kernel/dumpstack.c
> +++ b/arch/x86/kernel/dumpstack.c
> @@ -128,8 +128,8 @@ void show_opcodes(struct pt_regs *regs, const char 
> *loglvl)
>               /* No access to the user space stack of other tasks. Ignore. */
>               break;
>       default:
> -             printk("%sCode: Unable to access opcode bytes at RIP 0x%lx.\n",
> -                    loglvl, prologue);
> +             printk("%sCode: Unable to access %s opcode bytes at RIP 
> 0x%lx.\n",
> +                    loglvl, user_mode(regs) ? "user" : "kernel", prologue);
>               break;
>       }
>  }
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 145a7ac0c19a..f6f16df04cb9 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -163,6 +163,19 @@ int copy_thread(unsigned long clone_flags, unsigned long 
> sp, unsigned long arg,
>       /* Kernel thread ? */
>       if (unlikely(p->flags & PF_KTHREAD)) {
>               memset(childregs, 0, sizeof(struct pt_regs));
> +
> +             /*
> +              * Even though there is no real user state here, these
> +              * are were user regs belong, and kernel_execve() will
                       ^^^^^
                       where?

Ira

> +              * overwrite them with user regs.  Put an obviously
> +              * invalid value that nonetheless causes user_mode(regs)
> +              * to return true in CS.
> +              *
> +              * This also prevents the unwinder from thinking that there
> +              * is invalid kernel state at the top of the stack.
> +              */
> +             childregs->cs = 3;
> +
>               kthread_frame_init(frame, sp, arg);
>               return 0;
>       }
> -- 
> 2.29.2
> 

Reply via email to