On Tue, Sep 08, 2020 at 10:55:18AM -0700, Tony Luck wrote:
> From: Youquan Song <[email protected]>
> 
> Existing kernel code can only recover from a machine check on code that
> tagged in the exception table with a fault handling recovery path.

"is tagged"

> New field in the task structure mce_vaddr is initialized to the
> user virtual address of the fault. This is so that kill_me_maybe()
> can provide that information to the user SIGBUS handler.
> 
> Add code to recover from a machine check while copying data from user
> space to the kernel. Action for this case is the same as if the user
> touched the poison directly; unmap the page and send a SIGBUS to the task.
> 
> Signed-off-by: Youquan Song <[email protected]>
> Signed-off-by: Tony Luck <[email protected]>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 51 ++++++++++++++++++++++++++++++++++
>  include/linux/sched.h          |  1 +
>  2 files changed, 52 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5512318a07ae..2a3c42329c3f 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -53,6 +53,8 @@
>  #include <asm/mce.h>
>  #include <asm/msr.h>
>  #include <asm/reboot.h>
> +#include <asm/insn.h>
> +#include <asm/insn-eval.h>
>  
>  #include "internal.h"
>  
> @@ -1197,6 +1199,32 @@ static void kill_me_maybe(struct callback_head *cb)
>       kill_me_now(cb);
>  }
>  
> +/*
> + * Decode a kernel instruction that faulted while reading from a user
> + * address and return the linear address that was being read.
> + */
> +static void __user *get_virtual_address(struct pt_regs *regs)
> +{
> +     u8 insn_buf[MAX_INSN_SIZE];
> +     struct insn insn;
> +
> +     if (copy_from_kernel_nofault(insn_buf, (void *)regs->ip, MAX_INSN_SIZE))
> +             return (void __user *)~0ul;

You're initializing ->mce_vaddr to NULL below but you're returning ~0
here. You should return NULL here too. If it is NULL, this check from
your next patch will pass:

        if (p->mce_vaddr != (void __user *)~0ul) {

which would be the wrong thing to do so you need to think about a single
invalid vaddr value and stick with it.

> +     kernel_insn_init(&insn, insn_buf, MAX_INSN_SIZE);
> +     insn_get_length(&insn);
> +     insn_get_modrm(&insn);
> +     insn_get_sib(&insn);

AFAICT, you need the opcode only so why do all those?

I think you simply need to do:

        insn_get_opcode()

and then check opcode->got because otherwise you might be looking at
garbage below.

> +
> +     /*
> +      * For MOVS[BWLQ] the source address is in %rsi

Pls end your sentences with a fullstop.

> +      */
> +     if (insn.opcode.value == 0xa4 || insn.opcode.value == 0xa5)
> +             return (void __user *)regs->si;

How do you know just by looking at the opcodes, that the source operand
in rSI is __user memory?

I see is_copy_from_user() in your next patch so I guess I'll verify that
there...

> +     else
> +             return insn_get_addr_ref(&insn, regs);
> +}
> +
>  /*
>   * The actual machine check handler. This only handles real
>   * exceptions when something got corrupted coming in through int 18.
> @@ -1342,6 +1370,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
>               /* If this triggers there is no way to recover. Die hard. */
>               BUG_ON(!on_thread_stack() || !user_mode(regs));
>  
> +             current->mce_vaddr = NULL;
>               current->mce_addr = m.addr;
>               current->mce_ripv = !!(m.mcgstatus & MCG_STATUS_RIPV);
>               current->mce_whole_page = whole_page(&m);
> @@ -1350,6 +1379,13 @@ noinstr void do_machine_check(struct pt_regs *regs)
>                       current->mce_kill_me.func = kill_me_now;
>               task_work_add(current, &current->mce_kill_me, true);
>       } else {
> +             /*
> +              * Before fixing the exception IP, find the user address
> +              * in the MCE_IN_KERNEL_COPYIN case
                                                   ^
                                                   |-- Fullstop

> +              */
> +             if (m.kflags & MCE_IN_KERNEL_COPYIN)
> +                     current->mce_vaddr = get_virtual_address(regs);
> +
>               /*
>                * Handle an MCE which has happened in kernel space but from
>                * which the kernel can recover: ex_has_fault_handler() has
> @@ -1363,6 +1399,21 @@ noinstr void do_machine_check(struct pt_regs *regs)
>                       if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
>                               mce_panic("Failed kernel mode recovery", &m, 
> msg);
>               }
> +
> +             /*
> +              * MCE on user data while copying to kernel. Action here is
> +              * very similar to the user hitting the poison themself.
> +              * Poison page will be unmapped and signal sent to process.
> +              */
> +             if (m.kflags & MCE_IN_KERNEL_COPYIN) {
> +                     current->mce_addr = m.addr;
> +                     current->mce_ripv = !!(m.mcgstatus & MCG_STATUS_RIPV);
> +                     current->mce_whole_page = whole_page(&m);
> +                     current->mce_kill_me.func = kill_me_maybe;
> +                     if (kill_it)
> +                             current->mce_kill_me.func = kill_me_now;
> +                     task_work_add(current, &current->mce_kill_me, true);

This hunk is mostly copied from the in-user case above. How about a
"goto recover;" label instead of the duplication?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Reply via email to