On Fri, May 05, 2017 at 11:17:19AM -0700, Ricardo Neri wrote:
> The feature User-Mode Instruction Prevention present in recent Intel
> processor prevents a group of instructions from being executed with
> CPL > 0. Otherwise, a general protection fault is issued.

This is one of the best opening paragraphs of a commit message I've
read this year! This is how you open: short, succinct, to the point, no
marketing bullshit. Good!

> Rather than relaying this fault to the user space (in the form of a SIGSEGV
> signal), the instructions protected by UMIP can be emulated to provide
> dummy results. This allows to conserve the current kernel behavior and not
> reveal the system resources that UMIP intends to protect (the global
> descriptor and interrupt descriptor tables, the segment selectors of the
> local descriptor table and the task state and the machine status word).
> 
> This emulation is needed because certain applications (e.g., WineHQ and
> DOSEMU2) rely on this subset of instructions to function.
> 
> The instructions protected by UMIP can be split in two groups. Those who

s/who/which/

> return a kernel memory address (sgdt and sidt) and those who return a

ditto.

> value (sldt, str and smsw).
>
> For the instructions that return a kernel memory address, applications
> such as WineHQ rely on the result being located in the kernel memory space.
> The result is emulated as a hard-coded value that, lies close to the top
> of the kernel memory. The limit for the GDT and the IDT are set to zero.

Nice.

> Given that sldt and str are not used in common in programs supported by

You wanna say "in common programs" here? Or "not commonly used in programs" ?

> WineHQ and DOSEMU2, they are not emulated.
> 
> The instruction smsw is emulated to return the value that the register CR0
> has at boot time as set in the head_32.
> 
> Care is taken to appropriately emulate the results when segmentation is
> used. This is, rather than relying on USER_DS and USER_CS, the function

        "That is,... "

> insn_get_addr_ref() inspects the segment descriptor pointed by the
> registers in pt_regs. This ensures that we correctly obtain the segment
> base address and the address and operand sizes even if the user space
> application uses local descriptor table.

Btw, I could very well use all that nice explanation in umip.c too so
that the high-level behavior is documented.

> Cc: Andy Lutomirski <l...@kernel.org>
> Cc: Andrew Morton <a...@linux-foundation.org>
> Cc: H. Peter Anvin <h...@zytor.com>
> Cc: Borislav Petkov <b...@suse.de>
> Cc: Brian Gerst <brge...@gmail.com>
> Cc: Chen Yucong <sla...@gmail.com>
> Cc: Chris Metcalf <cmetc...@mellanox.com>
> Cc: Dave Hansen <dave.han...@linux.intel.com>
> Cc: Fenghua Yu <fenghua...@intel.com>
> Cc: Huang Rui <ray.hu...@amd.com>
> Cc: Jiri Slaby <jsl...@suse.cz>
> Cc: Jonathan Corbet <cor...@lwn.net>
> Cc: Michael S. Tsirkin <m...@redhat.com>
> Cc: Paul Gortmaker <paul.gortma...@windriver.com>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Ravi V. Shankar <ravi.v.shan...@intel.com>
> Cc: Shuah Khan <sh...@kernel.org>
> Cc: Vlastimil Babka <vba...@suse.cz>
> Cc: Tony Luck <tony.l...@intel.com>
> Cc: Paolo Bonzini <pbonz...@redhat.com>
> Cc: Liang Z. Li <liang.z...@intel.com>
> Cc: Alexandre Julliard <julli...@winehq.org>
> Cc: Stas Sergeev <s...@list.ru>
> Cc: x...@kernel.org
> Cc: linux-msdos@vger.kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calde...@linux.intel.com>
> ---
>  arch/x86/include/asm/umip.h |  15 +++
>  arch/x86/kernel/Makefile    |   1 +
>  arch/x86/kernel/umip.c      | 245 
> ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 261 insertions(+)
>  create mode 100644 arch/x86/include/asm/umip.h
>  create mode 100644 arch/x86/kernel/umip.c
> 
> diff --git a/arch/x86/include/asm/umip.h b/arch/x86/include/asm/umip.h
> new file mode 100644
> index 0000000..077b236
> --- /dev/null
> +++ b/arch/x86/include/asm/umip.h
> @@ -0,0 +1,15 @@
> +#ifndef _ASM_X86_UMIP_H
> +#define _ASM_X86_UMIP_H
> +
> +#include <linux/types.h>
> +#include <asm/ptrace.h>
> +
> +#ifdef CONFIG_X86_INTEL_UMIP
> +bool fixup_umip_exception(struct pt_regs *regs);
> +#else
> +static inline bool fixup_umip_exception(struct pt_regs *regs)
> +{
> +     return false;
> +}

Let's save some header lines:

static inline bool fixup_umip_exception(struct pt_regs *regs)   { return false; 
}

those trunks take too much space as it is.

> +#endif  /* CONFIG_X86_INTEL_UMIP */
> +#endif  /* _ASM_X86_UMIP_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 4b99423..cc1b7cc 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -123,6 +123,7 @@ obj-$(CONFIG_EFI)                 += sysfb_efi.o
>  obj-$(CONFIG_PERF_EVENTS)            += perf_regs.o
>  obj-$(CONFIG_TRACING)                        += tracepoint.o
>  obj-$(CONFIG_SCHED_MC_PRIO)          += itmt.o
> +obj-$(CONFIG_X86_INTEL_UMIP)         += umip.o
>  
>  ifdef CONFIG_FRAME_POINTER
>  obj-y                                        += unwind_frame.o
> diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
> new file mode 100644
> index 0000000..c7c5795
> --- /dev/null
> +++ b/arch/x86/kernel/umip.c
> @@ -0,0 +1,245 @@
> +/*
> + * umip.c Emulation for instruction protected by the Intel User-Mode
> + * Instruction Prevention. The instructions are:
> + *    sgdt
> + *    sldt
> + *    sidt
> + *    str
> + *    smsw
> + *
> + * Copyright (c) 2017, Intel Corporation.
> + * Ricardo Neri <ricardo.n...@linux.intel.com>
> + */
> +
> +#include <linux/uaccess.h>
> +#include <asm/umip.h>
> +#include <asm/traps.h>
> +#include <asm/insn.h>
> +#include <asm/insn-eval.h>
> +#include <linux/ratelimit.h>
> +
> +/*
> + * == Base addresses of GDT and IDT
> + * Some applications to function rely finding the global descriptor table 
> (GDT)

That formulation reads funny.

> + * and the interrupt descriptor table (IDT) in kernel memory.
> + * For x86_32, the selected values do not match any particular hole, but it
> + * suffices to provide a memory location within kernel memory.
> + *
> + * == CRO flags for SMSW
> + * Use the flags given when booting, as found in head_32.S
> + */
> +
> +#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | \
> +                X86_CR0_WP | X86_CR0_AM)

Why not pull those up in asm/processor-flags.h or so and share the
definition instead of duplicating it?

> +#define UMIP_DUMMY_GDT_BASE 0xfffe0000
> +#define UMIP_DUMMY_IDT_BASE 0xffff0000
> +
> +enum umip_insn {
> +     UMIP_SGDT = 0,  /* opcode 0f 01 ModR/M reg 0 */
> +     UMIP_SIDT,      /* opcode 0f 01 ModR/M reg 1 */
> +     UMIP_SLDT,      /* opcode 0f 00 ModR/M reg 0 */
> +     UMIP_SMSW,      /* opcode 0f 01 ModR/M reg 4 */
> +     UMIP_STR,       /* opcode 0f 00 ModR/M reg 1 */

Let's stick to a single spelling: ModRM.reg=0, etc.

Better yet, use the SDM format:

        UMIP_SGDT = 0,          /* 0F 01 /0 */
        UMIP_SIDT,              /* 0F 01 /1 */
        ...

> +};
> +
> +/**
> + * __identify_insn() - Identify a UMIP-protected instruction
> + * @insn:    Instruction structure with opcode and ModRM byte.
> + *
> + * From the instruction opcode and the reg part of the ModRM byte, identify,
> + * if any, a UMIP-protected instruction.
> + *
> + * Return: an enumeration of a UMIP-protected instruction; -EINVAL on 
> failure.
> + */
> +static int __identify_insn(struct insn *insn)

static enum umip_insn __identify_insn(...

But frankly, that enum looks pointless to me - it is used locally only
and you can just as well use plain ints.

> +{
> +     /* By getting modrm we also get the opcode. */
> +     insn_get_modrm(insn);
> +
> +     /* All the instructions of interest start with 0x0f. */
> +     if (insn->opcode.bytes[0] != 0xf)
> +             return -EINVAL;
> +
> +     if (insn->opcode.bytes[1] == 0x1) {
> +             switch (X86_MODRM_REG(insn->modrm.value)) {
> +             case 0:
> +                     return UMIP_SGDT;
> +             case 1:
> +                     return UMIP_SIDT;
> +             case 4:
> +                     return UMIP_SMSW;
> +             default:
> +                     return -EINVAL;
> +             }
> +     }
> +     /* SLDT AND STR are not emulated */
> +     return -EINVAL;
> +}
> +
> +/**
> + * __emulate_umip_insn() - Emulate UMIP instructions with dummy values
> + * @insn:    Instruction structure with ModRM byte
> + * @umip_inst:       Instruction to emulate
> + * @data:    Buffer onto which the dummy values will be copied
> + * @data_size:       Size of the emulated result
> + *
> + * Emulate an instruction protected by UMIP. The result of the emulation
> + * is saved in the provided buffer. The size of the results depends on both
> + * the instruction and type of operand (register vs memory address). Thus,
> + * the size of the result needs to be updated.
> + *
> + * Result: 0 if success, -EINVAL on failure to emulate
> + */
> +static int __emulate_umip_insn(struct insn *insn, enum umip_insn umip_inst,
> +                            unsigned char *data, int *data_size)
> +{
> +     unsigned long dummy_base_addr;
> +     unsigned short dummy_limit = 0;
> +     unsigned int dummy_value = 0;
> +
> +     switch (umip_inst) {
> +     /*
> +      * These two instructions return the base address and limit of the
> +      * global and interrupt descriptor table. The base address can be
> +      * 24-bit, 32-bit or 64-bit. Limit is always 16-bit. If the operand
> +      * size is 16-bit the returned value of the base address is supposed
> +      * to be a zero-extended 24-byte number. However, it seems that a
> +      * 32-byte number is always returned in legacy protected mode
> +      * irrespective of the operand size.
> +      */
> +     case UMIP_SGDT:
> +             /* fall through */
> +     case UMIP_SIDT:
> +             if (umip_inst == UMIP_SGDT)
> +                     dummy_base_addr = UMIP_DUMMY_GDT_BASE;
> +             else
> +                     dummy_base_addr = UMIP_DUMMY_IDT_BASE;
> +             if (X86_MODRM_MOD(insn->modrm.value) == 3) {
> +                     /* SGDT and SIDT do not take register as argument. */

Comment above the if.

> +                     return -EINVAL;
> +             }

So that check needs to go first, then the dummy_base_addr assignment.

> +
> +             memcpy(data + 2, &dummy_base_addr, sizeof(dummy_base_addr));
> +             memcpy(data, &dummy_limit, sizeof(dummy_limit));
> +             *data_size = sizeof(dummy_base_addr) + sizeof(dummy_limit);

Huh, that value will always be the same - why do you have a specific
variable? It could be a define, once for 32-bit and once for 64-bit.

> +             break;
> +     case UMIP_SMSW:
> +             /*
> +              * Even though CR0_STATE contain 4 bytes, the number
> +              * of bytes to be copied in the result buffer is determined
> +              * by whether the operand is a register or a memory location.
> +              */
> +             dummy_value = CR0_STATE;

Something's wrong here: how does that local, write-only variable have
any effect?

> +             /*
> +              * These two instructions return a 16-bit value. We return
> +              * all zeros. This is equivalent to a null descriptor for
> +              * str and sldt.
> +              */
> +             /* SLDT and STR are not emulated */
> +             /* fall through */
> +     case UMIP_SLDT:
> +             /* fall through */
> +     case UMIP_STR:
> +             /* fall through */
> +     default:
> +             return -EINVAL;

That switch-case has a majority of fall-throughs. So make it an if-else
instead.

> +     }
> +     return 0;
> +}
> +
> +/**
> + * fixup_umip_exception() - Fixup #GP faults caused by UMIP
> + * @regs:    Registers as saved when entering the #GP trap
> + *
> + * The instructions sgdt, sidt, str, smsw, sldt cause a general protection
> + * fault if with CPL > 0 (i.e., from user space). This function can be
> + * used to emulate the results of the aforementioned instructions with
> + * dummy values. Results are copied to user-space memory as indicated by
> + * the instruction pointed by EIP using the registers indicated in the
> + * instruction operands. This function also takes care of determining
> + * the address to which the results must be copied.
> + */
> +bool fixup_umip_exception(struct pt_regs *regs)
> +{
> +     struct insn insn;
> +     unsigned char buf[MAX_INSN_SIZE];
> +     /* 10 bytes is the maximum size of the result of UMIP instructions */
> +     unsigned char dummy_data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

        unsigned char dummy_data[10] = { 0 };

One 0 should be enough :)

> +     unsigned long seg_base;
> +     int not_copied, nr_copied, reg_offset, dummy_data_size;
> +     void __user *uaddr;
> +     unsigned long *reg_addr;
> +     enum umip_insn umip_inst;
> +     struct insn_code_seg_defaults seg_defs;

Please sort function local variables declaration in a reverse christmas
tree order:

        <type> longest_variable_name;
        <type> shorter_var_name;
        <type> even_shorter;
        <type> i;

> +
> +     /*
> +      * Use the segment base in case user space used a different code
> +      * segment, either in protected (e.g., from an LDT) or virtual-8086
> +      * modes. In most of the cases seg_base will be zero as in USER_CS.
> +      */
> +     seg_base = insn_get_seg_base(regs, &insn,
> +                                  offsetof(struct pt_regs, ip));

Oh boy, where's the error handling?! That can return -1L.

> +     not_copied = copy_from_user(buf, (void __user *)(seg_base + regs->ip),

-1L + regs->ip is then your pwnage.

> +                                 sizeof(buf));

Just let them stick out.

> +     nr_copied = sizeof(buf) - not_copied;

<---- newline here.

> +     /*
> +      * The copy_from_user above could have failed if user code is protected
                            ()

> +      * by a memory protection key. Give up on emulation in such a case.
> +      * Should we issue a page fault?

Why? AFAICT, you're in the #GP handler. Simply you return unhandled.

> +      */
> +     if (!nr_copied)
> +             return false;
> +
> +     insn_init(&insn, buf, nr_copied, user_64bit_mode(regs));
> +
> +     /*
> +      * Override the default operand and address sizes to what is specified
> +      * in the code segment descriptor. The instruction decoder only sets
> +      * the address size it to either 4 or 8 address bytes and does nothing
> +      * for the operand bytes. This OK for most of the cases, but we could
> +      * have special cases where, for instance, a 16-bit code segment
> +      * descriptor is used.
> +      * If there are overrides, the instruction decoder correctly updates
> +      * these values, even for 16-bit defaults.
> +      */
> +     seg_defs = insn_get_code_seg_defaults(regs);
> +     insn.addr_bytes = seg_defs.address_bytes;
> +     insn.opnd_bytes = seg_defs.operand_bytes;
> +
> +     if (!insn.addr_bytes || !insn.opnd_bytes)
> +             return false;
> +
> +     if (user_64bit_mode(regs))
> +             return false;
> +
> +     insn_get_length(&insn);
> +     if (nr_copied < insn.length)
> +             return false;
> +
> +     umip_inst = __identify_insn(&insn);
> +     /* Check if we found an instruction protected by UMIP */

Put comment above the function call.

> +     if (umip_inst < 0)
> +             return false;
> +
> +     if (__emulate_umip_insn(&insn, umip_inst, dummy_data, &dummy_data_size))
> +             return false;
> +
> +     /* If operand is a register, write directly to it */
> +     if (X86_MODRM_MOD(insn.modrm.value) == 3) {
> +             reg_offset = insn_get_modrm_rm_off(&insn, regs);

Grr, error handling!! That reg_offset can be -E<something>.

> +             reg_addr = (unsigned long *)((unsigned long)regs + reg_offset);
> +             memcpy(reg_addr, dummy_data, dummy_data_size);
> +     } else {
> +             uaddr = insn_get_addr_ref(&insn, regs);
> +             /* user address could not be determined, abort emulation */

That comment is kinda obvious. But yes, this has error handling.

> +             if ((unsigned long)uaddr == -1L)
> +                     return false;
> +             nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size);
> +             if (nr_copied  > 0)
> +                     return false;
> +     }
> +
> +     /* increase IP to let the program keep going */
> +     regs->ip += insn.length;
> +     return true;
> +}
> -- 
> 2.9.3
> 

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imend├Ârffer, Jane Smithard, Graham Norton, HRB 21284 
(AG N├╝rnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-msdos" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to