This a shortened version of the series "x86: Enable User-Mode Instruction Prevention (UMIP)". This series only includes the code that is used to compute 64-bit linear addresses with and without segmentation plus handling the special cases of each addressing mode.
The purpose of this series is to gather all the patches that have been reviewed during the last nine versions of the series and have it merged in the tip tree (hopefully!). Thus far, this code is use by MPX. It will also be used by UMIP once enabled. A separate series will deal with the UMIP emulation code as well as 32-bit and 16-bit addresses. A discussion on UMIP and the need for emulation code can be found here [9]. For reference, the nine previous submissions can be found here [1], here [2], here[3], here[4], here[5], here[6], here[7], here[8] and here[9]. This version addresses the feedback comments from Borislav Petkov received on v9. Please see details in the change log. === How is this series laid out? ++ Preparatory work As per suggestions from Andy Lutormirsky and Borislav Petkov, I moved the x86 page fault error codes to a header. Also, I made user_64bit_mode available to x86_32 builds. This helps to reuse code and reduce the number of #ifdef's in these patches. Borislav also suggested to uprobes should use the existing definitions in arch/x86/include/asm/inat.h instead of hard- coded values when checking instruction prefixes. I included this change in the series. ++ Fix bugs in MPX address decoder I found very useful the code for Intel MPX (Memory Protection Extensions) used to parse opcodes and the memory locations contained in the general purpose registers when used as operands. I put this code in a separate library file that both MPX, UMIP and potentially others can access and avoid code duplication. Before creating the new library, I fixed several of bugs that I found in in corner cases on how MPX determines the address contained in the instruction and operands. ++ Provide a new x86 instruction evaluating library With bugs fixed, the MPX evaluating code is relocated in a new insn-eval.c library. The basic functionality of this library is extended to obtain the segment descriptor selected by either segment override prefixes or the default segment by the involved registers in the calculation of the effective address. It was also extended to obtain the default address and operand sizes as well as the segment base address. Armed with this arsenal, it is now possible to determine the linear address indicated by the operands of an instruction structure. This new library relies on and extends the capabilities of the existing instruction decoder in arch/x86/lib/insn.c. ++ Extensive tests Extensive tests were performed to test all the combinations of ModRM, SiB and displacements for 64-bit addresses; including segmentation via the FS and GS segment registers. For this purpose, I relied on a CPU that features UMIP support for 64-bit process. Emulation could also be tested by using instructions that cause a #GP in readily available systems (e.g., use lgdt instead of sgdt). This change is not part of this patchset. Code of these tests can be found here [13]. ++ Merging this series? As stated, this series contains code that has been reasonably reviewed through 9 versions. It should be in good condition to be merged [14]. [1]. https://lwn.net/Articles/705877/ [2]. https://lkml.org/lkml/2016/12/23/265 [3]. https://lkml.org/lkml/2017/1/25/622 [4]. https://lkml.org/lkml/2017/2/23/40 [5]. https://lkml.org/lkml/2017/3/3/678 [6]. https://lkml.org/lkml/2017/3/7/866 [7]. https://lkml.org/lkml/2017/5/5/398 [8]. https://lkml.org/lkml/2017/8/18/992 [9]. https://lkml.org/lkml/2017/10/3/1066 [13]. https://github.com/01org/luv-yocto/tree/rneri/umip/meta-luv/recipes-core/umip/files [14]. https://lkml.org/lkml/2017/10/20/763 Thanks and BR, Ricardo Changes since V9: *Shortened the series to group the patches that have been reviewed thus far. This comprises the code to resolve 64-bit linear addresses with segmentation. *Reworked the handling of segment resolution for rIP. This is the only case in which we don't have a valid instruction structure. *Added a new function get_seg_base_addr() that resolves the segment register and finds its associated address. *Added a new function resolve_default_seg() to determine the default segment associated by a given register. This is to simplify further the function resolve_seg_reg(). *Renamed function get_overridden_seg_reg_idx() as get_seg_reg_override_idx() and several automatic variables of such function. *Renamed function allow_seg_reg_overrides() as check_seg_overrides(). *Renamed the function insn_get_code_seg_defaults() as insn_get_code_seg_params(). Changes since V8: *Simplified error handling in the family of get_addr_ref_xx functions by initializing linear address to -1L. *Reworded commit that #define's an initial state of CR0 and removed unneeded comment. *Reworked get_desc() to get rid of one mutex_unlock(). Used a new local variable to improve readability. *Reworked the utility functions used to obtain the segment selector: + get_overridden_seg_reg_idx() now only inspects the instruction to find segment override prefixes. + A new function allow_seg_reg_overrides() determines if segment override prefixes can be used based on the register operand in use and the nature of the instruction (i.e., string instructions vs not). + resolve_seg_reg() uses the two functions above, along with user_64bit_mode() to resolve the segment register index: overridden, default or ignored. *Renamed local variables to reflect the fact that our segment registers are indexes and not the actual hardware regiters. *Reworded function documentation for improved readability. Changes since V7: *UMIP is not enabled by default. *Relocated definition of the initial state of CR0 into processor-flags.h *Updated uprobes to use the autogenerated INAT_PFX_xS definitions instead of hard-coded values. *In insn-eval.c, refer to segment override prefixes using the autogenerated INAT_PFX_XS definitions. *Removed enumeration for segment registers that reused the segment override instruction prefixes. Instead, a new, separate, set of #defines is used in arch/x86/include/asm/inat.h *Simplified function to identify string instruction. *Split the code usde to determine the relevant segment register into two functions: one to inspect segment overrides and a second one to determine default segment registers based on the instruction and operands. A third functions reads the segment register to obtain the segment selector. *Reworked arithmetic to compute 32-bit and 64-bit effective addresses. Instead of type casts, two separate functions are used in each case. *Removed structure to hold segment default address and operand sizes. Used #defines instead. *Corrected bug when determining the limit of a segment. *Updated various functions to use error codes from errno-base.h *Replaced prink_ratelimited with pr_err_ratelimited. *Corrected typos and format errors in functions' documentation. *Fixed unimplemented handling of emulation of the SMSW instruction. *Added documentation to file containing implementation for UMIP. *Improved error handling in fixup_umip_exception() function. Changes since V6: *Reworded and addded more details on the special cases of ModRM and SIB bytes. To avoid confusion, I ommited mentioning the involved registers (EBP and ESP). *Replaced BUG() with printk_ratelimited in function get_reg_offset of insn-eval.c *Removed unused utility functions that obtain a register value from pt_regs given a SIB base and index. *Clarified nomenclature to call CS, DS, ES, FS, GS and SS segment registers and their values segment selectors. *Reworked function resolve_seg_register to issue an error when more than one segment overrides prefixes are used in the instruction. *Added logic in resolve_seg_register to ignore segment register when in long mode and not using FS or GS. *Added logic to ensure the effective address is within the limits of the segment in protected mode. *Added logic to ensure segment override prefixes are ignored when resolving the segment of EIP and EDI with string instructions. *Added code to make user_64bit_mode() available in CONFIG_X86_32... and make it return false, of course. *Merged the two functions that obtain the default address and operand size of a code segment into one as they are always used together. *Corrected logic of displacement-only addressing in long mode to make the displacement relative to the RIP of the next instruction. *Reworked logic to sign-extend 32-bit memory offsets into 64-bit signed memory offsets. This include more checks and putting all together in an utility function. *Removed the 'unlikely' of conditional statements as we are not in a critical path. *In virtual-8086 mode, ensure that effective addresses are always less than 0x10000, even when address override prefixes are used. Also, ensure that linear addresses have a size of 20-bits. Changes since V5: * Relocate the page fault error code enumerations to traps.h Changes since V4: * Audited patches to use braces in all the branches of conditional. statements, except those in which the conditional action only takes one line. * Implemented support in 64-builds for both 32-bit and 64-bit tasks in the instruction evaluating library. * Split segment selector function in the instruction evaluating library into two functions to resolve the segment type by instruction override or default and a separate function to actually read the segment selector. * Fixed a bug when evaluating 32-bit effective addresses with 64-bit kernels. * Split patches further for for easier review. * Use signed variables for computation of effective address. * Fixed issue with a spurious static modifier in function insn_get_addr_ref found by kbuild test bot. * Removed comparison between true and fixup_umip_exception. * Reworked check logic when identifying erroneous vs invalid values of the SiB base and index. Changes since V3: * Limited emulation to 32-bit and 16-bit modes. For 64-bit mode, a general protection fault is still issued when UMIP-protected instructions are executed with CPL > 0. * Expanded instruction-evaluating code to obtain segment descriptor along with their attributes such as base address and default address and operand sizes. Also, support for 16-bit encodings in protected mode was implemented. * When getting a segment descriptor, this include support to obtain those of a local descriptor table. * Now the instruction-evaluating code returns -EDOM when the value of registers should not be used in calculating the effective address. The value -EINVAL is left for errors. * Incorporate the value of the segment base address in the computation of linear addresses. * Renamed new instruction evaluation library from insn-kernel.c to insn-eval.c * Exported functions insn_get_reg_offset_* to obtain the register offset by ModRM r/m, SiB base and SiB index. * Improved documentation of functions. * Split patches further for easier review. Changes since V2: * Added new utility functions to decode the memory addresses contained in registers when the 16-bit addressing encodings are used. This includes code to obtain and compute memory addresses using segment selectors for real-mode address translation. * Added support to emulate UMIP-protected instructions for virtual-8086 tasks. * Added self-tests for virtual-8086 mode that contains representative use cases: address represented as a displacement, address in registers and registers as operands. * Instead of maintaining a static variable for the dummy base addresses of the IDT and GDT, a hard-coded value is used. * The emulated SMSW instructions now return the value with which the CR0 register is programmed in head_32/64.S This is: PE | MP | ET | NE | WP | AM. For x86_64, PG is also enabled. * The new file arch/x86/lib/insn-utils.c is now renamed as arch/x86/lib/ insn-kernel.c. It also has its own header. This helps keep in sync the the kernel and objtool instruction decoders. Also, the new insn-kernel.c contains utility functions that are only relevant in a kernel context. * Removed printed warnings for errors that occur when decoding instructions with invalid operands. * Added more comments on fixes in the instruction-decoding MPX functions. * Now user_64bit_mode(regs) is used instead of test_thread_flag(TIF_IA32) to determine if the task is 32-bit or 64-bit. * Found and fixed a bug in insn-decoder in which X86_MODRM_RM was incorrectly used to obtain the mod part of the ModRM byte. * Added more explanatory code in emulation and instruction decoding code. This includes a comment regarding that copy_from_user could fail if there exists a memory protection key in place. * Tested code with CONFIG_X86_DECODER_SELFTEST=y and everything passes now. * Prefixed get_reg_offset_rm with insn_ as this function is exposed via a header file. For clarity, this function was added in a separate patch. Changes since V1: * Virtual-8086 mode tasks are not treated in a special manner. All code for this purpose was removed. * Instead of attempting to disable UMIP during a context switch or when entering virtual-8086 mode, UMIP remains enabled all the time. General protection faults that occur are fixed-up by returning dummy values as detailed above. * Removed umip= kernel parameter in favor of using clearcpuid=514 to disable UMIP. * Removed selftests designed to detect the absence of SIGSEGV signals when running in virtual-8086 mode. * Reused code from MPX to decode instructions operands. For this purpose code was put in a common location. * Fixed two bugs in MPX code that decodes operands. Ricardo Neri (18): x86/mm: Relocate page fault error codes to traps.h x86/boot: Relocate definition of the initial state of CR0 ptrace,x86: Make user_64bit_mode() available to 32-bit builds uprobes/x86: Use existing definitions for segment override prefixes x86/mpx: Simplify handling of errors when computing linear addresses x86/mpx: Use signed variables to compute effective addresses x86/mpx: Do not use SIB.index if its value is 100b and ModRM.mod is not 11b x86/mpx: Do not use SIB.base if its value is 101b and ModRM.mod = 0 x86/mpx, x86/insn: Relocate insn util functions to a new insn-eval file x86/insn-eval: Do not BUG on invalid register type x86/insn-eval: Add a utility function to get register offsets x86/insn-eval: Add utility function to identify string instructions x86/insn-eval: Add utility functions to get segment selector x86/insn-eval: Add utility function to get segment descriptor x86/insn-eval: Add utility functions to get segment descriptor base address and limit x86/insn-eval: Add function to get default params of code segment x86/insn-eval: Indicate a 32-bit displacement if ModRM.mod is 0 and ModRM.rm is 101b x86/insn-eval: Incorporate segment base in linear address computation arch/x86/include/asm/inat.h | 10 + arch/x86/include/asm/insn-eval.h | 23 + arch/x86/include/asm/ptrace.h | 6 +- arch/x86/include/asm/traps.h | 18 + arch/x86/include/uapi/asm/processor-flags.h | 3 + arch/x86/kernel/head_32.S | 3 - arch/x86/kernel/head_64.S | 3 - arch/x86/kernel/uprobes.c | 15 +- arch/x86/lib/Makefile | 2 +- arch/x86/lib/insn-eval.c | 854 ++++++++++++++++++++++++++++ arch/x86/mm/fault.c | 88 ++- arch/x86/mm/mpx.c | 120 +--- 12 files changed, 959 insertions(+), 186 deletions(-) create mode 100644 arch/x86/include/asm/insn-eval.h create mode 100644 arch/x86/lib/insn-eval.c -- 2.7.4