On Sat, 16 Feb 2019, Jann Horn wrote: > On Sat, Feb 16, 2019 at 12:59 AM <ba...@gandi.net> wrote: > > When extracting an initramfs, a filename may be near an allocation boundary. > > Should that happen, strncopy_from_user will invoke unsafe_get_user which > > may cross the allocation boundary. Should that happen, unsafe_get_user will > > trigger a page fault, and strncopy_from_user would then bailout to > > byte_at_a_time behavior. > > > > unsafe_get_user is unsafe by nature, and rely on pagefault to detect > > boundaries. > > After 9da3f2b74054 ("x86/fault: BUG() when uaccess helpers fault on kernel > > addresses") > > it may no longer rely on pagefault as the new page fault handler would > > trigger a BUG(). > > > > This commit allows unsafe_get_user to explicitly trigger pagefaults and > > handle them directly with the error target label. > > Oof. So basically the init code is full of things that just call > syscalls instead of using VFS functions (which don't actually exist > for everything), and the VFS syscalls use getname_flags(), which uses > strncpy_from_user(), which can access out-of-bounds pages on > architectures that set CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, and > that in summary means that all the init code is potentially prone to > tripping over this?
Not all init code. It should be only the initramfs decompression. > I don't particularly like this approach to fixing it, but I also don't > have any better ideas, so I guess unless someone else has a bright > idea, this patch might have to go in. So we know that this happens in the context of decompress() which calls flush_buffer() for every chunk. flush_buffer() gets the start_address and the length. We also know that the fault can only happen within: start_address <= fault_address < start_address + length + 8; So something like the untested workaround below should cover the initramfs oddity and avoid to weaken the protection for all other cases. Thanks, tglx 8<--------------- --- a/arch/x86/mm/extable.c +++ b/arch/x86/mm/extable.c @@ -1,5 +1,6 @@ #include <linux/extable.h> #include <linux/uaccess.h> +#include <linux/initrd.h> #include <linux/sched/debug.h> #include <xen/xen.h> @@ -161,6 +162,14 @@ static bool bogus_uaccess(struct pt_regs if (current->kernel_uaccess_faults_ok) return false; + /* + * initramfs decompression can trigger a fault when + * unsafe_get_user() goes over the boundary of the buffer. That's a + * valid case for e.g. strncpy_from_user(). + */ + if (initramfs_fault_in_decompress(fault_addr)) + return false; + /* This is bad. Refuse the fixup so that we go into die(). */ if (trapnr == X86_TRAP_PF) { pr_emerg("BUG: pagefault on kernel address 0x%lx in non-whitelisted uaccess\n", --- a/include/linux/initrd.h +++ b/include/linux/initrd.h @@ -1,5 +1,8 @@ /* SPDX-License-Identifier: GPL-2.0 */ +#ifndef LINUX_INITRD_H +#define LINUX_INITRD_H + #define INITRD_MINOR 250 /* shouldn't collide with /dev/ram* too soon ... */ /* 1 = load ramdisk, 0 = don't load */ @@ -25,3 +28,14 @@ extern phys_addr_t phys_initrd_start; extern unsigned long phys_initrd_size; extern unsigned int real_root_dev; + +#ifdef CONFIG_BLK_DEV_INITRD +bool initramfs_fault_in_decompress(unsigned long addr); +#else +static inline bool initramfs_fault_in_decompress(unsigned long addr) +{ + return false; +} +#endif + +#endif --- a/init/initramfs.c +++ b/init/initramfs.c @@ -403,13 +403,27 @@ static __initdata int (*actions[])(void) [Reset] = do_reset, }; +static unsigned long flush_start; +static unsigned long flush_length; + +bool initramfs_fault_in_decompress(unsigned long addr) +{ + return addr >= flush_start && addr < flush_start + flush_length + 8; +} + static long __init write_buffer(char *buf, unsigned long len) { + /* Store address and length for uaccess fault handling */ + flush_start = (unsigned long) buf; + flush_length = len; + byte_count = len; victim = buf; while (!actions[state]()) ; + /* Clear the uaccess fault handling region */ + flush_start = flush_length = 0; return len - byte_count; }