From: Dave Hansen <[email protected]>
The kernel image is mapped into two places in the virtual address
space (addresses without KASLR, of course):
1. The kernel direct map (0xffff880000000000)
2. The "high kernel map" (0xffffffff81000000)
We actually execute out of #2. If we get the address of a kernel
symbol, it points to #2, but almost all physical-to-virtual
translations point to #1.
Parts of the "high kernel map" alias are mapped in the userspace
page tables with the Global bit for performance reasons. The
parts that we map to userspace do not (er, should not) have
secrets.
This is fine, except that some areas in the kernel image that
are adjacent to the non-secret-containing areas are unused holes.
We free these holes back into the normal page allocator and
reuse them as normal kernel memory. The memory will, of course,
get *used* via the normal map, but the alias mapping is kept.
This otherwise unused alias mapping of the holes will, by default
keep the Global bit, be mapped out to userspace, and be
vulnerable to Meltdown.
Remove the alias mapping of these pages entirely. This is likely
to fracture the 2M page mapping the kernel image near these areas,
but this should affect a minority of the area.
This unmapping behavior is currently dependent on PTI being in
place. Going forward, we should at least consider doing this for
all configurations. Having an extra read-write alias for memory
is not exactly ideal for debugging things like random memory
corruption and this does undercut features like DEBUG_PAGEALLOC
or future work like eXclusive Page Frame Ownership (XPFO).
Before this patch:
current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000 16M
pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000 14M ro
PSE GLB x pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000 68K ro
GLB x pte
current_kernel-0xffffffff81e11000-0xffffffff82000000 1980K RW
NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000 6M ro
PSE GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000 6M RW
PSE NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000 2M RW
NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000 4M RW
PSE NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000 462M
pmd
current_user:---[ High Kernel Mapping ]---
current_user-0xffffffff80000000-0xffffffff81000000 16M
pmd
current_user-0xffffffff81000000-0xffffffff81e00000 14M ro
PSE GLB x pmd
current_user-0xffffffff81e00000-0xffffffff81e11000 68K ro
GLB x pte
current_user-0xffffffff81e11000-0xffffffff82000000 1980K RW
NX pte
current_user-0xffffffff82000000-0xffffffff82600000 6M ro
PSE GLB NX pmd
current_user-0xffffffff82600000-0xffffffffa0000000 474M
pmd
After this patch:
current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000 16M
pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000 14M ro
PSE GLB x pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000 68K ro
GLB x pte
current_kernel-0xffffffff81e11000-0xffffffff82000000 1980K
pte
current_kernel-0xffffffff82000000-0xffffffff82400000 4M ro
PSE GLB NX pmd
current_kernel-0xffffffff82400000-0xffffffff82488000 544K ro
NX pte
current_kernel-0xffffffff82488000-0xffffffff82600000 1504K
pte
current_kernel-0xffffffff82600000-0xffffffff82c00000 6M RW
PSE NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82c0d000 52K RW
NX pte
current_kernel-0xffffffff82c0d000-0xffffffff82dc0000 1740K
pte
current_user:---[ High Kernel Mapping ]---
current_user-0xffffffff80000000-0xffffffff81000000 16M
pmd
current_user-0xffffffff81000000-0xffffffff81e00000 14M ro
PSE GLB x pmd
current_user-0xffffffff81e00000-0xffffffff81e11000 68K ro
GLB x pte
current_user-0xffffffff81e11000-0xffffffff82000000 1980K
pte
current_user-0xffffffff82000000-0xffffffff82400000 4M ro
PSE GLB NX pmd
current_user-0xffffffff82400000-0xffffffff82488000 544K ro
NX pte
current_user-0xffffffff82488000-0xffffffff82600000 1504K
pte
current_user-0xffffffff82600000-0xffffffffa0000000 474M
pmd
Signed-off-by: Dave Hansen <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andi Kleen <[email protected]>
---
b/arch/x86/mm/init.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff -puN arch/x86/mm/init.c~x86-unmap-freed-areas-from-kernel-image
arch/x86/mm/init.c
--- a/arch/x86/mm/init.c~x86-unmap-freed-areas-from-kernel-image
2018-07-30 09:53:14.862915689 -0700
+++ b/arch/x86/mm/init.c 2018-07-30 09:53:14.866915689 -0700
@@ -778,8 +778,26 @@ void free_init_pages(char *what, unsigne
*/
void free_kernel_image_pages(void *begin, void *end)
{
- free_init_pages("unused kernel image",
- (unsigned long)begin, (unsigned long)end);
+ unsigned long begin_ul = (unsigned long)begin;
+ unsigned long end_ul = (unsigned long)end;
+ unsigned long len_pages = (end_ul - begin_ul) >> PAGE_SHIFT;
+
+
+ free_init_pages("unused kernel image", begin_ul, end_ul);
+
+ /*
+ * PTI maps some of the kernel into userspace. For
+ * performance, this includes some kernel areas that
+ * do not contain secrets. Those areas might be
+ * adjacent to the parts of the kernel image being
+ * freed, which may contain secrets. Remove the
+ * "high kernel image mapping" for these freed areas,
+ * ensuring they are not even potentially vulnerable
+ * to Meltdown regardless of the specific optimizations
+ * PTI is currently using.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_PTI))
+ set_memory_np(begin_ul, len_pages);
}
void __ref free_initmem(void)
_