On Wed, 1 Aug 2018, Dave Hansen wrote:
> 
> From: Dave Hansen <[email protected]>
> 
> The kernel image is mapped into two places in the virtual address
> space (addresses without KASLR, of course):
> 
>       1. The kernel direct map (0xffff880000000000)
>       2. The "high kernel map" (0xffffffff81000000)
> 
> We actually execute out of #2.  If we get the address of a kernel
> symbol, it points to #2, but almost all physical-to-virtual
> translations point to #1.
> 
> Parts of the "high kernel map" alias are mapped in the userspace
> page tables with the Global bit for performance reasons.  The
> parts that we map to userspace do not (er, should not) have
> secrets.
> 
> This is fine, except that some areas in the kernel image that
> are adjacent to the non-secret-containing areas are unused holes.
> We free these holes back into the normal page allocator and
> reuse them as normal kernel memory.  The memory will, of course,
> get *used* via the normal map, but the alias mapping is kept.
> 
> This otherwise unused alias mapping of the holes will, by default
> keep the Global bit, be mapped out to userspace, and be
> vulnerable to Meltdown.
> 
> Remove the alias mapping of these pages entirely.  This is likely
> to fracture the 2M page mapping the kernel image near these areas,
> but this should affect a minority of the area.
> 
> This unmapping behavior is currently dependent on PTI being in
> place.  Going forward, we should at least consider doing this for
> all configurations.  Having an extra read-write alias for memory
> is not exactly ideal for debugging things like random memory
> corruption and this does undercut features like DEBUG_PAGEALLOC
> or future work like eXclusive Page Frame Ownership (XPFO).
> 
> Before this patch:
> 
> current_kernel:---[ High Kernel Mapping ]---
> current_kernel-0xffffffff80000000-0xffffffff81000000          16M             
>                   pmd
> current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro      
>    PSE     GLB x  pmd
> current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro      
>            GLB x  pte
> current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K     RW      
>                NX pte
> current_kernel-0xffffffff82000000-0xffffffff82600000           6M     ro      
>    PSE     GLB NX pmd
> current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW      
>    PSE         NX pmd
> current_kernel-0xffffffff82c00000-0xffffffff82e00000           2M     RW      
>                NX pte
> current_kernel-0xffffffff82e00000-0xffffffff83200000           4M     RW      
>    PSE         NX pmd
> current_kernel-0xffffffff83200000-0xffffffffa0000000         462M             
>                   pmd
> 
>   current_user:---[ High Kernel Mapping ]---
>   current_user-0xffffffff80000000-0xffffffff81000000          16M             
>                   pmd
>   current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro      
>    PSE     GLB x  pmd
>   current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro      
>            GLB x  pte
>   current_user-0xffffffff81e11000-0xffffffff82000000        1980K     RW      
>                NX pte
>   current_user-0xffffffff82000000-0xffffffff82600000           6M     ro      
>    PSE     GLB NX pmd
>   current_user-0xffffffff82600000-0xffffffffa0000000         474M             
>                   pmd
> 
> 
> After this patch:
> 
> current_kernel:---[ High Kernel Mapping ]---
> current_kernel-0xffffffff80000000-0xffffffff81000000          16M             
>                   pmd
> current_kernel-0xffffffff81000000-0xffffffff81e00000          14M     ro      
>    PSE     GLB x  pmd
> current_kernel-0xffffffff81e00000-0xffffffff81e11000          68K     ro      
>            GLB x  pte
> current_kernel-0xffffffff81e11000-0xffffffff82000000        1980K             
>                   pte
> current_kernel-0xffffffff82000000-0xffffffff82400000           4M     ro      
>    PSE     GLB NX pmd
> current_kernel-0xffffffff82400000-0xffffffff82488000         544K     ro      
>                NX pte
> current_kernel-0xffffffff82488000-0xffffffff82600000        1504K             
>                   pte
> current_kernel-0xffffffff82600000-0xffffffff82c00000           6M     RW      
>    PSE         NX pmd
> current_kernel-0xffffffff82c00000-0xffffffff82c0d000          52K     RW      
>                NX pte
> current_kernel-0xffffffff82c0d000-0xffffffff82dc0000        1740K             
>                   pte
> 
>   current_user:---[ High Kernel Mapping ]---
>   current_user-0xffffffff80000000-0xffffffff81000000          16M             
>                   pmd
>   current_user-0xffffffff81000000-0xffffffff81e00000          14M     ro      
>    PSE     GLB x  pmd
>   current_user-0xffffffff81e00000-0xffffffff81e11000          68K     ro      
>            GLB x  pte
>   current_user-0xffffffff81e11000-0xffffffff82000000        1980K             
>                   pte
>   current_user-0xffffffff82000000-0xffffffff82400000           4M     ro      
>    PSE     GLB NX pmd
>   current_user-0xffffffff82400000-0xffffffff82488000         544K     ro      
>                NX pte
>   current_user-0xffffffff82488000-0xffffffff82600000        1504K             
>                   pte
>   current_user-0xffffffff82600000-0xffffffffa0000000         474M             
>                   pmd
> 
> Signed-off-by: Dave Hansen <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Juergen Gross <[email protected]>
> Cc: Josh Poimboeuf <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Hugh Dickins <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Andi Kleen <[email protected]>
> ---
> 
>  b/arch/x86/mm/init.c |   22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff -puN arch/x86/mm/init.c~x86-unmap-freed-areas-from-kernel-image 
> arch/x86/mm/init.c
> --- a/arch/x86/mm/init.c~x86-unmap-freed-areas-from-kernel-image      
> 2018-07-30 09:53:14.862915689 -0700
> +++ b/arch/x86/mm/init.c      2018-07-30 09:53:14.866915689 -0700
> @@ -778,8 +778,26 @@ void free_init_pages(char *what, unsigne
>   */
>  void free_kernel_image_pages(void *begin, void *end)
>  {
> -     free_init_pages("unused kernel image",
> -                     (unsigned long)begin, (unsigned long)end);
> +     unsigned long begin_ul = (unsigned long)begin;
> +     unsigned long end_ul = (unsigned long)end;
> +     unsigned long len_pages = (end_ul - begin_ul) >> PAGE_SHIFT;
> +
> +
> +     free_init_pages("unused kernel image", begin_ul, end_ul);
> +
> +     /*
> +      * PTI maps some of the kernel into userspace.  For
> +      * performance, this includes some kernel areas that
> +      * do not contain secrets.  Those areas might be
> +      * adjacent to the parts of the kernel image being
> +      * freed, which may contain secrets.  Remove the
> +      * "high kernel image mapping" for these freed areas,
> +      * ensuring they are not even potentially vulnerable
> +      * to Meltdown regardless of the specific optimizations
> +      * PTI is currently using.
> +      */
> +     if (cpu_feature_enabled(X86_FEATURE_PTI))
> +             set_memory_np(begin_ul, len_pages);
>  }
>  
>  void __ref free_initmem(void)
> _

Ironically, that set_memory_np() is giving me a problem.

I don't see it when booting the 8GB laptop normally, but when booting
with "mem=1G", I get a not-present fault when ext4_iget() is trying to
do its business in starting init.  But boots fine with "mem=1G nopti".

I get the feeling that set_memory_np() is marking those freed pages
as not-present in the direct map, so they're no longer usable at all.

I can jot down some console messages if you need, but hope I've said
enough for you to see it immediately, and just say whoops, forget 5/5?

Hugh

Reply via email to