Do we have test cases with guest.wp=0 in KVM test suite?
Thanks!
-Xin

> -----Original Message-----
> From: Avi Kivity [mailto:[email protected]]
> Sent: Monday, June 06, 2011 9:19 PM
> To: Marcelo Tosatti; [email protected]; Yang, Wei Y; Shan, Haitao; Li, Xin
> Subject: [PATCH] KVM: Adjust shadow paging to work when SMEP=1 and CR0.WP=0
> 
> When CR0.WP=0, we sometimes map user pages as kernel pages (to allow
> the kernel to write to them).  Unfortunately this also allows the kernel
> to fetch from these pages, even if CR4.SMEP is set.
> 
> Adjust for this by also setting NX on the spte in these circumstances.
> 
> Signed-off-by: Avi Kivity <[email protected]>
> ---
> 
> Turned out a little more complicated than I thought.
> 
>  Documentation/virtual/kvm/mmu.txt |   18 ++++++++++++++++++
>  arch/x86/include/asm/kvm_host.h   |    1 +
>  arch/x86/kvm/mmu.c                |   14 +++++++++++++-
>  3 files changed, 32 insertions(+), 1 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/mmu.txt
> b/Documentation/virtual/kvm/mmu.txt
> index f46aa58..5dc972c 100644
> --- a/Documentation/virtual/kvm/mmu.txt
> +++ b/Documentation/virtual/kvm/mmu.txt
> @@ -165,6 +165,10 @@ Shadow pages contain the following information:
>      Contains the value of efer.nxe for which the page is valid.
>    role.cr0_wp:
>      Contains the value of cr0.wp for which the page is valid.
> +  role.smep_andnot_wp:
> +    Contains the value of cr4.smep && !cr0.wp for which the page is valid
> +    (pages for which this is true are different from other pages; see the
> +    treatment of cr0.wp=0 below).
>    gfn:
>      Either the guest page table containing the translations shadowed by this
>      page, or the base page frame for linear translations.  See role.direct.
> @@ -317,6 +321,20 @@ on fault type:
> 
>  (user write faults generate a #PF)
> 
> +In the first case there is an additional complication if CR4.SMEP is
> +enabled: since we've turned the page into a kernel page, the kernel may now
> +execute it.  We handle this by also setting spte.nx.  If we get a user
> +fetch or read fault, we'll change spte.u=1 and spte.nx=gpte.nx back.
> +
> +To prevent an spte that was converted into a kernel page with cr0.wp=0
> +from being written by the kernel after cr0.wp has changed to 1, we make
> +the value of cr0.wp part of the page role.  This means that an spte created
> +with one value of cr0.wp cannot be used when cr0.wp has a different value -
> +it will simply be missed by the shadow page lookup code.  A similar issue
> +exists when an spte created with cr0.wp=0 and cr4.smep=0 is used after
> +changing cr4.smep to 1.  To avoid this, the value of !cr0.wp && cr4.smep
> +is also made a part of the page role.
> +
>  Large pages
>  ===========
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index fc38eca..c7e7f53 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -205,6 +205,7 @@ union kvm_mmu_page_role {
>               unsigned invalid:1;
>               unsigned nxe:1;
>               unsigned cr0_wp:1;
> +             unsigned smep_andnot_wp:1;
>       };
>  };
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 2d14434..8aaaa23f 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1985,8 +1985,17 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>               spte |= PT_WRITABLE_MASK;
> 
>               if (!vcpu->arch.mmu.direct_map
> -                 && !(pte_access & ACC_WRITE_MASK))
> +                 && !(pte_access & ACC_WRITE_MASK)) {
>                       spte &= ~PT_USER_MASK;
> +                     /*
> +                      * If we converted a user page to a kernel page,
> +                      * so that the kernel can write to it when cr0.wp=0,
> +                      * then we should prevent the kernel from executing it
> +                      * if SMEP is enabled.
> +                      */
> +                     if (!kvm_read_cr4_bits(vcpu, X86_CR4_SMEP))
> +                             spte |= PT64_NX_MASK;
> +             }
> 
>               /*
>                * Optimization: for pte sync, if spte was writable the hash
> @@ -2955,6 +2964,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
>  int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
>  {
>       int r;
> +     bool smep = kvm_read_cr4_bits(vcpu, X86_CR4_SMEP);
>       ASSERT(vcpu);
>       ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
> 
> @@ -2969,6 +2979,8 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
> struct kvm_mmu *context)
> 
>       vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
>       vcpu->arch.mmu.base_role.cr0_wp  = is_write_protection(vcpu);
> +     vcpu->arch.mmu.base_role.smep_andnot_wp
> +             = smep && !is_write_protection(vcpu);
> 
>       return r;
>  }
> --
> 1.7.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to