Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
Hollis Blanchard wrote:
On Thu, 2008-07-10 at 20:49 -0300, Marcelo Tosatti wrote:
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index b850d24..2c438a7 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -170,6 +170,11 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+void kvm_arch_flush_shadow(struct kvm *kvm)
+{
+ return;
+}
+
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int
id)
{
struct kvm_vcpu *vcpu;
By the way, what is the testcase for this, i.e. how do I remove a
memslot?
1. Memory hotunplug (perhaps not very relevant to embedded operating
systems)
2. PCI memory resource remapping (aka framebuffers)
--
Do not meddle in the internals of kernels, for they are subtle and quick to
panic.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
On Monday 21 July 2008 16:34:40 Marcelo Tosatti wrote: > On Mon, Jul 21, 2008 at 04:03:27PM -0500, Hollis Blanchard wrote: > > > > By the way, what is the testcase for this, i.e. how do I remove a > > memslot? > > The testcase I used was RH6.2 graphical install, which changes the > cirrus mode from linear frame buffer to the standard one, thus > destroying the vram memory slot (should be able to do that with the > framebuffer driver). Hmm, we're not using any graphics adapters with PowerPC right now. I've skimmed through cirrus_vga.c but I don't see any obvious "mode change" routine; what is the function that triggers this behavior? Some "(un)register physical memory" call? > All you need on this callback is destroy all shadow mappings, to avoid > the MMU code from referencing a memslot that is no longer existant. In addition to that, I also need a testcase to actually exercise the code... :) -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
On Mon, Jul 21, 2008 at 04:03:27PM -0500, Hollis Blanchard wrote:
> On Thu, 2008-07-10 at 20:49 -0300, Marcelo Tosatti wrote:
> > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> > index b850d24..2c438a7 100644
> > --- a/arch/powerpc/kvm/powerpc.c
> > +++ b/arch/powerpc/kvm/powerpc.c
> > @@ -170,6 +170,11 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
> > return 0;
> > }
> >
> > +void kvm_arch_flush_shadow(struct kvm *kvm)
> > +{
> > + return;
> > +}
> > +
> > struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int
> > id)
> > {
> > struct kvm_vcpu *vcpu;
>
> By the way, what is the testcase for this, i.e. how do I remove a
> memslot?
The testcase I used was RH6.2 graphical install, which changes the
cirrus mode from linear frame buffer to the standard one, thus
destroying the vram memory slot (should be able to do that with the
framebuffer driver).
All you need on this callback is destroy all shadow mappings, to avoid
the MMU code from referencing a memslot that is no longer existant.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
On Thu, 2008-07-10 at 20:49 -0300, Marcelo Tosatti wrote:
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index b850d24..2c438a7 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -170,6 +170,11 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
> return 0;
> }
>
> +void kvm_arch_flush_shadow(struct kvm *kvm)
> +{
> + return;
> +}
> +
> struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int
> id)
> {
> struct kvm_vcpu *vcpu;
By the way, what is the testcase for this, i.e. how do I remove a
memslot?
--
Hollis Blanchard
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
Marcelo Tosatti wrote: KVM: MMU: improve invalid shadow root page handling Harden kvm_mmu_zap_page() against invalid root pages that had been shadowed from memslots that are gone. Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
Marcelo Tosatti wrote: On Thu, Jul 10, 2008 at 01:58:24PM -0500, Hollis Blanchard wrote: This (and its friends) ought to be static inlines. On the other hand, don't the other arches have to flush their tlbs? Xiantao/Hollis? So maybe this function needs to be renamed kvm_flush_shadow() and implemented across the board. Agreed, I think that's the right approach. Ok, here it is, Hollis and Xiantao can you fill in the blanks? -- Flush the shadow mmu before removing regions to avoid stale entries. Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
KVM: MMU: improve invalid shadow root page handling
Harden kvm_mmu_zap_page() against invalid root pages that
had been shadowed from memslots that are gone.
Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ff7cf63..7f57da6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -930,14 +930,17 @@ static void kvm_mmu_zap_page(struct kvm *kvm, struct
kvm_mmu_page *sp)
}
kvm_mmu_page_unlink_children(kvm, sp);
if (!sp->root_count) {
- if (!sp->role.metaphysical)
+ if (!sp->role.metaphysical && !sp->role.invalid)
unaccount_shadowed(kvm, sp->gfn);
hlist_del(&sp->hash_link);
kvm_mmu_free_page(kvm, sp);
} else {
+ int invalid = sp->role.invalid;
list_move(&sp->link, &kvm->arch.active_mmu_pages);
sp->role.invalid = 1;
kvm_reload_remote_mmus(kvm);
+ if (!sp->role.metaphysical && !invalid)
+ unaccount_shadowed(kvm, sp->gfn);
}
kvm_mmu_reset_last_pte_updated(kvm);
}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
On Thu, Jul 10, 2008 at 01:58:24PM -0500, Hollis Blanchard wrote:
> > This (and its friends) ought to be static inlines.
> >
> > On the other hand, don't the other arches have to flush their tlbs?
> > Xiantao/Hollis? So maybe this function needs to be renamed
> > kvm_flush_shadow() and implemented across the board.
>
> Agreed, I think that's the right approach.
Ok, here it is, Hollis and Xiantao can you fill in the blanks?
--
Flush the shadow mmu before removing regions to avoid stale entries.
Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index a4cf4a2..d10e35b 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1455,6 +1455,10 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+void kvm_arch_flush_shadow(struct kvm *kvm)
+{
+ return;
+}
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index b850d24..2c438a7 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -170,6 +170,11 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+void kvm_arch_flush_shadow(struct kvm *kvm)
+{
+ return;
+}
+
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
{
struct kvm_vcpu *vcpu;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 399acf3..5612c00 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -675,6 +675,11 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+void kvm_arch_flush_shadow(struct kvm *kvm)
+{
+ return;
+}
+
gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
{
return gfn;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c9b8a04..dedb581 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4040,6 +4040,11 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+void kvm_arch_flush_shadow(struct kvm *kvm)
+{
+ kvm_mmu_zap_all(kvm);
+}
+
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
{
return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index fc685c5..3798097 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -168,6 +168,7 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
struct kvm_userspace_memory_region *mem,
struct kvm_memory_slot old,
int user_alloc);
+void kvm_arch_flush_shadow(struct kvm *kvm);
gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn);
struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b90da0b..c459383 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -405,6 +405,9 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (mem->slot >= kvm->nmemslots)
kvm->nmemslots = mem->slot + 1;
+ if (!npages)
+ kvm_arch_flush_shadow(kvm);
+
*memslot = new;
r = kvm_arch_set_memory_region(kvm, mem, old, user_alloc);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
On Thu, 2008-07-10 at 17:42 +0300, Avi Kivity wrote:
>
> > During RH6.2 graphical installation the following oops is triggered:
> >
> > BUG: unable to handle kernel NULL pointer dereference at
>
> > IP: [] :kvm:gfn_to_rmap+0x3e/0x61
> > Pid: 4559, comm: qemu-system-x86 Not tainted
> >
> > The problem is that KVM allows shadow pagetable entries that
> > point to a removed memslot to exist. In this case the cirrus vram
> > mapping was removed, and the NULL dereference happened during
> > kvm_set_memory_alias()'s zap_all_pages().
> >
> > So nuke all shadowed pages before memslot removal.
> >
> > Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
> >
> >
> > diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> > index a4cf4a2..76259da 100644
> > --- a/arch/ia64/kvm/kvm-ia64.c
> > +++ b/arch/ia64/kvm/kvm-ia64.c
> > @@ -1455,6 +1455,10 @@ int kvm_arch_set_memory_region(struct kvm
> *kvm,
> > return 0;
> > }
> >
> > +int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
> > +{
> > + return 0;
> > +}
> >
> >
>
> This (and its friends) ought to be static inlines.
>
> On the other hand, don't the other arches have to flush their tlbs?
> Xiantao/Hollis? So maybe this function needs to be renamed
> kvm_flush_shadow() and implemented across the board.
Agreed, I think that's the right approach.
--
Hollis Blanchard
IBM Linux Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
Marcelo Tosatti wrote:
On Mon, Jul 07, 2008 at 02:31:55PM -0300, Marcelo Tosatti wrote:
On Sun, Jul 06, 2008 at 12:15:56AM +0300, Avi Kivity wrote:
Marcelo Tosatti wrote:
On Sat, Jul 05, 2008 at 08:25:30PM +0300, Avi Kivity wrote:
@@ -1955,6 +1955,22 @@ void kvm_mmu_slot_remove_write_access(st
}
}
+int kvm_mmu_slot_has_shadowed_page(struct kvm *kvm, int slot)
+{
+ struct kvm_mmu_page *sp;
+ int ret = 0;
+
+ spin_lock(&kvm->mmu_lock);
+ list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
+ if (test_bit(slot, &sp->slot_bitmap)) {
+ ret = -EINVAL;
+ break;
+ }
+ }
+ spin_unlock(&kvm->mmu_lock);
+ return ret;
+}
+
I don't like the guest influencing host actions in this way. It's
just a guest.
But I think it's unneeded. kvm_mmu_zap_page() will mark a root
shadow page invalid and force all vcpus to reload it, so all that's
needed is to keep the mmu spinlock held while removing the slot.
You're still keeping a shadowed page around with sp->gfn pointing to
non-existant memslot. The code generally makes the assumption that
gfn_to_memslot(gfn) on shadowed info will not fail.
kvm_mmu_zap_page -> unaccount_shadowed, for example.
The page has already been zapped, so we might as well
unaccount_shadowed() on the first run. It needs to be moved until after
the reload_remote_mmus() call, though.
Oops, previous patch was unaccounting multiple times for invalid pages.
This should be better:
During RH6.2 graphical installation the following oops is triggered:
BUG: unable to handle kernel NULL pointer dereference at
IP: [] :kvm:gfn_to_rmap+0x3e/0x61
Pid: 4559, comm: qemu-system-x86 Not tainted
The problem is that KVM allows shadow pagetable entries that
point to a removed memslot to exist. In this case the cirrus vram
mapping was removed, and the NULL dereference happened during
kvm_set_memory_alias()'s zap_all_pages().
So nuke all shadowed pages before memslot removal.
Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index a4cf4a2..76259da 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1455,6 +1455,10 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
+{
+ return 0;
+}
This (and its friends) ought to be static inlines.
On the other hand, don't the other arches have to flush their tlbs?
Xiantao/Hollis? So maybe this function needs to be renamed
kvm_flush_shadow() and implemented across the board.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b90da0b..5ef3a5e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -405,6 +405,12 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (mem->slot >= kvm->nmemslots)
kvm->nmemslots = mem->slot + 1;
+ if (!npages) {
+ r = kvm_arch_destroy_memory_region(kvm, mem->slot);
+ if (r)
+ goto out_free;
+ }
+
Destructors should never fail, since there is no possible recovery. And
indeed you have 'return 0' in the actual implementation. So I think the
function better return void.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
On Mon, Jul 07, 2008 at 02:31:55PM -0300, Marcelo Tosatti wrote:
> On Sun, Jul 06, 2008 at 12:15:56AM +0300, Avi Kivity wrote:
> > Marcelo Tosatti wrote:
> >> On Sat, Jul 05, 2008 at 08:25:30PM +0300, Avi Kivity wrote:
> >>
> @@ -1955,6 +1955,22 @@ void kvm_mmu_slot_remove_write_access(st
> }
> }
> +int kvm_mmu_slot_has_shadowed_page(struct kvm *kvm, int slot)
> +{
> +struct kvm_mmu_page *sp;
> +int ret = 0;
> +
> +spin_lock(&kvm->mmu_lock);
> +list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
> +if (test_bit(slot, &sp->slot_bitmap)) {
> +ret = -EINVAL;
> +break;
> +}
> +}
> +spin_unlock(&kvm->mmu_lock);
> +return ret;
> +}
> +
>
> >>> I don't like the guest influencing host actions in this way. It's
> >>> just a guest.
> >>>
> >>> But I think it's unneeded. kvm_mmu_zap_page() will mark a root
> >>> shadow page invalid and force all vcpus to reload it, so all that's
> >>> needed is to keep the mmu spinlock held while removing the slot.
> >>>
> >>
> >> You're still keeping a shadowed page around with sp->gfn pointing to
> >> non-existant memslot. The code generally makes the assumption that
> >> gfn_to_memslot(gfn) on shadowed info will not fail.
> >>
> >> kvm_mmu_zap_page -> unaccount_shadowed, for example.
> >>
> >>
> >
> > The page has already been zapped, so we might as well
> > unaccount_shadowed() on the first run. It needs to be moved until after
> > the reload_remote_mmus() call, though.
Oops, previous patch was unaccounting multiple times for invalid pages.
This should be better:
During RH6.2 graphical installation the following oops is triggered:
BUG: unable to handle kernel NULL pointer dereference at
IP: [] :kvm:gfn_to_rmap+0x3e/0x61
Pid: 4559, comm: qemu-system-x86 Not tainted
The problem is that KVM allows shadow pagetable entries that
point to a removed memslot to exist. In this case the cirrus vram
mapping was removed, and the NULL dereference happened during
kvm_set_memory_alias()'s zap_all_pages().
So nuke all shadowed pages before memslot removal.
Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index a4cf4a2..76259da 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1455,6 +1455,10 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
+{
+ return 0;
+}
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index b850d24..07d69cf 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -170,6 +170,11 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
+{
+ return 0;
+}
+
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
{
struct kvm_vcpu *vcpu;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 399acf3..201a7e1 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -675,6 +675,11 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
+{
+ return 0;
+}
+
gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
{
return gfn;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1fd8e3b..89cd1cf 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -930,14 +930,17 @@ static void kvm_mmu_zap_page(struct kvm *kvm, struct
kvm_mmu_page *sp)
}
kvm_mmu_page_unlink_children(kvm, sp);
if (!sp->root_count) {
- if (!sp->role.metaphysical)
+ if (!sp->role.metaphysical && !sp->role.invalid)
unaccount_shadowed(kvm, sp->gfn);
hlist_del(&sp->hash_link);
kvm_mmu_free_page(kvm, sp);
} else {
+ int invalid = sp->role.invalid;
list_move(&sp->link, &kvm->arch.active_mmu_pages);
sp->role.invalid = 1;
kvm_reload_remote_mmus(kvm);
+ if (!sp->role.metaphysical && !invalid)
+ unaccount_shadowed(kvm, sp->gfn);
}
kvm_mmu_reset_last_pte_updated(kvm);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5a83c3b..8815f1d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4040,6 +4040,12 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
+{
+ kvm_mmu_zap_all(kvm);
+ return 0;
+}
+
int kvm_arch_vcpu_runn
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
On Sun, Jul 06, 2008 at 12:15:56AM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> On Sat, Jul 05, 2008 at 08:25:30PM +0300, Avi Kivity wrote:
>>
@@ -1955,6 +1955,22 @@ void kvm_mmu_slot_remove_write_access(st
}
}
+int kvm_mmu_slot_has_shadowed_page(struct kvm *kvm, int slot)
+{
+ struct kvm_mmu_page *sp;
+ int ret = 0;
+
+ spin_lock(&kvm->mmu_lock);
+ list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
+ if (test_bit(slot, &sp->slot_bitmap)) {
+ ret = -EINVAL;
+ break;
+ }
+ }
+ spin_unlock(&kvm->mmu_lock);
+ return ret;
+}
+
>>> I don't like the guest influencing host actions in this way. It's
>>> just a guest.
>>>
>>> But I think it's unneeded. kvm_mmu_zap_page() will mark a root
>>> shadow page invalid and force all vcpus to reload it, so all that's
>>> needed is to keep the mmu spinlock held while removing the slot.
>>>
>>
>> You're still keeping a shadowed page around with sp->gfn pointing to
>> non-existant memslot. The code generally makes the assumption that
>> gfn_to_memslot(gfn) on shadowed info will not fail.
>>
>> kvm_mmu_zap_page -> unaccount_shadowed, for example.
>>
>>
>
> The page has already been zapped, so we might as well
> unaccount_shadowed() on the first run. It needs to be moved until after
> the reload_remote_mmus() call, though.
During RH6.2 graphical installation the following oops is triggered:
BUG: unable to handle kernel NULL pointer dereference at
IP: [] :kvm:gfn_to_rmap+0x3e/0x61
Pid: 4559, comm: qemu-system-x86 Not tainted
The problem is that KVM allows shadow pagetable entries that
point to a removed memslot to exist. In this case the cirrus vram
mapping was removed, and the NULL dereference happened during
kvm_set_memory_alias()'s zap_all_pages().
So nuke all shadowed pages before memslot removal.
Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index a4cf4a2..76259da 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1455,6 +1455,10 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
+{
+ return 0;
+}
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index b850d24..07d69cf 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -170,6 +170,11 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
+{
+ return 0;
+}
+
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
{
struct kvm_vcpu *vcpu;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 399acf3..201a7e1 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -675,6 +675,11 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
+{
+ return 0;
+}
+
gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
{
return gfn;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1fd8e3b..1617b68 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -930,7 +930,7 @@ static void kvm_mmu_zap_page(struct kvm *kvm, struct
kvm_mmu_page *sp)
}
kvm_mmu_page_unlink_children(kvm, sp);
if (!sp->root_count) {
- if (!sp->role.metaphysical)
+ if (!sp->role.metaphysical && !sp->role.invalid)
unaccount_shadowed(kvm, sp->gfn);
hlist_del(&sp->hash_link);
kvm_mmu_free_page(kvm, sp);
@@ -938,6 +938,8 @@ static void kvm_mmu_zap_page(struct kvm *kvm, struct
kvm_mmu_page *sp)
list_move(&sp->link, &kvm->arch.active_mmu_pages);
sp->role.invalid = 1;
kvm_reload_remote_mmus(kvm);
+ if (!sp->role.metaphysical)
+ unaccount_shadowed(kvm, sp->gfn);
}
kvm_mmu_reset_last_pte_updated(kvm);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5a83c3b..8815f1d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4040,6 +4040,12 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
return 0;
}
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
+{
+ kvm_mmu_zap_all(kvm);
+ return 0;
+}
+
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
{
return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d220b49..0fc7ddc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -168,6 +168,7 @@ int kvm_arch_set_memory_region(struct kvm
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
Marcelo Tosatti wrote:
On Sat, Jul 05, 2008 at 08:25:30PM +0300, Avi Kivity wrote:
@@ -1955,6 +1955,22 @@ void kvm_mmu_slot_remove_write_access(st
}
}
+int kvm_mmu_slot_has_shadowed_page(struct kvm *kvm, int slot)
+{
+ struct kvm_mmu_page *sp;
+ int ret = 0;
+
+ spin_lock(&kvm->mmu_lock);
+ list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
+ if (test_bit(slot, &sp->slot_bitmap)) {
+ ret = -EINVAL;
+ break;
+ }
+ }
+ spin_unlock(&kvm->mmu_lock);
+ return ret;
+}
+
I don't like the guest influencing host actions in this way. It's just
a guest.
But I think it's unneeded. kvm_mmu_zap_page() will mark a root shadow
page invalid and force all vcpus to reload it, so all that's needed is
to keep the mmu spinlock held while removing the slot.
You're still keeping a shadowed page around with sp->gfn pointing to
non-existant memslot. The code generally makes the assumption that
gfn_to_memslot(gfn) on shadowed info will not fail.
kvm_mmu_zap_page -> unaccount_shadowed, for example.
The page has already been zapped, so we might as well
unaccount_shadowed() on the first run. It needs to be moved until after
the reload_remote_mmus() call, though.
The other option is to harden gfn_to_memslot() callers to handle
failure, is that saner?
I don't think so.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
On Sat, Jul 05, 2008 at 08:25:30PM +0300, Avi Kivity wrote:
>> @@ -1955,6 +1955,22 @@ void kvm_mmu_slot_remove_write_access(st
>> }
>> }
>> +int kvm_mmu_slot_has_shadowed_page(struct kvm *kvm, int slot)
>> +{
>> +struct kvm_mmu_page *sp;
>> +int ret = 0;
>> +
>> +spin_lock(&kvm->mmu_lock);
>> +list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
>> +if (test_bit(slot, &sp->slot_bitmap)) {
>> +ret = -EINVAL;
>> +break;
>> +}
>> +}
>> +spin_unlock(&kvm->mmu_lock);
>> +return ret;
>> +}
>> +
>>
>
> I don't like the guest influencing host actions in this way. It's just
> a guest.
>
> But I think it's unneeded. kvm_mmu_zap_page() will mark a root shadow
> page invalid and force all vcpus to reload it, so all that's needed is
> to keep the mmu spinlock held while removing the slot.
You're still keeping a shadowed page around with sp->gfn pointing to
non-existant memslot. The code generally makes the assumption that
gfn_to_memslot(gfn) on shadowed info will not fail.
kvm_mmu_zap_page -> unaccount_shadowed, for example.
The other option is to harden gfn_to_memslot() callers to handle
failure, is that saner?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
Marcelo Tosatti wrote:
During RH6.2 graphical installation the following oops is triggered:
BUG: unable to handle kernel NULL pointer dereference at
IP: [] :kvm:gfn_to_rmap+0x3e/0x61
Pid: 4559, comm: qemu-system-x86 Not tainted
RIP: 0010:[] [] :kvm:gfn_to_rmap+0x3e/0x61
@@ -1955,6 +1955,22 @@ void kvm_mmu_slot_remove_write_access(st
}
}
+int kvm_mmu_slot_has_shadowed_page(struct kvm *kvm, int slot)
+{
+ struct kvm_mmu_page *sp;
+ int ret = 0;
+
+ spin_lock(&kvm->mmu_lock);
+ list_for_each_entry(sp, &kvm->arch.active_mmu_pages, link) {
+ if (test_bit(slot, &sp->slot_bitmap)) {
+ ret = -EINVAL;
+ break;
+ }
+ }
+ spin_unlock(&kvm->mmu_lock);
+ return ret;
+}
+
I don't like the guest influencing host actions in this way. It's just
a guest.
But I think it's unneeded. kvm_mmu_zap_page() will mark a root shadow
page invalid and force all vcpus to reload it, so all that's needed is
to keep the mmu spinlock held while removing the slot.
--
Do not meddle in the internals of kernels, for they are subtle and quick to
panic.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: MMU: nuke shadowed pgtable pages and pte's on memslot destruction
During RH6.2 graphical installation the following oops is triggered:
BUG: unable to handle kernel NULL pointer dereference at
IP: [] :kvm:gfn_to_rmap+0x3e/0x61
Pid: 4559, comm: qemu-system-x86 Not tainted
RIP: 0010:[] [] :kvm:gfn_to_rmap+0x3e/0x61
Process qemu-system-x86 (pid: 4559, threadinfo 81022092, task
81022e82b2f0)
Stack: 00217a4b 81021a48e9d8 81021a48fa80
a00bf240
810220ab82f0 81021a48e9d8 81021a48fa80 013b
81022288c000 a00bf3b7 0003aed3 81022288c000
Call Trace:
[] ? :kvm:rmap_remove+0xab/0x19d
[] ? :kvm:kvm_mmu_zap_page+0x85/0x26e
[] ? :kvm:kvm_mmu_zap_all+0x2b/0x46
[] ? :kvm:kvm_arch_vm_ioctl+0x262/0x575
[] ? :kvm:kvm_read_guest+0x3f/0x7d
[] ? :kvm:paging32_walk_addr+0xac/0x262
[] ? :kvm:gfn_to_hva+0x9/0x5d
[] ? :kvm:kvm_read_guest_page+0x11/0x46
The problem is that KVM allows shadow pagetable entries that
point to a removed memslot to exist. In this case the cirrus vram
mapping was removed, and the NULL dereference happened during
kvm_set_memory_alias()'s zap_all_pages().
Since a malicious guest could have a thread's root table inside the
to-be-removed memslot, also guarantee there this is not the case before
removal.
This behaviour will also be useful for memory hotplugging.
As a side note, RH 6.2 graphical installation still won't function (due
to cirrus emulation bugs).
Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>
Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -4040,6 +4040,12 @@ int kvm_arch_set_memory_region(struct kv
return 0;
}
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
+{
+ kvm_mmu_zap_all(kvm);
+ return kvm_mmu_slot_has_shadowed_page(kvm, slot);
+}
+
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
{
return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE
Index: kvm/include/linux/kvm_host.h
===
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -168,6 +168,7 @@ int kvm_arch_set_memory_region(struct kv
struct kvm_userspace_memory_region *mem,
struct kvm_memory_slot old,
int user_alloc);
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot);
gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn);
struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
Index: kvm/virt/kvm/kvm_main.c
===
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -405,6 +405,12 @@ int __kvm_set_memory_region(struct kvm *
if (mem->slot >= kvm->nmemslots)
kvm->nmemslots = mem->slot + 1;
+ if (!npages) {
+ r = kvm_arch_destroy_memory_region(kvm, mem->slot);
+ if (r)
+ goto out_free;
+ }
+
*memslot = new;
r = kvm_arch_set_memory_region(kvm, mem, old, user_alloc);
Index: kvm/arch/ia64/kvm/kvm-ia64.c
===
--- kvm.orig/arch/ia64/kvm/kvm-ia64.c
+++ kvm/arch/ia64/kvm/kvm-ia64.c
@@ -1455,6 +1455,10 @@ int kvm_arch_set_memory_region(struct kv
return 0;
}
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
+{
+ return 0;
+}
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
Index: kvm/arch/powerpc/kvm/powerpc.c
===
--- kvm.orig/arch/powerpc/kvm/powerpc.c
+++ kvm/arch/powerpc/kvm/powerpc.c
@@ -170,6 +170,11 @@ int kvm_arch_set_memory_region(struct kv
return 0;
}
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
+{
+ return 0;
+}
+
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
{
struct kvm_vcpu *vcpu;
Index: kvm/arch/s390/kvm/kvm-s390.c
===
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -675,6 +675,11 @@ int kvm_arch_set_memory_region(struct kv
return 0;
}
+int kvm_arch_destroy_memory_region(struct kvm *kvm, int slot)
+{
+ return 0;
+}
+
gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn)
{
return gfn;
Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1955,6 +1955,22 @@ void kvm_mmu_slot_remove_write_access(st
}
}
+int kvm_mmu_slot_has_shadowed_page(struct kvm *kvm, int slot)
+{
+ struct kvm_mmu_page *sp;
+ int ret = 0;
+
+ spin_lock(&kvm->mmu_lock);
+ list_for_each_entry(sp, &kvm->arch.active
