Re: A question about "CONFIG_KVM_DEVICE_ASSIGNMENT" configuration
On 9/10/2015 11:08 PM, Alex Williamson wrote: On Thu, 2015-09-10 at 18:22 +0800, Nan Xiao wrote: Hi all, When building kernel, it prompts "CONFIG_KVM_DEVICE_ASSIGNMENT" is "deprecated". But it is still used in kernel code. E.g.: "kvm_vm_ioctl_check_extension" function: { ... #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT case KVM_CAP_ASSIGN_DEV_IRQ: case KVM_CAP_PCI_2_3: #endif r = 1; break; ... #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT case KVM_CAP_IOMMU: r = iommu_present(_bus_type); break; #endif ... } If not configure this option, the following code will execute failed: ret = ioctl(dev, KVM_CHECK_EXTENSION, KVM_CAP_IOMMU); So does it mean to use KVM assigned device feature, the "CONFIG_KVM_DEVICE_ASSIGNMENT" is not "deprecated"? Legacy KVM device assignment is deprecated, it is fully replaced by Why do we remove this legacy assignment completely? Its always leading to this confusion :) Thanks Tiejun VFIO-based device assignment. The intention is to deprecate legacy KVM device assignment now, so all users can transition away from it and at some point remove it from the kernel. When using QEMU and libvirt, the default is already to use VFIO instead of legacy KVM device assignment. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [[PATCH 2/2] kvm: enable preemption to register/unregister preempt notifier
On 2015/7/3 19:23, Paolo Bonzini wrote: On 03/07/2015 10:56, Tiejun Chen wrote: After commit 1cde2930e154 (sched/preempt: Add static_key() to preempt_notifiers) is introduced, preempt_notifier_{register, unregister} always hold a mutex, jump_label_mutex. So in current case this shouldn't work further under the circumstance of disabled preemption, and its also safe since we're just handling a per-vcpu stuff with holding vcpu-mutex. Otherwise, some warning messages are posted like this, BUG: scheduling while atomic: qemu-system-x86/17177/0x0002 2 locks held by qemu-system-x86/17177: #0: (vcpu-mutex){+.+.+.}, at: [a035fb48] vcpu_load+0x28/0xf0 [kvm] #1: (jump_label_mutex){+.+.+.}, at: [81244b54] static_key_slow_inc+0xc4/0x140 Modules linked in: x86_pkg_temp_thermal kvm_intel kvm Preemption disabled at:[a035fd3e] kvm_vcpu_ioctl+0x7e/0xeb0 [kvm] Thanks for your work Tiejun. However, the original patch is crap. I've asked to revert it. Yeah, its better to revert that commit since finally this also trigger a bug 100671: vmwrite error in vmx_vcpu_run. Thanks Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 06/12] KVM: mark kvm-buses as empty once they were destroyed
On 2015/3/27 9:31, Marcelo Tosatti wrote: On Wed, Mar 25, 2015 at 05:09:13PM +, Marc Zyngier wrote: On 23/03/15 15:58, Andre Przywara wrote: In kvm_destroy_vm() we call kvm_io_bus_destroy() pretty early, especially before calling kvm_arch_destroy_vm(). To avoid unregistering devices from the already destroyed bus, let's mark the bus with NULL to let other users know it has been destroyed already. This avoids a crash on a VM shutdown with the VGIC using the kvm_io_bus later (the unregistering is in there to be able to roll back a faulting init). Signed-off-by: Andre Przywara andre.przyw...@arm.com That seems sensible, but I don't see why nobody else hits that. What are we doing differently? Otherwise, Reviewed-by: Marc Zyngier marc.zyng...@arm.com Paolo, Marcelo, can we have your Ack on this? Thanks, M. --- virt/kvm/kvm_main.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8c7ab0b..6f164eb 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -604,8 +604,10 @@ static void kvm_destroy_vm(struct kvm *kvm) list_del(kvm-vm_list); spin_unlock(kvm_lock); kvm_free_irq_routing(kvm); - for (i = 0; i KVM_NR_BUSES; i++) + for (i = 0; i KVM_NR_BUSES; i++) { kvm_io_bus_destroy(kvm-buses[i]); + kvm-buses[i] = NULL; Could we fold this line into a common like, @@ -596,7 +597,6 @@ static void kvm_destroy_devices(struct kvm *kvm) static void kvm_destroy_vm(struct kvm *kvm) { - int i; struct mm_struct *mm = kvm-mm; kvm_arch_sync_events(kvm); @@ -604,8 +604,7 @@ static void kvm_destroy_vm(struct kvm *kvm) list_del(kvm-vm_list); spin_unlock(kvm_lock); kvm_free_irq_routing(kvm); - for (i = 0; i KVM_NR_BUSES; i++) - kvm_io_bus_destroy(kvm-buses[i]); + kvm_destroy_all_io_bus(kvm); kvm_coalesced_mmio_free(kvm); #if defined(CONFIG_MMU_NOTIFIER) defined(KVM_ARCH_WANT_MMU_NOTIFIER) mmu_notifier_unregister(kvm-mmu_notifier, kvm-mm); @@ -2943,6 +2942,16 @@ static void kvm_io_bus_destroy(struct kvm_io_bus *bus) kfree(bus); } +static void kvm_destroy_all_io_bus(struct kvm *kvm) +{ + int i; + + for (i = 0; i KVM_NR_BUSES; i++) { + kvm_io_bus_destroy(kvm-buses[i]); + kvm-buses[i] = NULL; + } +} + static inline int kvm_io_bus_cmp(const struct kvm_io_range *r1, const struct kvm_io_range *r2) { Thanks Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: fix to update memslots properly
On 2015/3/10 4:54, Marcelo Tosatti wrote: On Sat, Dec 27, 2014 at 09:41:45PM +0100, Paolo Bonzini wrote: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f528343..6e52f3f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots, WARN_ON(mslots[i].id != id); if (!new-npages) { new-base_gfn = 0; + new-flags = 0; if (mslots[i].npages) slots-used_slots--; } else { This should not be necessary. The part of the mslots array that has base_gfn == npages == 0 is entirely unused, and such a slot can never be returned by search_memslots because this: if (gfn = memslots[slot].base_gfn gfn memslots[slot].base_gfn + memslots[slot].npages) can never be true. @@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots, i++; } while (i 0 - new-base_gfn mslots[i - 1].base_gfn) { + ((new-base_gfn mslots[i - 1].base_gfn) || + (!new-base_gfn +!mslots[i - 1].base_gfn !mslots[i - 1].npages))) { mslots[i] = mslots[i - 1]; slots-id_to_index[mslots[i].id] = i; i--; You should have explained _why_ this fixes the bug, and what invariant is not being respected, something like this: kvm: fix sorting of memslots with base_gfn == 0 Before commit 0e60b0799fed (kvm: change memslot sorting rule from size to GFN, 2014-12-01), the memslots' sorting key was npages, meaning that a valid memslot couldn't have its sorting key equal to zero. On the other hand, a valid memslot can have base_gfn == 0, and invalid memslots are identified by base_gfn == npages == 0. Because of this, commit 0e60b0799fed broke the invariant that invalid memslots are at the end of the mslots array. When a memslot with base_gfn == 0 was created, any invalid memslot before it were left in place. This suggests another fix. We can change the insertion to use a = comparison, as in your first patch. Alone it is not correct, but we only need to take some care and avoid breaking the case of deleting a memslot. It's enough to wrap the second loop (that you patched) with if (new-npages). In the new-npages == 0 case the first loop has already set i to the right value, and moving i back would be wrong: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f5283438ee05..050974c051b5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -687,11 +687,23 @@ static void update_memslots(struct kvm_memslots *slots, slots-id_to_index[mslots[i].id] = i; i++; } - while (i 0 - new-base_gfn mslots[i - 1].base_gfn) { - mslots[i] = mslots[i - 1]; - slots-id_to_index[mslots[i].id] = i; - i--; + + /* +* The = is needed when creating a slot with base_gfn == 0, +* so that it moves before all those with base_gfn == npages == 0. +* +* On the other hand, if new-npages is zero, the above loop has +* already left i pointing to the beginning of the empty part of +* mslots, and the = would move the hole backwards in this +* case---which is wrong. So skip the loop when deleting a slot. +*/ + if (new-npages) { + while (i 0 + new-base_gfn = mslots[i - 1].base_gfn) { + mslots[i] = mslots[i - 1]; + slots-id_to_index[mslots[i].id] = i; + i--; + } } mslots[i] = *new; Paolo Paolo, Can you include a proper changelog for this patch? But this is already applied long time ago... Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ÞATCH v2] kvmtool, mips: Support more than 256 MB guest memory
On 2015/1/13 18:19, Andreas Herrmann wrote: Two guest memory regions need to be defined and two mem= parameters need to be passed to guest kernel to support more than 256 MB. Cc: Chen, Tiejun tiejun.c...@intel.com Looks fine to me. Thanks Tiejun Signed-off-by: Andreas Herrmann andreas.herrm...@caviumnetworks.com --- tools/kvm/mips/include/kvm/kvm-arch.h | 10 + tools/kvm/mips/kvm.c | 36 +++-- 2 files changed, 40 insertions(+), 6 deletions(-) This version uses correct macros during calculation of memory parameters. Andreas diff --git a/tools/kvm/mips/include/kvm/kvm-arch.h b/tools/kvm/mips/include/kvm/kvm-arch.h index 7eadbf4..97bbf34 100644 --- a/tools/kvm/mips/include/kvm/kvm-arch.h +++ b/tools/kvm/mips/include/kvm/kvm-arch.h @@ -1,10 +1,20 @@ #ifndef KVM__KVM_ARCH_H #define KVM__KVM_ARCH_H + +/* + * Guest memory map is: + * 0x-0x0fff : System RAM + * 0x1000-0x1fff : I/O (defined by KVM_MMIO_START and KVM_MMIO_SIZE) + * 0x2000-...: System RAM + * See also kvm__init_ram(). + */ + #define KVM_MMIO_START0x1000 #define KVM_PCI_CFG_AREA KVM_MMIO_START #define KVM_PCI_MMIO_AREA (KVM_MMIO_START + 0x100) #define KVM_VIRTIO_MMIO_AREA (KVM_MMIO_START + 0x200) +#define KVM_MMIO_SIZE 0x1000 /* * Just for reference. This and the above corresponds to what's used diff --git a/tools/kvm/mips/kvm.c b/tools/kvm/mips/kvm.c index fc0428b..1925f38 100644 --- a/tools/kvm/mips/kvm.c +++ b/tools/kvm/mips/kvm.c @@ -22,11 +22,28 @@ void kvm__init_ram(struct kvm *kvm) u64 phys_start, phys_size; void*host_mem; - phys_start = 0; - phys_size = kvm-ram_size; - host_mem = kvm-ram_start; + if (kvm-ram_size = KVM_MMIO_START) { + /* one region for all memory */ + phys_start = 0; + phys_size = kvm-ram_size; + host_mem = kvm-ram_start; - kvm__register_mem(kvm, phys_start, phys_size, host_mem); + kvm__register_mem(kvm, phys_start, phys_size, host_mem); + } else { + /* one region for memory that fits below MMIO range */ + phys_start = 0; + phys_size = KVM_MMIO_START; + host_mem = kvm-ram_start; + + kvm__register_mem(kvm, phys_start, phys_size, host_mem); + + /* one region for rest of memory */ + phys_start = KVM_MMIO_START + KVM_MMIO_SIZE; + phys_size = kvm-ram_size - KVM_MMIO_START; + host_mem = kvm-ram_start + KVM_MMIO_START; + + kvm__register_mem(kvm, phys_start, phys_size, host_mem); + } } void kvm__arch_delete_ram(struct kvm *kvm) @@ -108,8 +125,15 @@ static void kvm__mips_install_cmdline(struct kvm *kvm) u64 argv_offset = argv_start; u64 argc = 0; - sprintf(p + cmdline_offset, mem=0x%llx@0 , -(unsigned long long)kvm-ram_size); + + if ((u64) kvm-ram_size = KVM_MMIO_START) + sprintf(p + cmdline_offset, mem=0x%llx@0 , + (unsigned long long)kvm-ram_size); + else + sprintf(p + cmdline_offset, mem=0x%llx@0 mem=0x%llx@0x%llx , + (unsigned long long)KVM_MMIO_START, + (unsigned long long)kvm-ram_size - KVM_MMIO_START, + (unsigned long long)(KVM_MMIO_START + KVM_MMIO_SIZE)); strcat(p + cmdline_offset, kvm-cfg.real_cmdline); /* maximum size is 2K */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ÞATCH] kvmtool, mips: Support more than 256 MB guest memory
On 2015/1/6 21:15, Andreas Herrmann wrote: Two guest memory regions need to be defined and two mem= parameters need to be passed to guest kernel to support more than 256 MB. Signed-off-by: Andreas Herrmann andreas.herrm...@caviumnetworks.com --- tools/kvm/mips/include/kvm/kvm-arch.h | 10 + tools/kvm/mips/kvm.c | 36 +++-- 2 files changed, 40 insertions(+), 6 deletions(-) diff --git a/tools/kvm/mips/include/kvm/kvm-arch.h b/tools/kvm/mips/include/kvm/kvm-arch.h index 7eadbf4..97bbf34 100644 --- a/tools/kvm/mips/include/kvm/kvm-arch.h +++ b/tools/kvm/mips/include/kvm/kvm-arch.h @@ -1,10 +1,20 @@ #ifndef KVM__KVM_ARCH_H #define KVM__KVM_ARCH_H + +/* + * Guest memory map is: + * 0x-0x0fff : System RAM + * 0x1000-0x1fff : I/O (defined by KVM_MMIO_START and KVM_MMIO_SIZE) + * 0x2000-...: System RAM + * See also kvm__init_ram(). + */ + #define KVM_MMIO_START0x1000 #define KVM_PCI_CFG_AREA KVM_MMIO_START #define KVM_PCI_MMIO_AREA (KVM_MMIO_START + 0x100) #define KVM_VIRTIO_MMIO_AREA (KVM_MMIO_START + 0x200) +#define KVM_MMIO_SIZE 0x1000 /* * Just for reference. This and the above corresponds to what's used diff --git a/tools/kvm/mips/kvm.c b/tools/kvm/mips/kvm.c index fc0428b..10b441b 100644 --- a/tools/kvm/mips/kvm.c +++ b/tools/kvm/mips/kvm.c @@ -22,11 +22,28 @@ void kvm__init_ram(struct kvm *kvm) u64 phys_start, phys_size; void*host_mem; - phys_start = 0; - phys_size = kvm-ram_size; - host_mem = kvm-ram_start; + if (kvm-ram_size = KVM_MMIO_START) { + /* one region for all memory */ + phys_start = 0; + phys_size = kvm-ram_size; + host_mem = kvm-ram_start; - kvm__register_mem(kvm, phys_start, phys_size, host_mem); + kvm__register_mem(kvm, phys_start, phys_size, host_mem); + } else { + /* one region for memory that fits below MMIO range */ + phys_start = 0; + phys_size = KVM_MMIO_SIZE; Here phys_size = KVM_MMIO_START is better to make more sense. + host_mem = kvm-ram_start; + + kvm__register_mem(kvm, phys_start, phys_size, host_mem); + + /* one region for rest of memory */ + phys_start = KVM_MMIO_START + KVM_MMIO_SIZE; + phys_size = kvm-ram_size - 0x1000; and, phys_size = kvm-ram_size - KVM_MMIO_START. Tiejun + host_mem = kvm-ram_start + KVM_MMIO_SIZE; + + kvm__register_mem(kvm, phys_start, phys_size, host_mem); + } } void kvm__arch_delete_ram(struct kvm *kvm) @@ -108,8 +125,15 @@ static void kvm__mips_install_cmdline(struct kvm *kvm) u64 argv_offset = argv_start; u64 argc = 0; - sprintf(p + cmdline_offset, mem=0x%llx@0 , -(unsigned long long)kvm-ram_size); + + if ((u64) kvm-ram_size = KVM_MMIO_SIZE) + sprintf(p + cmdline_offset, mem=0x%llx@0 , + (unsigned long long)kvm-ram_size); + else + sprintf(p + cmdline_offset, mem=0x%llx@0 mem=0x%llx@0x%llx , + (unsigned long long)KVM_MMIO_START, + (unsigned long long)kvm-ram_size - KVM_MMIO_START, + (unsigned long long)(KVM_MMIO_START + KVM_MMIO_SIZE)); strcat(p + cmdline_offset, kvm-cfg.real_cmdline); /* maximum size is 2K */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: fix to update memslots properly
On 2014/12/28 4:41, Paolo Bonzini wrote: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f528343..6e52f3f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots, WARN_ON(mslots[i].id != id); if (!new-npages) { new-base_gfn = 0; + new-flags = 0; if (mslots[i].npages) slots-used_slots--; } else { This should not be necessary. The part of the mslots array that has base_gfn == npages == 0 is entirely unused, and such a slot can never be returned by search_memslots because this: if (gfn = memslots[slot].base_gfn gfn memslots[slot].base_gfn + memslots[slot].npages) can never be true. Yeah, but its really a little ugly to see some slots, base_gfn:npages:falgs = 0:0:(!0), to resort again when debug something inside of update_memslots(). @@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots, i++; } while (i 0 - new-base_gfn mslots[i - 1].base_gfn) { + ((new-base_gfn mslots[i - 1].base_gfn) || + (!new-base_gfn +!mslots[i - 1].base_gfn !mslots[i - 1].npages))) { mslots[i] = mslots[i - 1]; slots-id_to_index[mslots[i].id] = i; i--; You should have explained _why_ this fixes the bug, and what invariant Yeah. is not being respected, something like this: kvm: fix sorting of memslots with base_gfn == 0 Before commit 0e60b0799fed (kvm: change memslot sorting rule from size to GFN, 2014-12-01), the memslots' sorting key was npages, meaning that a valid memslot couldn't have its sorting key equal to zero. On the other hand, a valid memslot can have base_gfn == 0, and invalid memslots are identified by base_gfn == npages == 0. Because of this, commit 0e60b0799fed broke the invariant that invalid memslots are at the end of the mslots array. When a memslot with base_gfn == 0 was created, any invalid memslot before it were left in place. This suggests another fix. We can change the insertion to use a = comparison, as in your first patch. Alone it is not correct, but we only need to take some care and avoid breaking the case of deleting a memslot. It's enough to wrap the second loop (that you patched) with if (new-npages). In the new-npages == 0 case the first loop has already set i to the right value, and moving i back would be wrong: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f5283438ee05..050974c051b5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -687,11 +687,23 @@ static void update_memslots(struct kvm_memslots *slots, slots-id_to_index[mslots[i].id] = i; i++; } - while (i 0 - new-base_gfn mslots[i - 1].base_gfn) { - mslots[i] = mslots[i - 1]; - slots-id_to_index[mslots[i].id] = i; - i--; + + /* +* The = is needed when creating a slot with base_gfn == 0, +* so that it moves before all those with base_gfn == npages == 0. +* +* On the other hand, if new-npages is zero, the above loop has +* already left i pointing to the beginning of the empty part of +* mslots, and the = would move the hole backwards in this +* case---which is wrong. So skip the loop when deleting a slot. +*/ + if (new-npages) { + while (i 0 + new-base_gfn = mslots[i - 1].base_gfn) { + mslots[i] = mslots[i - 1]; + slots-id_to_index[mslots[i].id] = i; + i--; + } } mslots[i] = *new; This looks better. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] kvm_irqchip_assign_irqfd: just set irqfd in case of kvm_irqfds_enabled()
On 2014/12/27 22:52, Paolo Bonzini wrote: On 26/12/2014 18:59, Peter Maydell wrote: Mm, but once you're into such microoptimisations as this you really need to have a good justification for the effort, in the form of profiling measurements that indicate that this is a hot path. In this case that seems pretty unlikely, because I'd expect all the systems where we care about performance will support irqfds, so we won't be taking the early-exit code path anyhow. (And not supporting irqfds is leaving much more performance on the table than we could possibly be talking about in this function.) Also, it's even possible for a compiler to figure this out. All in all, I don't see any advantage to this patch... Indeed, its just a cleanup to make codes readable and comprehensible since oftentimes we don't initially write such a subsequent code just because we have this possibility to figure them out by the compiler, or others. And this is why I'm CCing Qemu trivial. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bisected] KVM in 3.19-rc1 is completely broken
On 2014/12/25 8:55, Chen, Tiejun wrote: On 2014/12/25 1:11, Andy Lutomirski wrote: On Wed, Dec 24, 2014 at 12:23 AM, Chen, Tiejun tiejun.c...@intel.com wrote: On 2014/12/24 5:29, Andy Lutomirski wrote: On Tue, Dec 23, 2014 at 1:13 PM, Paolo Bonzini pbonz...@redhat.com wrote: I can reproduce it using the same steps on a Sandy Bridge laptop, with whatever QEMU is packaged in Fedora 21. I attached the config. I also submitted a virtme update for Fedora Rawhide and 21 (20 is still building) in case it helps. The build is here: http://koji.fedoraproject.org/koji/buildinfo?buildID=600732 The other reporter bisected it to 0e60b0799fedc495a5c57dbd669de3c10d72edd2. Can you try its parent? That's what I bisected it to. The parent works. Also, does it break with 3.18 host and 3.19-rc1 guest, or with 3.19-rc1 host and 3.18 guest? (Sorry I should do this myself but I'm a bit swamped due to vacation until Jan 6th). The breakage is with 3.17.7-something L0 and the same test kernel as L1 and L2. I think it breaks the same way with 3.19-rc1 as host and guest without any nesting, but that's awkward to test right now. Andy, Could you try this? Signed-off-by: Tiejun Chen tiejun.c...@intel.com I applied it by hand, and it survives extremely light testing. Tested-by: Andy Lutomirski l...@amacapital.net Looks good so thanks for your validation. I refine that I posted that fix in another thread since looks that will broken !next case. And I myself already run those test instructions you showed previously, now looks good. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
On 2014/12/25 8:52, Nadav Amit wrote: Although pop sreg updates RSP according to the operand size, only 2 bytes are read. The current behavior may result in incorrect #GP or #PF exceptions. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/emulate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e5a84be..702da5e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt) unsigned long selector; int rc; Looks we just should do similar thing to em_push_sreg(), unsigned long selector; int rc; + if (ctxt-op_bytes == 4) { + rsp_increment(ctxt, -2); + ctxt-op_bytes = 2; + } rc = emulate_pop(ctxt, selector, ctxt-op_bytes); if (rc != X86EMUL_CONTINUE) return rc; Right? Thanks Tiejun - rc = emulate_pop(ctxt, selector, ctxt-op_bytes); + rc = emulate_pop(ctxt, selector, 2); if (rc != X86EMUL_CONTINUE) return rc; if (ctxt-modrm_reg == VCPU_SREG_SS) ctxt-interruptibility = KVM_X86_SHADOW_INT_MOV_SS; + if (ctxt-op_bytes 2) + rsp_increment(ctxt, ctxt-op_bytes - 2); rc = load_segment_descriptor(ctxt, (u16)selector, seg); return rc; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: regression bisected; KVM: entry failed, hardware error 0x80000021
On 2014/12/25 18:52, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/24 19:02, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/23 15:26, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/23 9:50, Chen, Tiejun wrote: On 2014/12/22 17:23, Jamie Heilman wrote: KVM internal error. Suberror: 1 emulation failure EAX=000de494 EBX= ECX= EDX=0cfd ESI=0059 EDI= EBP= ESP=6fb4 EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000f6be8 0037 IDT= 000f6c26 CR0=6011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b 5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1 b0 04 e6 21 b0 02 FWIW, I get the same thing with 34a1cd60d17 reverted. Maybe there are two bugs, maybe there's more to this first one. I can repro this So if my understanding is correct, this is probably another bug. And especially, I already saw the same log in another thread, Cleaning up the KVM clock. Maybe you can continue to `git bisect` to locate that bad commit. Looks just now Andy found that commit, 0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting rule from size to GFN, maybe you can try to revert this to try yours again. That doesn't revert cleanly for me, and I don't have much time to fiddle with it until the 24th---so checked out the commit before it (d4ae84a0), applied your patch, built, and yes, everything works fine at that point. I'll probably have time for another full bisection later, assuming things aren't ironed out already by then. 3.18.0-rc3-00120-gd4ae84a0 + vmx reorder msr writes patch = OK 3.18.0-rc3-00121-g0e60b07 + vmx reorder msr writes patch = emulation failure So that certainly points to 0e60b0799fedc495a5c57dbd669de3c10d72edd2 as well. Could you try this to fix your last error? Running qemu-system-x86_64 -machine pc,accel=kvm -nodefaults works, my real (headless) kvm guests work, but this new patch makes running qemu-system-x86_64 -machine pc,accel=kvm fail again, this time with Are you sure? From my test based on 3.19-rc1 that it owns top commit, aa39477b5692611b91ac9455ae588738852b3f60 just plus my previous patch, kvm: x86: vmx: reorder some msr writing I already can execute such a command successfully, qemu-system-x86_64 -machine pc,accel=kvm -m 2048 -smp 2 -hda ubuntu.img And your log below seems not to relate mem_slot issue we're discussing, I guess you need to update qemu as well. Yes, I'm sure. But I also found my new patch just work out Andy's next case, its really bringing a new issue in !next case. So I tried to refine that patch again as follows, This latest patch (again, after fixing all the whitespace so it actually Next time I guess I need to post that as a attached file :) applies), does the trick. Both qemu-system-x86_64 -machine pc,accel=kvm and qemu-system-x86_64 -machine pc,accel=kvm -nodefaults work for me now without any of the aforementioned warnings from the host. Sounds great and thanks for your test again. Tiejun Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- virt/kvm/kvm_main.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f528343..910bc48 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots, WARN_ON(mslots[i].id != id); if (!new-npages) { new-base_gfn = 0; + new-flags = 0; if (mslots[i].npages) slots-used_slots--; } else { @@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots, i++; } while (i 0 - new-base_gfn mslots[i - 1].base_gfn) { + ((new-base_gfn mslots[i - 1].base_gfn) || +(!new-base_gfn + !mslots[i - 1].base_gfn !mslots[i - 1].npages))) { mslots[i] = mslots[i - 1]; slots-id_to_index[mslots[i].id] = i; i--; Tiejun errors in the host to the tune of: [ cut here ] WARNING: CPU: 1 PID: 3901 at arch/x86/kvm/x86.c:6575 kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]() Modules linked in: nfsv4 cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand
Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes
On 2014/12/25 17:55, Nadav Amit wrote: Tiejun tiejun.c...@intel.com wrote: On 2014/12/25 8:52, Nadav Amit wrote: Although pop sreg updates RSP according to the operand size, only 2 bytes are read. The current behavior may result in incorrect #GP or #PF exceptions. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/emulate.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e5a84be..702da5e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt) unsigned long selector; int rc; Looks we just should do similar thing to em_push_sreg(), unsigned long selector; int rc; + if (ctxt-op_bytes == 4) { + rsp_increment(ctxt, -2); + ctxt-op_bytes = 2; + } rc = emulate_pop(ctxt, selector, ctxt-op_bytes); if (rc != X86EMUL_CONTINUE) return rc; Right? I don’t think so. It seems the behaviour of push and pop is a bit different. For push: “If the source operand is a segment register (16 bits) and the operand size is 64-bits, a zero-extended value is pushed on the stack; if the operand size is 32-bits ... all recent Core and Atom processors perform a 16-bit move, leaving the upper portion of the stack location unmodified.” Therefore, for push in the case of op_bytes==8 we push zero-extended value. For pop the behaviour is not well-documented, but experimentally it appears only the first two bytes are accessed. I cannot see why it would be Maybe we can comment something here, like /* Just force 2 byte destination to already work well in most cases. */. different when opsize is 8, since it is not like the push case, where the segment register value was zero extended. Thanks for your explanation. If you feel strongly about it, I’ll create a unit test. Based on your description I think I can stand with you at this point. Tiejun Nadav -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bisected] KVM in 3.19-rc1 is completely broken
On 2014/12/24 5:29, Andy Lutomirski wrote: On Tue, Dec 23, 2014 at 1:13 PM, Paolo Bonzini pbonz...@redhat.com wrote: I can reproduce it using the same steps on a Sandy Bridge laptop, with whatever QEMU is packaged in Fedora 21. I attached the config. I also submitted a virtme update for Fedora Rawhide and 21 (20 is still building) in case it helps. The build is here: http://koji.fedoraproject.org/koji/buildinfo?buildID=600732 The other reporter bisected it to 0e60b0799fedc495a5c57dbd669de3c10d72edd2. Can you try its parent? That's what I bisected it to. The parent works. Also, does it break with 3.18 host and 3.19-rc1 guest, or with 3.19-rc1 host and 3.18 guest? (Sorry I should do this myself but I'm a bit swamped due to vacation until Jan 6th). The breakage is with 3.17.7-something L0 and the same test kernel as L1 and L2. I think it breaks the same way with 3.19-rc1 as host and guest without any nesting, but that's awkward to test right now. Andy, Could you try this? Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- virt/kvm/kvm_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f528343..a2d928c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots, WARN_ON(mslots[i].id != id); if (!new-npages) { new-base_gfn = 0; + new-flags = 0; if (mslots[i].npages) slots-used_slots--; } else { @@ -688,7 +689,7 @@ static void update_memslots(struct kvm_memslots *slots, i++; } while (i 0 - new-base_gfn mslots[i - 1].base_gfn) { + new-base_gfn = mslots[i - 1].base_gfn) { mslots[i] = mslots[i - 1]; slots-id_to_index[mslots[i].id] = i; i--; -- 1.9.1 Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: regression bisected; KVM: entry failed, hardware error 0x80000021
On 2014/12/23 15:26, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/23 9:50, Chen, Tiejun wrote: On 2014/12/22 17:23, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/21 20:46, Jamie Heilman wrote: With v3.19-rc1 when I run qemu-system-x86_64 -machine pc,accel=kvm I get: KVM: entry failed, hardware error 0x8021 Looks some MSR writing issues such a failed entry. If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX= EBX= ECX= EDX=0663 ESI= EDI= EBP= ESP= EIP=e05b EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= IDT= CR0=6010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= And I don't see any obvious wrong as well. Any valuable info from dmesg? With the simple qemu command above, on 3.18.1 I see: kern.info: kvm: zapping shadow pages for mmio generation wraparound when I fire up a full guest that's actually useful I get: kern.info: kvm: zapping shadow pages for mmio generation wraparound kern.err: kvm [4073]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x On 3.18.0-rc3-00042-g34a1cd6 nothing appears in the dmesg, just the message I mention above to stderr. Same thing with a stock 3.19.0-rc1. Once I apply your patch the simple test command produces the same zapping shadow pages messages as 3.18.1, and a test guest of a Debian Jessie image (w/stock distro kernel) produces the same thing with disabled perfctr wrmsr message. However, it doesn't look like Sorry I'm not sure if I understood current status. Looks 3.19-rc1 my patch just fix that error above, KVM: entry failed, hardware error 0x8021 ... Right? I'm entirely out of the woods, because one of my other guest VMs with a custom kernel that works great under 3.18.1 now fails to run. Nothing in dmesg, but here's the stderr: But even you revert 34a1cd60d17 or just apply my patch, something else introduced between 3.18.1 and 3.19-rc1 led this error below, right? KVM internal error. Suberror: 1 emulation failure EAX=000de494 EBX= ECX= EDX=0cfd ESI=0059 EDI= EBP= ESP=6fb4 EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000f6be8 0037 IDT= 000f6c26 CR0=6011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b 5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1 b0 04 e6 21 b0 02 FWIW, I get the same thing with 34a1cd60d17 reverted. Maybe there are two bugs, maybe there's more to this first one. I can repro this So if my understanding is correct, this is probably another bug. And especially, I already saw the same log in another thread, Cleaning up the KVM clock. Maybe you can continue to `git bisect` to locate that bad commit. Looks just now Andy found that commit, 0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting rule from size to GFN, maybe you can try to revert this to try yours again. That doesn't revert cleanly for me, and I don't have much time to fiddle with it until the 24th---so checked out the commit before it (d4ae84a0), applied your patch, built, and yes, everything works fine at that point. I'll probably have time for another full bisection later, assuming things aren't ironed out already by then. Could you try this to fix your last error? Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- virt/kvm/kvm_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f528343..a2d928c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots, WARN_ON
Re: [bisected] KVM in 3.19-rc1 is completely broken
On 2014/12/25 1:11, Andy Lutomirski wrote: On Wed, Dec 24, 2014 at 12:23 AM, Chen, Tiejun tiejun.c...@intel.com wrote: On 2014/12/24 5:29, Andy Lutomirski wrote: On Tue, Dec 23, 2014 at 1:13 PM, Paolo Bonzini pbonz...@redhat.com wrote: I can reproduce it using the same steps on a Sandy Bridge laptop, with whatever QEMU is packaged in Fedora 21. I attached the config. I also submitted a virtme update for Fedora Rawhide and 21 (20 is still building) in case it helps. The build is here: http://koji.fedoraproject.org/koji/buildinfo?buildID=600732 The other reporter bisected it to 0e60b0799fedc495a5c57dbd669de3c10d72edd2. Can you try its parent? That's what I bisected it to. The parent works. Also, does it break with 3.18 host and 3.19-rc1 guest, or with 3.19-rc1 host and 3.18 guest? (Sorry I should do this myself but I'm a bit swamped due to vacation until Jan 6th). The breakage is with 3.17.7-something L0 and the same test kernel as L1 and L2. I think it breaks the same way with 3.19-rc1 as host and guest without any nesting, but that's awkward to test right now. Andy, Could you try this? Signed-off-by: Tiejun Chen tiejun.c...@intel.com I applied it by hand, and it survives extremely light testing. Tested-by: Andy Lutomirski l...@amacapital.net Looks good so thanks for your validation. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: regression bisected; KVM: entry failed, hardware error 0x80000021
On 2014/12/24 19:02, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/23 15:26, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/23 9:50, Chen, Tiejun wrote: On 2014/12/22 17:23, Jamie Heilman wrote: KVM internal error. Suberror: 1 emulation failure EAX=000de494 EBX= ECX= EDX=0cfd ESI=0059 EDI= EBP= ESP=6fb4 EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000f6be8 0037 IDT= 000f6c26 CR0=6011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b 5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1 b0 04 e6 21 b0 02 FWIW, I get the same thing with 34a1cd60d17 reverted. Maybe there are two bugs, maybe there's more to this first one. I can repro this So if my understanding is correct, this is probably another bug. And especially, I already saw the same log in another thread, Cleaning up the KVM clock. Maybe you can continue to `git bisect` to locate that bad commit. Looks just now Andy found that commit, 0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting rule from size to GFN, maybe you can try to revert this to try yours again. That doesn't revert cleanly for me, and I don't have much time to fiddle with it until the 24th---so checked out the commit before it (d4ae84a0), applied your patch, built, and yes, everything works fine at that point. I'll probably have time for another full bisection later, assuming things aren't ironed out already by then. 3.18.0-rc3-00120-gd4ae84a0 + vmx reorder msr writes patch = OK 3.18.0-rc3-00121-g0e60b07 + vmx reorder msr writes patch = emulation failure So that certainly points to 0e60b0799fedc495a5c57dbd669de3c10d72edd2 as well. Could you try this to fix your last error? Running qemu-system-x86_64 -machine pc,accel=kvm -nodefaults works, my real (headless) kvm guests work, but this new patch makes running qemu-system-x86_64 -machine pc,accel=kvm fail again, this time with Are you sure? From my test based on 3.19-rc1 that it owns top commit, aa39477b5692611b91ac9455ae588738852b3f60 just plus my previous patch, kvm: x86: vmx: reorder some msr writing I already can execute such a command successfully, qemu-system-x86_64 -machine pc,accel=kvm -m 2048 -smp 2 -hda ubuntu.img And your log below seems not to relate mem_slot issue we're discussing, I guess you need to update qemu as well. But I also found my new patch just work out Andy's next case, its really bringing a new issue in !next case. So I tried to refine that patch again as follows, Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- virt/kvm/kvm_main.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f528343..910bc48 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots, WARN_ON(mslots[i].id != id); if (!new-npages) { new-base_gfn = 0; + new-flags = 0; if (mslots[i].npages) slots-used_slots--; } else { @@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots, i++; } while (i 0 - new-base_gfn mslots[i - 1].base_gfn) { + ((new-base_gfn mslots[i - 1].base_gfn) || +(!new-base_gfn + !mslots[i - 1].base_gfn !mslots[i - 1].npages))) { mslots[i] = mslots[i - 1]; slots-id_to_index[mslots[i].id] = i; i--; Tiejun errors in the host to the tune of: [ cut here ] WARNING: CPU: 1 PID: 3901 at arch/x86/kvm/x86.c:6575 kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]() Modules linked in: nfsv4 cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand cpufreq_conservative autofs4 fan nfsd auth_rpcgss nfs lockd grace fscache sunrpc bridge stp llc vhost_net tun vhost macvtap macvlan fuse cbc dm_crypt usb_storage snd_hda_codec_analog snd_hda_codec_generic kvm_intel kvm tg3 ptp pps_core sr_mod snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer snd sg dcdbas cdrom psmouse soundcore floppy evdev xfs dm_mod raid1 md_mod CPU: 1 PID: 3901 Comm: qemu-system-x86 Not tainted 3.19.0-rc1-00011-g53262d1-dirty #1 Hardware
Re: regression bisected; KVM: entry failed, hardware error 0x80000021
On 2014/12/24 19:11, Paolo Bonzini wrote: On 24/12/2014 12:02, Jamie Heilman wrote: Running qemu-system-x86_64 -machine pc,accel=kvm -nodefaults works, my real (headless) kvm guests work, but this new patch makes running qemu-system-x86_64 -machine pc,accel=kvm fail again, this time with errors in the host to the tune of: [ cut here ] WARNING: CPU: 1 PID: 3901 at arch/x86/kvm/x86.c:6575 kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]() Modules linked in: nfsv4 cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand cpufreq_conservative autofs4 fan nfsd auth_rpcgss nfs lockd grace fscache sunrpc bridge stp llc vhost_net tun vhost macvtap macvlan fuse cbc dm_crypt usb_storage snd_hda_codec_analog snd_hda_codec_generic kvm_intel kvm tg3 ptp pps_core sr_mod snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer snd sg dcdbas cdrom psmouse soundcore floppy evdev xfs dm_mod raid1 md_mod CPU: 1 PID: 3901 Comm: qemu-system-x86 Not tainted 3.19.0-rc1-00011-g53262d1-dirty #1 Hardware name: Dell Inc. Precision WorkStation T3400 /0TP412, BIOS A14 04/30/2012 7e052328 8800c25ffcf8 813defbe 8800c25ffd38 8103b517 8800c25ffd28 a019bdec 8800caf1d000 8800c2774800 Call Trace: [813defbe] dump_stack+0x4c/0x6e [8103b517] warn_slowpath_common+0x97/0xb1 [a019bdec] ? kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm] [8103b60b] warn_slowpath_null+0x15/0x17 [a019bdec] kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm] [a02308b9] ? vmcs_load+0x20/0x62 [kvm_intel] [a0231e03] ? vmx_vcpu_load+0x140/0x16a [kvm_intel] [a0196ba3] ? kvm_arch_vcpu_load+0x15c/0x161 [kvm] [a018d8b1] kvm_vcpu_ioctl+0x189/0x4bd [kvm] [8104647a] ? do_sigtimedwait+0x12f/0x189 [810ea316] do_vfs_ioctl+0x370/0x436 [810f24f2] ? __fget+0x67/0x72 [810ea41b] SyS_ioctl+0x3f/0x5e [813e34d2] system_call_fastpath+0x12/0x17 ---[ end trace 46abac932fb3b4a1 ]--- [ cut here ] WARNING: CPU: 1 PID: 3901 at arch/x86/kvm/x86.c:6575 kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]() Modules linked in: nfsv4 cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand cpufreq_conservative autofs4 fan nfsd auth_rpcgss nfs lockd grace fscache sunrpc bridge stp llc vhost_net tun vhost macvtap macvlan fuse cbc dm_crypt usb_storage snd_hda_codec_analog snd_hda_codec_generic kvm_intel kvm tg3 ptp pps_core sr_mod snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_timer snd sg dcdbas cdrom psmouse soundcore floppy evdev xfs dm_mod raid1 md_mod CPU: 1 PID: 3901 Comm: qemu-system-x86 Tainted: GW 3.19.0-rc1-00011-g53262d1-dirty #1 Hardware name: Dell Inc. Precision WorkStation T3400 /0TP412, BIOS A14 04/30/2012 7e052328 8800c25ffcf8 813defbe 8800c25ffd38 8103b517 8800c25ffd28 a019bdec 8800caf1d000 8800c2774800 Call Trace: [813defbe] dump_stack+0x4c/0x6e [8103b517] warn_slowpath_common+0x97/0xb1 [a019bdec] ? kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm] [8103b60b] warn_slowpath_null+0x15/0x17 [a019bdec] kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm] [a02308b9] ? vmcs_load+0x20/0x62 [kvm_intel] [a0231e03] ? vmx_vcpu_load+0x140/0x16a [kvm_intel] [a0196ba3] ? kvm_arch_vcpu_load+0x15c/0x161 [kvm] [a018d8b1] kvm_vcpu_ioctl+0x189/0x4bd [kvm] [8104647a] ? do_sigtimedwait+0x12f/0x189 [810ea316] do_vfs_ioctl+0x370/0x436 [810f24f2] ? __fget+0x67/0x72 [810ea41b] SyS_ioctl+0x3f/0x5e [813e34d2] system_call_fastpath+0x12/0x17 ---[ end trace 46abac932fb3b4a2 ]--- over and over and over ad nauseum, or until I kill the qemu command, it also eats a core's worth of cpu. Such a message above seems to be out of our mem_slot issue, I'm not 100% sure but actually I can run this case, qemu-system-x86_64 -machine pc,accel=kvm -m 2048 -smp 2 -hda ubuntu.img Just one patch, kvm: x86: vmx: reorder some msr writing, is needed here. So I guess you guy can try to pull your 3.19-rc1 + that patch, and also pull qemu. Yeah, I'm fairly sure that the second hunk of Tiejun's patch is not correct, but he's on the right track. I hope to post a fix today, else Yeah, looks that will broken !next case then I regenerate that again post into another email. Now at lease I myself can run Andy's next case and a normal case, qemu-system-x86_64 -machine pc,accel=kvm, at the same time. But if I'm missing something please correct me directly :) Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bisected] KVM in 3.19-rc1 is completely broken
On 2014/12/23 9:38, Andy Lutomirski wrote: I can't get it to work at all. It fails with: How can we reproduce this? I just try the following command, qemu-system-x86_64 -machine pc,accel=kvm -m 2048 -smp 2 -hda ubuntu.img but at least I can boot that successfully and already live for a while. Here I installed ubuntu14.04 as a guest, and Linux commit: aa39477b5692611b91ac9455ae588738852b3f60 Qemu commit: 7e58e2ac7778cca3234c33387e49577bb7732714 So any specific Kconfigs need to be enabled? Tiejun KVM internal error. Suberror: 1 emulation failure EAX=000dee58 EBX= ECX= EDX=0cfd ESI=0059 EDI= EBP= ESP=6fc4 EIP=000f17f4 EFL=00010012 [A--] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000f6c58 0037 IDT= 000f6c96 CR0=6011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=e8 75 fc ff ff 89 f2 a8 10 89 d8 75 0a b9 74 17 ff ff ff d1 5b 5e c3 5b 5e e9 76 ff ff ff 57 56 53 8b 35 38 65 0f 00 85 f6 0f 88 be 00 00 00 0f b7 f6 The problem started with: commit 0e60b0799fedc495a5c57dbd669de3c10d72edd2 Author: Igor Mammedov imamm...@redhat.com Date: Mon Dec 1 17:29:26 2014 + kvm: change memslot sorting rule from size to GFN it will allow to use binary search for GFN - memslot lookups, reducing lookup cost with large slots amount. Signed-off-by: Igor Mammedov imamm...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --Andy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: x86: remove vmx_vm_has_apicv() outside of hwapic_isr_update()
On 2014/12/19 19:12, Paolo Bonzini wrote: On 19/12/2014 03:32, Chen, Tiejun wrote: Are you saying something below? if (enable_apicv) ... else { kvm_x86_ops-hwapic_irr_update = NULL; Yes. But this means we have to revise hadware_setup() to get 'kvm' inside, This would not even be possible, since hardware_setup() is only called once. Yeah. However, for the only caller of hwapic_isr_update (but presumably all of them, as is the case for hwapic_irr_update), you already know that There are two other callers to hwapic_isr_update(), apic_set_isr() and apic_clear_isr(), but I think they still work here. irqchip_in_kernel(kvm) is true. You are in kvm_apic_post_state_restore, which takes a kvm_lapic_state, and no lapic exists if !irqchip_in_kernel(kvm). (Yes, irqchip_in_kernel) is a bit weird and tests pic_irqchip(kvm) instead, but it's the same. It tests pic_irqchip(kvm) only because the LAPIC is per-cpu and irqchip_in_kernel takes a struct kvm). So it's possible to NULL out hwapic_isr_update in hardware_setup. It simply shouldn't happen that you call hwapic_isr_update without the in-kernel irqchip. The kernel knows nothing about ISR/IRR without the in-kernel irqchip. Thanks for your kind elaboration which always benefits me. What about this revision as follows? kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv In most cases calling hwapic_isr_update(), we always check if kvm_apic_vid_enabled() == 1, but actually, kvm_apic_vid_enabled() - kvm_x86_ops-vm_has_apicv() - vmx_vm_has_apicv() or '0' in svm case - return enable_apicv irqchip_in_kernel(kvm) So its a little cost to recall vmx_vm_has_apicv() inside hwapic_isr_update(), here just NULL out hwapic_isr_update() in case of !enable_apicv inside hardware_setup() then make all related stuffs follow this. Note we don't check this under that condition of irqchip_in_kernel() since we should make sure definitely any caller don't work without in-kernel irqchip. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- arch/x86/kvm/lapic.c | 7 --- arch/x86/kvm/vmx.c | 4 +--- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 4f0c0b9..eb4cd5e 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -402,7 +402,7 @@ static inline void apic_set_isr(int vec, struct kvm_lapic *apic) * because the processor can modify ISR under the hood. Instead * just set SVI. */ - if (unlikely(kvm_apic_vid_enabled(vcpu-kvm))) + if (unlikely(kvm_x86_ops-hwapic_isr_update)) kvm_x86_ops-hwapic_isr_update(vcpu-kvm, vec); else { ++apic-isr_count; @@ -450,7 +450,7 @@ static inline void apic_clear_isr(int vec, struct kvm_lapic *apic) * on the other hand isr_count and highest_isr_cache are unused * and must be left alone. */ - if (unlikely(kvm_apic_vid_enabled(vcpu-kvm))) + if (unlikely(kvm_x86_ops-hwapic_isr_update)) kvm_x86_ops-hwapic_isr_update(vcpu-kvm, apic_find_highest_isr(apic)); else { @@ -1742,7 +1742,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu, if (kvm_x86_ops-hwapic_irr_update) kvm_x86_ops-hwapic_irr_update(vcpu, apic_find_highest_irr(apic)); - kvm_x86_ops-hwapic_isr_update(vcpu-kvm, apic_find_highest_isr(apic)); + if (kvm_x86_ops-hwapic_isr_update) + kvm_x86_ops-hwapic_isr_update(vcpu-kvm, apic_find_highest_isr(apic)); kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_rtc_eoi_tracking_restore_one(vcpu); } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 96c84a8..e378dff 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5895,6 +5895,7 @@ static __init int hardware_setup(void) kvm_x86_ops-update_cr8_intercept = NULL; else { kvm_x86_ops-hwapic_irr_update = NULL; + kvm_x86_ops-hwapic_isr_update = NULL; kvm_x86_ops-deliver_posted_interrupt = NULL; kvm_x86_ops-sync_pir_to_irr = vmx_sync_pir_to_irr_dummy; } @@ -7471,9 +7472,6 @@ static void vmx_hwapic_isr_update(struct kvm *kvm, int isr) u16 status; u8 old; - if (!vmx_vm_has_apicv(kvm)) - return; - if (isr == -1) isr = 0; -- 1.9.1 I can send out as a patch if we have on any objections. Thanks Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: regression bisected; KVM: entry failed, hardware error 0x80000021
On 2014/12/22 17:23, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/21 20:46, Jamie Heilman wrote: With v3.19-rc1 when I run qemu-system-x86_64 -machine pc,accel=kvm I get: KVM: entry failed, hardware error 0x8021 Looks some MSR writing issues such a failed entry. If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX= EBX= ECX= EDX=0663 ESI= EDI= EBP= ESP= EIP=e05b EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= IDT= CR0=6010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= And I don't see any obvious wrong as well. Any valuable info from dmesg? With the simple qemu command above, on 3.18.1 I see: kern.info: kvm: zapping shadow pages for mmio generation wraparound when I fire up a full guest that's actually useful I get: kern.info: kvm: zapping shadow pages for mmio generation wraparound kern.err: kvm [4073]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x On 3.18.0-rc3-00042-g34a1cd6 nothing appears in the dmesg, just the message I mention above to stderr. Same thing with a stock 3.19.0-rc1. Once I apply your patch the simple test command produces the same zapping shadow pages messages as 3.18.1, and a test guest of a Debian Jessie image (w/stock distro kernel) produces the same thing with disabled perfctr wrmsr message. However, it doesn't look like Sorry I'm not sure if I understood current status. Looks 3.19-rc1 my patch just fix that error above, KVM: entry failed, hardware error 0x8021 ... Right? I'm entirely out of the woods, because one of my other guest VMs with a custom kernel that works great under 3.18.1 now fails to run. Nothing in dmesg, but here's the stderr: But even you revert 34a1cd60d17 or just apply my patch, something else introduced between 3.18.1 and 3.19-rc1 led this error below, right? KVM internal error. Suberror: 1 emulation failure EAX=000de494 EBX= ECX= EDX=0cfd ESI=0059 EDI= EBP= ESP=6fb4 EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000f6be8 0037 IDT= 000f6c26 CR0=6011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b 5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1 b0 04 e6 21 b0 02 FWIW, I get the same thing with 34a1cd60d17 reverted. Maybe there are two bugs, maybe there's more to this first one. I can repro this So if my understanding is correct, this is probably another bug. And especially, I already saw the same log in another thread, Cleaning up the KVM clock. Maybe you can continue to `git bisect` to locate that bad commit. Thanks Tiejun error with the command: qemu-system-x86_64 -machine pc,accel=kvm -nodefaults This is with QEMU emulator version 2.1.2 (Debian 1:2.1+dfsg-11), Copyright (c) 2003-2008 Fabrice Bellard The host system is: cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz stepping: 10 microcode : 0xa0b ... flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority I bisected this back to: commit 34a1cd60d17f62c1f077c1478a6c2ca8c3d17af4 Author: Tiejun Chen tiejun.c...@intel.com Date: Tue Oct 28 10:14:48 2014 +0800 kvm: x86: vmx: move some vmx setting from vmx_init() to hardware_setup() Instead of vmx_init(), actually it would make reasonable sense
Re: regression bisected; KVM: entry failed, hardware error 0x80000021
On 2014/12/23 9:50, Chen, Tiejun wrote: On 2014/12/22 17:23, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/21 20:46, Jamie Heilman wrote: With v3.19-rc1 when I run qemu-system-x86_64 -machine pc,accel=kvm I get: KVM: entry failed, hardware error 0x8021 Looks some MSR writing issues such a failed entry. If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX= EBX= ECX= EDX=0663 ESI= EDI= EBP= ESP= EIP=e05b EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= IDT= CR0=6010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= And I don't see any obvious wrong as well. Any valuable info from dmesg? With the simple qemu command above, on 3.18.1 I see: kern.info: kvm: zapping shadow pages for mmio generation wraparound when I fire up a full guest that's actually useful I get: kern.info: kvm: zapping shadow pages for mmio generation wraparound kern.err: kvm [4073]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x On 3.18.0-rc3-00042-g34a1cd6 nothing appears in the dmesg, just the message I mention above to stderr. Same thing with a stock 3.19.0-rc1. Once I apply your patch the simple test command produces the same zapping shadow pages messages as 3.18.1, and a test guest of a Debian Jessie image (w/stock distro kernel) produces the same thing with disabled perfctr wrmsr message. However, it doesn't look like Sorry I'm not sure if I understood current status. Looks 3.19-rc1 my patch just fix that error above, KVM: entry failed, hardware error 0x8021 ... Right? I'm entirely out of the woods, because one of my other guest VMs with a custom kernel that works great under 3.18.1 now fails to run. Nothing in dmesg, but here's the stderr: But even you revert 34a1cd60d17 or just apply my patch, something else introduced between 3.18.1 and 3.19-rc1 led this error below, right? KVM internal error. Suberror: 1 emulation failure EAX=000de494 EBX= ECX= EDX=0cfd ESI=0059 EDI= EBP= ESP=6fb4 EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000f6be8 0037 IDT= 000f6c26 CR0=6011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b 5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1 b0 04 e6 21 b0 02 FWIW, I get the same thing with 34a1cd60d17 reverted. Maybe there are two bugs, maybe there's more to this first one. I can repro this So if my understanding is correct, this is probably another bug. And especially, I already saw the same log in another thread, Cleaning up the KVM clock. Maybe you can continue to `git bisect` to locate that bad commit. Looks just now Andy found that commit, 0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting rule from size to GFN, maybe you can try to revert this to try yours again. Thanks Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: regression bisected; KVM: entry failed, hardware error 0x80000021
On 2014/12/23 15:26, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/23 9:50, Chen, Tiejun wrote: On 2014/12/22 17:23, Jamie Heilman wrote: Chen, Tiejun wrote: On 2014/12/21 20:46, Jamie Heilman wrote: With v3.19-rc1 when I run qemu-system-x86_64 -machine pc,accel=kvm I get: KVM: entry failed, hardware error 0x8021 Looks some MSR writing issues such a failed entry. If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX= EBX= ECX= EDX=0663 ESI= EDI= EBP= ESP= EIP=e05b EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= IDT= CR0=6010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= And I don't see any obvious wrong as well. Any valuable info from dmesg? With the simple qemu command above, on 3.18.1 I see: kern.info: kvm: zapping shadow pages for mmio generation wraparound when I fire up a full guest that's actually useful I get: kern.info: kvm: zapping shadow pages for mmio generation wraparound kern.err: kvm [4073]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x On 3.18.0-rc3-00042-g34a1cd6 nothing appears in the dmesg, just the message I mention above to stderr. Same thing with a stock 3.19.0-rc1. Once I apply your patch the simple test command produces the same zapping shadow pages messages as 3.18.1, and a test guest of a Debian Jessie image (w/stock distro kernel) produces the same thing with disabled perfctr wrmsr message. However, it doesn't look like Sorry I'm not sure if I understood current status. Looks 3.19-rc1 my patch just fix that error above, KVM: entry failed, hardware error 0x8021 ... Right? I'm entirely out of the woods, because one of my other guest VMs with a custom kernel that works great under 3.18.1 now fails to run. Nothing in dmesg, but here's the stderr: But even you revert 34a1cd60d17 or just apply my patch, something else introduced between 3.18.1 and 3.19-rc1 led this error below, right? KVM internal error. Suberror: 1 emulation failure EAX=000de494 EBX= ECX= EDX=0cfd ESI=0059 EDI= EBP= ESP=6fb4 EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00c09300 DPL=0 DS [-WA] CS =0008 00c09b00 DPL=0 CS32 [-RA] SS =0010 00c09300 DPL=0 DS [-WA] DS =0010 00c09300 DPL=0 DS [-WA] FS =0010 00c09300 DPL=0 DS [-WA] GS =0010 00c09300 DPL=0 DS [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS32-busy GDT= 000f6be8 0037 IDT= 000f6c26 CR0=6011 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b 5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1 b0 04 e6 21 b0 02 FWIW, I get the same thing with 34a1cd60d17 reverted. Maybe there are two bugs, maybe there's more to this first one. I can repro this So if my understanding is correct, this is probably another bug. And especially, I already saw the same log in another thread, Cleaning up the KVM clock. Maybe you can continue to `git bisect` to locate that bad commit. Looks just now Andy found that commit, 0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting rule from size to GFN, maybe you can try to revert this to try yours again. That doesn't revert cleanly for me, and I don't have much time to Yeah, I guess all associated commits should be reverted gradually. fiddle with it until the 24th---so checked out the commit before it (d4ae84a0), applied your patch, built, and yes, everything works fine Thanks for your test. I think I can submit this patch to fix one of yours problems and I'd like to add you as Reported-by Tested-by. Then we can step into another issue. And I'm trying to fetch 3.19-rc1 (because I'm always working on kvm/next.) to take a look at that but maybe Paolo is already going on that. Tiejun at that point. I'll probably have time for another full bisection later, assuming things aren't ironed out already
Re: regression bisected; KVM: entry failed, hardware error 0x80000021
On 2014/12/21 20:46, Jamie Heilman wrote: With v3.19-rc1 when I run qemu-system-x86_64 -machine pc,accel=kvm I get: KVM: entry failed, hardware error 0x8021 Looks some MSR writing issues such a failed entry. If you're running a guest on an Intel machine without unrestricted mode support, the failure can be most likely due to the guest entering an invalid state for Intel VT. For example, the guest maybe running in big real mode which is not supported on less recent Intel processors. EAX= EBX= ECX= EDX=0663 ESI= EDI= EBP= ESP= EIP=e05b EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = 9300 CS =f000 000f 9b00 SS = 9300 DS = 9300 FS = 9300 GS = 9300 LDT= 8200 TR = 8b00 GDT= IDT= CR0=6010 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= And I don't see any obvious wrong as well. Any valuable info from dmesg? Code=85 00 87 00 89 00 8b 00 00 00 86 00 88 00 8a 00 8c 00 00 90 2e 66 83 3e 30 6c 00 0f 85 e7 f2 31 c0 8e d0 66 bc 00 70 00 00 66 ba 31 2e 0f 00 e9 45 f1 This is with QEMU emulator version 2.1.2 (Debian 1:2.1+dfsg-11), Copyright (c) 2003-2008 Fabrice Bellard The host system is: cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz stepping: 10 microcode : 0xa0b ... flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority I bisected this back to: commit 34a1cd60d17f62c1f077c1478a6c2ca8c3d17af4 Author: Tiejun Chen tiejun.c...@intel.com Date: Tue Oct 28 10:14:48 2014 +0800 kvm: x86: vmx: move some vmx setting from vmx_init() to hardware_setup() Instead of vmx_init(), actually it would make reasonable sense to do anything specific to vmx hardware setting in vmx_x86_ops-hardware_setup(). This commit just reorders something but some MSR writing depend on previous status. Could you try this? diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index feb852b..96c84a8 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5840,49 +5840,6 @@ static __init int hardware_setup(void) memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE); memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE); - vmx_disable_intercept_for_msr(MSR_FS_BASE, false); - vmx_disable_intercept_for_msr(MSR_GS_BASE, false); - vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false); - vmx_disable_intercept_for_msr(MSR_IA32_BNDCFGS, true); - - memcpy(vmx_msr_bitmap_legacy_x2apic, - vmx_msr_bitmap_legacy, PAGE_SIZE); - memcpy(vmx_msr_bitmap_longmode_x2apic, - vmx_msr_bitmap_longmode, PAGE_SIZE); - - if (enable_apicv) { - for (msr = 0x800; msr = 0x8ff; msr++) - vmx_disable_intercept_msr_read_x2apic(msr); - - /* According SDM, in x2apic mode, the whole id reg is used. -* But in KVM, it only use the highest eight bits. Need to -* intercept it */ - vmx_enable_intercept_msr_read_x2apic(0x802); - /* TMCCT */ - vmx_enable_intercept_msr_read_x2apic(0x839); - /* TPR */ - vmx_disable_intercept_msr_write_x2apic(0x808); - /* EOI */ - vmx_disable_intercept_msr_write_x2apic(0x80b); - /* SELF-IPI */ - vmx_disable_intercept_msr_write_x2apic(0x83f); - } - - if (enable_ept) { - kvm_mmu_set_mask_ptes(0ull, - (enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull, - (enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull, - 0ull, VMX_EPT_EXECUTABLE_MASK); - ept_set_mmio_spte_mask(); - kvm_enable_tdp(); - } else - kvm_disable_tdp(); - - update_ple_window_actual_max(); - if (setup_vmcs_config(vmcs_config) 0) { r = -EIO; goto out7; @@ -5945,6 +5902,49 @@ static __init int hardware_setup(void) if (nested) nested_vmx_setup_ctls_msrs(); +
Re: [PATCH] kvm: x86: remove vmx_vm_has_apicv() outside of hwapic_isr_update()
On 2014/12/1 19:43, Paolo Bonzini wrote: On 01/12/2014 10:28, Tiejun Chen wrote: In most cases calling hwapic_isr_update(), actually we always check if kvm_apic_vid_enabled() == 1, and also actually, kvm_apic_vid_enabled() - kvm_x86_ops-vm_has_apicv() - vmx_vm_has_apicv() or '0' in svm case So its unnecessary to recall this inside hwapic_isr_update(), here just remove vmx_vm_has_apicv() out and follow others. If you want to do this, please NULL out the function pointer instead, as KVM already does for hwapic_irr_update. Are you saying something below? if (enable_apicv) ... else { kvm_x86_ops-hwapic_irr_update = NULL; But there's a little bit difference to NULL out hwapic_isr_update(), static int vmx_vm_has_apicv(struct kvm *kvm) { return enable_apicv irqchip_in_kernel(kvm); } Yes, I can do something like this, static __init int hadware_setup(void) { ... if (enable_apicv) { ... if (!irqchip_in_kernel(kvm)) kvm_x86_ops-hwapic_isr_update = NULL; } else { ... kvm_x86_ops-hwapic_isr_update = NULL; But this means we have to revise hadware_setup() to get 'kvm' inside, then rebase other callers to hwapic_isr_update(), is it really good? Here what I will intend to do is trying to reduce some cost (reduplicate check) with a little code, so its may not be worth changing much more. Tiejun Paolo Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- arch/x86/kvm/lapic.c | 3 ++- arch/x86/kvm/vmx.c | 3 --- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index e0e5642..2ddc426 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1739,7 +1739,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu, if (kvm_x86_ops-hwapic_irr_update) kvm_x86_ops-hwapic_irr_update(vcpu, apic_find_highest_irr(apic)); - kvm_x86_ops-hwapic_isr_update(vcpu-kvm, apic_find_highest_isr(apic)); + if (kvm_apic_vid_enabled(vcpu-kvm)) + kvm_x86_ops-hwapic_isr_update(vcpu-kvm, apic_find_highest_isr(apic)); kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_rtc_eoi_tracking_restore_one(vcpu); } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6a951d8..f0c16a9 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7406,9 +7406,6 @@ static void vmx_hwapic_isr_update(struct kvm *kvm, int isr) u16 status; u8 old; - if (!vmx_vm_has_apicv(kvm)) - return; - if (isr == -1) isr = 0; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: iommu: Add cond_resched to legacy device assignment code
On 2014/12/16 23:47, Joerg Roedel wrote: From: Joerg Roedel jroe...@suse.de When assigning devices to large memory guests (=128GB guest memory in the failure case) the functions to create the IOMMU page-tables for the whole guest might run for a very long time. On non-preemptible kernels this might cause Soft-Lockup warnings. Fix these by adding a cond_resched() to the mapping and unmapping loops. Signed-off-by: Joerg Roedel jroe...@suse.de --- virt/kvm/iommu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index c1e6ae9..ac427e8 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c This file is already gone after one latest commit c274e03af705, kvm: x86: move assigned-dev.c and iommu.c to arch/x86/ is introduced, so you need to pull your tree firstly :) Tiejun @@ -137,7 +137,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot) gfn += page_size PAGE_SHIFT; - + cond_resched(); } return 0; @@ -311,6 +311,8 @@ static void kvm_iommu_put_pages(struct kvm *kvm, kvm_unpin_pages(kvm, pfn, unmap_pages); gfn += unmap_pages; + + cond_resched(); } } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: coalesced_mmio: remove one redundant check inside of coalesced_mmio_in_range()
On 2014/12/11 19:29, Paolo Bonzini wrote: On 11/12/2014 04:02, Tiejun Chen wrote: We already check 'len' above to make sure it already isn't negative here, so indeed, (addr + len addr) should never be happened. ... except if there is an overflow. Sorry, I'm confused. 'addr' is u64 and now 'len' would always be '=0', what's your a so-called overflow here? And we also have such a check below, (addr + len dev-zone.addr + dev-zone.size), so can this guarantee an overflow? Thanks Tiejun Paolo Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- virt/kvm/coalesced_mmio.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 00d8642..60f59cd 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -30,8 +30,6 @@ static int coalesced_mmio_in_range(struct kvm_coalesced_mmio_dev *dev, */ if (len 0) return 0; - if (addr + len addr) - return 0; if (addr dev-zone.addr) return 0; if (addr + len dev-zone.addr + dev-zone.size) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: coalesced_mmio: remove one redundant check inside of coalesced_mmio_in_range()
On 2014/12/12 9:02, Chen, Tiejun wrote: On 2014/12/11 19:29, Paolo Bonzini wrote: On 11/12/2014 04:02, Tiejun Chen wrote: We already check 'len' above to make sure it already isn't negative here, so indeed, (addr + len addr) should never be happened. ... except if there is an overflow. I think now I can understand what you mean. Thanks Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [question] what is virtio-1 device?
On 2014/11/27 9:26, Zhang Haoyu wrote: Hi, what is virtio-1 device? Are you mean subsystem device id is '1'? That should be a network card. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [question] what is virtio-1 device?
On 2014/11/27 9:46, Zhang Haoyu wrote: Hi, what is virtio-1 device? Are you mean subsystem device id is '1'? That should be a network card. In the mail qemu: towards virtio-1 host support, virtio: allow virtio-1 queue layout, dataplane: allow virtio-1 devices, etc., virtio-1 devices was mentioned, I don't know which devices it incorporates? That should mean they're carrying forward virtio based on latest Virtual I/O Device (VIRTIO) Version 1.0. Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: ia64: remove
On 2014/11/20 5:05, Paolo Bonzini wrote: KVM for ia64 has been marked as broken not just once, but twice even, and the last patch from the maintainer is now roughly 5 years old. Time for it to rest in piece. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- I think we also need to sync this in Documentation: Documentation: virtual: kvm: remove ia64 kvm/ia64 would be gone so just clean Documentation. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- Documentation/ia64/kvm.txt | 83 - Documentation/ia64/paravirt_ops.txt | 8 ++-- Documentation/virtual/kvm/api.txt | 45 ++-- 3 files changed, 26 insertions(+), 110 deletions(-) delete mode 100644 Documentation/ia64/kvm.txt diff --git a/Documentation/ia64/kvm.txt b/Documentation/ia64/kvm.txt deleted file mode 100644 index ffb5c80..000 --- a/Documentation/ia64/kvm.txt +++ /dev/null @@ -1,83 +0,0 @@ -Currently, kvm module is in EXPERIMENTAL stage on IA64. This means that -interfaces are not stable enough to use. So, please don't run critical -applications in virtual machine. -We will try our best to improve it in future versions! - - Guide: How to boot up guests on kvm/ia64 - -This guide is to describe how to enable kvm support for IA-64 systems. - -1. Get the kvm source from git.kernel.org. - Userspace source: - git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-userspace.git - Kernel Source: - git clone git://git.kernel.org/pub/scm/linux/kernel/git/xiantao/kvm-ia64.git - -2. Compile the source code. - 2.1 Compile userspace code: - (1)cd ./kvm-userspace - (2)./configure - (3)cd kernel - (4)make sync LINUX= $kernel_dir (kernel_dir is the directory of kernel source.) - (5)cd .. - (6)make qemu - (7)cd qemu; make install - - 2.2 Compile kernel source code: - (1) cd ./$kernel_dir - (2) Make menuconfig - (3) Enter into virtualization option, and choose kvm. - (4) make - (5) Once (4) done, make modules_install - (6) Make initrd, and use new kernel to reboot up host machine. - (7) Once (6) done, cd $kernel_dir/arch/ia64/kvm - (8) insmod kvm.ko; insmod kvm-intel.ko - -Note: For step 2, please make sure that host page size == TARGET_PAGE_SIZE of qemu, otherwise, may fail. - -3. Get Guest Firmware named as Flash.fd, and put it under right place: - (1) If you have the guest firmware (binary) released by Intel Corp for Xen, use it directly. - - (2) If you have no firmware at hand, Please download its source from - hg clone http://xenbits.xensource.com/ext/efi-vfirmware.hg - you can get the firmware's binary in the directory of efi-vfirmware.hg/binaries. - - (3) Rename the firmware you owned to Flash.fd, and copy it to /usr/local/share/qemu - -4. Boot up Linux or Windows guests: - 4.1 Create or install a image for guest boot. If you have xen experience, it should be easy. - - 4.2 Boot up guests use the following command. - /usr/local/bin/qemu-system-ia64 -smp xx -m 512 -hda $your_image - (xx is the number of virtual processors for the guest, now the maximum value is 4) - -5. Known possible issue on some platforms with old Firmware. - -In the event of strange host crash issues, try to solve it through either of the following ways: - -(1): Upgrade your Firmware to the latest one. - -(2): Applying the below patch to kernel source. -diff --git a/arch/ia64/kernel/pal.S b/arch/ia64/kernel/pal.S -index 0b53344..f02b0f7 100644 a/arch/ia64/kernel/pal.S -+++ b/arch/ia64/kernel/pal.S -@@ -84,7 +84,8 @@ GLOBAL_ENTRY(ia64_pal_call_static) - mov ar.pfs = loc1 - mov rp = loc0 - ;; -- srlz.d // serialize restoration of psr.l -+ srlz.i // serialize restoration of psr.l -+ ;; - br.ret.sptk.many b0 - END(ia64_pal_call_static) - -6. Bug report: - If you found any issues when use kvm/ia64, Please post the bug info to kvm-ia64-devel mailing list. - https://lists.sourceforge.net/lists/listinfo/kvm-ia64-devel/ - -Thanks for your interest! Let's work together, and make kvm/ia64 stronger and stronger! - - - Xiantao Zhang xiantao.zh...@intel.com - 2008.3.10 diff --git a/Documentation/ia64/paravirt_ops.txt b/Documentation/ia64/paravirt_ops.txt index 39ded02..3a1f36b 100644 --- a/Documentation/ia64/paravirt_ops.txt +++ b/Documentation/ia64/paravirt_ops.txt @@ -12,7 +12,7 @@ paravirt_ops (pv_ops in short) is a way for virtualization support of Linux kernel on x86. Several ways for virtualization support were proposed, paravirt_ops is the winner.
Re: [v2][PATCH] kvm: x86: mmio: fix setting the present bit of mmio spte
On 2014/11/17 19:40, Paolo Bonzini wrote: On 17/11/2014 12:31, Tiejun Chen wrote: In non-ept 64-bit of PAE case maxphyaddr may be 52bit as well, There is no such thing as 64-bit PAE. Definitely. On 32-bit PAE hosts, PTEs have bit 62 reserved, as in your patch: Do you mean just one reserved bit is fine enough in this case? Thanks Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 1/2] kvm: x86: mmu: return zero if s e in rsvd_bits()
On 2014/11/17 17:22, Paolo Bonzini wrote: On 17/11/2014 02:34, Chen, Tiejun wrote: On 2014/11/14 18:06, Paolo Bonzini wrote: On 14/11/2014 10:31, Tiejun Chen wrote: In some real scenarios 'start' may not be less than 'end' like maxphyaddr = 52. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- arch/x86/kvm/mmu.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index bde8ee7..0e98b5e 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -58,6 +58,8 @@ static inline u64 rsvd_bits(int s, int e) { +if (unlikely(s e)) +return 0; return ((1ULL (e - s + 1)) - 1) s; } s == e + 1 is supported: (1ULL (e - (e + 1) + 1)) - 1) s == (1ULL (e - (e + 1) + 1)) - 1) s = (1ULL (e - e - 1) + 1)) - 1) s = (1ULL (-1) + 1)) - 1) s no, You're right since I'm seeing () wrongly. Sorry to bother you. Thanks Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 1/2] kvm: x86: mmu: return zero if s e in rsvd_bits()
On 2014/11/14 18:06, Paolo Bonzini wrote: On 14/11/2014 10:31, Tiejun Chen wrote: In some real scenarios 'start' may not be less than 'end' like maxphyaddr = 52. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- arch/x86/kvm/mmu.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index bde8ee7..0e98b5e 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -58,6 +58,8 @@ static inline u64 rsvd_bits(int s, int e) { + if (unlikely(s e)) + return 0; return ((1ULL (e - s + 1)) - 1) s; } s == e + 1 is supported: (1ULL (e - (e + 1) + 1)) - 1) s == (1ULL (e - (e + 1) + 1)) - 1) s = (1ULL (e - e - 1) + 1)) - 1) s = (1ULL (-1) + 1)) - 1) s = (1ULL (0) - 1) s = (1ULL (- 1) s Am I missing something? Thanks Tiejun (1ULL 0) s == 0 Is there any case where s is even bigger? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH 2/2] kvm: x86: mmio: fix setting the present bit of mmio spte
On 2014/11/14 18:11, Paolo Bonzini wrote: On 14/11/2014 10:31, Tiejun Chen wrote: In PAE case maxphyaddr may be 52bit as well, we also need to disable mmio page fault. Here we can check MMIO_SPTE_GEN_HIGH_SHIFT directly to determine if we should set the present bit, and bring a little cleanup. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/mmu.c | 23 +++ arch/x86/kvm/x86.c | 30 -- 3 files changed, 24 insertions(+), 30 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index dc932d3..667f2b6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -809,6 +809,7 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn_offset, unsigned long mask); void kvm_mmu_zap_all(struct kvm *kvm); +void kvm_set_mmio_spte_mask(void); void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm); unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages); diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ac1c4de..8e4be36 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -295,6 +295,29 @@ static bool check_mmio_spte(struct kvm *kvm, u64 spte) return likely(kvm_gen == spte_gen); } +/* + * Set the reserved bits and the present bit of an paging-structure + * entry to generate page fault with PFER.RSV = 1. + */ +void kvm_set_mmio_spte_mask(void) +{ + u64 mask; + int maxphyaddr = boot_cpu_data.x86_phys_bits; + + /* Mask the reserved physical address bits. */ + mask = rsvd_bits(maxphyaddr, MMIO_SPTE_GEN_HIGH_SHIFT - 1); + + /* Magic bits are always reserved for 32bit host. */ + mask |= 0x3ull 62; This should be enough to trigger the page fault on PAE systems. The problem is specific to non-EPT 64-bit hosts, where the PTEs have no reserved bits beyond 51:MAXPHYADDR. On EPT we use WX- permissions to trigger EPT misconfig, on 32-bit systems we have bit 62. Thanks for your explanation. + /* Set the present bit to enable mmio page fault. */ + if (maxphyaddr MMIO_SPTE_GEN_HIGH_SHIFT) + mask = PT_PRESENT_MASK; Shouldn't this be |= anyway, instead of =? Yeah, just miss this. Thanks a lot, I will fix this in next revision. Thanks Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: reset RVI upon system reset
On 2014/11/5 15:39, Wang, Wei W wrote: On 05/11/2014 2:14, Tiejun Chen wrote: A bug was reported as follows: when running Windows 7 32-bit guests on qemu-kvm, sometimes the guests run into blue screen during reboot. The problem was that a guest's RVI was not cleared when it rebooted. This patch has fixed the problem. Signed-off-by: Wei Wang wei.w.w...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com Tested-by: Rongrong Liu rongrongx@intel.com, Da Chun ng...@qq.com --- arch/x86/kvm/lapic.c |3 +++ arch/x86/kvm/vmx.c | 12 ++-- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 66dd173..6942742 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1712,6 +1712,9 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu, apic-isr_count = kvm_apic_vid_enabled(vcpu-kvm) ? 1 : count_vectors(apic-regs + APIC_ISR); apic-highest_isr_cache = -1; + if (kvm_x86_ops-hwapic_irr_update) + kvm_x86_ops-hwapic_irr_update(vcpu, + apic_find_highest_irr(apic)); Could we pass 0 directly? Because looks we just need to clear RVI. kvm_x86_ops-hwapic_irr_update(vcpu, 0); I think this already makes sense based on your description. Thanks Tiejun No. This is a restore function, and we cannot assume that the callers always need to reset to the initial state. Okay. Maybe I'm confused by the following change. Wei kvm_x86_ops-hwapic_isr_update(vcpu-kvm, apic_find_highest_isr(apic)); kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_rtc_eoi_tracking_restore_one(vcpu); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index fe4d2f4..d632548 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7292,19 +7292,19 @@ static void vmx_set_rvi(int vector) static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr) { if (max_irr == -1) + max_irr = 0; + + if (!is_guest_mode(vcpu)) { + vmx_set_rvi(max_irr); return; + } /* * If a vmexit is needed, vmx_check_nested_events handles it. */ - if (is_guest_mode(vcpu) nested_exit_on_intr(vcpu)) + if ((is_guest_mode(vcpu) nested_exit_on_intr(vcpu)) || max_irr == +0) Its really not readable to modify max_irr as 0 and check that here, and especially when you read the original comment. So what about this? diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0cd99d8..bc4558b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7280,6 +7280,9 @@ static void vmx_set_rvi(int vector) u16 status; u8 old; + if (vector == -1) + vector = 0; + status = vmcs_read16(GUEST_INTR_STATUS); old = (u8)status 0xff; if ((u8)vector != old) { @@ -7291,9 +7294,6 @@ static void vmx_set_rvi(int vector) static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr) { - if (max_irr == -1) - return; - /* * If a vmexit is needed, vmx_check_nested_events handles it. */ @@ -7305,6 +7305,9 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr) return; } + if (max_irr == -1) + return; + /* * Fall back to pre-APICv interrupt injection since L2 * is run without virtual interrupt delivery. Thanks Tiejun return; - if (!is_guest_mode(vcpu)) { - vmx_set_rvi(max_irr); - return; - } - /* * Fall back to pre-APICv interrupt injection since L2 * is run without virtual interrupt delivery. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: reset RVI upon system reset
On 2014/11/5 16:50, Wang, Wei W wrote: On 05/11/2014 4:07, Tiejun Chen wrote: A bug was reported as follows: when running Windows 7 32-bit guests on qemu-kvm, sometimes the guests run into blue screen during reboot. The problem was that a guest's RVI was not cleared when it rebooted. This patch has fixed the problem. Signed-off-by: Wei Wang wei.w.w...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com Tested-by: Rongrong Liu rongrongx@intel.com, Da Chun ng...@qq.com --- arch/x86/kvm/lapic.c |3 +++ arch/x86/kvm/vmx.c | 12 ++-- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 66dd173..6942742 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1712,6 +1712,9 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu, apic-isr_count = kvm_apic_vid_enabled(vcpu-kvm) ? 1 : count_vectors(apic-regs + APIC_ISR); apic-highest_isr_cache = -1; + if (kvm_x86_ops-hwapic_irr_update) + kvm_x86_ops-hwapic_irr_update(vcpu, + apic_find_highest_irr(apic)); Could we pass 0 directly? Because looks we just need to clear RVI. kvm_x86_ops-hwapic_irr_update(vcpu, 0); I think this already makes sense based on your description. Thanks Tiejun No. This is a restore function, and we cannot assume that the callers always need to reset to the initial state. Okay. Maybe I'm confused by the following change. Wei kvm_x86_ops-hwapic_isr_update(vcpu-kvm, apic_find_highest_isr(apic)); kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_rtc_eoi_tracking_restore_one(vcpu); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index fe4d2f4..d632548 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7292,19 +7292,19 @@ static void vmx_set_rvi(int vector) static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr) { if (max_irr == -1) + max_irr = 0; + + if (!is_guest_mode(vcpu)) { + vmx_set_rvi(max_irr); return; + } /* * If a vmexit is needed, vmx_check_nested_events handles it. */ - if (is_guest_mode(vcpu) nested_exit_on_intr(vcpu)) + if ((is_guest_mode(vcpu) nested_exit_on_intr(vcpu)) || max_irr == +0) Its really not readable to modify max_irr as 0 and check that here, and especially when you read the original comment. So what about this? I think both are ok. If we zero max_irr in vmx_set_rvi(), we still need this check: if ((is_guest_mode(vcpu) nested_exit_on_intr(vcpu)) || max_irr == -1) No, I don't think we need to add this. Let's see if Paolo has any comments, then send out a second version. Wei diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0cd99d8..bc4558b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7280,6 +7280,9 @@ static void vmx_set_rvi(int vector) u16 status; u8 old; + if (vector == -1) + vector = 0; + status = vmcs_read16(GUEST_INTR_STATUS); old = (u8)status 0xff; if ((u8)vector != old) { @@ -7291,9 +7294,6 @@ static void vmx_set_rvi(int vector) static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr) { - if (max_irr == -1) - return; - /* * If a vmexit is needed, vmx_check_nested_events handles it. */ @@ -7305,6 +7305,9 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr) return; } + if (max_irr == -1) + return; + Did you see this? Tiejun /* * Fall back to pre-APICv interrupt injection since L2 * is run without virtual interrupt delivery. Thanks Tiejun return; - if (!is_guest_mode(vcpu)) { - vmx_set_rvi(max_irr); - return; - } - /* * Fall back to pre-APICv interrupt injection since L2 * is run without virtual interrupt delivery. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: x86: lapic: remove one redundant judging condition
On 2014/11/5 18:22, Paolo Bonzini wrote: On 05/11/2014 10:03, Tiejun Chen wrote: Finally we always return highest_irr so its unnecessary to return -1 after check if highest_irr == -1. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- arch/x86/kvm/lapic.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 5f574b4..e6a7eb6 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1638,8 +1638,7 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) apic_update_ppr(apic); highest_irr = apic_find_highest_irr(apic); - if ((highest_irr == -1) || - ((highest_irr 0xF0) = kvm_apic_get_reg(apic, APIC_PROCPRI))) + if ((highest_irr 0xF0) = kvm_apic_get_reg(apic, APIC_PROCPRI)) return -1; return highest_irr; } I think the code is clearer without this change. The two returns mean: - return -1: no interrupt to inject - return highest_irr: inject this interrupt With IRR equal to all zeroes (highest_irr = -1), your patch would make the if always false (current PPR is low, can inject the interrupt), but computing highest_irr 0xF0 would make no sense if highest_irr == -1. Yeah, you're right so here is just a little confusion to read. Actually what this code is doing looks like, @@ -1638,7 +1638,7 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) apic_update_ppr(apic); highest_irr = apic_find_highest_irr(apic); - if ((highest_irr == -1) || + if ((highest_irr != -1) ((highest_irr 0xF0) = kvm_apic_get_reg(apic, APIC_PROCPRI))) return -1; return highest_irr; But it's really no big deal so we can keep the original alive. Thanks Tiejun To put it another way, imagine the code looked like this: static inline int int_prio(int vector) { WARN_ON(vector == -1); return vector 0xF0; } ... apic_update_ppr(apic); highest_irr = apic_find_highest_irr(apic); if (highest_irr == -1 || int_prio(highest_irr) = kvm_apic_get_reg(apic, APIC_PROCPRI)) return -1; return highest_irr; Then removing the check on highest_irr == -1 would trigger a warning. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: x86: vmx: avoid returning bool to distinguish success from error
On 2014/11/5 1:33, Paolo Bonzini wrote: Return a negative error code instead, and WARN() when we should be covering the entire 2-bit space of vmcs_field_type's return value. For increased robustness, add a BUILD_BUG_ON checking the range of vmcs_field_to_offset. Suggested-by: Tiejun Chen tiejun.c...@intel.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch/x86/kvm/vmx.c | 51 +++ 1 file changed, 31 insertions(+), 20 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 12f6f9a8dd8d..0ee148f1687f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -720,12 +720,15 @@ static const unsigned short vmcs_field_to_offset_table[] = { FIELD(HOST_RSP, host_rsp), FIELD(HOST_RIP, host_rip), }; -static const int max_vmcs_field = ARRAY_SIZE(vmcs_field_to_offset_table); static inline short vmcs_field_to_offset(unsigned long field) { - if (field = max_vmcs_field || vmcs_field_to_offset_table[field] == 0) - return -1; + BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table) SHRT_MAX); + + if (field = ARRAY_SIZE(vmcs_field_to_offset_table) || + vmcs_field_to_offset_table[field] == 0) + return -ENOENT; + return vmcs_field_to_offset_table[field]; } @@ -6492,58 +6495,60 @@ static inline int vmcs_field_readonly(unsigned long field) * some of the bits we return here (e.g., on 32-bit guests, only 32 bits of * 64-bit fields are to be returned). */ -static inline bool vmcs12_read_any(struct kvm_vcpu *vcpu, - unsigned long field, u64 *ret) +static inline int vmcs12_read_any(struct kvm_vcpu *vcpu, + unsigned long field, u64 *ret) { short offset = vmcs_field_to_offset(field); char *p; if (offset 0) - return 0; + return offset; p = ((char *)(get_vmcs12(vcpu))) + offset; switch (vmcs_field_type(field)) { case VMCS_FIELD_TYPE_NATURAL_WIDTH: *ret = *((natural_width *)p); - return 1; + return 0; case VMCS_FIELD_TYPE_U16: *ret = *((u16 *)p); - return 1; + return 0; case VMCS_FIELD_TYPE_U32: *ret = *((u32 *)p); - return 1; + return 0; case VMCS_FIELD_TYPE_U64: *ret = *((u64 *)p); - return 1; + return 0; default: - return 0; /* can never happen. */ + WARN_ON(1); + return -ENOENT; } } -static inline bool vmcs12_write_any(struct kvm_vcpu *vcpu, - unsigned long field, u64 field_value){ +static inline int vmcs12_write_any(struct kvm_vcpu *vcpu, + unsigned long field, u64 field_value){ short offset = vmcs_field_to_offset(field); char *p = ((char *) get_vmcs12(vcpu)) + offset; if (offset 0) - return false; + return offset; switch (vmcs_field_type(field)) { case VMCS_FIELD_TYPE_U16: *(u16 *)p = field_value; - return true; + return 0; case VMCS_FIELD_TYPE_U32: *(u32 *)p = field_value; - return true; + return 0; case VMCS_FIELD_TYPE_U64: *(u64 *)p = field_value; - return true; + return 0; case VMCS_FIELD_TYPE_NATURAL_WIDTH: *(natural_width *)p = field_value; - return true; + return 0; default: - return false; /* can never happen. */ + WARN_ON(1); + return -ENOENT; } } @@ -6576,6 +6581,9 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx) case VMCS_FIELD_TYPE_NATURAL_WIDTH: field_value = vmcs_readl(field); break; + default: + WARN_ON(1); + continue; 'continue' versus 'break'? Thanks Tiejun } vmcs12_write_any(vmx-vcpu, field, field_value); } @@ -6621,6 +6629,9 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx) case VMCS_FIELD_TYPE_NATURAL_WIDTH: vmcs_writel(field, (long)field_value); break; + default: + WARN_ON(1); + break; } } } @@ -6659,7 +6670,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu) /* Decode instruction info and find the field to read */ field = kvm_register_readl(vcpu, (((vmx_instruction_info) 28) 0xf)); /* Read the field,
Re: [PATCH] kvm: x86: vmx: avoid returning bool to distinguish success from error
On 2014/11/5 9:43, Chen, Tiejun wrote: On 2014/11/5 1:33, Paolo Bonzini wrote: Return a negative error code instead, and WARN() when we should be covering the entire 2-bit space of vmcs_field_type's return value. For increased robustness, add a BUILD_BUG_ON checking the range of vmcs_field_to_offset. Suggested-by: Tiejun Chen tiejun.c...@intel.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch/x86/kvm/vmx.c | 51 [snip] @@ -6576,6 +6581,9 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx) case VMCS_FIELD_TYPE_NATURAL_WIDTH: field_value = vmcs_readl(field); break; +default: +WARN_ON(1); +continue; 'continue' versus 'break'? Thanks Tiejun } vmcs12_write_any(vmx-vcpu, field, field_value); } @@ -6621,6 +6629,9 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx) case VMCS_FIELD_TYPE_NATURAL_WIDTH: vmcs_writel(field, (long)field_value); break; +default: +WARN_ON(1); +break; } } } @@ -6659,7 +6670,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu) /* Decode instruction info and find the field to read */ field = kvm_register_readl(vcpu, (((vmx_instruction_info) 28) 0xf)); /* Read the field, zero-extended to a u64 field_value */ -if (!vmcs12_read_any(vcpu, field, field_value)) { +if (vmcs12_read_any(vcpu, field, field_value) 0) { nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); skip_emulated_instruction(vcpu); return 1; Looks we're missing another place, --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6601,7 +6601,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu) return 1; } - if (!vmcs12_write_any(vcpu, field, field_value)) { + if (vmcs12_write_any(vcpu, field, field_value) 0) { nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); skip_emulated_instruction(vcpu); return 1; Thanks Tiejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: x86: vmx: return 'bool' type from vmcs12_read_any()
On 2014/11/5 1:34, Paolo Bonzini wrote: On 31/10/2014 07:45, Tiejun Chen wrote: Return value should be as bool type as this function declaration, static inline bool vmcs12_read_any(). Actually, bool return values are in general a bad idea if you mean success/fail, especially if you can use POSIX error codes such as in this case ENOENT. Yeah. I've sent a patch that changes the return value to int. Cool! I just have two minimal comments inline. Thanks Tiejun Paolo Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- arch/x86/kvm/vmx.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 9fa1f46..9d44d6e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6499,25 +6499,25 @@ static inline bool vmcs12_read_any(struct kvm_vcpu *vcpu, char *p; if (offset 0) - return 0; + return false; p = ((char *)(get_vmcs12(vcpu))) + offset; switch (vmcs_field_type(field)) { case VMCS_FIELD_TYPE_NATURAL_WIDTH: *ret = *((natural_width *)p); - return 1; + return true; case VMCS_FIELD_TYPE_U16: *ret = *((u16 *)p); - return 1; + return true; case VMCS_FIELD_TYPE_U32: *ret = *((u32 *)p); - return 1; + return true; case VMCS_FIELD_TYPE_U64: *ret = *((u64 *)p); - return 1; + return true; default: - return 0; /* can never happen. */ + return false; /* can never happen. */ } } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: reset RVI upon system reset
On 2014/11/5 10:53, Wei Wang wrote: A bug was reported as follows: when running Windows 7 32-bit guests on qemu-kvm, sometimes the guests run into blue screen during reboot. The problem was that a guest's RVI was not cleared when it rebooted. This patch has fixed the problem. Signed-off-by: Wei Wang wei.w.w...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com Tested-by: Rongrong Liu rongrongx@intel.com, Da Chun ng...@qq.com --- arch/x86/kvm/lapic.c |3 +++ arch/x86/kvm/vmx.c | 12 ++-- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 66dd173..6942742 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1712,6 +1712,9 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu, apic-isr_count = kvm_apic_vid_enabled(vcpu-kvm) ? 1 : count_vectors(apic-regs + APIC_ISR); apic-highest_isr_cache = -1; + if (kvm_x86_ops-hwapic_irr_update) + kvm_x86_ops-hwapic_irr_update(vcpu, + apic_find_highest_irr(apic)); Could we pass 0 directly? Because looks we just need to clear RVI. kvm_x86_ops-hwapic_irr_update(vcpu, 0); I think this already makes sense based on your description. Thanks Tiejun kvm_x86_ops-hwapic_isr_update(vcpu-kvm, apic_find_highest_isr(apic)); kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_rtc_eoi_tracking_restore_one(vcpu); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index fe4d2f4..d632548 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7292,19 +7292,19 @@ static void vmx_set_rvi(int vector) static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr) { if (max_irr == -1) + max_irr = 0; + + if (!is_guest_mode(vcpu)) { + vmx_set_rvi(max_irr); return; + } /* * If a vmexit is needed, vmx_check_nested_events handles it. */ - if (is_guest_mode(vcpu) nested_exit_on_intr(vcpu)) + if ((is_guest_mode(vcpu) nested_exit_on_intr(vcpu)) || max_irr == 0) return; - if (!is_guest_mode(vcpu)) { - vmx_set_rvi(max_irr); - return; - } - /* * Fall back to pre-APICv interrupt injection since L2 * is run without virtual interrupt delivery. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: x86: fix access memslots w/o hold srcu read lock
On 2014/10/31 14:26, Wanpeng Li wrote: The srcu read lock must be held while accessing memslots (e.g. when using gfn_to_* functions), however, commit c24ae0dcd3e8 (kvm: x86: Unpin and remove kvm_arch-apic_access_page) call gfn_to_page() in kvm_vcpu_reload_apic_access_page() w/o hold it in vmx_vcpu_reset() path which leads to suspicious rcu_dereference_check() usage warning. This patch fix it by holding srcu read lock when call gfn_to_page() in vmx_vcpu_reset() path. [ INFO: suspicious RCU usage. ] 3.18.0-rc2-test2+ #70 Not tainted --- include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-x86/2371: #0: (vcpu-mutex){+.+...}, at: [a037d800] vcpu_load+0x20/0xd0 [kvm] stack backtrace: CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 0001 880209983ca8 816f514f 8802099b8990 880209983cd8 810bd687 000fee00 880208a2c000 880208a1 88020ef50040 880209983d08 Call Trace: [816f514f] dump_stack+0x4e/0x71 [810bd687] lockdep_rcu_suspicious+0xe7/0x120 [a037d055] gfn_to_memslot+0xd5/0xe0 [kvm] [a03807d3] __gfn_to_pfn+0x33/0x60 [kvm] [a0380885] gfn_to_page+0x25/0x90 [kvm] [a038aeec] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm] [a08f0a9c] vmx_vcpu_reset+0x20c/0x460 [kvm_intel] [a039ab8e] kvm_vcpu_reset+0x15e/0x1b0 [kvm] [a039ac0c] kvm_arch_vcpu_setup+0x2c/0x50 [kvm] [a037f7e0] kvm_vm_ioctl+0x1d0/0x780 [kvm] [810bc664] ? __lock_is_held+0x54/0x80 [812231f0] do_vfs_ioctl+0x300/0x520 [8122ee45] ? __fget+0x5/0x250 [8122f0fa] ? __fget_light+0x2a/0xe0 [81223491] SyS_ioctl+0x81/0xa0 [816fed6d] system_call_fastpath+0x16/0x1b Reported-by: Takashi Iwai ti...@suse.de Reported-by: Alexei Starovoitov alexei.starovoi...@gmail.com Suggested-by: Tiejun Chen tiejun.c...@intel.com Reviewed-by: Tiejun Chen tiejun.c...@intel.com Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- arch/x86/kvm/vmx.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a0f78db..bd9be01 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4512,6 +4512,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); struct msr_data apic_base_msr; + int idx; vmx-rmode.vm86_active = 0; @@ -4579,7 +4580,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu) vmcs_write32(TPR_THRESHOLD, 0); } + idx = srcu_read_lock(vcpu-kvm-srcu); kvm_vcpu_reload_apic_access_page(vcpu); + srcu_read_unlock(vcpu-kvm-srcu, idx); if (vmx_vm_has_apicv(vcpu-kvm)) memset(vmx-pi_desc, 0, sizeof(struct pi_desc)); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: fix access memslots w/o hold srcu read lock
On 2014/10/31 12:33, Wanpeng Li wrote: The srcu read lock must be held while accessing memslots (e.g. when using gfn_to_* functions), however, commit c24ae0dcd3e8 (kvm: x86: Unpin and remove kvm_arch-apic_access_page) call gfn_to_page() in kvm_vcpu_reload_apic_access_page() w/o hold it which leads to suspicious rcu_dereference_check() usage warning. This patch fix it by holding srcu read lock when call gfn_to_page() in kvm_vcpu_reload_apic_access_page() function. [ INFO: suspicious RCU usage. ] 3.18.0-rc2-test2+ #70 Not tainted --- include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-x86/2371: #0: (vcpu-mutex){+.+...}, at: [a037d800] vcpu_load+0x20/0xd0 [kvm] stack backtrace: CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 0001 880209983ca8 816f514f 8802099b8990 880209983cd8 810bd687 000fee00 880208a2c000 880208a1 88020ef50040 880209983d08 Call Trace: [816f514f] dump_stack+0x4e/0x71 [810bd687] lockdep_rcu_suspicious+0xe7/0x120 [a037d055] gfn_to_memslot+0xd5/0xe0 [kvm] [a03807d3] __gfn_to_pfn+0x33/0x60 [kvm] [a0380885] gfn_to_page+0x25/0x90 [kvm] [a038aeec] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm] [a08f0a9c] vmx_vcpu_reset+0x20c/0x460 [kvm_intel] [a039ab8e] kvm_vcpu_reset+0x15e/0x1b0 [kvm] [a039ac0c] kvm_arch_vcpu_setup+0x2c/0x50 [kvm] [a037f7e0] kvm_vm_ioctl+0x1d0/0x780 [kvm] [810bc664] ? __lock_is_held+0x54/0x80 [812231f0] do_vfs_ioctl+0x300/0x520 [8122ee45] ? __fget+0x5/0x250 [8122f0fa] ? __fget_light+0x2a/0xe0 [81223491] SyS_ioctl+0x81/0xa0 [816fed6d] system_call_fastpath+0x16/0x1b Reported-by: Takashi Iwai ti...@suse.de Reported-by: Alexei Starovoitov alexei.starovoi...@gmail.com Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- arch/x86/kvm/x86.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0033df3..2d97329 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6059,6 +6059,7 @@ static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu) void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) { struct page *page = NULL; + int idx; if (!irqchip_in_kernel(vcpu-kvm)) return; @@ -6066,7 +6067,9 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) if (!kvm_x86_ops-set_apic_access_page_addr) return; + idx = srcu_read_lock(vcpu-kvm-srcu); There's another scenario that we already hold srcu before call kvm_vcpu_reload_apic_access_page(), __vcpu_run() | + vcpu-srcu_idx = srcu_read_lock(kvm-srcu); + r = vcpu_enter_guest(vcpu); | + kvm_vcpu_reload_apic_access_page(vcpu); So according to backtrace I think we should fix as follows: kvm: x86: vmx: hold kvm-srcu while reload apic access page kvm_vcpu_reload_apic_access_page() needs to access memslots via gfn_to_page(), so its necessary to hold kvm-srcu. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- arch/x86/kvm/vmx.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b25a588..9fa1f46 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4442,6 +4442,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); struct msr_data apic_base_msr; + int idx; vmx-rmode.vm86_active = 0; @@ -4509,7 +4510,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu) vmcs_write32(TPR_THRESHOLD, 0); } + idx = srcu_read_lock(vcpu-kvm-srcu); kvm_vcpu_reload_apic_access_page(vcpu); + srcu_read_unlock(vcpu-kvm-srcu, idx); if (vmx_vm_has_apicv(vcpu-kvm)) memset(vmx-pi_desc, 0, sizeof(struct pi_desc)); -- 1.9.1 Thanks Tiejun page = gfn_to_page(vcpu-kvm, APIC_DEFAULT_PHYS_BASE PAGE_SHIFT); + srcu_read_unlock(vcpu-kvm-srcu, idx); kvm_x86_ops-set_apic_access_page_addr(vcpu, page_to_phys(page)); /* -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [x86, kvm] WARNING: at arch/x86/kernel/pvclock.c:182 pvclock_init_vsyscall()
On 2014/9/30 15:59, Fengguang Wu wrote: Greetings, 0day kernel testing robot got the below dmesg and the first bad commit is commit 3dc4f7cfb7441e5e0fed3a02fc81cdaabd28300a Author: Marcelo Tosatti mtosa...@redhat.com AuthorDate: Tue Nov 27 23:28:56 2012 -0200 Commit: Marcelo Tosatti mtosa...@redhat.com CommitDate: Tue Nov 27 23:29:10 2012 -0200 x86: kvm guest: pvclock vsyscall support Hook into generic pvclock vsyscall code, with the aim to allow userspace to have visibility into pvclock data. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com +--++++ | | 71056ae22d | 3dc4f7cfb7 | d778df51c0 | +--++++ | boot_successes | 141| 0 | 0 | | boot_failures| 0 | 47 | 11 | | WARNING:at_arch/x86/kernel/pvclock.c:pvclock_init_vsyscall() | 0 | 47 | 11 | | backtrace:pvclock_init_vsyscall | 0 | 47 | 11 | | backtrace:warn_slowpath_null | 0 | 47 | 11 | | backtrace:kvm_setup_vsyscall_timeinfo| 0 | 47 | 11 | | backtrace:kvm_guest_init | 0 | 47 | 11 | | Kernel_panic-not_syncing:Attempted_to_kill_init_exitcode=| 0 | 0 | 7 | | BUG:kernel_boot_hang | 0 | 0 | 4 | +--++++ [0.00] mapped IOAPIC to ff5f9000 (fec0) [0.00] nr_irqs_gsi: 40 [0.00] [ cut here ] [0.00] WARNING: at arch/x86/kernel/pvclock.c:182 pvclock_init_vsyscall+0x41/0x8e() [0.00] Hardware name: Standard PC (i440FX + PIIX, 1996) [0.00] Modules linked in: [0.00] Pid: 0, comm: swapper Not tainted 3.7.0-rc3-00112-g3dc4f7c #1 Looks you're working with old kernel, so its worth trying the latest again. Tiejun [0.00] Call Trace: [0.00] [8104f750] warn_slowpath_common+0x70/0xa0 [0.00] [8104f83a] warn_slowpath_null+0x1a/0x20 [0.00] [8219a6af] pvclock_init_vsyscall+0x41/0x8e [0.00] [8219a623] kvm_setup_vsyscall_timeinfo+0x48/0x78 [0.00] [8219a397] kvm_guest_init+0x98/0xe1 [0.00] [82190eb8] setup_arch+0xa9b/0xb10 [0.00] [818eeae8] ? printk+0x4f/0x57 [0.00] [8218e856] start_kernel+0x93/0x388 [0.00] [8218e120] ? early_idt_handlers+0x120/0x120 [0.00] [8218e2b4] x86_64_start_reservations+0xb0/0xb3 [0.00] [8218e3b9] x86_64_start_kernel+0x102/0x10f [0.00] ---[ end trace c9f7d63dd24af7e7 ]--- [0.00] KVM setup async PF for cpu 0 git bisect start v3.8 v3.7 -- git bisect bad 8d91a42e54eebc43f4d8f6064751ccba73528275 # 21:30 0- 5 Merge tag 'omap-late-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc git bisect bad 770b6cb4d21fb3e3df2a7a51e186a3c14db1ec30 # 21:35 0- 1 ARM: OMAP: Fix drivers to depend on omap for internal devices git bisect good 9977d9b379cb77e0f67bd6f4563618106e58e11d # 21:40 30+ 0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal git bisect bad e777d192ffb9f2929d547a2f8a5f65b7db7a9552 # 21:44 4- 26 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi git bisect good 8b0cab14951fbf8126795ab301835a8f8126a988 # 22:00 30+ 0 Merge tag 'regulator-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator git bisect good c7708fac5a878d6e0f2de0aa19f9749cff4f707f # 22:11 30+ 0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux git bisect bad e05a1c6397a73d09389e033b6b2c25c954d2177c # 22:17 0- 1 Merge tag 'ktest-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest git bisect good 896ea17d3da5f44b2625c9cda9874d7dfe447393 # 22:36 30+ 0 Merge tag 'stable/for-linus-3.8-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen git bisect bad 66cdd0ceaf65a18996f561b770eedde1d123b019 # 22:41 3- 3 Merge tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm git bisect good 8455d79e2163997e479931b8d5b7e60a92cd2b86 # 23:04 30+ 0 KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run git bisect bad
Re: [PATCH v2] KVM: x86: sync old tmr with ioapic to update
On 2014/8/27 22:05, Wei Wang wrote: kvm_ioapic_scan_entry() needs to update tmr. The previous lapic tmr value (old_tmr) needs to sync with ioapic to get an accurate updated tmr value before the updating work. Tested-by: Rongrong Liu rongrongx@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com Signed-off-by: Wei Wang wei.w.w...@intel.com --- arch/x86/kvm/lapic.c | 19 +-- arch/x86/kvm/x86.c |2 +- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 08e8a89..8c1162d 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -518,10 +518,25 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu) void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr) { struct kvm_lapic *apic = vcpu-arch.apic; + u32 irr; + u32 isr; + u32 old_tmr, new_tmr; int i; - for (i = 0; i 8; i++) - apic_set_reg(apic, APIC_TMR + 0x10 * i, tmr[i]); + /* +* The updated tmr value comes from level-triggerd interrupts that s/level-triggerd/level-triggered +* have already been delieverd to lapic and new coming ones which s/delieverd/delivered +* are pending in ioapic. According to the x86 spec, tmr is valid +* when irr or isr is set. +*/ + for (i = 0; i 8; i++) { + irr = kvm_apic_get_reg(apic, APIC_IRR + 0x10 * i); + isr = kvm_apic_get_reg(apic, APIC_ISR + 0x10 * i); + old_tmr = kvm_apic_get_reg(apic, APIC_TMR + 0x10 * i); + new_tmr = (~(irr | isr) tmr[i]) + | ((irr | isr) old_tmr); + apic_set_reg(apic, APIC_TMR + 0x10 * i, new_tmr); + } } static void apic_update_ppr(struct kvm_lapic *apic) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5f5edb6..d401684 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5991,8 +5991,8 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) memset(tmr, 0, 32); kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); - kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap); kvm_apic_update_tmr(vcpu, tmr); + kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap); } /* -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: keep EOI exit bitmap accurate before loading it.
On 2014/8/27 0:27, Wei Wang wrote: Guest may mask the IOAPIC entry before issue EOI. In such case, EOI will not be intercepted by hypervisor, since the corrensponding Looks this is always missed :) s/corrensponding/corresponding Tiejun bit in eoi exit bitmap is not set after the masking of IOAPIC entry. The solution here is to OR EOI_exit_bitmap with tmr. Tested-by: Rongrong Liu rongrongx@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com Signed-off-by: Wei Wang wei.w.w...@intel.com --- arch/x86/kvm/lapic.c |9 + arch/x86/kvm/lapic.h |2 ++ arch/x86/kvm/x86.c |1 + virt/kvm/ioapic.c|6 +++--- 4 files changed, 15 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 93c2e93..759d24e 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -533,6 +533,15 @@ void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr) } } +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, + u32 *tmr) +{ + u32 i; + + for (i = 0; i 8; i++) + *((u32 *)eoi_exit_bitmap + i) |= tmr[i]; +} + static void apic_update_ppr(struct kvm_lapic *apic) { u32 tpr, isrv, ppr, old_ppr; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 6a11845..eda7be7 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -55,6 +55,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu); void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr); void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir); +void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, + u32 *tmr); int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d401684..4042bc0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5992,6 +5992,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); kvm_apic_update_tmr(vcpu, tmr); + kvm_apic_update_eoi_exitmap(vcpu, eoi_exit_bitmap, tmr); kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap); } diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index e8ce34c..ea5f697 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -254,9 +254,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, spin_lock(ioapic-lock); for (index = 0; index IOAPIC_NUM_PINS; index++) { e = ioapic-redirtbl[index]; - if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG || - kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, index) || - index == RTC_GSI) { + if (!e-fields.mask e-fields.trig_mode == IOAPIC_LEVEL_TRIG + || kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, + index) || index == RTC_GSI) { if (kvm_apic_match_dest(vcpu, NULL, 0, e-fields.dest_id, e-fields.dest_mode)) { __set_bit(e-fields.vector, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: x86: check ISR and TMR to construct eoi exit bitmap
On 2014/8/14 3:16, Wei Wang wrote: From: Yang Zhang yang.z.zh...@intel.com Guest may mask the IOAPIC entry before issue EOI. In such case, EOI will not be intercepted by hypervisor due to the corrensponding s/corrensponding/corresponding bit in eoi exit bitmap is not setting. The solution is to check ISR + TMR to construct the EOI exit bitmap. This patch is a better fixing for the issue that commit 0f6c0a740b tries to solve. Tested-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com Signed-off-by: Wei Wang wei.w.w...@intel.com --- arch/x86/kvm/lapic.c | 17 + arch/x86/kvm/lapic.h |2 ++ arch/x86/kvm/x86.c |9 + virt/kvm/ioapic.c|7 --- 4 files changed, 32 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 08e8a89..0ed4bcb 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -515,6 +515,23 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu) __clear_bit(KVM_APIC_PV_EOI_PENDING, vcpu-arch.apic_attention); } +void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, + u32 *tmr) +{ + u32 i, reg_off, intr_in_service; + struct kvm_lapic *apic = vcpu-arch.apic; + + for (i = 0; i 8; i++) { + reg_off = 0x10 * i; + intr_in_service = apic_read_reg(apic, APIC_ISR + reg_off) + kvm_apic_get_reg(apic, APIC_TMR + reg_off); + if (intr_in_service) { + *((u32 *)eoi_exit_bitmap + i) |= intr_in_service; + tmr[i] |= intr_in_service; + } + } +} + void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr) { struct kvm_lapic *apic = vcpu-arch.apic; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 6a11845..4ee3d70 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -53,6 +53,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value); u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu); void kvm_apic_set_version(struct kvm_vcpu *vcpu); +void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, + u32 *tmr); void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr); void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir); int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 204422d..755b556 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6005,6 +6005,15 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) memset(tmr, 0, 32); kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); + /* +* Guest may mask the IOAPIC entry before issue EOI. In such case, +* EOI will not be intercepted by hypervisor due to the corrensponding s/corrensponding/corresponding +* bit in eoi exit bitmap is not setting. +* +* The solution is to check ISR + TMR to construct the EOI exit bitmap. +*/ + kvm_apic_zap_eoi_exitmap(vcpu, eoi_exit_bitmap, tmr); + kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap); kvm_apic_update_tmr(vcpu, tmr); } diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index e8ce34c..2458a1d 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, spin_lock(ioapic-lock); for (index = 0; index IOAPIC_NUM_PINS; index++) { e = ioapic-redirtbl[index]; - if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG || - kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, index) || - index == RTC_GSI) { + if (!e-fields.mask + (e-fields.trig_mode == IOAPIC_LEVEL_TRIG || +kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, +index) || index == RTC_GSI)) { if (kvm_apic_match_dest(vcpu, NULL, 0, e-fields.dest_id, e-fields.dest_mode)) { __set_bit(e-fields.vector, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: check ISR and TMR to construct eoi exit bitmap
On 2014/8/8 14:02, Yang Zhang wrote: From: Yang Zhang yang.z.zh...@intel.com Guest may mask the IOAPIC entry before issue EOI. In such case, EOI will not be intercepted by hypervisor due to the corrensponding bit in eoi exit bitmap is not setting. The solution is to check ISR + TMR to construct the EOI exit bitmap. This patch is a better fixing for the issue that commit 0f6c0a740b tries to solve. Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- arch/x86/kvm/lapic.c | 22 ++ arch/x86/kvm/lapic.h |2 ++ arch/x86/kvm/x86.c |9 + virt/kvm/ioapic.c|7 --- 4 files changed, 37 insertions(+), 3 deletions(-) hi, alex This patch is not tested since i don't have the environment to do it. Can you help to test it? thanks. diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 08e8a89..d2f9a6e 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -71,6 +71,11 @@ #define VEC_POS(v) ((v) (32 - 1)) #define REG_POS(v) (((v) 5) 4) +static inline u32 apic_read_reg(struct kvm_lapic *apic, int reg_off) +{ + return *((u32 *) (apic-regs + reg_off)); +} + I think we already define the same in the arch/x86/kvm/lapic.h file, static inline u32 kvm_apic_get_reg(struct kvm_lapic *apic, int reg_off) { return *((u32 *) (apic-regs + reg_off)); } Thanks Tiejun static inline void apic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val) { *((u32 *) (apic-regs + reg_off)) = val; @@ -515,6 +520,23 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu) __clear_bit(KVM_APIC_PV_EOI_PENDING, vcpu-arch.apic_attention); } +void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, + u32 *tmr) +{ + u32 i, reg_off, intr_in_service[8]; + struct kvm_lapic *apic = vcpu-arch.apic; + + for (i = 0; i 8; i++) { + reg_off = 0x10 * i; + intr_in_service[i] = apic_read_reg(apic, APIC_ISR + reg_off) + apic_read_reg(apic, APIC_TMR + reg_off); + if (intr_in_service[i]) { + *((u32 *)eoi_exit_bitmap + i) |= intr_in_service[i]; + tmr[i] |= intr_in_service[i]; + } + } +} + void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr) { struct kvm_lapic *apic = vcpu-arch.apic; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 6a11845..4ee3d70 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -53,6 +53,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value); u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu); void kvm_apic_set_version(struct kvm_vcpu *vcpu); +void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, + u32 *tmr); void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr); void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir); int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 204422d..755b556 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6005,6 +6005,15 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) memset(tmr, 0, 32); kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr); + /* +* Guest may mask the IOAPIC entry before issue EOI. In such case, +* EOI will not be intercepted by hypervisor due to the corrensponding +* bit in eoi exit bitmap is not setting. +* +* The solution is to check ISR + TMR to construct the EOI exit bitmap. +*/ + kvm_apic_zap_eoi_exitmap(vcpu, eoi_exit_bitmap, tmr); + kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap); kvm_apic_update_tmr(vcpu, tmr); } diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index e8ce34c..2458a1d 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, spin_lock(ioapic-lock); for (index = 0; index IOAPIC_NUM_PINS; index++) { e = ioapic-redirtbl[index]; - if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG || - kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, index) || - index == RTC_GSI) { + if (!e-fields.mask + (e-fields.trig_mode == IOAPIC_LEVEL_TRIG || +kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, +index) || index == RTC_GSI)) { if (kvm_apic_match_dest(vcpu, NULL, 0, e-fields.dest_id, e-fields.dest_mode)) { __set_bit(e-fields.vector, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
-Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Thursday, May 09, 2013 8:38 PM To: Chen, Tiejun Cc: Bhushan Bharat-R65777; Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc-...@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On Thu, 2013-05-09 at 17:44 +0800, tiejun.chen wrote: Actually in the case GS=1 even if EE=0, EXT/DEC/DBELL still occur as I recall. Only if directed to the hypervisor. Yes, this is our current booehv design with EPCR[EXTGS] = 0. Case 1) - Local_irq_disable() will set soft_enabled = 0 - Now Externel interrupt happens, there we set PACA_IRQ_EE in irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard disabled. No more other interrupt gated by MSR.EE can happen. Looks like the idea here is to not let a device keep on inserting interrupt till the interrupt condition on device is cleared, right? The external interrupt line is level sensitive normally, so we have to mask MSR:EE in that case or the interrupt would keep re-occurring (note that FSL has this concept of auto-acked interrupts via the on die MPIC for which you can potentially use PACA_IRQ_EE_EDGE instead and avoid having to mask MSR:EE). I don't understand the interrupt condition on device is cleared here. Well, the external interrupt line will remain asserted until the device (via the PIC) stops interrupting the processor :-) Yes, the device keeps on interrupting the interrupt until the device clear its interrupted condition. Either because we mask at the PIC level (or raise the priority) or because the condition goes away. I think regardless if you clear the device interrupt status, the system still receive a pending interrupt once EE or GS = 1. Why ? Depends on your PIC actually but if it's a sane one, it won't remember a level interrupt that is gone. An edge interrupt is a different matter and is latched. But the interrupt controller like MPIC should leave this irq as pending if we don't ACK/EOI this irq at all if we just clear the device, right? Some MPIC implementations tend to generate a spurrious IRQ in the case of level IRQs going away. IE. they still remember an event occurred and interrupt the processor, but on IACK return the spurious vector. However that isn't guaranteed to Yes, this is just what I mean. No matter if this vector is still valid, the external interrupt exception always occur once EE = 1 again. be the case and it is perfectly ok (and a good idea) for the PIC to just stop asserting the external interrupt line if the original device interrupt goes away (and it's configured as level sensitive). I don't remember MPIC can work with this way. So I'd like to look the manual again :) Tiejun-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html