Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 11/20/14 00:32, Mario Smarduch wrote: Hi Laszlo, couple observations. I'm wondering if access from qemu and guest won't result in mixed memory attributes and if that's acceptable to the CPU. Normally this would be a problem I think (Jon raised the topic of live migration). However, for flash programming specifically, I think the guest's access pattern ensures that we'll see things OK. When the guest issues the first write access, the memslot is deleted, and everything is forwarded to qemu, both reads and writes. In response qemu modifies the array that *otherwise* backs the flash. These modifications by qemu end up in the dcache mostly. When the guest is done programming, it writes a special command (read array mode) at which point the memslot is recreated (as read-only) and flushed / set up for flushing during demand paging. So from the emulated flash POV, the memslot either doesn't exist at all (and then qemu serves all accesses just fine), or it exists r/o, at which point qemu (host userspace) will have stopped writing to it, and will have set it up for flushing before and during guest read accesses. Also is if you update memory from qemu you may break dirty page logging/migration. Very probably. Jon said the same thing. Unless there is some other way you keep track. Of course it may not be applicable in your case (i.e. flash unused after boot). The flash *is* used after boot, because the UEFI runtime variable services *are* exercised by the guest kernel. However those use the same access pattern (it's the same set of UEFI services just called by a different client). *Uncoordinated* access from guest and host in parallel will be a big problem; but we're not that far yet, and we need to get the flash problem sorted, so that we can at least boot and work on the basic stuff. The flash programming dance happens to provide coordination; the flash mode changes (which are equivalent to the teardown and the recreation of the memslot) can be considered barriers. I hope this is acceptable for the time being... Thanks Laszlo - Mario On 11/17/2014 07:49 AM, Laszlo Ersek wrote: On 11/17/14 16:29, Paolo Bonzini wrote: On 17/11/2014 15:58, Ard Biesheuvel wrote: Readonly memslots are often used to implement emulation of ROMs and NOR flashes, in which case the guest may legally map these regions as uncached. To deal with the incoherency associated with uncached guest mappings, treat all readonly memslots as incoherent, and ensure that pages that belong to regions tagged as such are flushed to DRAM before being passed to the guest. On x86, the processor combines the cacheability values from the two levels of page tables. Is there no way to do the same on ARM? Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict (Device non-Gathering, non-Reordering, no Early Write Acknowledgement -- for EFI_MEMORY_UC), which basically overrides the Stage2 (very lax host) memory attributes. When qemu writes, as part of emulating the flash programming commands, to the RAMBlock that *otherwise* backs the flash range (as a r/o memslot), those writes (from host userspace) tend to end up in dcache. But, when the guest flips back the flash to romd mode, and tries to read back the values from the flash as plain ROM, the dcache is completely bypassed due to the strict stage1 mapping, and the guest goes directly to DRAM. Where qemu's earlier writes are not yet / necessarily visible. Please see my original patch (which was incomplete) in the attachment, it has a very verbose commit message. Anyway, I'll let others explain; they can word it better than I can :) FWIW, Series Reviewed-by: Laszlo Ersek ler...@redhat.com I ported this series to a 3.17.0+ based kernel, and tested it. It works fine. The ROM-like view of the NOR flash now reflects the previously programmed contents. Series Tested-by: Laszlo Ersek ler...@redhat.com Thanks! Laszlo ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v1 0/2] Define some VFIO interfaces for VT-d Posted-Interrupts
VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. You can find the VT-d Posted-Interrtups Spec. in the following URL: http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html This patch set does the following things: - Define a new VFIO group KVM_DEV_VFIO_INTERRUPT and it's attributes KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ. Qemu can use this interface to configure VT-d PI when guest updates the interrupt configuration (MSI/MSI-X configuration). - Define a new VFIO API: vfio_msi_get_irq(), which can be used by KVM to get the host irq of the assigned devices. Then KVM can update the associated IRTE for VT-d PI. Feng Wu (2): vfio: Add new interrupt group for VFIO vfio: Add VFIO API vfio_msi_get_irq Documentation/virtual/kvm/devices/vfio.txt |8 drivers/vfio/pci/vfio_pci.c| 10 ++ include/linux/vfio.h |2 ++ include/uapi/linux/kvm.h | 14 ++ 4 files changed, 34 insertions(+), 0 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v1 2/2] vfio: Add VFIO API vfio_msi_get_irq
This API returns the host irq for the MSI/MSI-X interrrupts. Signed-off-by: Feng Wu feng...@intel.com --- drivers/vfio/pci/vfio_pci.c | 10 ++ include/linux/vfio.h|2 ++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 9558da3..4fb9828 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1009,6 +1009,16 @@ put_devs: kfree(devs.devices); } +unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, bool msix) +{ + struct vfio_pci_device *vdev = + (struct vfio_pci_device *)vfio_device_data(device); + struct pci_dev *pdev = vdev-pdev; + + return msix ? vdev-msix[vector].vector : pdev-irq + vector; +} +EXPORT_SYMBOL_GPL(vfio_msi_get_irq); + static void __exit vfio_pci_cleanup(void) { pci_unregister_driver(vfio_pci_driver); diff --git a/include/linux/vfio.h b/include/linux/vfio.h index d320411..007ca55 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -92,6 +92,8 @@ extern void vfio_unregister_iommu_driver( /* * External user API */ +extern unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, + bool msix); extern struct vfio_group *vfio_group_get_external_user(struct file *filep); extern void vfio_group_put_external_user(struct vfio_group *group); extern int vfio_external_user_iommu_id(struct vfio_group *group); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO
Add new group KVM_DEV_VFIO_INTERRUPT and command KVM_DEV_VFIO_DEVIE_POSTING_IRQ related to it. This is used for VT-d Posted-Interrupts setup. Signed-off-by: Feng Wu feng...@intel.com --- Documentation/virtual/kvm/devices/vfio.txt |8 include/uapi/linux/kvm.h | 14 ++ 2 files changed, 22 insertions(+), 0 deletions(-) diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt index ef51740..bd99176 100644 --- a/Documentation/virtual/kvm/devices/vfio.txt +++ b/Documentation/virtual/kvm/devices/vfio.txt @@ -13,6 +13,7 @@ VFIO-group is held by KVM. Groups: KVM_DEV_VFIO_GROUP + KVM_DEV_VFIO_INTERRUPT KVM_DEV_VFIO_GROUP attributes: KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking @@ -20,3 +21,10 @@ KVM_DEV_VFIO_GROUP attributes: For each, kvm_device_attr.addr points to an int32_t file descriptor for the VFIO group. + +KVM_DEV_VFIO_INTERRUPT attributes: + KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ: Set up the interrupt configuration for +VT-d Posted-Interrrupts + +For each, kvm_device_attr.addr points to struct kvm_posted_intr, which +include the needed information for VT-d Posted-Interrupts setup. diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6076882..5544fcc 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -943,9 +943,23 @@ struct kvm_device_attr { __u64 addr; /* userspace address of attr data */ }; +struct virq_info { + __u32 index; /* index of the msi/msix entry */ + int virq; /* virq of the interrupt */ +}; + +struct kvm_posted_intr { + __u32 fd; /* file descriptor of the VFIO device */ + __u32 count; + boolmsix; + struct virq_info virq_info[0]; +}; + #define KVM_DEV_VFIO_GROUP1 #define KVM_DEV_VFIO_GROUP_ADD 1 #define KVM_DEV_VFIO_GROUP_DEL 2 +#define KVM_DEV_VFIO_INTERRUPT2 +#define KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ 1 enum kvm_device_type { KVM_DEV_TYPE_FSL_MPIC_20= 1, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
Suresh E. Warrier warr...@linux.vnet.ibm.com writes: This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. /* Set this explicitly in case thread 0 doesn't have a vcpu */ @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) vc-vcore_state = VCORE_RUNNING; preempt_disable(); + + trace_kvmppc_run_core(vc, 0); + spin_unlock(vc-lock); Do we really want to call tracepoint with spin lock held ? Is that a good thing to do ?. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] vfio: put off the allocation of minor in vfio_create_group
The next code fragment list_for_each_entry is not depend on minor. With this patch, the free of minor in list_for_each_entry can be reduced, and there is no functional change. Signed-off-by: Zhen Lei thunder.leiz...@huawei.com --- drivers/vfio/vfio.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index f018d8d..737eb468 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -225,22 +225,21 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group) mutex_lock(vfio.group_lock); - minor = vfio_alloc_group_minor(group); - if (minor 0) { - vfio_group_unlock_and_free(group); - return ERR_PTR(minor); - } - /* Did we race creating this group? */ list_for_each_entry(tmp, vfio.group_list, vfio_next) { if (tmp-iommu_group == iommu_group) { vfio_group_get(tmp); - vfio_free_group_minor(minor); vfio_group_unlock_and_free(group); return tmp; } } + minor = vfio_alloc_group_minor(group); + if (minor 0) { + vfio_group_unlock_and_free(group); + return ERR_PTR(minor); + } + dev = device_create(vfio.class, NULL, MKDEV(MAJOR(vfio.group_devt), minor), group, %d, iommu_group_id(iommu_group)); -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
Am 10.11.2014 um 22:07 schrieb Linus Torvalds: [...] So before blacklisting any compilers, let's first see if (a) we can actually make it a real rule that we only use ACCESS_ONCE on scalars (b) we can somehow enforce this with a compiler warning/error for mis-uses For example, the attached patch works for some cases, but shows how we use ACCESS_ONCE() on pointers to pte_t's etc, so it doesn't come even close to compiling the whole kernel. But I wonder how painful that would be to change.. The places where it complains are actually somewhat debatable to begin with, like: - handle_pte_fault(.. pte_t *pte ..): entry = ACCESS_ONCE(*pte); and the thing is, pte is actually possibly an 8-byte entity on x86-32, and that ACCESS_ONCE() fundamentally will be two 32-byte reads. So there is a very valid argument for saying well, you shouldn't do that, then, and that we might be better off cleaning up our ACCESS_ONCE() uses, than to just blindly blacklist compilers. NOTE! I'm not at all advocating the attached patch. I'm sending it out white-space damaged on purpose, it's more of a hey, something like this might be the direction we want to go in, with the spinlock.h part of the patch also acting as an example of the kind of changes the ACCESS_ONCE() only works on scalars rule would require. So I tried to see if I can come up with some results on how often this problem happens... [...] diff --git a/include/linux/compiler.h b/include/linux/compiler.h index d5ad7b1118fc..63e82f1dfc1a 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -378,7 +378,11 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect); * use is to mediate communication between process-level code and irq/NMI * handlers, all running on the same CPU. */ -#define ACCESS_ONCE(x) (*(volatile typeof(x) *)(x)) +#define get_scalar_volatile_pointer(x) ({ \ + typeof(x) *__p = (x); \ + volatile typeof(x) *__vp = __p; \ + (void)(long)*__p; __vp; }) +#define ACCESS_ONCE(x) (*get_scalar_volatile_pointer(x)) ..and just took this patch. On s390 is pretty much clean with allyesconfig In fact with the siif lock changed only the pte/pmd cases you mentioned trigger a compile error: mm/memory.c: In function 'handle_pte_fault': mm/memory.c:3203:2: error: aggregate value used where an integer was expected entry = ACCESS_ONCE(*pte); mm/rmap.c: In function 'mm_find_pmd': mm/rmap.c:584:2: error: aggregate value used where an integer was expected pmde = ACCESS_ONCE(*pmd); Here a barrier() might be a good solution as well, I guess. On x86 allyesconfig its almost the same. - we need your spinlock changes (well, something different to make it compile) - we need to fix pmd and pte - we have gup_get_pte in arch/x86/mm/gup.c getting a ptep So It looks like we could make a change to ACCESS_ONCE. Would something like CONFIG_ARCH_SCALAR_ACCESS_ONCE be a good start? This would boil down to Patch1: Provide stricter ACCESS_ONCE if CONFIG_ARCH_SCALAR_ACCESS_ONCE is set + docu update + comments Patch2: Change mm/* to barriers Patch3: Change x86 locks Patch4: Change x86 gup Patch4: Enable CONFIG_ARCH_SCALAR_ACCESS_ONCE for s390x and x86 Makes sense? Christian -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
On 19.11.14 22:54, Suresh E. Warrier wrote: On 11/14/2014 04:56 AM, Alexander Graf wrote: Am 14.11.2014 um 00:29 schrieb Suresh E. Warrier warr...@linux.vnet.ibm.com: This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. Acked-by: Paul Mackerras pau...@samba.org Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 12 +- arch/powerpc/kvm/book3s_hv.c| 19 ++ arch/powerpc/kvm/trace_hv.h | 497 3 files changed, 525 insertions(+), 3 deletions(-) create mode 100644 arch/powerpc/kvm/trace_hv.h diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 70feb7b..20cbad1 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -38,6 +38,7 @@ #include asm/cputable.h #include book3s_hv_cma.h +#include trace_hv.h /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */ #define MAX_LPID_97063 @@ -627,6 +628,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, gfn = gpa PAGE_SHIFT; memslot = gfn_to_memslot(kvm, gfn); +trace_kvm_page_fault_enter(vcpu, hpte, memslot, ea, dsisr); + /* No memslot means it's an emulated MMIO region */ if (!memslot || (memslot-flags KVM_MEMSLOT_INVALID)) return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea, @@ -639,6 +642,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, mmu_seq = kvm-mmu_notifier_seq; smp_rmb(); +ret = -EFAULT; is_io = 0; pfn = 0; page = NULL; @@ -662,7 +666,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, } up_read(current-mm-mmap_sem); if (!pfn) -return -EFAULT; +goto out_put; } else { page = pages[0]; if (PageHuge(page)) { @@ -690,14 +694,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, pfn = page_to_pfn(page); } -ret = -EFAULT; if (psize pte_size) goto out_put; /* Check WIMG vs. the actual page we're accessing */ if (!hpte_cache_flags_ok(r, is_io)) { if (is_io) -return -EFAULT; +goto out_put; + /* * Allow guest to map emulated device memory as * uncacheable, but actually make it cacheable. @@ -753,6 +757,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, SetPageDirty(page); out_put: +trace_kvm_page_fault_exit(vcpu, hpte, ret); + if (page) { /* * We drop pages[0] here, not page because page might diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 69d4085..5143d17 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -57,6 +57,9 @@ #include book3s.h +#define CREATE_TRACE_POINTS +#include trace_hv.h + /* #define EXIT_DEBUG */ /* #define EXIT_DEBUG_SIMPLE */ /* #define EXIT_DEBUG_INT */ @@ -1679,6 +1682,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) { kvmppc_start_thread(vcpu); kvmppc_create_dtl_entry(vcpu, vc); +trace_kvm_guest_enter(vcpu); } /* Set this explicitly in case thread 0 doesn't have a vcpu */ @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) vc-vcore_state = VCORE_RUNNING; preempt_disable(); + +trace_kvmppc_run_core(vc, 0); + spin_unlock(vc-lock); kvm_guest_enter(); @@ -1732,6 +1739,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) kvmppc_core_pending_dec(vcpu)) kvmppc_core_dequeue_dec(vcpu); +trace_kvm_guest_exit(vcpu); + ret = RESUME_GUEST; if (vcpu-arch.trap) ret = kvmppc_handle_exit_hv(vcpu-arch.kvm_run, vcpu, @@ -1757,6 +1766,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) wake_up(vcpu-arch.cpu_run); } } + +trace_kvmppc_run_core(vc, 1); } /* @@ -1783,11 +1794,13 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) prepare_to_wait(vc-wq, wait, TASK_INTERRUPTIBLE); vc-vcore_state = VCORE_SLEEPING; +trace_kvmppc_vcore_blocked(vc, 0); spin_unlock(vc-lock); schedule(); finish_wait(vc-wq, wait); spin_lock(vc-lock); vc-vcore_state = VCORE_INACTIVE; +trace_kvmppc_vcore_blocked(vc, 1); } static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) @@ -1796,6 +1809,8 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) struct kvmppc_vcore *vc; struct kvm_vcpu *v, *vn; +
Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
On 20.11.14 11:40, Aneesh Kumar K.V wrote: Suresh E. Warrier warr...@linux.vnet.ibm.com writes: This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. /* Set this explicitly in case thread 0 doesn't have a vcpu */ @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) vc-vcore_state = VCORE_RUNNING; preempt_disable(); + +trace_kvmppc_run_core(vc, 0); + spin_unlock(vc-lock); Do we really want to call tracepoint with spin lock held ? Is that a good thing to do ?. I thought it was safe to call tracepoints inside of spin lock regions? Steve? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: ia64: remove
2014-11-19 22:05+0100, Paolo Bonzini: KVM for ia64 has been marked as broken not just once, but twice even, and the last patch from the maintainer is now roughly 5 years old. Time for it to rest in piece. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- Nice, if only every diffstat was like that! I propose another removal. (The reasoning below wasn't confirmed with ia64 compiler. I'd remove the ioctls even if they worked.) ---8--- KVM: remove buggy ia64 specific ioctls IA64 is no longer present so new applications shouldn't use them. The main problem is that they most likely didn't work even before, because we have misused ioctl #define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug) #define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) as struct kvm_guest_debug { __u32 control; __u32 pad; struct kvm_guest_debug_arch arch; }; and struct kvm_guest_debug_arch { }; mean that sizeof(struct kvm_guest_debug) == sizeof(void *) == 8 thus KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK and KVM_SET_GUEST_DEBUG is handled before KVM_IA64_VCPU_SET_STACK. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- include/uapi/linux/kvm.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6d59e5b..a37fd12 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1099,9 +1099,6 @@ struct kvm_s390_ucas_mapping { #define KVM_X86_SETUP_MCE _IOW(KVMIO, 0x9c, __u64) #define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO, 0x9d, __u64) #define KVM_X86_SET_MCE _IOW(KVMIO, 0x9e, struct kvm_x86_mce) -/* IA64 stack access */ -#define KVM_IA64_VCPU_GET_STACK _IOR(KVMIO, 0x9a, void *) -#define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) /* Available with KVM_CAP_VCPU_EVENTS */ #define KVM_GET_VCPU_EVENTS _IOR(KVMIO, 0x9f, struct kvm_vcpu_events) #define KVM_SET_VCPU_EVENTS _IOW(KVMIO, 0xa0, struct kvm_vcpu_events) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/
ia64 does not need them anymore. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch/x86/include/asm/kvm_host.h | 16 arch/x86/kvm/Makefile | 5 ++--- {virt = arch/x86}/kvm/ioapic.c | 0 {virt = arch/x86}/kvm/ioapic.h | 1 - {virt = arch/x86}/kvm/irq_comm.c | 4 ++-- arch/x86/kvm/x86.c| 1 + include/linux/kvm_host.h | 22 -- virt/kvm/eventfd.c| 7 --- virt/kvm/kvm_main.c | 3 --- 9 files changed, 29 insertions(+), 30 deletions(-) rename {virt = arch/x86}/kvm/ioapic.c (100%) rename {virt = arch/x86}/kvm/ioapic.h (98%) rename {virt = arch/x86}/kvm/irq_comm.c (98%) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 769db36a3001..76ff3e2d8fd2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -603,6 +603,9 @@ struct kvm_arch { struct kvm_xen_hvm_config xen_hvm_config; + /* reads protected by irq_srcu, writes by irq_lock */ + struct hlist_head mask_notifier_list; + /* fields used by HYPER-V emulation */ u64 hv_guest_os_id; u64 hv_hypercall; @@ -819,6 +822,19 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa, const void *val, int bytes); u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); +struct kvm_irq_mask_notifier { + void (*func)(struct kvm_irq_mask_notifier *kimn, bool masked); + int irq; + struct hlist_node link; +}; + +void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq, + struct kvm_irq_mask_notifier *kimn); +void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq, + struct kvm_irq_mask_notifier *kimn); +void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin, +bool mask); + extern bool tdp_enabled; u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 25d22b2d6509..ee1cd92b03be 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -7,14 +7,13 @@ CFLAGS_vmx.o := -I. KVM := ../../../virt/kvm -kvm-y += $(KVM)/kvm_main.o $(KVM)/ioapic.o \ - $(KVM)/coalesced_mmio.o $(KVM)/irq_comm.o \ +kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)+= $(KVM)/assigned-dev.o $(KVM)/iommu.o kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ - i8254.o cpuid.o pmu.o + i8254.o ioapic.o irq_comm.o cpuid.o pmu.o kvm-intel-y+= vmx.o kvm-amd-y += svm.o diff --git a/virt/kvm/ioapic.c b/arch/x86/kvm/ioapic.c similarity index 100% rename from virt/kvm/ioapic.c rename to arch/x86/kvm/ioapic.c diff --git a/virt/kvm/ioapic.h b/arch/x86/kvm/ioapic.h similarity index 98% rename from virt/kvm/ioapic.h rename to arch/x86/kvm/ioapic.h index dc3baa3a538f..deac8d509f2a 100644 --- a/virt/kvm/ioapic.h +++ b/arch/x86/kvm/ioapic.h @@ -96,7 +96,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, unsigned long *dest_map); int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state); int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state); -void kvm_vcpu_request_scan_ioapic(struct kvm *kvm); void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap, u32 *tmr); diff --git a/virt/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c similarity index 98% rename from virt/kvm/irq_comm.c rename to arch/x86/kvm/irq_comm.c index 1345bde064f5..e9c135b639aa 100644 --- a/virt/kvm/irq_comm.c +++ b/arch/x86/kvm/irq_comm.c @@ -234,7 +234,7 @@ void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq, { mutex_lock(kvm-irq_lock); kimn-irq = irq; - hlist_add_head_rcu(kimn-link, kvm-mask_notifier_list); + hlist_add_head_rcu(kimn-link, kvm-arch.mask_notifier_list); mutex_unlock(kvm-irq_lock); } @@ -256,7 +256,7 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin, idx = srcu_read_lock(kvm-irq_srcu); gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin); if (gsi != -1) - hlist_for_each_entry_rcu(kimn, kvm-mask_notifier_list, link) + hlist_for_each_entry_rcu(kimn, kvm-arch.mask_notifier_list, link) if (kimn-irq == gsi) kimn-func(kimn, mask); srcu_read_unlock(kvm-irq_srcu, idx); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a8f53a6960fd..5337039427c8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@
Re: [PATCH] KVM: ia64: remove
On 20/11/2014 14:38, Radim Krčmář wrote: I propose another removal. (The reasoning below wasn't confirmed with ia64 compiler. I'd remove the ioctls even if they worked.) Checked and applied, thanks. Paolo ---8--- KVM: remove buggy ia64 specific ioctls IA64 is no longer present so new applications shouldn't use them. The main problem is that they most likely didn't work even before, because we have misused ioctl #define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug) #define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) as struct kvm_guest_debug { __u32 control; __u32 pad; struct kvm_guest_debug_arch arch; }; and struct kvm_guest_debug_arch { }; mean that sizeof(struct kvm_guest_debug) == sizeof(void *) == 8 thus KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK and KVM_SET_GUEST_DEBUG is handled before KVM_IA64_VCPU_SET_STACK. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- include/uapi/linux/kvm.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6d59e5b..a37fd12 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1099,9 +1099,6 @@ struct kvm_s390_ucas_mapping { #define KVM_X86_SETUP_MCE _IOW(KVMIO, 0x9c, __u64) #define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO, 0x9d, __u64) #define KVM_X86_SET_MCE _IOW(KVMIO, 0x9e, struct kvm_x86_mce) -/* IA64 stack access */ -#define KVM_IA64_VCPU_GET_STACK _IOR(KVMIO, 0x9a, void *) -#define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) /* Available with KVM_CAP_VCPU_EVENTS */ #define KVM_GET_VCPU_EVENTS _IOR(KVMIO, 0x9f, struct kvm_vcpu_events) #define KVM_SET_VCPU_EVENTS _IOW(KVMIO, 0xa0, struct kvm_vcpu_events) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
On Thu, 20 Nov 2014 13:10:12 +0100 Alexander Graf ag...@suse.de wrote: On 20.11.14 11:40, Aneesh Kumar K.V wrote: Suresh E. Warrier warr...@linux.vnet.ibm.com writes: This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. /* Set this explicitly in case thread 0 doesn't have a vcpu */ @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) vc-vcore_state = VCORE_RUNNING; preempt_disable(); + + trace_kvmppc_run_core(vc, 0); + spin_unlock(vc-lock); Do we really want to call tracepoint with spin lock held ? Is that a good thing to do ?. I thought it was safe to call tracepoints inside of spin lock regions? Steve? There's tracepoints in the guts of the scheduler where rq lock is held. Don't worry about it. The tracing system is lockless. -- Steve -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 01/19] vfio/platform: initial skeleton of VFIO support for platform devices
On Wed, Nov 12, 2014 at 5:49 PM, Alex Williamson alex.william...@redhat.com wrote: On Wed, 2014-11-12 at 11:05 +0100, Eric Auger wrote: Hi Antonios, On 10/27/2014 07:07 PM, Antonios Motakis wrote: This patch forms the common skeleton code for platform devices support with VFIO. This will include the core functionality of VFIO_PLATFORM, however binding to the device and discovering the device resources will be done with the help of a separate file where any Linux platform bus specific code will reside. This will allow us to implement support for also discovering AMBA devices and their resources, but still reuse a large part of the VFIO_PLATFORM implementation. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/platform/vfio_platform_common.c | 126 ++ drivers/vfio/platform/vfio_platform_private.h | 36 2 files changed, 162 insertions(+) create mode 100644 drivers/vfio/platform/vfio_platform_common.c create mode 100644 drivers/vfio/platform/vfio_platform_private.h diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c new file mode 100644 index 000..e0fdbc8 --- /dev/null +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -0,0 +1,126 @@ +/* + * Copyright (C) 2013 - Virtual Open Systems + * Author: Antonios Motakis a.mota...@virtualopensystems.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/device.h +#include linux/interrupt.h +#include linux/iommu.h +#include linux/module.h +#include linux/mutex.h +#include linux/notifier.h +#include linux/pm_runtime.h +#include linux/slab.h +#include linux/types.h +#include linux/uaccess.h +#include linux/vfio.h +#include linux/io.h not sure at that state all the above includes are needed. + +#include vfio_platform_private.h + +static void vfio_platform_release(void *device_data) +{ + module_put(THIS_MODULE); +} + +static int vfio_platform_open(void *device_data) +{ + if (!try_module_get(THIS_MODULE)) + return -ENODEV; + + return 0; +} + +static long vfio_platform_ioctl(void *device_data, + unsigned int cmd, unsigned long arg) a minor style comment/question that applies on all the series. Shouldn't subsequent argument lines rather be aligned with the first char after (, as done in PCI code? It's also my preferred style to indent to just after the open paren on wrapped lines where possible, but I don't think there are hard rules in CodingStyle or checkpatch that enforce this, so I often let it slide. Thanks, You're right that there are no hard coding style rules for this, but I also like this style so I'll apply it more consistently. Alex +{ + if (cmd == VFIO_DEVICE_GET_INFO) + return -EINVAL; + + else if (cmd == VFIO_DEVICE_GET_REGION_INFO) + return -EINVAL; + + else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) + return -EINVAL; + + else if (cmd == VFIO_DEVICE_SET_IRQS) + return -EINVAL; + + else if (cmd == VFIO_DEVICE_RESET) + return -EINVAL; + + return -ENOTTY; +} + +static ssize_t vfio_platform_read(void *device_data, char __user *buf, + size_t count, loff_t *ppos) +{ + return -EINVAL; +} + +static ssize_t vfio_platform_write(void *device_data, const char __user *buf, + size_t count, loff_t *ppos) +{ + return -EINVAL; +} + +static int vfio_platform_mmap(void *device_data, struct vm_area_struct *vma) +{ + return -EINVAL; +} + +static const struct vfio_device_ops vfio_platform_ops = { + .name = vfio-platform, + .open = vfio_platform_open, + .release= vfio_platform_release, + .ioctl = vfio_platform_ioctl, + .read = vfio_platform_read, + .write = vfio_platform_write, + .mmap = vfio_platform_mmap, +}; + +int vfio_platform_probe_common(struct vfio_platform_device *vdev, + struct device *dev) +{ + struct iommu_group *group; + int ret; + + if (!vdev) + return -EINVAL; + + group = iommu_group_get(dev); + if (!group) { + pr_err(VFIO: No
Re: [PATCH v9 06/19] vfio/platform: return info for bound device
On Wed, Nov 12, 2014 at 5:36 PM, Alex Williamson alex.william...@redhat.com wrote: On Wed, 2014-11-12 at 11:32 +0100, Eric Auger wrote: On 10/27/2014 07:07 PM, Antonios Motakis wrote: A VFIO userspace driver will start by opening the VFIO device that corresponds to an IOMMU group, and will use the ioctl interface to get the basic device info, such as number of memory regions and interrupts, and their properties. This patch enables the VFIO_DEVICE_GET_INFO ioctl call. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/platform/vfio_platform_common.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index e0fdbc8..cb20526 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -43,10 +43,27 @@ static int vfio_platform_open(void *device_data) static long vfio_platform_ioctl(void *device_data, unsigned int cmd, unsigned long arg) { - if (cmd == VFIO_DEVICE_GET_INFO) - return -EINVAL; + struct vfio_platform_device *vdev = device_data; + unsigned long minsz; + + if (cmd == VFIO_DEVICE_GET_INFO) { + struct vfio_device_info info; + + minsz = offsetofend(struct vfio_device_info, num_irqs); + + if (copy_from_user(info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz minsz) + return -EINVAL; + + info.flags = vdev-flags; + info.num_regions = 0; + info.num_irqs = 0; Seems a bit weird to me to enable the modality but returning zeroed values. Shouldn't we put that patch after VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO ones? I actually like how Antonios has started from a base framework, exposing a device but none of the resources and then incrementally adds each component. It's also a good showcase of the VFIO ABI that we can do things like this. Thanks, I also agree with Alex with this. But of course I'm not married with any particular splitting style, in case we decide to change this. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig
On Thu, Nov 13, 2014 at 9:05 AM, Hongbo Zhang hongbo.zh...@linaro.org wrote: On 12 November 2014 17:57, Antonios Motakis a.mota...@virtualopensystems.com wrote: Hello Hongbo, On Wed, Nov 12, 2014 at 10:52 AM, Hongbo Zhang hongbo.zh...@linaro.org wrote: On 28 October 2014 02:07, Antonios Motakis a.mota...@virtualopensystems.com wrote: Enable building the VFIO PLATFORM driver that allows to use Linux platform devices with VFIO. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/Kconfig | 1 + drivers/vfio/Makefile | 1 + drivers/vfio/platform/Kconfig | 9 + drivers/vfio/platform/Makefile | 4 4 files changed, 15 insertions(+) create mode 100644 drivers/vfio/platform/Kconfig create mode 100644 drivers/vfio/platform/Makefile diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index a0abe04..962fb80 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -27,3 +27,4 @@ menuconfig VFIO If you don't know what to do here, say N. source drivers/vfio/pci/Kconfig +source drivers/vfio/platform/Kconfig diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index 0b035b1..dadf0ca 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o obj-$(CONFIG_VFIO_PCI) += pci/ +obj-$(CONFIG_VFIO_PLATFORM) += platform/ diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig new file mode 100644 index 000..c51af17 --- /dev/null +++ b/drivers/vfio/platform/Kconfig @@ -0,0 +1,9 @@ +config VFIO_PLATFORM + tristate VFIO support for platform devices + depends on VFIO EVENTFD ARM Hi Antonios, Is this only for ARM? how about X86 and PowerPC? On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral Access Management Unit), and I am trying to use this VFIO framework on it. In principle it should be working on any platform with such devices; as long as you have a VFIO IOMMU driver for the PAMU (on ARM we use VFIO PLATFORM for the device, with VFIO IOMMU TYPE1 for the IOMMU). Antonios, As far as you know, on which ARM platform can I apply your patches directly? My purpose is to apply you patches[1], and then implement a user space driver to evaluate the performance. In principle, if your target has a working IOMMU in front of a platform or AMBA device, then you should be able to use this. In practice, I have tested this on various fast models, and less extensively on Arndale in the past. Linaro as far as I know has tested this series on Calxeda devices with an xgmac NIC. [1] It is better without manually merging conflicts/dependencies etc, I am vfio-platform user, not a iommu expert. So if you have a suitable IOMMU driver for your target, feel free to test it, and let us know of the results. + help + Support for platform devices with VFIO. This is required to make + use of platform devices present on the system using the VFIO + framework. + + If you don't know what to do here, say N. diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile new file mode 100644 index 000..279862b --- /dev/null +++ b/drivers/vfio/platform/Makefile @@ -0,0 +1,4 @@ + +vfio-platform-y := vfio_platform.o vfio_platform_common.o + +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o -- 2.1.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v9 06/19] vfio/platform: return info for bound device
On 11/20/2014 03:10 PM, Antonios Motakis wrote: On Wed, Nov 12, 2014 at 5:36 PM, Alex Williamson alex.william...@redhat.com wrote: On Wed, 2014-11-12 at 11:32 +0100, Eric Auger wrote: On 10/27/2014 07:07 PM, Antonios Motakis wrote: A VFIO userspace driver will start by opening the VFIO device that corresponds to an IOMMU group, and will use the ioctl interface to get the basic device info, such as number of memory regions and interrupts, and their properties. This patch enables the VFIO_DEVICE_GET_INFO ioctl call. Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com --- drivers/vfio/platform/vfio_platform_common.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index e0fdbc8..cb20526 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -43,10 +43,27 @@ static int vfio_platform_open(void *device_data) static long vfio_platform_ioctl(void *device_data, unsigned int cmd, unsigned long arg) { - if (cmd == VFIO_DEVICE_GET_INFO) - return -EINVAL; + struct vfio_platform_device *vdev = device_data; + unsigned long minsz; + + if (cmd == VFIO_DEVICE_GET_INFO) { + struct vfio_device_info info; + + minsz = offsetofend(struct vfio_device_info, num_irqs); + + if (copy_from_user(info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz minsz) + return -EINVAL; + + info.flags = vdev-flags; + info.num_regions = 0; + info.num_irqs = 0; Seems a bit weird to me to enable the modality but returning zeroed values. Shouldn't we put that patch after VFIO_DEVICE_GET_REGION_INFO and VFIO_DEVICE_GET_IRQ_INFO ones? I actually like how Antonios has started from a base framework, exposing a device but none of the resources and then incrementally adds each component. It's also a good showcase of the VFIO ABI that we can do things like this. Thanks, I also agree with Alex with this. But of course I'm not married with any particular splitting style, in case we decide to change this. Hi Antonios, please keep as is. I also learn each day about splitting style ;-) Best Regards Eric Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support
On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall christoffer.d...@linaro.org wrote: On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote: Hi All, I have second thoughts about rebasing KVM PMU patches to Marc's irq-forwarding patches. The PMU IRQs (when virtualized by KVM) are not exactly forwarded IRQs because they are shared between Host and Guest. Scenario1 - We might have perf running on Host and no KVM guest running. In this scenario, we wont get interrupts on Host because the kvm_pmu_hyp_init() (similar to the function kvm_timer_hyp_init() of Marc's IRQ-forwarding implementation) has put all host PMU IRQs in forwarding mode. The only way solve this problem is to not set forwarding mode for PMU IRQs in kvm_pmu_hyp_init() and instead have special routines to turn on and turn off the forwarding mode of PMU IRQs. These routines will be called from kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ forwarding state. Scenario2 - We might have perf running on Host and Guest simultaneously which means it is quite likely that PMU HW trigger IRQ meant for Host between ret = kvm_call_hyp(__kvm_vcpu_run, vcpu); and kvm_pmu_sync_hwstate(vcpu); (similar to timer sync routine of Marc's patchset which is called before local_irq_enable()). In this scenario, the updated kvm_pmu_sync_hwstate(vcpu) will accidentally forward IRQ meant for Host to Guest unless we put additional checks to inspect VCPU PMU state. Am I missing any detail about IRQ forwarding for above scenarios? Hi Anup, Hi Christoffer, I briefly discussed this with Marc. What I don't understand is how it would be possible to get an interrupt for the host while running the guest? The rationale behind my question is that whenever you're running the guest, the PMU should be programmed exclusively with guest state, and since the PMU is per core, any interrupts should be for the guest, where it would always be pending. Yes, thats right PMU is programmed exclusively for guest when guest is running and for host when host is running. Let us assume a situation (Scenario2 mentioned previously) where both host and guest are using PMU. When the guest is running we come back to host mode due to variety of reasons (stage2 fault, guest IO, regular host interrupt, host interrupt meant for guest, ) which means we will return from the ret = kvm_call_hyp(__kvm_vcpu_run, vcpu); statement in the kvm_arch_vcpu_ioctl_run() function with local IRQs disabled. At this point we would have restored back host PMU context and any PMU counter used by host can trigger PMU overflow interrup for host. Now we will be having kvm_pmu_sync_hwstate(vcpu); in the kvm_arch_vcpu_ioctl_run() function (similar to the kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset) which will try to detect PMU irq forwarding state in GIC hence it can accidentally discover PMU irq pending for guest while this PMU irq is actually meant for host. This above mentioned situation does not happen for timer because virtual timer interrupts are exclusively used for guest. The exclusive use of virtual timer interrupt for guest ensures that the function kvm_timer_sync_hwstate() will always see correct state of virtual timer IRQ from GIC. When migrating a VM with a pending PMU interrupt away for a CPU core, we also capture the active state (the forwarding patches already handle this), and obviously the PMU state along with it. Yes, the migration of PMU state and PMU interrupt state is quite clear. Does this address your concern? I hope above description give you idea about the concern raised by me. -Christoffer Regards, Anup -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] vfio: put off the allocation of minor in vfio_create_group
On Thu, 2014-11-20 at 19:25 +0800, Zhen Lei wrote: The next code fragment list_for_each_entry is not depend on minor. With this patch, the free of minor in list_for_each_entry can be reduced, and there is no functional change. A reasonable micro-optimization, but I'm curious if you're actually seeing some measurable overhead from this. It seems like we'd need to have multiple devices, all within the same IOMMU group, all probed by vfio-pci at the same time to exercise the race condition. Thanks, Alex Signed-off-by: Zhen Lei thunder.leiz...@huawei.com --- drivers/vfio/vfio.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index f018d8d..737eb468 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -225,22 +225,21 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group) mutex_lock(vfio.group_lock); - minor = vfio_alloc_group_minor(group); - if (minor 0) { - vfio_group_unlock_and_free(group); - return ERR_PTR(minor); - } - /* Did we race creating this group? */ list_for_each_entry(tmp, vfio.group_list, vfio_next) { if (tmp-iommu_group == iommu_group) { vfio_group_get(tmp); - vfio_free_group_minor(minor); vfio_group_unlock_and_free(group); return tmp; } } + minor = vfio_alloc_group_minor(group); + if (minor 0) { + vfio_group_unlock_and_free(group); + return ERR_PTR(minor); + } + dev = device_create(vfio.class, NULL, MKDEV(MAJOR(vfio.group_devt), minor), group, %d, iommu_group_id(iommu_group)); -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO
On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote: Add new group KVM_DEV_VFIO_INTERRUPT and command KVM_DEV_VFIO_DEVIE_POSTING_IRQ related to it. This is used for VT-d Posted-Interrupts setup. Eric proposed an interface for ARM forwarded interrupts[1] using group KVM_DEV_VFIO_DEVICE with attributes KVM_DEV_VFIO_DEVICE_ASSIGN_IRQ and KVM_DEV_VFIO_DEVICE_DEASSIGN_IRQ. Why are we proposing yet another group and attributes here? Why can't we re-use the ones Eric proposes? [1] https://lkml.org/lkml/2014/8/25/258 Signed-off-by: Feng Wu feng...@intel.com --- Documentation/virtual/kvm/devices/vfio.txt |8 include/uapi/linux/kvm.h | 14 ++ 2 files changed, 22 insertions(+), 0 deletions(-) diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt index ef51740..bd99176 100644 --- a/Documentation/virtual/kvm/devices/vfio.txt +++ b/Documentation/virtual/kvm/devices/vfio.txt @@ -13,6 +13,7 @@ VFIO-group is held by KVM. Groups: KVM_DEV_VFIO_GROUP + KVM_DEV_VFIO_INTERRUPT KVM_DEV_VFIO_GROUP attributes: KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking @@ -20,3 +21,10 @@ KVM_DEV_VFIO_GROUP attributes: For each, kvm_device_attr.addr points to an int32_t file descriptor for the VFIO group. + +KVM_DEV_VFIO_INTERRUPT attributes: + KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ: Set up the interrupt configuration for +VT-d Posted-Interrrupts + +For each, kvm_device_attr.addr points to struct kvm_posted_intr, which +include the needed information for VT-d Posted-Interrupts setup. diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6076882..5544fcc 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -943,9 +943,23 @@ struct kvm_device_attr { __u64 addr; /* userspace address of attr data */ }; +struct virq_info { + __u32 index; /* index of the msi/msix entry */ + int virq; /* virq of the interrupt */ +}; + +struct kvm_posted_intr { + __u32 fd; /* file descriptor of the VFIO device */ + __u32 count; + boolmsix; Note that MSI-X (as opposed to MSI) is a PCI concept. Being a VFIO interface this should operate at VFIO IRQ index and sub-index. + struct virq_info virq_info[0]; +}; + #define KVM_DEV_VFIO_GROUP 1 #define KVM_DEV_VFIO_GROUP_ADD 1 #define KVM_DEV_VFIO_GROUP_DEL 2 +#define KVM_DEV_VFIO_INTERRUPT 2 +#define KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ 1 enum kvm_device_type { KVM_DEV_TYPE_FSL_MPIC_20= 1, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO
On 11/20/2014 04:53 PM, Alex Williamson wrote: On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote: Add new group KVM_DEV_VFIO_INTERRUPT and command KVM_DEV_VFIO_DEVIE_POSTING_IRQ related to it. This is used for VT-d Posted-Interrupts setup. Eric proposed an interface for ARM forwarded interrupts[1] using group KVM_DEV_VFIO_DEVICE with attributes KVM_DEV_VFIO_DEVICE_ASSIGN_IRQ and KVM_DEV_VFIO_DEVICE_DEASSIGN_IRQ. Why are we proposing yet another group and attributes here? Why can't we re-use the ones Eric proposes? Hi Alex, Feng I share your point of view about the KVM_DEV_VFIO_DEVICE group. For the attribute (renamed KVM_DEV_VFIO_DEVICE_FORWARD_IRQ in RFC v2) the issue is I specify the kvm_device_attr.addr points to a kvm_arch_forwarded_irq struct. Feng needs another struct - kvm_posted_intr -. An alternative is to merge both structs if it makes sense. Best Regards Eric [1] https://lkml.org/lkml/2014/8/25/258 Signed-off-by: Feng Wu feng...@intel.com --- Documentation/virtual/kvm/devices/vfio.txt |8 include/uapi/linux/kvm.h | 14 ++ 2 files changed, 22 insertions(+), 0 deletions(-) diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt index ef51740..bd99176 100644 --- a/Documentation/virtual/kvm/devices/vfio.txt +++ b/Documentation/virtual/kvm/devices/vfio.txt @@ -13,6 +13,7 @@ VFIO-group is held by KVM. Groups: KVM_DEV_VFIO_GROUP + KVM_DEV_VFIO_INTERRUPT KVM_DEV_VFIO_GROUP attributes: KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking @@ -20,3 +21,10 @@ KVM_DEV_VFIO_GROUP attributes: For each, kvm_device_attr.addr points to an int32_t file descriptor for the VFIO group. + +KVM_DEV_VFIO_INTERRUPT attributes: + KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ: Set up the interrupt configuration for +VT-d Posted-Interrrupts + +For each, kvm_device_attr.addr points to struct kvm_posted_intr, which +include the needed information for VT-d Posted-Interrupts setup. diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6076882..5544fcc 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -943,9 +943,23 @@ struct kvm_device_attr { __u64 addr; /* userspace address of attr data */ }; +struct virq_info { +__u32 index; /* index of the msi/msix entry */ +int virq; /* virq of the interrupt */ +}; + +struct kvm_posted_intr { +__u32 fd; /* file descriptor of the VFIO device */ +__u32 count; +boolmsix; Note that MSI-X (as opposed to MSI) is a PCI concept. Being a VFIO interface this should operate at VFIO IRQ index and sub-index. +struct virq_info virq_info[0]; +}; + #define KVM_DEV_VFIO_GROUP 1 #define KVM_DEV_VFIO_GROUP_ADD1 #define KVM_DEV_VFIO_GROUP_DEL2 +#define KVM_DEV_VFIO_INTERRUPT 2 +#define KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ1 enum kvm_device_type { KVM_DEV_TYPE_FSL_MPIC_20= 1, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v1 2/2] vfio: Add VFIO API vfio_msi_get_irq
On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote: This API returns the host irq for the MSI/MSI-X interrrupts. Signed-off-by: Feng Wu feng...@intel.com --- drivers/vfio/pci/vfio_pci.c | 10 ++ include/linux/vfio.h|2 ++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 9558da3..4fb9828 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1009,6 +1009,16 @@ put_devs: kfree(devs.devices); } +unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, bool msix) +{ + struct vfio_pci_device *vdev = + (struct vfio_pci_device *)vfio_device_data(device); + struct pci_dev *pdev = vdev-pdev; + + return msix ? vdev-msix[vector].vector : pdev-irq + vector; +} +EXPORT_SYMBOL_GPL(vfio_msi_get_irq); + It's difficult to see what you're going to do with this, but it's making the assumption that the vfio_device is a vfio_pci_device, which is the wrong starting point. Eric proposed[1] some VFIO external user interface extensions, that have since been refined in discussion that allows the external user (KVM in this case) access to the struct device where it can get the IRQ vectors itself. We either need to follow that same path for posted interrupts, teaching KVM about struct pci_device in order to extract the IRQ info, or we need to create another vfio external user interface extension that can return IRQ info regardless of the device type. Thanks, Alex [1] https://lkml.org/lkml/2014/8/25/257 static void __exit vfio_pci_cleanup(void) { pci_unregister_driver(vfio_pci_driver); diff --git a/include/linux/vfio.h b/include/linux/vfio.h index d320411..007ca55 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -92,6 +92,8 @@ extern void vfio_unregister_iommu_driver( /* * External user API */ +extern unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, + bool msix); extern struct vfio_group *vfio_group_get_external_user(struct file *filep); extern void vfio_group_put_external_user(struct vfio_group *group); extern int vfio_external_user_iommu_id(struct vfio_group *group); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v1 2/2] vfio: Add VFIO API vfio_msi_get_irq
On 11/20/2014 05:05 PM, Alex Williamson wrote: On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote: This API returns the host irq for the MSI/MSI-X interrrupts. Signed-off-by: Feng Wu feng...@intel.com --- drivers/vfio/pci/vfio_pci.c | 10 ++ include/linux/vfio.h|2 ++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 9558da3..4fb9828 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1009,6 +1009,16 @@ put_devs: kfree(devs.devices); } +unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, bool msix) +{ +struct vfio_pci_device *vdev = +(struct vfio_pci_device *)vfio_device_data(device); +struct pci_dev *pdev = vdev-pdev; + +return msix ? vdev-msix[vector].vector : pdev-irq + vector; +} +EXPORT_SYMBOL_GPL(vfio_msi_get_irq); + It's difficult to see what you're going to do with this, but it's making the assumption that the vfio_device is a vfio_pci_device, which is the wrong starting point. Eric proposed[1] some VFIO external user interface extensions, that have since been refined in discussion that allows the external user (KVM in this case) access to the struct device where it can get the IRQ vectors itself. We either need to follow that same path for posted interrupts, teaching KVM about struct pci_device in order to extract the IRQ info, or we need to create another vfio external user interface extension that can return IRQ info regardless of the device type. Thanks, Alex [1] https://lkml.org/lkml/2014/8/25/257 Here is the v2: https://lkml.org/lkml/2014/9/1/347 Best Regards Eric http://lwn.net/Articles/610087/ static void __exit vfio_pci_cleanup(void) { pci_unregister_driver(vfio_pci_driver); diff --git a/include/linux/vfio.h b/include/linux/vfio.h index d320411..007ca55 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -92,6 +92,8 @@ extern void vfio_unregister_iommu_driver( /* * External user API */ +extern unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, + bool msix); extern struct vfio_group *vfio_group_get_external_user(struct file *filep); extern void vfio_group_put_external_user(struct vfio_group *group); extern int vfio_external_user_iommu_id(struct vfio_group *group); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM causes #GP on XRSTORS
Fenghua, I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears to get a #GP when it is trying to load the guest FPU. One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but I am not sure it is the only problem. Was KVM ever tested with XRSTORS? Thanks, Nadav -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can I make this work… (Foundation for accessibility project)
On 11/18/2014 9:57 AM, Eric S. Johansson wrote: That's great to know. I will spin up a version of Windows 7 and give it a try given that I'm not looking at it, I can strip it down to the barest user interface elements and improve performance significantly. I tried it and it took me approximately 10 to 12 hours to install Windows 7 twice and I didn't even finish installing the last time. Here's what happened. The first time I installed it, it was a naïve install. Took all the defaults just set up the ISO and let the install run. Then I installed all the updates. Hours went by and it kind of came up and ran but then I tried to install the virt I/O drivers and the Windows installation lost its mind. Did some reading on how to make performance better and on using the virtio drivers in windows. So I start of the second install, same size disk 25 GB, same amount of RAM, 1 GB and installed the ethernet, disk and balloon drivers at the right time. I also changed the cache to none, I/O something to native and I think that's about it. Anyway, that was not really any improvement. It's still was incredibly slow and this time it was complaining about running out of memory and packages install never finished. Just kept going and going going. iptraf reported network io ranging from 3kbit to 100kbit range when the updates were running. I'm accustomed to lesser performance on virtual machines. That's the hazard of a running on old and slow laptop (dell e6400 (2.2ghz core duo, 8gb ram)[1]) but even virtual box is not this slow. So what am I doing wrong? It would be nice to use a slow machine like this as many handcrips don't have a whole lot of resources for buying newer/faster machines. On the other hand, many of them use desktops and work from one place whereas someone like me is all over the map (quite literally). --- eric [1] Part of the reason I don't bother upgrading machines all that often is because it no matter how fast the CPU runs or how much memory I have, Windows always runs about the same speed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Add missing HPTE unlock
On 05.11.14 02:21, Paul Mackerras wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com In kvm_test_clear_dirty(), if we find an invalid HPTE we move on to the next HPTE without unlocking the invalid one. In fact we should never find an invalid and unlocked HPTE in the rmap chain, but for robustness we should unlock it. This adds the missing unlock. Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Signed-off-by: Paul Mackerras pau...@samba.org Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: ptes are big endian
On 03.11.14 16:35, Cédric Le Goater wrote: When being restored from qemu, the kvm_get_htab_header are in native endian, but the ptes are big endian. This patch fixes restore on a KVM LE host. Qemu also needs a fix for this : http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg8.html Signed-off-by: Cédric Le Goater c...@fr.ibm.com Cc: Paul Mackerras pau...@samba.org Cc: Alexey Kardashevskiy a...@ozlabs.ru Cc: Gregory Kurz gk...@linux.vnet.ibm.com --- Tested on 3.17-rc7 with LE and BE host. arch/powerpc/kvm/book3s_64_mmu_hv.c |2 ++ 1 file changed, 2 insertions(+) Index: linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c === --- linux-3.18-hv.git.orig/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1542,6 +1542,8 @@ static ssize_t kvm_htab_write(struct fil err = -EFAULT; if (__get_user(v, lbuf) || __get_user(r, lbuf + 1)) goto out; + v = be64_to_cpu(v); + r = be64_to_cpu(r); This will trigger warnings with sparse. Please introduce new be64 variables that you do get_user on and that you then use as source for v and r. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/17] RFC: userfault v2
Hi, On Fri, Oct 31, 2014 at 12:39:32PM -0700, Peter Feiner wrote: On Fri, Oct 31, 2014 at 11:29:49AM +0800, zhanghailiang wrote: Agreed, but for doing live memory snapshot (VM is running when do snapsphot), we have to do this (block the write action), because we have to save the page before it is dirtied by writing action. This is the difference, compared to pre-copy migration. Ah ha, I understand the difference now. I suppose that you have considered doing a traditional pre-copy migration (that is, passes over memory saving dirty pages, followed by a pause and a final dump of remaining dirty pages) to a file. Your approach has the advantage of having the VM pause time bounded by the time it takes to handle the userfault and do the write, as opposed to pre-copy migration which has a pause time bounded by the time it takes to do the final dump of dirty pages, which, in the worst case, is the time it takes to dump all of the guest memory! It sounds really similar issue as live migration, one can implement a precopy live snapshot, or a precopy+postcopy live snapshot or a pure postcopy live snapshot. The decision on the amount of precopy done before engaging postcopy (zero passes, 1 pass, or more passes) would have similar tradeoffs too, except instead of having to re-transmit the re-dirtied pages over the wire, it would need to overwrite them to disk. The more precopy passes, the longer it takes for the live snapshotting process to finish and the more I/O there will be (for live migration it'd be network bandwidth usage instead of amount of I/O), but the shortest the postcopy runtime will be (and the shorter postcopy runtime is, the fewer userfaults will end up triggering on writes, in turn reducing the slowdown and the artificial fault latency introduced to the guest runtime). But the more precopy passes the more overwriting will happen during the longer precopy stage and the more overall load there will be for the host (the otherwise idle part of the host). For the postcopy live snapshot the wrprotect faults are quite equivalent to the not-present faults of postcopy live migration logic. You could use the old fork dump trick. Given that the guest's memory is backed by private VMA (as of a year ago when I last looked, is always the case for QEMU), you can have the kernel do the write protection for you. Essentially, you fork Qemu and, in the child process, dump the guest memory then exit. If the parent (including the guest) writes to guest memory, then it will fault and the kernel will copy the page. The fork dump approach will give you the best performance w.r.t. guest pause times (i.e., just pausing for the COW fault handler), but it does have the distinct disadvantage of potentially using 2x the guest memory (i.e., if the parent process races ahead and writes to all of the pages before you finish the dump). To mitigate memory copying, you could madvise MADV_DONTNEED the child memory as you copy it. This is a very good point. fork must be evaluated first because it literally already provides you a readonly memory snapshot of the guest memory. The memory cons mentioned above could lead to both -ENOMEM of too many guests runs live snapshots at the same time in the same host, unless overcommit_memory is set to 1 (0 by default). Even then if too many live snapshots are running in parallel you could hit the OOM killer if there are just a bit too many faults at the same time, or you could hit heavy swapping which isn't ideal either. In fact the -ENOMEM avoidance (with qemu failing) is one of the two critical reasons why qemu always set the guest memory as MADV_DONTFORK. But that's not the only reason. To use the fork() trick you'd need to undo the MADV_DONTFORK first but that would open another problem: there's a race condition between fork() O_DIRECT and 4k hardblocksize of virtio-blk. If there's any read() syscall with O_DIRECT with len=512 while fork() is running (think if the aio runs in parallel with the live snapshot thread that forks the child to dump the snapshot) and if the guest writes with the CPU to any 512 fragment of the same page that is the destination buffer of the write(len=512) (on two different 512bytes area of the same guest page) the O_DIRECT write will get lost. So to use fork we'd need to fix this longstanding race (I tried but in the end we declared it an userland issue because it's not exploitable to bypass permissions or corrupt kernel or unrelated memory). Or you'd need to add locking between the dataplane/aio threads and the live snapshot thread to ensure no direct-io I/O is ever in-flight while fork runs. The O_DIRECT however would only help if it's qemu TCG, if it's KVM it's not even enough to stop O_DIRECT reads. KVM would use gup(write=1) from the async-pf all the time... and then the shadow pagetables would go out of sync (it won't destabilize the host of course, but the guest memory would be corrupt then and guest would
Re: [PATCH 2/5] KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
On 03.11.14 05:51, Paul Mackerras wrote: From: Mahesh Salgaonkar mah...@linux.vnet.ibm.com When we get an HMI (hypervisor maintenance interrupt) while in a guest, we see that guest enters into paused state. The reason is, in kvmppc_handle_exit_hv it falls through default path and returns to host instead of resuming guest. This causes guest to enter into paused state. HMI is a hypervisor only interrupt and it is safe to resume the guest since the host has handled it already. This patch adds a switch case to resume the guest. Without this patch we see guest entering into paused state with following console messages: [ 3003.329351] Severe Hypervisor Maintenance interrupt [Recovered] [ 3003.329356] Error detail: Timer facility experienced an error [ 3003.329359]HMER: 0840 [ 3003.329360]TFMR: 4a12000980a84000 [ 3003.329366] vcpu c007c35094c0 (40): [ 3003.329368] pc = c00c2ba0 msr = 80009032 trap = e60 [ 3003.329370] r 0 = c021ddc0 r16 = 0046 [ 3003.329372] r 1 = c0007a02bbd0 r17 = 327d5d98 [ 3003.329375] r 2 = c10980b8 r18 = 1fc9a0b0 [ 3003.329377] r 3 = c142d6b8 r19 = c142d6b8 [ 3003.329379] r 4 = 0002 r20 = [ 3003.329381] r 5 = c524a110 r21 = [ 3003.329383] r 6 = 0001 r22 = [ 3003.329386] r 7 = r23 = c524a110 [ 3003.329388] r 8 = r24 = 0001 [ 3003.329391] r 9 = 0001 r25 = c0007c31da38 [ 3003.329393] r10 = c14280b8 r26 = 0002 [ 3003.329395] r11 = 746f6f6c2f68656c r27 = c524a110 [ 3003.329397] r12 = 28004484 r28 = c0007c31da38 [ 3003.329399] r13 = cfe01400 r29 = 0002 [ 3003.329401] r14 = 0046 r30 = c3011e00 [ 3003.329403] r15 = ffba r31 = 0002 [ 3003.329404] ctr = c041a670 lr = c0272520 [ 3003.329405] srr0 = c007e8d8 srr1 = 90001002 [ 3003.329406] sprg0 = sprg1 = cfe01400 [ 3003.329407] sprg2 = cfe01400 sprg3 = 0005 [ 3003.329408] cr = 48004482 xer = 2000 dsisr = 4200 [ 3003.329409] dar = 010015020048 [ 3003.329410] fault dar = 010015020048 dsisr = 4200 [ 3003.329411] SLB (8 entries): [ 3003.329412] ESID = c800 VSID = 40016e7779000510 [ 3003.329413] ESID = d801 VSID = 400142add1000510 [ 3003.329414] ESID = f804 VSID = 4000eb1a81000510 [ 3003.329415] ESID = 1f00080b VSID = 40004fda0a000d90 [ 3003.329416] ESID = 3f00080c VSID = 400039f536000d90 [ 3003.329417] ESID = 180d VSID = 0001251b35150d90 [ 3003.329417] ESID = 0100080e VSID = 4001e4609d90 [ 3003.329418] ESID = d8000819 VSID = 40013d349c000400 [ 3003.329419] lpcr = c04881847001 sdr1 = 001b1906 last_inst = [ 3003.329421] trap=0xe60 | pc=0xc00c2ba0 | msr=0x80009032 [ 3003.329524] Severe Hypervisor Maintenance interrupt [Recovered] [ 3003.329526] Error detail: Timer facility experienced an error [ 3003.329527]HMER: 0840 [ 3003.329527]TFMR: 4a12000980a94000 [ 3006.359786] Severe Hypervisor Maintenance interrupt [Recovered] [ 3006.359792] Error detail: Timer facility experienced an error [ 3006.359795]HMER: 0840 [ 3006.359797]TFMR: 4a12000980a84000 IdName State 2 guest2 running 3 guest3 paused 4 guest4 running Signed-off-by: Mahesh Salgaonkar mah...@linux.vnet.ibm.com Signed-off-by: Paul Mackerras pau...@samba.org Do we need this for PR running on bare metal as well? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Exposing host debug capabilities to userspace
Hi, I've almost finished the ARMv8 guest debug support but I have one problem left to solve. userspace needs to know how many hardware debug registers are available for GDB to use. This information is available from the ID_AA64DFR0_EL1 register. Currently I abuse GET_ONE_REG to fetch it's value however semantically this is poor as it's API is for getting guest state not host state and they could theoretically have different values. So far the options I've examined are: * KVM ioctl GET_ONE_REG(ID_AA64DFR0_EL1) As explained above, abusing a guest state API for host configuration. * ptrace(PTRACE_GETREGSET, NT_ARM_HW_WATCH) This is used by GDB to access the host details in debug-monitors. However the ptrace API really wants you to attach to a process before calling PTRACE_GETREGSET. Currently I've tried attaching to the thread_id of the vCPU but this fails with EPERM, I suspect because attaching to your own threads likely upsets the kernel. * KVM ioctl KVM_GET_DEBUGREGS This is currently x86 only and looks like it's more aimed at debug registers than capability stuff. Also I'm not sure what the state of this ioctl is compared to KVM_SET_GUEST_DEBUG. Do these APIs overlap or is one an older deprecated x86 only API? * Export the information via sysfs I suppose the correct canonical non-subsystem specific way to make this information available it to expose the data in some sort of sysfs node? However I don't see any existing sysfs structure for the CPU. * Expand /proc/cpuinfo I suspect adding extra text to be badly parsed by userspace is just horrid and unacceptable behaviour ;-) * Add another KVM ioctl? This would have the downside of being specific to KVM and of course proliferating the API space again. I'm open to any suggestions and look forward to your valued feedback ;-) -- Alex Bennée -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
On 03.11.14 05:52, Paul Mackerras wrote: From: Suresh E. Warrier warr...@linux.vnet.ibm.com The kvmppc_vcore_blocked() code does not check for the wait condition after putting the process on the wait queue. This means that it is possible for an external interrupt to become pending, but the vcpu to remain asleep until the next decrementer interrupt. The fix is to make one last check for pending exceptions and ceded state before calling schedule(). Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com Signed-off-by: Paul Mackerras pau...@samba.org I don't understand the race you're fixing here. Can you please explain it? Alex --- arch/powerpc/kvm/book3s_hv.c | 20 1 file changed, 20 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index cd7e030..1a7a281 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1828,9 +1828,29 @@ static void kvmppc_wait_for_exec(struct kvm_vcpu *vcpu, int wait_state) */ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) { + struct kvm_vcpu *vcpu; + int do_sleep = 1; + DEFINE_WAIT(wait); prepare_to_wait(vc-wq, wait, TASK_INTERRUPTIBLE); + + /* + * Check one last time for pending exceptions and ceded state after + * we put ourselves on the wait queue + */ + list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) { + if (vcpu-arch.pending_exceptions || !vcpu-arch.ceded) { + do_sleep = 0; + break; + } + } + + if (!do_sleep) { + finish_wait(vc-wq, wait); + return; + } + vc-vcore_state = VCORE_SLEEPING; spin_unlock(vc-lock); schedule(); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Some fixes for HV KVM on PPC
On 03.11.14 05:51, Paul Mackerras wrote: Here are fixes for five bugs which were found in the testing of our PowerKVM product. The bugs range from guest performance issues to guest crashes and memory corruption. Please apply. Thanks, applied patches 1-4 to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/17] RFC: userfault v2
Hi, On Thu, Nov 20, 2014 at 10:54:29AM +0800, zhanghailiang wrote: Yes, you are right. This is what i really want, bypass all non-present faults and only track strict wrprotect faults. ;) So, do you plan to support that in the userfault API? Yes I think it's good idea to support wrprotect/COW faults too. I just wanted to understand if there was any other reason why you needed only wrprotect faults, because the non-present faults didn't look like a big performance concern if they triggered in addition to wrprotect faults, but it's certainly ok to optimize them away so it's fully optimal. All it takes to differentiate the behavior should be one more bit during registration so you can select non-present, wrprotect faults or both. postcopy live migration would select only non-present faults, postcopy live snapshot would select only wrprotect faults, anything like distributed shared memory supporting shared readonly access and exclusive write access, would select both flags. I just sent an (unfortunately) longish but way more detailed email about live snapshotting with userfaultfd but I just wanted to give a shorter answer here too :). Thanks, Andrea -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 11/20/2014 12:08 AM, Laszlo Ersek wrote: On 11/20/14 00:32, Mario Smarduch wrote: Hi Laszlo, couple observations. I'm wondering if access from qemu and guest won't result in mixed memory attributes and if that's acceptable to the CPU. Normally this would be a problem I think (Jon raised the topic of live migration). However, for flash programming specifically, I think the guest's access pattern ensures that we'll see things OK. When the guest issues the first write access, the memslot is deleted, and everything is forwarded to qemu, both reads and writes. In response qemu modifies the array that *otherwise* backs the flash. These modifications by qemu end up in the dcache mostly. When the guest is done programming, it writes a special command (read array mode) at which point the memslot is recreated (as read-only) and flushed / set up for flushing during demand paging. So from the emulated flash POV, the memslot either doesn't exist at all (and then qemu serves all accesses just fine), or it exists r/o, at which point qemu (host userspace) will have stopped writing to it, and will have set it up for flushing before and during guest read accesses. I think beyond consistency, there should be no double mappings with conflicting attributes at any time or CPU state is undefined. At least that's what I recall for cases where identity mapping was cacheble and user mmapp'ed regions uncacheable. Side effects like CPU hardstop or victim invalidate of dirty cache line. With virtualization extensions maybe behavior is different. I guess if you're not seeing lock ups or crashes then it appears to work :) Probably more senior folks in ARM community are in better position to address this, but I thought I raise a flag. Also is if you update memory from qemu you may break dirty page logging/migration. Very probably. Jon said the same thing. Unless there is some other way you keep track. Of course it may not be applicable in your case (i.e. flash unused after boot). The flash *is* used after boot, because the UEFI runtime variable services *are* exercised by the guest kernel. However those use the same access pattern (it's the same set of UEFI services just called by a different client). *Uncoordinated* access from guest and host in parallel will be a big problem; but we're not that far yet, and we need to get the flash problem sorted, so that we can at least boot and work on the basic stuff. The flash programming dance happens to provide coordination; the flash mode changes (which are equivalent to the teardown and the recreation of the memslot) can be considered barriers. I hope this is acceptable for the time being... Yeah I understand you have a more imediatte requirement to support, migration isssue is more fyi. Thanks for the details helps to understand the context. - Mario Thanks Laszlo - Mario On 11/17/2014 07:49 AM, Laszlo Ersek wrote: On 11/17/14 16:29, Paolo Bonzini wrote: On 17/11/2014 15:58, Ard Biesheuvel wrote: Readonly memslots are often used to implement emulation of ROMs and NOR flashes, in which case the guest may legally map these regions as uncached. To deal with the incoherency associated with uncached guest mappings, treat all readonly memslots as incoherent, and ensure that pages that belong to regions tagged as such are flushed to DRAM before being passed to the guest. On x86, the processor combines the cacheability values from the two levels of page tables. Is there no way to do the same on ARM? Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict (Device non-Gathering, non-Reordering, no Early Write Acknowledgement -- for EFI_MEMORY_UC), which basically overrides the Stage2 (very lax host) memory attributes. When qemu writes, as part of emulating the flash programming commands, to the RAMBlock that *otherwise* backs the flash range (as a r/o memslot), those writes (from host userspace) tend to end up in dcache. But, when the guest flips back the flash to romd mode, and tries to read back the values from the flash as plain ROM, the dcache is completely bypassed due to the strict stage1 mapping, and the guest goes directly to DRAM. Where qemu's earlier writes are not yet / necessarily visible. Please see my original patch (which was incomplete) in the attachment, it has a very verbose commit message. Anyway, I'll let others explain; they can word it better than I can :) FWIW, Series Reviewed-by: Laszlo Ersek ler...@redhat.com I ported this series to a 3.17.0+ based kernel, and tested it. It works fine. The ROM-like view of the NOR flash now reflects the previously programmed contents. Series Tested-by: Laszlo Ersek ler...@redhat.com Thanks! Laszlo ___ kvmarm mailing list kvm...@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm -- To unsubscribe from this list: send the
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 20 November 2014 18:35, Mario Smarduch m.smard...@samsung.com wrote: I think beyond consistency, there should be no double mappings with conflicting attributes at any time or CPU state is undefined. The situation is not so bleak as this. See section B2.9 Mismatched memory attributes in the ARMv8 ARM ARM (DDI0487A.d), which lays out in some detail what the results of mismatched attributes are (generally, you lose ordering or coherency guarantees you might have hoped to have). They're not pretty, but it's not as bad as completely UNPREDICTABLE behaviour. thanks -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 11/20/2014 10:40 AM, Peter Maydell wrote: On 20 November 2014 18:35, Mario Smarduch m.smard...@samsung.com wrote: I think beyond consistency, there should be no double mappings with conflicting attributes at any time or CPU state is undefined. The situation is not so bleak as this. See section B2.9 Mismatched memory attributes in the ARMv8 ARM ARM (DDI0487A.d), which lays out in some detail what the results of mismatched attributes are (generally, you lose ordering or coherency guarantees you might have hoped to have). They're not pretty, but it's not as bad as completely UNPREDICTABLE behaviour. thanks -- PMM Hi Peter, thanks for digging that up, quite a list to navigate but it does provide detailed guidance. - Mario -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
On 11/20/2014 11:36 AM, Alexander Graf wrote: On 03.11.14 05:52, Paul Mackerras wrote: From: Suresh E. Warrier warr...@linux.vnet.ibm.com The kvmppc_vcore_blocked() code does not check for the wait condition after putting the process on the wait queue. This means that it is possible for an external interrupt to become pending, but the vcpu to remain asleep until the next decrementer interrupt. The fix is to make one last check for pending exceptions and ceded state before calling schedule(). Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com Signed-off-by: Paul Mackerras pau...@samba.org I don't understand the race you're fixing here. Can you please explain it? When a virtual interrupt needs to be delivered to the guest, and the virtual ICS state for the interrupt and virtual ICP state for the VCPU allow for the VCPU to be immediately interrupted, we 1. Set the BOOK3S_INTERRUPT_EXTERNAL_LEVEL bit in pending_exceptions. 2. Call kvmppc_fast_vcpu_kick_hv(), which checks the wait queue at vcpu-wq to wake the VCPU up. The caller of kvmppc_vcore_blocked() does the check for pending exceptions, but there is a race condition here and we do need to check again after the VCPU is put on the wait queue. -suresh Alex --- arch/powerpc/kvm/book3s_hv.c | 20 1 file changed, 20 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index cd7e030..1a7a281 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1828,9 +1828,29 @@ static void kvmppc_wait_for_exec(struct kvm_vcpu *vcpu, int wait_state) */ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) { +struct kvm_vcpu *vcpu; +int do_sleep = 1; + DEFINE_WAIT(wait); prepare_to_wait(vc-wq, wait, TASK_INTERRUPTIBLE); + +/* + * Check one last time for pending exceptions and ceded state after + * we put ourselves on the wait queue + */ +list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) { +if (vcpu-arch.pending_exceptions || !vcpu-arch.ceded) { +do_sleep = 0; +break; +} +} + +if (!do_sleep) { +finish_wait(vc-wq, wait); +return; +} + vc-vcore_state = VCORE_SLEEPING; spin_unlock(vc-lock); schedule(); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 11/20/2014 01:40 PM, Peter Maydell wrote: On 20 November 2014 18:35, Mario Smarduch m.smard...@samsung.com wrote: I think beyond consistency, there should be no double mappings with conflicting attributes at any time or CPU state is undefined. The situation is not so bleak as this. See section B2.9 Mismatched memory attributes in the ARMv8 ARM ARM (DDI0487A.d), which lays out in some detail what the results of mismatched attributes are (generally, you lose ordering or coherency guarantees you might have hoped to have). They're not pretty, but it's not as bad as completely UNPREDICTABLE behaviour. Quick side note that I did raise exactly this issue with the ARM Architecture team several years ago (that of missmatched memory attributes between a guest and hypervisor) and it is a known concern. I'm personally concerned about a couple of things that I won't go into here but will followup on what the longer term plan might be. Jon. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 20 November 2014 19:49, Jon Masters j...@redhat.com wrote: On 11/20/2014 01:40 PM, Peter Maydell wrote: The situation is not so bleak as this. See section B2.9 Mismatched memory attributes in the ARMv8 ARM ARM (DDI0487A.d), which lays out in some detail what the results of mismatched attributes are (generally, you lose ordering or coherency guarantees you might have hoped to have). They're not pretty, but it's not as bad as completely UNPREDICTABLE behaviour. Quick side note that I did raise exactly this issue with the ARM Architecture team several years ago (that of missmatched memory attributes between a guest and hypervisor) and it is a known concern. I think in practice for a well-behaved guest we can arrange that everything is fine (roughly, the guest has to treat DMA-capable devices as doing coherent-dma, which we can tell them to do via DT bindings or ACPI[*], plus the special case handling we already have for bootup), and naughty guests will only confuse themselves. But I need to think a bit more about it (and we should probably write down how it works somewhere :-)). [*] We should be able to emulate non-coherent-DMA devices but would need an extra API from KVM so userspace can do clean dcache to point of coherency. And in practice I'm not sure we need to emulate those devices... -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
On Thu, Nov 20, 2014 at 3:39 AM, Christian Borntraeger borntrae...@de.ibm.com wrote: So It looks like we could make a change to ACCESS_ONCE. Would something like CONFIG_ARCH_SCALAR_ACCESS_ONCE be a good start? No, if it's just a handful of places to be fixed, let's not add config options for broken cases. This would boil down to Patch1: Provide stricter ACCESS_ONCE if CONFIG_ARCH_SCALAR_ACCESS_ONCE is set + docu update + comments Patch2: Change mm/* to barriers Patch3: Change x86 locks Patch4: Change x86 gup Patch4: Enable CONFIG_ARCH_SCALAR_ACCESS_ONCE for s390x and x86 Just do patches 2-4 first, and then patch 1 unconditionally. Obviously you'd need to spread the word on linux-arch to see how bad it is for other cases, but if other architectures are at all like x86 and s390, and just require a few trivial patches, let's not make this some config option. Linus -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 11/20/14 21:10, Peter Maydell wrote: On 20 November 2014 19:49, Jon Masters j...@redhat.com wrote: On 11/20/2014 01:40 PM, Peter Maydell wrote: The situation is not so bleak as this. See section B2.9 Mismatched memory attributes in the ARMv8 ARM ARM (DDI0487A.d), which lays out in some detail what the results of mismatched attributes are (generally, you lose ordering or coherency guarantees you might have hoped to have). They're not pretty, but it's not as bad as completely UNPREDICTABLE behaviour. Quick side note that I did raise exactly this issue with the ARM Architecture team several years ago (that of missmatched memory attributes between a guest and hypervisor) and it is a known concern. I think in practice for a well-behaved guest we can arrange that everything is fine (roughly, the guest has to treat DMA-capable devices as doing coherent-dma, which we can tell them to do via DT bindings or ACPI[*], plus the special case handling we already have for bootup), and naughty guests will only confuse themselves. But I need to think a bit more about it (and we should probably write down how it works somewhere :-)). [*] We should be able to emulate non-coherent-DMA devices but would need an extra API from KVM so userspace can do clean dcache to point of coherency. And in practice I'm not sure we need to emulate those devices... This basically means that virtio transfers should just use normal memory in the guest (treating virtio transfers as coherent DMA), right? We're actually doing that in the ArmVirtualizationQemu.dsc build of edk2, and Things Work Great (TM) in guests on the Mustang. Thanks Laszlo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can I make this work… (Foundation for accessibility project)
On 20/11/2014 17:28, Eric S. Johansson wrote: I'm accustomed to lesser performance on virtual machines. That's the hazard of a running on old and slow laptop (dell e6400 (2.2ghz core duo, 8gb ram)[1]) but even virtual box is not this slow. So what am I doing wrong? It would be nice to use a slow machine like this as many handcrips don't have a whole lot of resources for buying newer/faster machines. On the other hand, many of them use desktops and work from one place whereas someone like me is all over the map (quite literally). How did you start the virtual machine? Perhaps you're not using KVM but emulation? I have a fast machine but slow disk (a NAS on 100 MBit ethernet) and I can do about 15 automated installations in less than 6 hours. Are you using libvirt or directly invoking QEMU? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots
On 20 November 2014 21:13, Laszlo Ersek ler...@redhat.com wrote: On 11/20/14 21:10, Peter Maydell wrote: I think in practice for a well-behaved guest we can arrange that everything is fine (roughly, the guest has to treat DMA-capable devices as doing coherent-dma, which we can tell them to do via DT bindings or ACPI[*], plus the special case handling we already have for bootup), and naughty guests will only confuse themselves. But I need to think a bit more about it (and we should probably write down how it works somewhere :-)). This basically means that virtio transfers should just use normal memory in the guest (treating virtio transfers as coherent DMA), right? Normal *cacheable*, but yes. -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: KVM causes #GP on XRSTORS
From: Nadav Amit [mailto:nadav.a...@gmail.com] Sent: Thursday, November 20, 2014 8:34 AM To: Paolo Bonzini; Yu, Fenghua Cc: kvm list Subject: KVM causes #GP on XRSTORS Fenghua, I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears to get a #GP when it is trying to load the guest FPU. One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but I am not sure it is the only problem. Was KVM ever tested with XRSTORS? In dmesg, do you see xsave: . using compacted form? If you use kernel noxsaves, does KVM work fine? Thanks. -Fenghua -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: can I make this work… (Foundation for accessibility project)
On 11/20/2014 4:48 PM, Paolo Bonzini wrote: On 20/11/2014 17:28, Eric S. Johansson wrote: I'm accustomed to lesser performance on virtual machines. That's the hazard of a running on old and slow laptop (dell e6400 (2.2ghz core duo, 8gb ram)[1]) but even virtual box is not this slow. So what am I doing wrong? It would be nice to use a slow machine like this as many handcrips don't have a whole lot of resources for buying newer/faster machines. On the other hand, many of them use desktops and work from one place whereas someone like me is all over the map (quite literally). How did you start the virtual machine? Perhaps you're not using KVM but emulation? I have a fast machine but slow disk (a NAS on 100 MBit ethernet) and I can do about 15 automated installations in less than 6 hours. Are you using libvirt or directly invoking QEMU? I was using one of the GUIs ( less hand stress than trying to assemble a commandline). Unfortunately I'm in Windows 8 right now because I'm writing. I'm very sure the GUI was http://virt-manager.org/ I tried a different one but it kept telling me I only had QEMU I thought silly program, that can't be right. Someday I will not argue with software or small electronic boxes. They don't care who wins and they are much more stubborn than I am. I'll be able to run some tests in about 2 to 3 hours after I finish this document. Let me know what I should look at? on a side note, a pointer to an automated install process would be wonderful. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: PPC: Book3S HV: ptes are big endian
When being restored from qemu, the kvm_get_htab_header are in native endian, but the ptes are big endian. This patch fixes restore on a KVM LE host. Qemu also needs a fix for this : http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg8.html Signed-off-by: Cédric Le Goater c...@fr.ibm.com Cc: Paul Mackerras pau...@samba.org Cc: Alexey Kardashevskiy a...@ozlabs.ru Cc: Gregory Kurz gk...@linux.vnet.ibm.com --- Tested on 3.18-rc5 with LE and BE host. v2: add be64 local variables to be friendly with sparse arch/powerpc/kvm/book3s_64_mmu_hv.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) Index: linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c === --- linux-3.18-hv.git.orig/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1539,9 +1539,15 @@ static ssize_t kvm_htab_write(struct fil hptp = (__be64 *)(kvm-arch.hpt_virt + (i * HPTE_SIZE)); lbuf = (unsigned long __user *)buf; for (j = 0; j hdr.n_valid; ++j) { + __be64 hpte_v; + __be64 hpte_r; + err = -EFAULT; - if (__get_user(v, lbuf) || __get_user(r, lbuf + 1)) + if (__get_user(hpte_v, lbuf) || + __get_user(hpte_r, lbuf + 1)) goto out; + v = be64_to_cpu(hpte_v); + r = be64_to_cpu(hpte_r); err = -EINVAL; if (!(v HPTE_V_VALID)) goto out; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM causes #GP on XRSTORS
Hi Nadav, On Thu, Nov 20, 2014 at 06:34:04PM +0200, Nadav Amit wrote: Fenghua, I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears to get a #GP when it is trying to load the guest FPU. One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but I am not sure it is the only problem. Was KVM ever tested with XRSTORS? Current kvm and qemu use standard format and xsaves/xrstors should use compact format, in addition, vmx is still not enabled for xsaves/xrstors in kvm. Regards, Wanpeng Li Thanks, Nadav -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM causes #GP on XRSTORS
On Nov 21, 2014, at 02:10, Wanpeng Li wanpeng...@linux.intel.com wrote: Hi Nadav, On Thu, Nov 20, 2014 at 06:34:04PM +0200, Nadav Amit wrote: Fenghua, I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears to get a #GP when it is trying to load the guest FPU. One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but I am not sure it is the only problem. Was KVM ever tested with XRSTORS? Current kvm and qemu use standard format and xsaves/xrstors should use compact format, in addition, vmx is still not enabled for xsaves/xrstors in kvm. Sorry, but I don’t quite understand. Should KVM work on a machine that supports xsaves/xrstors? I am not referring to whether KVM exposes the capability to the guest, but to whether KVM should work at all. Thanks, Nadav -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] vfio: put off the allocation of minor in vfio_create_group
On 2014/11/20 23:37, Alex Williamson wrote: On Thu, 2014-11-20 at 19:25 +0800, Zhen Lei wrote: The next code fragment list_for_each_entry is not depend on minor. With this patch, the free of minor in list_for_each_entry can be reduced, and there is no functional change. A reasonable micro-optimization, but I'm curious if you're actually seeing some measurable overhead from this. It seems like we'd need to Oh, I just found this by code review. It's good to put pure check first and resource allocation later, if they are order independent. Isn't it? I think it will not improve performance but only save a little code space and looks feel better. have multiple devices, all within the same IOMMU group, all probed by vfio-pci at the same time to exercise the race condition. Thanks, I think vfio_create_group can not be invoked frequently. So, the performance of this function is not a major consideration, we don't care slightly performance change. Alex Signed-off-by: Zhen Lei thunder.leiz...@huawei.com --- drivers/vfio/vfio.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index f018d8d..737eb468 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -225,22 +225,21 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group) mutex_lock(vfio.group_lock); -minor = vfio_alloc_group_minor(group); -if (minor 0) { -vfio_group_unlock_and_free(group); -return ERR_PTR(minor); -} - /* Did we race creating this group? */ list_for_each_entry(tmp, vfio.group_list, vfio_next) { if (tmp-iommu_group == iommu_group) { vfio_group_get(tmp); -vfio_free_group_minor(minor); vfio_group_unlock_and_free(group); return tmp; } } +minor = vfio_alloc_group_minor(group); +if (minor 0) { +vfio_group_unlock_and_free(group); +return ERR_PTR(minor); +} + dev = device_create(vfio.class, NULL, MKDEV(MAJOR(vfio.group_devt), minor), group, %d, iommu_group_id(iommu_group)); -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ . -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: x86: fix access memslots w/o hold srcu read lock
Hi Tang, On Fri, Nov 21, 2014 at 02:41:57PM +0800, Tang Chen wrote: Hi Wanpeng, Sorry, it is about this problem again. I booted 3.18.0-rc2, without Paolo's patch. lockdep and RCU debug were all opened. Then I started a qemu vm with the following options: /usr/libexec/qemu-kvm -hda rhel7.0ga-x64.qcow2 -m 512M -cpu host,-x2apic -serial stdio I added printk() in kvm_vcpu_reload_apic_access_page(), and it was printed out. So I think I can confirm that I have run into kvm_vcpu_reload_apic_access_page(). But I still didn't see any warning. Is there anything else I should do ? Would you please share your qemu command with me ? I test it on the other guy's Ivytown and take advantage of the qemu command line which he used, so I forget the accurate command line which used that day. Paolo also reproduce the bug, Paolo, ping. Regards, Wanpeng Li Thanks. :) On 11/14/2014 07:39 AM, Wanpeng Li wrote: Hi Tang, On Tue, Nov 11, 2014 at 01:35:29PM +0800, Tang Chen wrote: Hi Wanpeng, Sorry for the late. I think I have totally missed this thread. I opened lockdep and RCU debug, and tried on 3.18-rc1. But I didn't get the warning. I also opened lockdep and RCU debug, and tried 3.18.0-rc2 on a Ivy bridge, the warning will be triggered after run qemu immediately. There is no need to try any hotplug related stuff. In addition, Paolo's patch is merged upstream to fix this. commit a73896cb5bbdce672945745db8224352a689f580 Author: Paolo Bonzini pbonz...@redhat.com Date: Sun Nov 2 07:54:30 2014 +0100 KVM: vmx: defer load of APIC access page address during reset Regards, Wanpeng Li My steps are: 1. Use numactl to bind a qemu process to node1. 2. Offline all node1 memory. And the qemu process is still running. Would you please tell me how did you reproduce it ? Thanks. On 11/02/2014 03:07 PM, Wanpeng Li wrote: The srcu read lock must be held while accessing memslots (e.g. when using gfn_to_* functions), however, commit c24ae0dcd3e8 (kvm: x86: Unpin and remove kvm_arch-apic_access_page) call gfn_to_page() in kvm_vcpu_reload_apic_access_page() w/o hold it in vmx_vcpu_reset() path which leads to suspicious rcu_dereference_check() usage warning. This patch fix it by holding srcu read lock in all kvm_vcpu_reset() call path. [ INFO: suspicious RCU usage. ] 3.18.0-rc2-test2+ #70 Not tainted --- include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-x86/2371: #0: (vcpu-mutex){+.+...}, at: [a037d800] vcpu_load+0x20/0xd0 [kvm] stack backtrace: CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 0001 880209983ca8 816f514f 8802099b8990 880209983cd8 810bd687 000fee00 880208a2c000 880208a1 88020ef50040 880209983d08 Call Trace: [816f514f] dump_stack+0x4e/0x71 [810bd687] lockdep_rcu_suspicious+0xe7/0x120 [a037d055] gfn_to_memslot+0xd5/0xe0 [kvm] [a03807d3] __gfn_to_pfn+0x33/0x60 [kvm] [a0380885] gfn_to_page+0x25/0x90 [kvm] [a038aeec] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm] [a08f0a9c] vmx_vcpu_reset+0x20c/0x460 [kvm_intel] [a039ab8e] kvm_vcpu_reset+0x15e/0x1b0 [kvm] [a039ac0c] kvm_arch_vcpu_setup+0x2c/0x50 [kvm] [a037f7e0] kvm_vm_ioctl+0x1d0/0x780 [kvm] [810bc664] ? __lock_is_held+0x54/0x80 [812231f0] do_vfs_ioctl+0x300/0x520 [8122ee45] ? __fget+0x5/0x250 [8122f0fa] ? __fget_light+0x2a/0xe0 [81223491] SyS_ioctl+0x81/0xa0 [816fed6d] system_call_fastpath+0x16/0x1b Reported-by: Takashi Iwai ti...@suse.de Reported-by: Alexei Starovoitov alexei.starovoi...@gmail.com Suggested-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- v3 - v4: * bypass the problem altoghter by kvm_make_request v2 - v3: * take care all vmx_vcpu_reset call path v1 - v2: * just fix hold the srcu read lock in vmx_vcpu_reset path arch/x86/kvm/vmx.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a0f78db..3e556c6 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4579,7 +4579,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu) vmcs_write32(TPR_THRESHOLD, 0); } - kvm_vcpu_reload_apic_access_page(vcpu); + kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); if (vmx_vm_has_apicv(vcpu-kvm)) memset(vmx-pi_desc, 0, sizeof(struct pi_desc)); . -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/17] RFC: userfault v2
On 2014/11/21 1:38, Andrea Arcangeli wrote: Hi, On Thu, Nov 20, 2014 at 10:54:29AM +0800, zhanghailiang wrote: Yes, you are right. This is what i really want, bypass all non-present faults and only track strict wrprotect faults. ;) So, do you plan to support that in the userfault API? Yes I think it's good idea to support wrprotect/COW faults too. Great! Then i can expect your patches. ;) I just wanted to understand if there was any other reason why you needed only wrprotect faults, because the non-present faults didn't look like a big performance concern if they triggered in addition to wrprotect faults, but it's certainly ok to optimize them away so it's fully optimal. Er, you have got the answer, no special, it's only for optimality. All it takes to differentiate the behavior should be one more bit during registration so you can select non-present, wrprotect faults or both. postcopy live migration would select only non-present faults, postcopy live snapshot would select only wrprotect faults, anything like distributed shared memory supporting shared readonly access and exclusive write access, would select both flags. It is really flexible in this way. I just sent an (unfortunately) longish but way more detailed email about live snapshotting with userfaultfd but I just wanted to give a shorter answer here too :). Thanks for your explanation, and your patience. It is really useful, now i know more details about why 'fork dump live snapshot' scenario is not acceptable. Thanks :) Thanks, Andrea . -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
Suresh E. Warrier warr...@linux.vnet.ibm.com writes: This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. /* Set this explicitly in case thread 0 doesn't have a vcpu */ @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) vc-vcore_state = VCORE_RUNNING; preempt_disable(); + + trace_kvmppc_run_core(vc, 0); + spin_unlock(vc-lock); Do we really want to call tracepoint with spin lock held ? Is that a good thing to do ?. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
On 20.11.14 11:40, Aneesh Kumar K.V wrote: Suresh E. Warrier warr...@linux.vnet.ibm.com writes: This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. /* Set this explicitly in case thread 0 doesn't have a vcpu */ @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) vc-vcore_state = VCORE_RUNNING; preempt_disable(); + +trace_kvmppc_run_core(vc, 0); + spin_unlock(vc-lock); Do we really want to call tracepoint with spin lock held ? Is that a good thing to do ?. I thought it was safe to call tracepoints inside of spin lock regions? Steve? Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
On Thu, 20 Nov 2014 13:10:12 +0100 Alexander Graf ag...@suse.de wrote: On 20.11.14 11:40, Aneesh Kumar K.V wrote: Suresh E. Warrier warr...@linux.vnet.ibm.com writes: This patch adds trace points in the guest entry and exit code and also for exceptions handled by the host in kernel mode - hypercalls and page faults. The new events are added to /sys/kernel/debug/tracing/events under a new subsystem called kvm_hv. /* Set this explicitly in case thread 0 doesn't have a vcpu */ @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc) vc-vcore_state = VCORE_RUNNING; preempt_disable(); + + trace_kvmppc_run_core(vc, 0); + spin_unlock(vc-lock); Do we really want to call tracepoint with spin lock held ? Is that a good thing to do ?. I thought it was safe to call tracepoints inside of spin lock regions? Steve? There's tracepoints in the guts of the scheduler where rq lock is held. Don't worry about it. The tracing system is lockless. -- Steve -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: Add missing HPTE unlock
On 05.11.14 02:21, Paul Mackerras wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com In kvm_test_clear_dirty(), if we find an invalid HPTE we move on to the next HPTE without unlocking the invalid one. In fact we should never find an invalid and unlocked HPTE in the rmap chain, but for robustness we should unlock it. This adds the missing unlock. Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Signed-off-by: Paul Mackerras pau...@samba.org Thanks, applied to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Book3S HV: ptes are big endian
On 03.11.14 16:35, Cédric Le Goater wrote: When being restored from qemu, the kvm_get_htab_header are in native endian, but the ptes are big endian. This patch fixes restore on a KVM LE host. Qemu also needs a fix for this : http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg8.html Signed-off-by: Cédric Le Goater c...@fr.ibm.com Cc: Paul Mackerras pau...@samba.org Cc: Alexey Kardashevskiy a...@ozlabs.ru Cc: Gregory Kurz gk...@linux.vnet.ibm.com --- Tested on 3.17-rc7 with LE and BE host. arch/powerpc/kvm/book3s_64_mmu_hv.c |2 ++ 1 file changed, 2 insertions(+) Index: linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c === --- linux-3.18-hv.git.orig/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1542,6 +1542,8 @@ static ssize_t kvm_htab_write(struct fil err = -EFAULT; if (__get_user(v, lbuf) || __get_user(r, lbuf + 1)) goto out; + v = be64_to_cpu(v); + r = be64_to_cpu(r); This will trigger warnings with sparse. Please introduce new be64 variables that you do get_user on and that you then use as source for v and r. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
On 03.11.14 05:51, Paul Mackerras wrote: From: Mahesh Salgaonkar mah...@linux.vnet.ibm.com When we get an HMI (hypervisor maintenance interrupt) while in a guest, we see that guest enters into paused state. The reason is, in kvmppc_handle_exit_hv it falls through default path and returns to host instead of resuming guest. This causes guest to enter into paused state. HMI is a hypervisor only interrupt and it is safe to resume the guest since the host has handled it already. This patch adds a switch case to resume the guest. Without this patch we see guest entering into paused state with following console messages: [ 3003.329351] Severe Hypervisor Maintenance interrupt [Recovered] [ 3003.329356] Error detail: Timer facility experienced an error [ 3003.329359]HMER: 0840 [ 3003.329360]TFMR: 4a12000980a84000 [ 3003.329366] vcpu c007c35094c0 (40): [ 3003.329368] pc = c00c2ba0 msr = 80009032 trap = e60 [ 3003.329370] r 0 = c021ddc0 r16 = 0046 [ 3003.329372] r 1 = c0007a02bbd0 r17 = 327d5d98 [ 3003.329375] r 2 = c10980b8 r18 = 1fc9a0b0 [ 3003.329377] r 3 = c142d6b8 r19 = c142d6b8 [ 3003.329379] r 4 = 0002 r20 = [ 3003.329381] r 5 = c524a110 r21 = [ 3003.329383] r 6 = 0001 r22 = [ 3003.329386] r 7 = r23 = c524a110 [ 3003.329388] r 8 = r24 = 0001 [ 3003.329391] r 9 = 0001 r25 = c0007c31da38 [ 3003.329393] r10 = c14280b8 r26 = 0002 [ 3003.329395] r11 = 746f6f6c2f68656c r27 = c524a110 [ 3003.329397] r12 = 28004484 r28 = c0007c31da38 [ 3003.329399] r13 = cfe01400 r29 = 0002 [ 3003.329401] r14 = 0046 r30 = c3011e00 [ 3003.329403] r15 = ffba r31 = 0002 [ 3003.329404] ctr = c041a670 lr = c0272520 [ 3003.329405] srr0 = c007e8d8 srr1 = 90001002 [ 3003.329406] sprg0 = sprg1 = cfe01400 [ 3003.329407] sprg2 = cfe01400 sprg3 = 0005 [ 3003.329408] cr = 48004482 xer = 2000 dsisr = 4200 [ 3003.329409] dar = 010015020048 [ 3003.329410] fault dar = 010015020048 dsisr = 4200 [ 3003.329411] SLB (8 entries): [ 3003.329412] ESID = c800 VSID = 40016e7779000510 [ 3003.329413] ESID = d801 VSID = 400142add1000510 [ 3003.329414] ESID = f804 VSID = 4000eb1a81000510 [ 3003.329415] ESID = 1f00080b VSID = 40004fda0a000d90 [ 3003.329416] ESID = 3f00080c VSID = 400039f536000d90 [ 3003.329417] ESID = 180d VSID = 0001251b35150d90 [ 3003.329417] ESID = 0100080e VSID = 4001e4609d90 [ 3003.329418] ESID = d8000819 VSID = 40013d349c000400 [ 3003.329419] lpcr = c04881847001 sdr1 = 001b1906 last_inst = [ 3003.329421] trap=0xe60 | pc=0xc00c2ba0 | msr=0x80009032 [ 3003.329524] Severe Hypervisor Maintenance interrupt [Recovered] [ 3003.329526] Error detail: Timer facility experienced an error [ 3003.329527]HMER: 0840 [ 3003.329527]TFMR: 4a12000980a94000 [ 3006.359786] Severe Hypervisor Maintenance interrupt [Recovered] [ 3006.359792] Error detail: Timer facility experienced an error [ 3006.359795]HMER: 0840 [ 3006.359797]TFMR: 4a12000980a84000 IdName State 2 guest2 running 3 guest3 paused 4 guest4 running Signed-off-by: Mahesh Salgaonkar mah...@linux.vnet.ibm.com Signed-off-by: Paul Mackerras pau...@samba.org Do we need this for PR running on bare metal as well? Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Some fixes for HV KVM on PPC
On 03.11.14 05:51, Paul Mackerras wrote: Here are fixes for five bugs which were found in the testing of our PowerKVM product. The bugs range from guest performance issues to guest crashes and memory corruption. Please apply. Thanks, applied patches 1-4 to kvm-ppc-queue. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
On 03.11.14 05:52, Paul Mackerras wrote: From: Suresh E. Warrier warr...@linux.vnet.ibm.com The kvmppc_vcore_blocked() code does not check for the wait condition after putting the process on the wait queue. This means that it is possible for an external interrupt to become pending, but the vcpu to remain asleep until the next decrementer interrupt. The fix is to make one last check for pending exceptions and ceded state before calling schedule(). Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com Signed-off-by: Paul Mackerras pau...@samba.org I don't understand the race you're fixing here. Can you please explain it? Alex --- arch/powerpc/kvm/book3s_hv.c | 20 1 file changed, 20 insertions(+) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index cd7e030..1a7a281 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -1828,9 +1828,29 @@ static void kvmppc_wait_for_exec(struct kvm_vcpu *vcpu, int wait_state) */ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) { + struct kvm_vcpu *vcpu; + int do_sleep = 1; + DEFINE_WAIT(wait); prepare_to_wait(vc-wq, wait, TASK_INTERRUPTIBLE); + + /* + * Check one last time for pending exceptions and ceded state after + * we put ourselves on the wait queue + */ + list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) { + if (vcpu-arch.pending_exceptions || !vcpu-arch.ceded) { + do_sleep = 0; + break; + } + } + + if (!do_sleep) { + finish_wait(vc-wq, wait); + return; + } + vc-vcore_state = VCORE_SLEEPING; spin_unlock(vc-lock); schedule(); -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: PPC: Book3S HV: ptes are big endian
When being restored from qemu, the kvm_get_htab_header are in native endian, but the ptes are big endian. This patch fixes restore on a KVM LE host. Qemu also needs a fix for this : http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg8.html Signed-off-by: Cédric Le Goater c...@fr.ibm.com Cc: Paul Mackerras pau...@samba.org Cc: Alexey Kardashevskiy a...@ozlabs.ru Cc: Gregory Kurz gk...@linux.vnet.ibm.com --- Tested on 3.18-rc5 with LE and BE host. v2: add be64 local variables to be friendly with sparse arch/powerpc/kvm/book3s_64_mmu_hv.c |8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) Index: linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c === --- linux-3.18-hv.git.orig/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1539,9 +1539,15 @@ static ssize_t kvm_htab_write(struct fil hptp = (__be64 *)(kvm-arch.hpt_virt + (i * HPTE_SIZE)); lbuf = (unsigned long __user *)buf; for (j = 0; j hdr.n_valid; ++j) { + __be64 hpte_v; + __be64 hpte_r; + err = -EFAULT; - if (__get_user(v, lbuf) || __get_user(r, lbuf + 1)) + if (__get_user(hpte_v, lbuf) || + __get_user(hpte_r, lbuf + 1)) goto out; + v = be64_to_cpu(hpte_v); + r = be64_to_cpu(hpte_r); err = -EINVAL; if (!(v HPTE_V_VALID)) goto out; -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html