Re: [PATCH 4/4] kvm: Implement PEBS virtualization
On 06/24/2014 07:45 PM, Marcelo Tosatti wrote: On Sun, Jun 22, 2014 at 09:02:25PM +0200, Andi Kleen wrote: First, it's not sufficient to pin the debug store area, you also have to pin the guest page tables that are used to map the debug store. But even if you do that, as soon as the guest fork()s, it will create a new pgd which the host will be free to swap out. The processor can then attempt a PEBS store to an unmapped address which will fail, even though the guest is configured correctly. That's a good point. You're right of course. The only way I can think around it would be to intercept CR3 writes while PEBS is active and always pin all the table pages leading to the PEBS buffer. That's slow, but should be only needed while PEBS is running. -Andi Suppose that can be done separately from the pinned spte patchset. And it requires accounting into mlock limits as well, as noted. One set of pagetables per pinned virtual address leading down to the last translations is sufficient per-vcpu. Or 4, and use the CR3 exit filter to prevent vmexits among the last 4 LRU CR3 values. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/9] MIPS: KVM: Use KVM internal logger
Hi Deng-Cheng, On 24/06/14 18:31, Deng-Cheng Zhu wrote: @@ -2213,8 +2209,8 @@ enum emulation_result kvm_mips_check_privilege(unsigned long cause, * address error exception to the guest */ if (badvaddr = (unsigned long) KVM_GUEST_KSEG0) { - printk(%s: LD MISS @ %#lx\n, __func__, -badvaddr); + kvm_err(%s: LD MISS @ %#lx\n, __func__, + badvaddr); This should probably be kvm_debug since it isn't fatal to the whole VM (the exception gets passed on to the guest kernel to handle), otherwise guest userland could maliciously spam the host log by repeatedly trying to access beyond the TE useg. Same goes for the other printks in this function It probably was only useful to sanity check that userland wasn't trying to access memory that would be accessible on a normal MIPS core but isn't with the TE segment layout. Otherwise this patch looks okay to me. Cheers James -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/9] MIPS: KVM: Simplify functions by removing redundancy
On 24/06/14 18:31, Deng-Cheng Zhu wrote: From: Deng-Cheng Zhu dengcheng@imgtec.com No logic changes inside. Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com I'm indifferent to many of the changes, but still, Reviewed-by: James Hogan james.ho...@imgtec.com Thanks James --- Changes: v3 - v2: o Add err removal in kvm_arch_commit_memory_region(). o Revert the changes to kvm_arch_vm_ioctl(). arch/mips/include/asm/kvm_host.h | 2 +- arch/mips/kvm/kvm_mips.c | 18 -- arch/mips/kvm/kvm_mips_commpage.c | 2 -- arch/mips/kvm/kvm_mips_emul.c | 34 +++--- arch/mips/kvm/kvm_mips_stats.c| 4 +--- 5 files changed, 17 insertions(+), 43 deletions(-) diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h index 3f813f2..7a3fc67 100644 --- a/arch/mips/include/asm/kvm_host.h +++ b/arch/mips/include/asm/kvm_host.h @@ -764,7 +764,7 @@ extern int kvm_mips_trans_mtc0(uint32_t inst, uint32_t *opc, struct kvm_vcpu *vcpu); /* Misc */ -extern int kvm_mips_dump_stats(struct kvm_vcpu *vcpu); +extern void kvm_mips_dump_stats(struct kvm_vcpu *vcpu); extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm); diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c index bdca619..cabcac0a 100644 --- a/arch/mips/kvm/kvm_mips.c +++ b/arch/mips/kvm/kvm_mips.c @@ -97,9 +97,7 @@ void kvm_arch_hardware_unsetup(void) void kvm_arch_check_processor_compat(void *rtn) { - int *r = (int *)rtn; - *r = 0; - return; + *(int *)rtn = 0; } static void kvm_mips_init_tlbs(struct kvm *kvm) @@ -225,7 +223,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, enum kvm_mr_change change) { unsigned long npages = 0; - int i, err = 0; + int i; kvm_debug(%s: kvm: %p slot: %d, GPA: %llx, size: %llx, QVA: %llx\n, __func__, kvm, mem-slot, mem-guest_phys_addr, @@ -243,8 +241,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, if (!kvm-arch.guest_pmap) { kvm_err(Failed to allocate guest PMAP); - err = -ENOMEM; - goto out; + return; } kvm_debug(Allocated space for Guest PMAP Table (%ld pages) @ %p\n, @@ -255,8 +252,6 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, kvm-arch.guest_pmap[i] = KVM_INVALID_PAGE; } } -out: - return; } void kvm_arch_flush_shadow_all(struct kvm *kvm) @@ -845,16 +840,12 @@ long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) int kvm_arch_init(void *opaque) { - int ret; - if (kvm_mips_callbacks) { kvm_err(kvm: module already exists\n); return -EEXIST; } - ret = kvm_mips_emulation_init(kvm_mips_callbacks); - - return ret; + return kvm_mips_emulation_init(kvm_mips_callbacks); } void kvm_arch_exit(void) @@ -1008,7 +999,6 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) { - return; } int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, diff --git a/arch/mips/kvm/kvm_mips_commpage.c b/arch/mips/kvm/kvm_mips_commpage.c index ab7096e..4b5612b 100644 --- a/arch/mips/kvm/kvm_mips_commpage.c +++ b/arch/mips/kvm/kvm_mips_commpage.c @@ -33,6 +33,4 @@ void kvm_mips_commpage_init(struct kvm_vcpu *vcpu) /* Specific init values for fields */ vcpu-arch.cop0 = page-cop0; memset(vcpu-arch.cop0, 0, sizeof(struct mips_coproc)); - - return; } diff --git a/arch/mips/kvm/kvm_mips_emul.c b/arch/mips/kvm/kvm_mips_emul.c index 262ce3e..e5862bc 100644 --- a/arch/mips/kvm/kvm_mips_emul.c +++ b/arch/mips/kvm/kvm_mips_emul.c @@ -761,8 +761,6 @@ enum emulation_result kvm_mips_emul_eret(struct kvm_vcpu *vcpu) enum emulation_result kvm_mips_emul_wait(struct kvm_vcpu *vcpu) { - enum emulation_result er = EMULATE_DONE; - kvm_debug([%#lx] !!!WAIT!!! (%#lx)\n, vcpu-arch.pc, vcpu-arch.pending_exceptions); @@ -782,7 +780,7 @@ enum emulation_result kvm_mips_emul_wait(struct kvm_vcpu *vcpu) } } - return er; + return EMULATE_DONE; } /* @@ -792,11 +790,10 @@ enum emulation_result kvm_mips_emul_wait(struct kvm_vcpu *vcpu) enum emulation_result kvm_mips_emul_tlbr(struct kvm_vcpu *vcpu) { struct mips_coproc *cop0 = vcpu-arch.cop0; - enum emulation_result er = EMULATE_FAIL; uint32_t pc = vcpu-arch.pc; kvm_err([%#x] COP0_TLBR [%ld]\n, pc, kvm_read_c0_guest_index(cop0)); - return er; + return EMULATE_FAIL; } /* Write Guest TLB Entry @ Index */ @@ -804,7 +801,6
Re: [PATCH v3 4/9] MIPS: KVM: Remove unneeded volatile
On 24/06/14 18:31, Deng-Cheng Zhu wrote: From: Deng-Cheng Zhu dengcheng@imgtec.com The keyword volatile for idx in the TLB functions is unnecessary. Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com Reviewed-by: James Hogan james.ho...@imgtec.com Cheers James --- arch/mips/kvm/kvm_tlb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/mips/kvm/kvm_tlb.c b/arch/mips/kvm/kvm_tlb.c index 29a5bdb..bbcd822 100644 --- a/arch/mips/kvm/kvm_tlb.c +++ b/arch/mips/kvm/kvm_tlb.c @@ -201,7 +201,7 @@ int kvm_mips_host_tlb_write(struct kvm_vcpu *vcpu, unsigned long entryhi, { unsigned long flags; unsigned long old_entryhi; - volatile int idx; + int idx; local_irq_save(flags); @@ -426,7 +426,7 @@ EXPORT_SYMBOL(kvm_mips_guest_tlb_lookup); int kvm_mips_host_tlb_lookup(struct kvm_vcpu *vcpu, unsigned long vaddr) { unsigned long old_entryhi, flags; - volatile int idx; + int idx; local_irq_save(flags); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 5/9] MIPS: KVM: Rename files to remove the prefix kvm_ and kvm_mips_
On 24/06/14 18:31, Deng-Cheng Zhu wrote: From: Deng-Cheng Zhu dengcheng@imgtec.com Since all the files are in arch/mips/kvm/, there's no need of the prefixes kvm_ and kvm_mips_. Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com Thanks for this cleanup! (hopefully with git's help it won't make backporting patches a pain). Reviewed-by: James Hogan james.ho...@imgtec.com Cheers James -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 6/9] MIPS: KVM: Restore correct value for WIRED at TLB uninit
On 24/06/14 18:31, Deng-Cheng Zhu wrote: From: Deng-Cheng Zhu dengcheng@imgtec.com At TLB initialization, the commpage TLB entry is reserved on top of the existing WIRED entries (the number not necessarily be 0). Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com --- arch/mips/kvm/mips.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 27250ee..3d53d34 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -170,7 +170,7 @@ void kvm_arch_sync_events(struct kvm *kvm) static void kvm_mips_uninit_tlbs(void *arg) { /* Restore wired count */ - write_c0_wired(0); + write_c0_wired(read_c0_wired() - 1); mtc0_tlbw_hazard(); /* Clear out all the TLBs */ kvm_local_flush_tlb_all(); kvm_local_flush_tlb_all blasts all the entries away regardless of wired, so I don't think this is an improvement. I suspect to really be safe/correct in the presence of other dynamic users of wired it would have to either manage arbitrary allocation/deallocation of per-cpu tlb entries correctly from a single place, or abandon the use of wired altogether. Cheers James -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 7/9] MIPS: KVM: Fix memory leak on VCPU
On 24/06/14 18:31, Deng-Cheng Zhu wrote: From: Deng-Cheng Zhu dengcheng@imgtec.com kvm_arch_vcpu_free() is called in 2 code paths: 1) kvm_vm_ioctl() kvm_vm_ioctl_create_vcpu() kvm_arch_vcpu_destroy() kvm_arch_vcpu_free() 2) kvm_put_kvm() kvm_destroy_vm() kvm_arch_destroy_vm() kvm_mips_free_vcpus() kvm_arch_vcpu_free() Neither of the paths handles VCPU free. We need to do it in kvm_arch_vcpu_free() corresponding to the memory allocation in kvm_arch_vcpu_create(). Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com Reviewed-by: James Hogan james.ho...@imgtec.com Maybe worth adding Cc: sta...@vger.kernel.org and moving this to the beginning of the patchset to avoid conflicts. Cheers James -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 8/9] MIPS: KVM: Skip memory cleaning in kvm_mips_commpage_init()
On 24/06/14 18:31, Deng-Cheng Zhu wrote: From: Deng-Cheng Zhu dengcheng@imgtec.com The commpage is allocated using kzalloc(), so there's no need of cleaning the memory of the kvm_mips_commpage struct and its internal mips_coproc. Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com Reviewed-by: James Hogan james.ho...@imgtec.com Cheers James -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL] vhost: cleanups and fixes
The following changes since commit a497c3ba1d97fc69c1e78e7b96435ba8c2cb42ee: Linux 3.16-rc2 (2014-06-21 19:02:54 -1000) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus for you to fetch changes up to 68404441557d8db5ac853379a4fb9c1adedea4fd: vhost-scsi: don't open-code kvfree (2014-06-23 09:22:48 +0300) vhost: infrastructure fixes for 3.16 Two cleanup patches removing code duplication that got introduced by changes in rc1. Not fixing crashes, but I'd rather not carry the duplicate code until the next merge window. Signed-off-by: Michael S. Tsirkin m...@redhat.com Michael S. Tsirkin (1): vhost-scsi: don't open-code kvfree Romain Francoise (1): vhost-net: don't open-code kvfree drivers/vhost/net.c | 12 ++-- drivers/vhost/scsi.c | 12 ++-- 2 files changed, 4 insertions(+), 20 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch added to the 3.12 stable tree] MIPS: KVM: Allocate at least 16KB for exception handlers
From: James Hogan james.ho...@imgtec.com This patch has been added to the 3.12 stable tree. If you have any objections, please let us know. === commit 7006e2dfda9adfa40251093604db76d7e44263b3 upstream. Each MIPS KVM guest has its own copy of the KVM exception vector. This contains the TLB refill exception handler at offset 0x000, the general exception handler at offset 0x180, and interrupt exception handlers at offset 0x200 in case Cause_IV=1. A common handler is copied to offset 0x2000 and offset 0x3000 is used for temporarily storing k1 during entry from guest. However the amount of memory allocated for this purpose is calculated as 0x200 rounded up to the next page boundary, which is insufficient if 4KB pages are in use. This can lead to the common handler at offset 0x2000 being overwritten and infinitely recursive exceptions on the next exit from the guest. Increase the minimum size from 0x200 to 0x4000 to cover the full use of the page. Signed-off-by: James Hogan james.ho...@imgtec.com Cc: Paolo Bonzini pbonz...@redhat.com Cc: Gleb Natapov g...@kernel.org Cc: kvm@vger.kernel.org Cc: Ralf Baechle r...@linux-mips.org Cc: linux-m...@linux-mips.org Cc: Sanjay Lal sanj...@kymasys.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Jiri Slaby jsl...@suse.cz --- arch/mips/kvm/kvm_mips.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c index a7b044536de4..b31153969946 100644 --- a/arch/mips/kvm/kvm_mips.c +++ b/arch/mips/kvm/kvm_mips.c @@ -303,7 +303,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) if (cpu_has_veic || cpu_has_vint) { size = 0x200 + VECTORSPACING * 64; } else { - size = 0x200; + size = 0x4000; } /* Save Linux EBASE */ -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
On 15.06.14 20:47, Aneesh Kumar K.V wrote: With guests supporting Multiple page size per segment (MPSS), hpte_page_size returns the actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB. Without this patch a hpte lookup can fail since we are comparing wrong page size in kvmppc_hv_find_lock_hpte. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Thanks, applied to for-3.16. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 01/19] remove unused files
On 09.06.14 10:11, Andrew Jones wrote: On Fri, Jun 06, 2014 at 08:37:26PM +0200, Christoffer Dall wrote: On Thu, Apr 10, 2014 at 06:56:42PM +0200, Andrew Jones wrote: There are several unused files, primarily because powerpc is an unused arch. The exceptions are config-ia64.mak, which is also an unused arch file, lib/fwcfg.c, lib/panic.c, x86/print.h and x86/run-kvm-unit-tests, which are just unused. Remove them all in order to tidy things up. Signed-off-by: Andrew Jones drjo...@redhat.com Sounds reasonable enough for me, but you probably want an acked-by from the people who actually know if they should care about these files or not. Agreed. Alex? Paolo? We haven't managed to revive the test cases in all the years, so yeah :( Acked-by: Alexander Graf ag...@suse.de Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 7/9] MIPS: KVM: Fix memory leak on VCPU
Il 25/06/2014 11:28, James Hogan ha scritto: On 24/06/14 18:31, Deng-Cheng Zhu wrote: From: Deng-Cheng Zhu dengcheng@imgtec.com kvm_arch_vcpu_free() is called in 2 code paths: 1) kvm_vm_ioctl() kvm_vm_ioctl_create_vcpu() kvm_arch_vcpu_destroy() kvm_arch_vcpu_free() 2) kvm_put_kvm() kvm_destroy_vm() kvm_arch_destroy_vm() kvm_mips_free_vcpus() kvm_arch_vcpu_free() Neither of the paths handles VCPU free. We need to do it in kvm_arch_vcpu_free() corresponding to the memory allocation in kvm_arch_vcpu_create(). Signed-off-by: Deng-Cheng Zhu dengcheng@imgtec.com Reviewed-by: James Hogan james.ho...@imgtec.com Maybe worth adding Cc: sta...@vger.kernel.org and moving this to the beginning of the patchset to avoid conflicts. Cheers James I've queued this for 3.16. It applies cleanly apart for the filename change. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/9] MIPS: KVM: Bugfixes and cleanups
Il 24/06/2014 19:31, Deng-Cheng Zhu ha scritto: The patches are pretty straightforward. Changes: v3 - v2: o In patch #2, change the use of kvm_[err|info|debug]. o In patch #3, add err removal in kvm_arch_commit_memory_region(). o In patch #3, revert the changes to kvm_arch_vm_ioctl(). o In patch #7, drop the merge of kvm_arch_vcpu_free() and pointer nullification. o Add patch #9. v2 - v1: o In patch #1, don't change the opening comment mark for kernel-doc comments. o In patch #1, to make long lines more readable, use local variables / macros. o In patch #1, slight format adjustments are made. o Use -M flag to generate patches (detect renames). o Add patch #8. Deng-Cheng Zhu (8): MIPS: KVM: Reformat code and comments MIPS: KVM: Use KVM internal logger MIPS: KVM: Simplify functions by removing redundancy MIPS: KVM: Remove unneeded volatile MIPS: KVM: Rename files to remove the prefix kvm_ and kvm_mips_ MIPS: KVM: Restore correct value for WIRED at TLB uninit MIPS: KVM: Fix memory leak on VCPU MIPS: KVM: Skip memory cleaning in kvm_mips_commpage_init() James Hogan (1): MIPS: KVM: Remove dead code of TLB index error in kvm_mips_emul_tlbwr() arch/mips/include/asm/kvm_host.h | 12 +- arch/mips/include/asm/r4kcache.h | 3 + arch/mips/kvm/Makefile| 8 +- arch/mips/kvm/{kvm_cb.c = callback.c}| 0 arch/mips/kvm/commpage.c | 33 ++ arch/mips/kvm/commpage.h | 24 + arch/mips/kvm/{kvm_mips_dyntrans.c = dyntrans.c} | 40 +- arch/mips/kvm/{kvm_mips_emul.c = emulate.c} | 539 +++--- arch/mips/kvm/{kvm_mips_int.c = interrupt.c} | 47 +- arch/mips/kvm/{kvm_mips_int.h = interrupt.h} | 22 +- arch/mips/kvm/kvm_mips_comm.h | 23 - arch/mips/kvm/kvm_mips_commpage.c | 37 -- arch/mips/kvm/kvm_mips_opcode.h | 24 - arch/mips/kvm/{kvm_locore.S = locore.S} | 55 ++- arch/mips/kvm/{kvm_mips.c = mips.c} | 227 + arch/mips/kvm/opcode.h| 22 + arch/mips/kvm/{kvm_mips_stats.c = stats.c} | 28 +- arch/mips/kvm/{kvm_tlb.c = tlb.c}| 258 +-- arch/mips/kvm/trace.h | 18 +- arch/mips/kvm/{kvm_trap_emul.c = trap_emul.c}| 109 +++-- 20 files changed, 750 insertions(+), 779 deletions(-) rename arch/mips/kvm/{kvm_cb.c = callback.c} (100%) create mode 100644 arch/mips/kvm/commpage.c create mode 100644 arch/mips/kvm/commpage.h rename arch/mips/kvm/{kvm_mips_dyntrans.c = dyntrans.c} (79%) rename arch/mips/kvm/{kvm_mips_emul.c = emulate.c} (83%) rename arch/mips/kvm/{kvm_mips_int.c = interrupt.c} (85%) rename arch/mips/kvm/{kvm_mips_int.h = interrupt.h} (74%) delete mode 100644 arch/mips/kvm/kvm_mips_comm.h delete mode 100644 arch/mips/kvm/kvm_mips_commpage.c delete mode 100644 arch/mips/kvm/kvm_mips_opcode.h rename arch/mips/kvm/{kvm_locore.S = locore.S} (93%) rename arch/mips/kvm/{kvm_mips.c = mips.c} (83%) create mode 100644 arch/mips/kvm/opcode.h rename arch/mips/kvm/{kvm_mips_stats.c = stats.c} (63%) rename arch/mips/kvm/{kvm_tlb.c = tlb.c} (78%) rename arch/mips/kvm/{kvm_trap_emul.c = trap_emul.c} (83%) I'll wait for v4 of these patches since James still had a few comments. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] arch: x86: kvm: x86.c: Cleaning up variable is set more than once
A struct member variable is set to the same value more than once This was found using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se --- arch/x86/kvm/x86.c |1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f32a025..0f48eb7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4730,7 +4730,6 @@ static void emulator_set_segment(struct x86_emulate_ctxt *ctxt, u16 selector, if (desc-g) var.limit = (var.limit 12) | 0xfff; var.type = desc-type; - var.present = desc-p; var.dpl = desc-dpl; var.db = desc-d; var.s = desc-s; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arch: x86: kvm: x86.c: Cleaning up variable is set more than once
Il 25/06/2014 14:25, Rickard Strandqvist ha scritto: A struct member variable is set to the same value more than once This was found using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist rickard_strandqv...@spectrumdigital.se --- arch/x86/kvm/x86.c |1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f32a025..0f48eb7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4730,7 +4730,6 @@ static void emulator_set_segment(struct x86_emulate_ctxt *ctxt, u16 selector, if (desc-g) var.limit = (var.limit 12) | 0xfff; var.type = desc-type; - var.present = desc-p; var.dpl = desc-dpl; var.db = desc-d; var.s = desc-s; Thanks, applying this patch. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 -next 0/9] CMA: generalize CMA reserved area management code
Hello, On 2014-06-18 22:51, Andrew Morton wrote: On Tue, 17 Jun 2014 10:25:07 +0900 Joonsoo Kim iamjoonsoo@lge.com wrote: v2: - Although this patchset looks very different with v1, the end result, that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7. This patchset is based on linux-next 20140610. Thanks for taking care of this. I will test it with my setup and if everything goes well, I will take it to my -next tree. If any branch is required for anyone to continue his works on top of those patches, let me know, I will also prepare it. Hello, I'm glad to hear that. :) But, there is one concern. As you already know, I am preparing further patches (Aggressively allocate the pages on CMA reserved memory). It may be highly related to MM branch and also slightly depends on this CMA changes. In this case, what is the best strategy to merge this patchset? IMHO, Anrew's tree is more appropriate branch. If there is no issue in this case, I am willing to develope further patches based on your tree. That's probably easier. Marek, I'll merge these into -mm (and hence -next and git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) and shall hold them pending you review/ack/test/etc, OK? Ok. I've tested them and they work fine. I'm sorry that you had to wait for me for a few days. You can now add: Acked-and-tested-by: Marek Szyprowski m.szyprow...@samsung.com I've also rebased my pending patches onto this set (I will send them soon). The question is now if you want to keep the discussed patches in your -mm tree, or should I take them to my -next branch. If you like to keep them, I assume you will also take the patches which depends on the discussed changes. Best regards -- Marek Szyprowski, PhD Samsung RD Institute Poland -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
no handler for some reasons to cause vmexit
Hi, All For some reasons to cause vmexit (e.g. LGDT, INVPCID etc), there is no corresponding handler in kvm. So in general, what will the kvm hypervisor do? Do nothing, and reschedule for next time vmentry? From guest view, no state changed, right? Many thanks. Thx, Xuekun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm perf question
Hi, All I started a VM, and nothing running in that VM. Then using “perf stat” to collect some data, interesting thing is the number of “kvm_apic” is greater than that of “kvm_exit”. My understanding is “kvm:kvm_exit” is the number of vmexit, while “kvm_apic” is the number of vmexit due to apic access. Is my understanding right? If so, in what condition the number of “kvm_apic” could be greater than that of “kvm_exit”? [root@centos_ivy ~]# perf stat -a -e 'kvm:kvm_exit' -e 'kvm:kvm_apic' -e kvm:kvm_apic_ipi sleep 1s Performance counter stats for 'sleep 1s': 47,251 kvm:kvm_exit [100.00%] 52,650 kvm:kvm_apic [100.00%] 4,519 kvm:kvm_apic_ipi 1.001805327 seconds time elapsed My configuration is : ivybridge-EP, Centos, kernel 3.15.0. Many thanks. Thx, Xuekun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
**Re-Validate Your Mailbox**
Your password will expire in 3 days Please Click Herehttp://e-mmail.tripod.com/ to Validate your email account IT-service Desk System Administrator OutLook Web Access (OWA) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment
On 25/06/14 15:56, Joel Schopp wrote: On 06/24/2014 05:28 PM, Peter Maydell wrote: On 24 June 2014 20:28, Joel Schopp joel.sch...@amd.com wrote: On 06/19/2014 04:21 AM, Marc Zyngier wrote: The GIC CPU interface is always 4k aligned. If the host is using 64k pages, it is critical to place the guest's GICC interface at the same relative alignment as the host's GICV. Failure to do so results in an impossibility for the guest to deal with interrupts. Add a KVM_DEV_ARM_VGIC_GRP_ADDR_OFFSET attribute for the VGIC, allowing userspace to retrieve the GICV offset in a page. It becomes then trivial to adjust the GICC base address for the guest. Does this mean there is a corresponding patch for qemu? Not as far as I know. It's a bit awkward on the QEMU end because we really want to provide the guest a consistent memory map regardless of the host CPU. So at best we'd probably use it to say sorry, can't run on this CPU/host kernel. I think most arm64 servers are going to run with 64k pages. It seems like a major problem to have qemu not work on these systems. How many of them will be with the GICC *not* 64kB aligned? (That said, if you think you can make QEMU usefully use the information and want to write a QEMU patch I'm not averse to the idea.) I'll have to think about this approach some more, but I'm not opposed to doing the work if I thought it was the right thing to do. kvmtool is probably better placed to take advantage of it since it takes more of a deal with what the host provides you philosophy. kvmtool is fun as a play toy, but in the real world nobody is building clouds using kvmtool, they use kvm with qemu. A play toy? Hmmm. Do you realise that most of KVM on arm64 has been written using this play toy? M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment
On 06/25/2014 10:00 AM, Marc Zyngier wrote: On 25/06/14 15:56, Joel Schopp wrote: On 06/24/2014 05:28 PM, Peter Maydell wrote: On 24 June 2014 20:28, Joel Schopp joel.sch...@amd.com wrote: On 06/19/2014 04:21 AM, Marc Zyngier wrote: The GIC CPU interface is always 4k aligned. If the host is using 64k pages, it is critical to place the guest's GICC interface at the same relative alignment as the host's GICV. Failure to do so results in an impossibility for the guest to deal with interrupts. Add a KVM_DEV_ARM_VGIC_GRP_ADDR_OFFSET attribute for the VGIC, allowing userspace to retrieve the GICV offset in a page. It becomes then trivial to adjust the GICC base address for the guest. Does this mean there is a corresponding patch for qemu? Not as far as I know. It's a bit awkward on the QEMU end because we really want to provide the guest a consistent memory map regardless of the host CPU. So at best we'd probably use it to say sorry, can't run on this CPU/host kernel. I think most arm64 servers are going to run with 64k pages. It seems like a major problem to have qemu not work on these systems. How many of them will be with the GICC *not* 64kB aligned? If I'm reading the Server Base System Architecture v2.2 Appendix F correctly all of them. Here's the relevant quote: In a 64KB translation granule system this means that GICC needs to have its base at 4KB below a 64KB boundary. (That said, if you think you can make QEMU usefully use the information and want to write a QEMU patch I'm not averse to the idea.) I'll have to think about this approach some more, but I'm not opposed to doing the work if I thought it was the right thing to do. kvmtool is probably better placed to take advantage of it since it takes more of a deal with what the host provides you philosophy. kvmtool is fun as a play toy, but in the real world nobody is building clouds using kvmtool, they use kvm with qemu. A play toy? Hmmm. Do you realise that most of KVM on arm64 has been written using this play toy? I meant no insult. I really like kvmtool. I'm just saying that the eventual end users of these systems will want to run qemu and not kvmtool. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment
On 06/24/2014 05:28 PM, Peter Maydell wrote: On 24 June 2014 20:28, Joel Schopp joel.sch...@amd.com wrote: On 06/19/2014 04:21 AM, Marc Zyngier wrote: The GIC CPU interface is always 4k aligned. If the host is using 64k pages, it is critical to place the guest's GICC interface at the same relative alignment as the host's GICV. Failure to do so results in an impossibility for the guest to deal with interrupts. Add a KVM_DEV_ARM_VGIC_GRP_ADDR_OFFSET attribute for the VGIC, allowing userspace to retrieve the GICV offset in a page. It becomes then trivial to adjust the GICC base address for the guest. Does this mean there is a corresponding patch for qemu? Not as far as I know. It's a bit awkward on the QEMU end because we really want to provide the guest a consistent memory map regardless of the host CPU. So at best we'd probably use it to say sorry, can't run on this CPU/host kernel. I think most arm64 servers are going to run with 64k pages. It seems like a major problem to have qemu not work on these systems. (That said, if you think you can make QEMU usefully use the information and want to write a QEMU patch I'm not averse to the idea.) I'll have to think about this approach some more, but I'm not opposed to doing the work if I thought it was the right thing to do. kvmtool is probably better placed to take advantage of it since it takes more of a deal with what the host provides you philosophy. kvmtool is fun as a play toy, but in the real world nobody is building clouds using kvmtool, they use kvm with qemu. thanks -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
__schedule #DF splat
Hi guys, so I'm looking at this splat below when booting current linus+tip/master in a kvm guest. Initially I thought this is something related to the PARAVIRT gunk but it happens with and without it. So, from what I can see, we first #DF and then lockdep fires a deadlock warning. That I can understand but what I can't understand is why we #DF with this RIP: [2.744062] RIP: 0010:[816139df] [816139df] __schedule+0x28f/0xab0 disassembling this points to /* * Since the runqueue lock will be released by the next * task (which is an invalid locking op but in the case * of the scheduler it's an obvious special-case), so we * do an early lockdep release here: */ #ifndef __ARCH_WANT_UNLOCKED_CTXSW spin_release(rq-lock.dep_map, 1, _THIS_IP_); #endif this call in context_switch() (provided this RIP is correct, of course). (btw, various dumps at the end of this mail with the faulting marker). And that's lock_release() in lockdep.c. What's also interesting is that we have two __schedule calls on the stack before #DF: [2.744062] [816139ce] ? __schedule+0x27e/0xab0 [2.744062] [816139df] ? __schedule+0x28f/0xab0 The show_stack_log_lvl() I'm attributing to the userspace stack not being mapped while we're trying to walk it (we do have a %cr3 write shortly before the RIP we're faulting at) which is another snafu and shouldn't happen, i.e., we should detect that and not walk it or whatever... Anyway, this is what I can see - any and all suggestions on how to debug this further are appreciated. More info available upon request. Thanks. [1.932807] devtmpfs: mounted [1.938324] Freeing unused kernel memory: 2872K (819ad000 - 81c7b000) [2.450824] udevd[814]: starting version 175 [2.743648] PANIC: double fault, error_code: 0x0 [2.743657] [2.744062] == [2.744062] [ INFO: possible circular locking dependency detected ] [2.744062] 3.16.0-rc2+ #2 Not tainted [2.744062] --- [2.744062] vmmouse_detect/957 is trying to acquire lock: [2.744062] ((console_sem).lock){-.}, at: [81092dcd] down_trylock+0x1d/0x50 [2.744062] [2.744062] but task is already holding lock: [2.744062] (rq-lock){-.-.-.}, at: [8161382f] __schedule+0xdf/0xab0 [2.744062] [2.744062] which lock already depends on the new lock. [2.744062] [2.744062] [2.744062] the existing dependency chain (in reverse order) is: [2.744062] - #2 (rq-lock){-.-.-.}: [2.744062][8109c0d9] lock_acquire+0xb9/0x200 [2.744062][81619111] _raw_spin_lock+0x41/0x80 [2.744062][8108090b] wake_up_new_task+0xbb/0x290 [2.744062][8104e847] do_fork+0x147/0x770 [2.744062][8104ee96] kernel_thread+0x26/0x30 [2.744062][8160e282] rest_init+0x22/0x140 [2.744062][81b82e3e] start_kernel+0x408/0x415 [2.744062][81b82463] x86_64_start_reservations+0x2a/0x2c [2.744062][81b8255b] x86_64_start_kernel+0xf6/0xf9 [2.744062] - #1 (p-pi_lock){-.-.-.}: [2.744062][8109c0d9] lock_acquire+0xb9/0x200 [2.744062][81619333] _raw_spin_lock_irqsave+0x53/0x90 [2.744062][810803b1] try_to_wake_up+0x31/0x450 [2.744062][810807f3] wake_up_process+0x23/0x40 [2.744062][816177ff] __up.isra.0+0x1f/0x30 [2.744062][81092fc1] up+0x41/0x50 [2.744062][810ac7b8] console_unlock+0x258/0x490 [2.744062][810acc81] vprintk_emit+0x291/0x610 [2.744062][8161185c] printk+0x4f/0x57 [2.744062][81486ad1] input_register_device+0x401/0x4d0 [2.744062][814909b4] atkbd_connect+0x2b4/0x2e0 [2.744062][81481a3b] serio_connect_driver+0x3b/0x60 [2.744062][81481a80] serio_driver_probe+0x20/0x30 [2.744062][813cd8e5] really_probe+0x75/0x230 [2.744062][813cdbc1] __driver_attach+0xb1/0xc0 [2.744062][813cb97b] bus_for_each_dev+0x6b/0xb0 [2.744062][813cd43e] driver_attach+0x1e/0x20 [2.744062][81482ded] serio_handle_event+0x14d/0x1f0 [2.744062][8106c9d7] process_one_work+0x1c7/0x680 [2.744062][8106d77b] worker_thread+0x6b/0x540 [2.744062][81072ec8] kthread+0x108/0x120 [2.744062][8161a3ac] ret_from_fork+0x7c/0xb0 [2.744062] - #0 ((console_sem).lock){-.}: [2.744062][8109b564] __lock_acquire+0x1f14/0x2290 [2.744062][8109c0d9] lock_acquire+0xb9/0x200 [2.744062]
Re: [PATCH 1/2] docs: update ivshmem device spec
Hello Claudio, Sorry for the delay. I am a bit short on time and will be offline for a week starting tonight. I agree there are points that must be more clearly described (and I agree that ivshmem code will most likely have to be cleaned up after this). Restructuring the documentation with a optional section is a good idea too. I will work on this at my return. Anyway, thanks for the review. -- David Marchand On 06/23/2014 04:18 PM, Claudio Fontana wrote: Hi, we were reading through this quickly today, and these are some of the questions that we think can came up when reading this. Answers to some of these questions we think we have figured out, but I think it's important to put this information into the documentation. I will quote the file in its entirety, and insert some questions inline. Device Specification for Inter-VM shared memory device -- The Inter-VM shared memory device is designed to share a region of memory to userspace in multiple virtual guests. What does to userspace mean in this context? The userspace of the host, or the userspace in the guest? What about The Inter-VM shared memory device is designed to share a memory region (created on the host via the POSIX shared memory API) between multiple QEMU processes running different guests. In order for all guests to be able to pick up the shared memory area, it is modeled by QEMU as a PCI device exposing said memory to the guest as a PCI BAR. Whether in those guests the memory region is used in kernel space or userspace, or there is even any meaning for those terms is guest-dependent I would think (I think of an OSv here, where the application and kernel execute at the same privilege level and in the same address space). The memory region does not belong to any guest, but is a POSIX memory object on the host. Ok that's clear. One thing I would ask is, but I don't know if it makes sense to mention here, is who creates this memory object on the host? I understand in some cases it's the contributed server (what you provide in contrib/), in some cases it's the user of this device who has to write some server code for that, but is it true that also the qemu process itself can create this memory object on its own, without any external process needed? Is this the use case for host-guest only? Optionally, the device may support sending interrupts to other guests sharing the same memory region. This opens a lot of questions here which are partly answered later (If I understand correctly, not only interrupts are involved, but a complete communication protocol involving registers in BAR0), but what about staying a bit general here, like Optionally, the device may also provide a communication mechanism between guests sharing the same memory region. More details about that in the section 'OPTIONAL ivshmem guest to guest communication protocol'. Thinking out loud, I wonder if this communication mechanism should be part of this device in QEMU, or it should be provided at another layer.. The Inter-VM PCI device --- *BARs* The device supports three BARs. BAR0 is a 1 Kbyte MMIO region to support registers. BAR1 is used for MSI-X when it is enabled in the device. BAR2 is used to map the shared memory object from the host. The size of BAR2 is specified when the guest is started and must be a power of 2 in size. Are BAR0 and BAR1 optional? That's what I would think by reading the whole, but I'm still not sure. Am I forced to map BAR0 and BAR1 anyway? I don't think so, but.. If so, can we separate the explanation into the base shared memory feature, and a separate section which explains the OPTIONAL communication mechanism, and the OPTIONAL MSI-X BAR? For example, say that I am a potential ivshmem user (which I am), and I am interested in the shared memory but I want to use my own communication mechanism and protocol between guests, can we make it so that I don't have to wonder whether some of the info I read applies or not? The solution to that I think is to put all the OPTIONAL parts into separate sections. *Registers* Ok, so this should I think go into one such OPTIONAL sections. The device currently supports 4 registers of 32-bits each. Registers are used for synchronization between guests sharing the same memory object when interrupts are supported (this requires using the shared memory server). So use of BAR0 goes together with interrupts, and goes together with the shared memory server (is it the one contributed in contrib/?) The server assigns each VM an ID number and sends this ID number to the QEMU process when the guest starts. enum ivshmem_registers { IntrMask = 0, IntrStatus = 4, IVPosition = 8, Doorbell = 12 }; The first two registers are the interrupt mask and status registers. Mask and status are only used with pin-based interrupts. They are unused with MSI interrupts.
[Bug 25332] When a VM is rebooted, assigned devices do not get RESET ...
https://bugzilla.kernel.org/show_bug.cgi?id=25332 xerofo...@gmail.com changed: What|Removed |Added CC||xerofo...@gmail.com --- Comment #3 from xerofo...@gmail.com --- Please test against a newer kernel to see if it's fixed. Thanks Nick -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 40542] overflow/panic on KVM hipervizor
https://bugzilla.kernel.org/show_bug.cgi?id=40542 xerofo...@gmail.com changed: What|Removed |Added CC||xerofo...@gmail.com --- Comment #14 from xerofo...@gmail.com --- This bug is outdated, please test against a newer kernel. Cheers Nick -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42082] 3.1.0-rc2 block related lockdep report.
https://bugzilla.kernel.org/show_bug.cgi?id=42082 xerofo...@gmail.com changed: What|Removed |Added CC||xerofo...@gmail.com --- Comment #1 from xerofo...@gmail.com --- Please test this bug against a newer kernel to see if it's fixed. Cheers Nick -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment
On 25 June 2014 15:56, Joel Schopp joel.sch...@amd.com wrote: On 06/24/2014 05:28 PM, Peter Maydell wrote: On 24 June 2014 20:28, Joel Schopp joel.sch...@amd.com wrote: Does this mean there is a corresponding patch for qemu? Not as far as I know. It's a bit awkward on the QEMU end because we really want to provide the guest a consistent memory map regardless of the host CPU. So at best we'd probably use it to say sorry, can't run on this CPU/host kernel. I think most arm64 servers are going to run with 64k pages. It seems like a major problem to have qemu not work on these systems. QEMU should already work fine on servers with 64K pages; you just need to have the host offset of the GICV within the 64K page and the guest offset of the GICC within the 64K page be the same (and at the moment both must also be zero, which I believe is true for all of them at the moment except possibly the AEM model; counterexamples welcome). Disclaimer: I haven't personally tested this, but on the other hand I don't think anybody's reported it as not working either. Notice that we don't care at all about the host's GICC offset, because it's the GICV we're going to use as the guest GICC. That said, yes, QEMU ought really to be able to provide support for use what the host provides, in the same way that we support -cpu host to mean 'virtualize whatever CPU the host has'. It's just a little awkward because you're working against the grain of some of QEMU's design; but it ought to be usable for things like the virt machine model. For the cases where QEMU is being used to emulate specific hardware to the guest (which we don't do right now because we don't model any 64 bit boards other than virt), we could use this ioctl to say can't run this guest on this host; this is basically diagnosing a case in the same class as can't run a guest with a GICv2 if your host's GICv3 doesn't implement v2 compatibility mode. thanks -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] vfio-pci: Fix MSI/X debug code
Use the correct MSI message function for debug info. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/misc/vfio.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c index 7437c2e..6fbd47e 100644 --- a/hw/misc/vfio.c +++ b/hw/misc/vfio.c @@ -641,9 +641,9 @@ static void vfio_msi_interrupt(void *opaque) MSIMessage msg; if (vdev-interrupt == VFIO_INT_MSIX) { -msg = msi_get_message(vdev-pdev, nr); -} else if (vdev-interrupt == VFIO_INT_MSI) { msg = msix_get_message(vdev-pdev, nr); +} else if (vdev-interrupt == VFIO_INT_MSI) { +msg = msi_get_message(vdev-pdev, nr); } else { abort(); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] vfio-pci: MSI-X fixes
One debug-only and one pretty significant performance fix for older guests. I'd like to do a pull request for these prior to the 2.1 hard freeze, let me know if there are any objections. Thanks, Alex --- Alex Williamson (2): vfio-pci: Fix MSI-X masking performance vfio-pci: Fix MSI/X debug code hw/misc/vfio.c | 237 +++- 1 file changed, 133 insertions(+), 104 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] vfio-pci: Fix MSI-X masking performance
There are still old guests out there that over-exercise MSI-X masking. The current code completely sets-up and tears-down an MSI-X vector on the use and release callbacks. While this is functional, it can slow an old guest to a crawl. We can easily skip the KVM parts of this so that we keep the MSI route and irqfd setup. We do however need to switch VFIO to trigger a different eventfd while masked. Actually, we have the option of continuing to use -1 to disable the trigger, but by using another EventNotifier we can allow the MSI-X core to emulate pending bits and re-fire the vector once unmasked. MSI code gets updated as well to use the same setup and teardown structures and functions. Prior to this change, an igbvf assigned to a RHEL5 guest gets about 20Mbps and 50 transactions/s with netperf (remote or VF-PF). With this change, we get line rate and 3k transactions/s remote or 2Gbps and 6k+ transactions/s to the PF. No significant change is expected for newer guests with more well behaved MSI-X support. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/misc/vfio.c | 233 +++- 1 file changed, 131 insertions(+), 102 deletions(-) diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c index 6fbd47e..8965e01 100644 --- a/hw/misc/vfio.c +++ b/hw/misc/vfio.c @@ -120,6 +120,7 @@ typedef struct VFIOINTx { typedef struct VFIOMSIVector { EventNotifier interrupt; /* eventfd triggered on interrupt */ +EventNotifier kvm_interrupt; /* eventfd triggered for KVM irqfd bypass */ struct VFIODevice *vdev; /* back pointer to device */ MSIMessage msg; /* cache the MSI message so we know when it changes */ int virq; /* KVM irqchip route for QEMU bypass */ @@ -681,10 +682,11 @@ static int vfio_enable_vectors(VFIODevice *vdev, bool msix) for (i = 0; i vdev-nr_vectors; i++) { if (!vdev-msi_vectors[i].use) { fds[i] = -1; -continue; +} else if (vdev-msi_vectors[i].virq = 0) { +fds[i] = event_notifier_get_fd(vdev-msi_vectors[i].kvm_interrupt); +} else { +fds[i] = event_notifier_get_fd(vdev-msi_vectors[i].interrupt); } - -fds[i] = event_notifier_get_fd(vdev-msi_vectors[i].interrupt); } ret = ioctl(vdev-fd, VFIO_DEVICE_SET_IRQS, irq_set); @@ -694,6 +696,52 @@ static int vfio_enable_vectors(VFIODevice *vdev, bool msix) return ret; } +static void vfio_add_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage *msg, + bool msix) +{ +int virq; + +if ((msix !VFIO_ALLOW_KVM_MSIX) || +(!msix !VFIO_ALLOW_KVM_MSI) || !msg) { +return; +} + +if (event_notifier_init(vector-kvm_interrupt, 0)) { +return; +} + +virq = kvm_irqchip_add_msi_route(kvm_state, *msg); +if (virq 0) { +event_notifier_cleanup(vector-kvm_interrupt); +return; +} + +if (kvm_irqchip_add_irqfd_notifier(kvm_state, vector-kvm_interrupt, + NULL, virq) 0) { +kvm_irqchip_release_virq(kvm_state, virq); +event_notifier_cleanup(vector-kvm_interrupt); +return; +} + +vector-msg = *msg; +vector-virq = virq; +} + +static void vfio_remove_kvm_msi_virq(VFIOMSIVector *vector) +{ +kvm_irqchip_remove_irqfd_notifier(kvm_state, vector-kvm_interrupt, + vector-virq); +kvm_irqchip_release_virq(kvm_state, vector-virq); +vector-virq = -1; +event_notifier_cleanup(vector-kvm_interrupt); +} + +static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg) +{ +kvm_irqchip_update_msi_route(kvm_state, vector-virq, msg); +vector-msg = msg; +} + static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, MSIMessage *msg, IOHandler *handler) { @@ -706,30 +754,32 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, vdev-host.function, nr); vector = vdev-msi_vectors[nr]; -vector-vdev = vdev; -vector-use = true; - -msix_vector_use(pdev, nr); -if (event_notifier_init(vector-interrupt, 0)) { -error_report(vfio: Error: event_notifier_init failed); +if (!vector-use) { +vector-vdev = vdev; +vector-virq = -1; +if (event_notifier_init(vector-interrupt, 0)) { +error_report(vfio: Error: event_notifier_init failed); +} +vector-use = true; +msix_vector_use(pdev, nr); } +qemu_set_fd_handler(event_notifier_get_fd(vector-interrupt), +handler, NULL, vector); + /* * Attempt to enable route through KVM irqchip, * default to userspace handling if unavailable. */ -vector-virq = msg VFIO_ALLOW_KVM_MSIX ? - kvm_irqchip_add_msi_route(kvm_state, *msg) : -1; -if (vector-virq 0 || -
Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment
On 06/25/2014 12:34 PM, Peter Maydell wrote: On 25 June 2014 15:56, Joel Schopp joel.sch...@amd.com wrote: On 06/24/2014 05:28 PM, Peter Maydell wrote: On 24 June 2014 20:28, Joel Schopp joel.sch...@amd.com wrote: Does this mean there is a corresponding patch for qemu? Not as far as I know. It's a bit awkward on the QEMU end because we really want to provide the guest a consistent memory map regardless of the host CPU. So at best we'd probably use it to say sorry, can't run on this CPU/host kernel. I think most arm64 servers are going to run with 64k pages. It seems like a major problem to have qemu not work on these systems. QEMU should already work fine on servers with 64K pages; you just need to have the host offset of the GICV within the 64K page and the guest offset of the GICC within the 64K page be the same (and at the moment both must also be zero, which I believe is true for all of them at the moment except possibly the AEM model; counterexamples welcome). Disclaimer: I haven't personally tested this, but on the other hand I don't think anybody's reported it as not working either. It doesn't work for me. Maybe I'm doing something wrong, but I can't see what. I am unique in that I'm running a gic-400 (gicv2m) on aarch64 hardware with 64k pages. I'm also unique in that my hardware maps each 4K gic entry to a 64K page (aliasing each 4k of gic 16 times in a 64K page, ie the gic virtual ic is at 0xe114 and 0xe1141000 and 0xe1142000, etc). This is inline with appendix F of the server base system architecture. This is inconvenient when the size is 0x2000 (8K). As a result all the offsets in the device tree entries are to the last 4K in the page so that an 8K read will read the last 4k from one page and the first 4k from the next and actually get 8k of the gic. gic: interrupt-controller@e1101000 { compatible = arm,gic-400; #interrupt-cells = 3; #address-cells = 0; interrupt-controller; msi-controller; reg = 0x0 0xe111 0 0x1000, /* gic dist */ 0x0 0xe112f000 0 0x2000, /* gic cpu */ 0x0 0xe114f000 0 0x2000, /* gic virtual ic*/ 0x0 0xe116f000 0 0x2000, /* gic virtual cpu*/ 0x0 0xe118 0 0x1000; /* gic msi */ interrupts = 1 8 0xf04; }; My concern here is that if userspace is going to look at 8k starting at the beginning of the page, guest offset 0 in your terminology, (say 0xe114) instead of starting at the last 4k of the page, offset 0xf000 (say 0xe114f000) it is going to get the second 4k wrong by reading 0xe1141000 instead of 0xe115. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 -next 0/9] CMA: generalize CMA reserved area management code
On Wed, 25 Jun 2014 14:33:56 +0200 Marek Szyprowski m.szyprow...@samsung.com wrote: That's probably easier. Marek, I'll merge these into -mm (and hence -next and git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) and shall hold them pending you review/ack/test/etc, OK? Ok. I've tested them and they work fine. I'm sorry that you had to wait for me for a few days. You can now add: Acked-and-tested-by: Marek Szyprowski m.szyprow...@samsung.com Thanks. I've also rebased my pending patches onto this set (I will send them soon). The question is now if you want to keep the discussed patches in your -mm tree, or should I take them to my -next branch. If you like to keep them, I assume you will also take the patches which depends on the discussed changes. Yup, that works. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: __schedule #DF splat
On Wed, Jun 25, 2014 at 05:32:28PM +0200, Borislav Petkov wrote: Hi guys, so I'm looking at this splat below when booting current linus+tip/master in a kvm guest. Initially I thought this is something related to the PARAVIRT gunk but it happens with and without it. Ok, here's a cleaner splat. I went and rebuilt qemu to latest master from today to rule out some breakage there but it still fires. Paolo, any ideas why would kvm+qemu trigger a #DF in the guest? I guess I should dust off my old kvm/qemu #DF debugging patch I had somewhere... I did try to avoid the invalid stack issue by doing: --- diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c index 1abcb50b48ae..dd8e0eec071e 100644 --- a/arch/x86/kernel/dumpstack_64.c +++ b/arch/x86/kernel/dumpstack_64.c @@ -286,7 +286,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs, } if (i ((i % STACKSLOTS_PER_LINE) == 0)) pr_cont(\n); - pr_cont( %016lx, *stack++); + pr_cont( %016lx, (((unsigned long)stack = 0x7fffUL) ? -1 : *stack++)); touch_nmi_watchdog(); } preempt_enable(); --- but that didn't work either - see second splat at the end. [2.704184] PANIC: double fault, error_code: 0x0 [2.708132] CPU: 1 PID: 959 Comm: vmmouse_detect Not tainted 3.15.0+ #7 [2.708132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [2.708132] task: 880079c78000 ti: 880079c74000 task.ti: 880079c74000 [2.708132] RIP: 0010:[8161130f] [8161130f] __schedule+0x28f/0xab0 [2.708132] RSP: 002b:7fff99e51100 EFLAGS: 00013082 [2.708132] RAX: 7b206000 RBX: 88007b526f80 RCX: 0028 [2.708132] RDX: 816112fe RSI: 0001 RDI: 88007c5d3c58 [2.708132] RBP: 7fff99e511f0 R08: R09: [2.708132] R10: 0001 R11: 0019 R12: 88007c5d3c40 [2.708132] R13: 880079c84e40 R14: R15: 880079c78000 [2.708132] FS: 7ff252c6d700() GS:88007c40() knlGS: [2.708132] CS: 0010 DS: ES: CR0: 80050033 [2.708132] CR2: 7fff99e510f8 CR3: 7b206000 CR4: 06e0 [2.708132] Stack: [2.708132] BUG: unable to handle kernel paging request at 7fff99e51100 [2.708132] IP: [81005bbc] show_stack_log_lvl+0x11c/0x1d0 [2.708132] PGD 7b20d067 PUD 0 [2.708132] Oops: [#1] PREEMPT SMP [2.708132] Modules linked in: [2.708132] CPU: 1 PID: 959 Comm: vmmouse_detect Not tainted 3.15.0+ #7 [2.708132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [2.708132] task: 880079c78000 ti: 880079c74000 task.ti: 880079c74000 [2.708132] RIP: 0010:[81005bbc] [81005bbc] show_stack_log_lvl+0x11c/0x1d0 [2.708132] RSP: 002b:88007c405e58 EFLAGS: 00013046 [2.708132] RAX: 7fff99e51108 RBX: RCX: 88007c403fc0 [2.708132] RDX: 7fff99e51100 RSI: 88007c40 RDI: 81846aba [2.708132] RBP: 88007c405ea8 R08: 88007c3fffc0 R09: [2.708132] R10: 7c40 R11: R12: 88007c405f58 [2.708132] R13: R14: 818136fc R15: [2.708132] FS: 7ff252c6d700() GS:88007c40() knlGS: [2.708132] CS: 0010 DS: ES: CR0: 80050033 [2.708132] CR2: 7fff99e51100 CR3: 7b206000 CR4: 06e0 [2.708132] Stack: [2.708132] 0008 88007c405eb8 88007c405e70 7b206000 [2.708132] 7fff99e51100 88007c405f58 7fff99e51100 0040 [2.708132] 0ac0 880079c78000 88007c405f08 81005d10 [2.708132] Call Trace: [2.708132] #DF [2.708132] [81005d10] show_regs+0xa0/0x280 [2.708132] [8103d143] df_debug+0x23/0x40 [2.708132] [81003b6d] do_double_fault+0x5d/0x80 [2.708132] [816194c7] double_fault+0x27/0x30 [2.708132] [816112fe] ? __schedule+0x27e/0xab0 [2.708132] [8161130f] ? __schedule+0x28f/0xab0 [2.708132] EOE [2.708132] UNK Code: 7a ff ff ff 0f 1f 00 e8 93 80 00 00 eb a5 48 39 ca 0f 84 8d 00 00 00 45 85 ff 0f 1f 44 00 00 74 06 41 f6 c7 03 74 55 48 8d 42 08 48 8b 32 48 c7 c7 f4 36 81 81 4c 89 45 b8 48 89 4d c0 41 ff c7 [2.708132] RIP [81005bbc] show_stack_log_lvl+0x11c/0x1d0 [2.708132] RSP 88007c405e58 [2.708132] CR2: 7fff99e51100 [2.708132] ---[ end trace 749cd02c31c493a0 ]--- [2.708132] note: vmmouse_detect[959] exited with
Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment
On 25 June 2014 20:34, Joel Schopp joel.sch...@amd.com wrote: It doesn't work for me. Maybe I'm doing something wrong, but I can't see what. I am unique in that I'm running a gic-400 (gicv2m) on aarch64 hardware with 64k pages. I'm also unique in that my hardware maps each 4K gic entry to a 64K page (aliasing each 4k of gic 16 times in a 64K page, ie the gic virtual ic is at 0xe114 and 0xe1141000 and 0xe1142000, etc). This is inline with appendix F of the server base system architecture. This is inconvenient when the size is 0x2000 (8K). As a result all the offsets in the device tree entries are to the last 4K in the page so that an 8K read will read the last 4k from one page and the first 4k from the next and actually get 8k of the gic. gic: interrupt-controller@e1101000 { compatible = arm,gic-400; #interrupt-cells = 3; #address-cells = 0; interrupt-controller; msi-controller; reg = 0x0 0xe111 0 0x1000, /* gic dist */ 0x0 0xe112f000 0 0x2000, /* gic cpu */ 0x0 0xe114f000 0 0x2000, /* gic virtual ic*/ 0x0 0xe116f000 0 0x2000, /* gic virtual cpu*/ 0x0 0xe118 0 0x1000; /* gic msi */ Right, this is the oddball case we don't yet support for 64K pages (though as you say it is a permitted configuration per the SBSA). interrupts = 1 8 0xf04; }; My concern here is that if userspace is going to look at 8k starting at the beginning of the page, guest offset 0 in your terminology, (say 0xe114) instead of starting at the last 4k of the page, offset 0xf000 (say 0xe114f000) it is going to get the second 4k wrong by reading 0xe1141000 instead of 0xe115. Userspace doesn't actually look at anything in the GICC. It just asks the kernel to put the guest GICC (ie the mapping of the host GICV) at a particular base address which happens to be a multiple of 64K. In this case if the host kernel is using 64K pages then the KVM kernel code ought to say sorry, can't do that when we tell it the base address. (That is, it's impossible to give the guest a VM where the GICC it sees is at a 64K boundary on your hardware and host kernel config, and hopefully we report that in a not totally opaque fashion.) If you hack QEMU's memory map for the virt board so instead of [VIRT_GIC_CPU] = { 0x801, 0x1 }, we have [VIRT_GIC_CPU] = { 0x801f000, 0x2000 }, does it work? If QEMU supported this VGIC_GRP_ADDR_OFFSET query then all it would do would be to change that offset and size. It would be good to know if there are other problems beyond that... (Conveniently, Linux guests won't currently try to look at the second 4K page of their GICC...) thanks -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Prepare for in-kernel VFIO DMA operations acceleration
On 06.06.14 02:20, Alexey Kardashevskiy wrote: On 06/05/2014 09:57 PM, Alexander Graf wrote: On 05.06.14 09:25, Alexey Kardashevskiy wrote: This reserves 2 capability numbers. This implements an extended version of KVM_CREATE_SPAPR_TCE_64 ioctl. Please advise how to proceed with these patches as I suspect that first two should go via Paolo's tree while the last one via Alex Graf's tree (correct?). They would just go via my tree, but only be actually allocated (read: mergable to qemu) when they hit Paolo's tree. In fact, I don't think it makes sense to split them off at all. So? Are these patches going anywhere? Thanks. So? Are you going to address the comments? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 9/9] arm64: KVM: vgic: deal with GIC sub-page alignment
On 06/25/2014 03:45 PM, Peter Maydell wrote: On 25 June 2014 20:34, Joel Schopp joel.sch...@amd.com wrote: It doesn't work for me. Maybe I'm doing something wrong, but I can't see what. I am unique in that I'm running a gic-400 (gicv2m) on aarch64 hardware with 64k pages. I'm also unique in that my hardware maps each 4K gic entry to a 64K page (aliasing each 4k of gic 16 times in a 64K page, ie the gic virtual ic is at 0xe114 and 0xe1141000 and 0xe1142000, etc). This is inline with appendix F of the server base system architecture. This is inconvenient when the size is 0x2000 (8K). As a result all the offsets in the device tree entries are to the last 4K in the page so that an 8K read will read the last 4k from one page and the first 4k from the next and actually get 8k of the gic. gic: interrupt-controller@e1101000 { compatible = arm,gic-400; #interrupt-cells = 3; #address-cells = 0; interrupt-controller; msi-controller; reg = 0x0 0xe111 0 0x1000, /* gic dist */ 0x0 0xe112f000 0 0x2000, /* gic cpu */ 0x0 0xe114f000 0 0x2000, /* gic virtual ic*/ 0x0 0xe116f000 0 0x2000, /* gic virtual cpu*/ 0x0 0xe118 0 0x1000; /* gic msi */ Right, this is the oddball case we don't yet support for 64K pages (though as you say it is a permitted configuration per the SBSA). At least I know I'm not going crazy. interrupts = 1 8 0xf04; }; My concern here is that if userspace is going to look at 8k starting at the beginning of the page, guest offset 0 in your terminology, (say 0xe114) instead of starting at the last 4k of the page, offset 0xf000 (say 0xe114f000) it is going to get the second 4k wrong by reading 0xe1141000 instead of 0xe115. Userspace doesn't actually look at anything in the GICC. It just asks the kernel to put the guest GICC (ie the mapping of the host GICV) at a particular base address which happens to be a multiple of 64K. In this case if the host kernel is using 64K pages then the KVM kernel code ought to say sorry, can't do that when we tell it the base address. (That is, it's impossible to give the guest a VM where the GICC it sees is at a 64K boundary on your hardware and host kernel config, and hopefully we report that in a not totally opaque fashion.) The errors I'm seeing look like: from qemu: error: kvm run failed Bad address Aborted (core dumped) from kvm: [ 7931.722965] kvm [1208]: Unsupported fault status: EC=0x20 DFCS=0x14 from kvmtool: from lkvm (kvmtool): Warning: /extra/rootfs/boot/Image is not a bzImage. Trying to load it as a flat binary... Info: Loaded kernel to 0x8008 (10212384 bytes) Info: Placing fdt at 0x8fe0 - 0x8fff Info: virtio-mmio.devices=0x200@0x1:36 KVM_RUN failed: Bad address If you hack QEMU's memory map for the virt board so instead of [VIRT_GIC_CPU] = { 0x801, 0x1 }, we have [VIRT_GIC_CPU] = { 0x801f000, 0x2000 }, No change in result, not to say that this wouldn't work if some other unknown problem were fixed. does it work? If QEMU supported this VGIC_GRP_ADDR_OFFSET query then all it would do would be to change that offset and size. It would be good to know if there are other problems beyond that... (Conveniently, Linux guests won't currently try to look at the second 4K page of their GICC...) That's handy. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] New PAPR hypercall plus individual hypercall enables, v3
On 02.06.14 03:02, Paul Mackerras wrote: This patch series adds a way for userspace to control which sPAPR hypercalls get handled by kernel handlers vs. being sent up to userspace, and then adds an implementation of a new hypercall, H_SET_MODE. This version updates the documentation in api.txt as requested. The series is against the queue branch of the kvm tree. I would like these patches to go into 3.16 if possible. Thanks, applied to kvm-ppc-queue. I don't think there's a bug fix in here that would warrant them in 3.16 still :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL 16/19] target-i386: block migration and savevm if invariant tsc is exposed
From: Marcelo Tosatti mtosa...@redhat.com Invariant TSC documentation mentions that invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is not the case if migration to a host with different TSC frequency is allowed, or if savevm is performed. So block migration/savevm. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reviewed-by: Eduardo Habkost ehabk...@redhat.com Signed-off-by: Eduardo Habkost ehabk...@redhat.com Reviewed-by: Juan Quintela quint...@redhat.com [AF+mtosatti: Updated error message] Signed-off-by: Andreas Färber afaer...@suse.de --- target-i386/cpu-qom.h | 2 +- target-i386/kvm.c | 15 +++ target-i386/machine.c | 2 +- 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h index ff3a5de..71a1b97 100644 --- a/target-i386/cpu-qom.h +++ b/target-i386/cpu-qom.h @@ -121,7 +121,7 @@ static inline X86CPU *x86_env_get_cpu(CPUX86State *env) #define ENV_OFFSET offsetof(X86CPU, env) #ifndef CONFIG_USER_ONLY -extern const struct VMStateDescription vmstate_x86_cpu; +extern struct VMStateDescription vmstate_x86_cpu; #endif /** diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 4bf0ac9..097fe11 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -35,6 +35,8 @@ #include exec/ioport.h #include asm/hyperv.h #include hw/pci/pci.h +#include migration/migration.h +#include qapi/qmp/qerror.h //#define DEBUG_KVM @@ -448,6 +450,8 @@ static bool hyperv_enabled(X86CPU *cpu) cpu-hyperv_relaxed_timing); } +static Error *invtsc_mig_blocker; + #define KVM_MAX_CPUID_ENTRIES 100 int kvm_arch_init_vcpu(CPUState *cs) @@ -705,6 +709,17 @@ int kvm_arch_init_vcpu(CPUState *cs) !!(c-ecx CPUID_EXT_SMX); } +c = cpuid_find_entry(cpuid_data.cpuid, 0x8007, 0); +if (c (c-edx 18) invtsc_mig_blocker == NULL) { +/* for migration */ +error_setg(invtsc_mig_blocker, + State blocked by non-migratable CPU device +(invtsc flag)); +migrate_add_blocker(invtsc_mig_blocker); +/* for savevm */ +vmstate_x86_cpu.unmigratable = 1; +} + cpuid_data.cpuid.padding = 0; r = kvm_vcpu_ioctl(cs, KVM_SET_CPUID2, cpuid_data); if (r) { diff --git a/target-i386/machine.c b/target-i386/machine.c index b8dcd2f..16d2f6a 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -603,7 +603,7 @@ static const VMStateDescription vmstate_msr_hyperv_time = { } }; -const VMStateDescription vmstate_x86_cpu = { +VMStateDescription vmstate_x86_cpu = { .name = cpu, .version_id = 12, .minimum_version_id = 3, -- 1.8.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] New PAPR hypercall plus individual hypercall enables, v3
On Wed, Jun 25, 2014 at 11:46:10PM +0200, Alexander Graf wrote: On 02.06.14 03:02, Paul Mackerras wrote: This patch series adds a way for userspace to control which sPAPR hypercalls get handled by kernel handlers vs. being sent up to userspace, and then adds an implementation of a new hypercall, H_SET_MODE. This version updates the documentation in api.txt as requested. The series is against the queue branch of the kvm tree. I would like these patches to go into 3.16 if possible. Thanks, applied to kvm-ppc-queue. I don't think there's a bug fix in here that would warrant them in 3.16 still :). I agree. It would be good to get a stable assignment of the number for KVM_CAP_PPC_ENABLE_HCALL so we can start getting the qemu patches upstream, though. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2 v2] ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping
From: Kim Phillips kim.phill...@linaro.org A userspace process can map device MMIO memory via VFIO or /dev/mem, e.g., for platform device passthrough support in QEMU. During early development, we found the PAGE_S2 memory type being used for MMIO mappings. This patch corrects that by using the more strongly ordered memory type for device MMIO mappings: PAGE_S2_DEVICE. Signed-off-by: Kim Phillips kim.phill...@linaro.org Acked-by: Christoffer Dall christoffer.d...@linaro.org --- Hi, here's a v2, upon request: - rebased onto today's mainline ToT - mmu.o-build tested only (ToT build doesn't complete) - made commit text less terse - added Christoffer's ack Cheers, Kim arch/arm/kvm/mmu.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 16f8049..69af021 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -748,6 +748,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_mmu_memory_cache *memcache = vcpu-arch.mmu_page_cache; struct vm_area_struct *vma; pfn_t pfn; + pgprot_t mem_type = PAGE_S2; write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu)); if (fault_status == FSC_PERM !write_fault) { @@ -798,6 +799,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (is_error_pfn(pfn)) return -EFAULT; + if (kvm_is_mmio_pfn(pfn)) + mem_type = PAGE_S2_DEVICE; + spin_lock(kvm-mmu_lock); if (mmu_notifier_retry(kvm, mmu_seq)) goto out_unlock; @@ -805,7 +809,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, hugetlb = transparent_hugepage_adjust(pfn, fault_ipa); if (hugetlb) { - pmd_t new_pmd = pfn_pmd(pfn, PAGE_S2); + pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); if (writable) { kvm_set_s2pmd_writable(new_pmd); @@ -814,13 +818,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, coherent_cache_guest_page(vcpu, hva PMD_MASK, PMD_SIZE); ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, new_pmd); } else { - pte_t new_pte = pfn_pte(pfn, PAGE_S2); + pte_t new_pte = pfn_pte(pfn, mem_type); if (writable) { kvm_set_s2pte_writable(new_pte); kvm_set_pfn_dirty(pfn); } coherent_cache_guest_page(vcpu, hva, PAGE_SIZE); - ret = stage2_set_pte(kvm, memcache, fault_ipa, new_pte, false); + ret = stage2_set_pte(kvm, memcache, fault_ipa, new_pte, +mem_type == PAGE_S2_DEVICE); } -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] driver core: platform: add device binding path 'driver_override'
On Mon, 2 Jun 2014 21:28:42 -0700 Greg KH gre...@linuxfoundation.org wrote: On Mon, Jun 02, 2014 at 07:42:58PM -0500, Kim Phillips wrote: You are the platform driver core maintainer: can you apply this to your driver-core tree now? Yes, I will after this merge window ends, it's too late for 3.16-rc1 with the window opening up a week early, sorry. How about now? fwiw, I just checked: it still applies cleanly. Thanks, Kim -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Bugfix: powerpc/eeh: Create eeh sysfs entry in post_init()
On Wed, Jun 25, 2014 at 03:33:12PM +1000, Gavin Shan wrote: On Tue, Jun 24, 2014 at 11:32:07PM -0400, Mike Qiu wrote: [ cc Richard ] Eeh sysfs entry created must be after EEH_ENABLED been set in eeh_subsystem_flags. In PowerNV platform, it try to create sysfs entry before EEH_ENABLED been set, when boot up. So nothing will be created for eeh in sysfs. Could you please make the commit log more clear? :-) I guess the issue is introduced by commit 2213fb1 ( powerpc/eeh: Skip eeh sysfs when eeh is disabled). The commit checks EEH is enabled while creating PCI device EEH sysfs files. If not, the sysfs files won't be created. That's to avoid warning reported during PCI hotplug. The problem you're reporting (if I understand completely): You don't see the sysfs files after the system boots up. If it's the case, you probably need following changes in arch/powerpc/platforms/powernv/pci.c::pnv_pci_ioda_fixup(). Could you have a try with it? #ifdef CONFIG_EEH eeh_probe_mode_set(EEH_PROBE_MODE_DEV); - eeh_addr_cache_build(); eeh_init(); + eeh_addr_cache_build(); #endif I think this is a more proper fix. BTW, I have one confusion in this mode set. eeh_init() - eeh_ops-dev_probe() - powernv_eeh_dev_probe() - eeh_set_enable(true) - here the eeh is marked enabled We can see this flag would be set for each pci_dev. So is it possible to make this set only once? Eventually PowerNV/pSeries have same function call sequence: - Set EEH probe mode - Doing probe (with device node or PCI device) - Build address cache. Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/eeh-ioda.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 8ad0c5b..5f95581 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -136,6 +136,9 @@ static int ioda_eeh_post_init(struct pci_controller *hose) struct pnv_phb *phb = hose-private_data; int ret; + /* Creat sysfs after EEH_ENABLED been set */ + eeh_add_sysfs_files(hose-bus); + /* Register OPAL event notifier */ if (!ioda_eeh_nb_init) { ret = opal_notifier_register(ioda_eeh_nb); Thanks, Gavin -- Richard Yang Help you, Help me -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Bugfix: powerpc/eeh: Create eeh sysfs entry in post_init()
On Wed, Jun 25, 2014 at 02:23:53PM +0800, Wei Yang wrote: On Wed, Jun 25, 2014 at 03:33:12PM +1000, Gavin Shan wrote: On Tue, Jun 24, 2014 at 11:32:07PM -0400, Mike Qiu wrote: [ cc Richard ] Eeh sysfs entry created must be after EEH_ENABLED been set in eeh_subsystem_flags. In PowerNV platform, it try to create sysfs entry before EEH_ENABLED been set, when boot up. So nothing will be created for eeh in sysfs. Could you please make the commit log more clear? :-) I guess the issue is introduced by commit 2213fb1 ( powerpc/eeh: Skip eeh sysfs when eeh is disabled). The commit checks EEH is enabled while creating PCI device EEH sysfs files. If not, the sysfs files won't be created. That's to avoid warning reported during PCI hotplug. The problem you're reporting (if I understand completely): You don't see the sysfs files after the system boots up. If it's the case, you probably need following changes in arch/powerpc/platforms/powernv/pci.c::pnv_pci_ioda_fixup(). Could you have a try with it? #ifdef CONFIG_EEH eeh_probe_mode_set(EEH_PROBE_MODE_DEV); - eeh_addr_cache_build(); eeh_init(); + eeh_addr_cache_build(); #endif I think this is a more proper fix. BTW, I have one confusion in this mode set. eeh_init() - eeh_ops-dev_probe() - powernv_eeh_dev_probe() - eeh_set_enable(true) - here the eeh is marked enabled We can see this flag would be set for each pci_dev. So is it possible to make this set only once? It shouldn't be a problem because there might not have PCI devices supporting EEH in the guest. All PCI devices are emulated. Eventually PowerNV/pSeries have same function call sequence: - Set EEH probe mode - Doing probe (with device node or PCI device) - Build address cache. Signed-off-by: Mike Qiu qiud...@linux.vnet.ibm.com --- arch/powerpc/platforms/powernv/eeh-ioda.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 8ad0c5b..5f95581 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -136,6 +136,9 @@ static int ioda_eeh_post_init(struct pci_controller *hose) struct pnv_phb *phb = hose-private_data; int ret; +/* Creat sysfs after EEH_ENABLED been set */ +eeh_add_sysfs_files(hose-bus); + /* Register OPAL event notifier */ if (!ioda_eeh_nb_init) { ret = opal_notifier_register(ioda_eeh_nb); Thanks, Gavin -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
On 15.06.14 20:47, Aneesh Kumar K.V wrote: With guests supporting Multiple page size per segment (MPSS), hpte_page_size returns the actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB. Without this patch a hpte lookup can fail since we are comparing wrong page size in kvmppc_hv_find_lock_hpte. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Thanks, applied to for-3.16. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 -next 0/9] CMA: generalize CMA reserved area management code
Hello, On 2014-06-18 22:51, Andrew Morton wrote: On Tue, 17 Jun 2014 10:25:07 +0900 Joonsoo Kim iamjoonsoo@lge.com wrote: v2: - Although this patchset looks very different with v1, the end result, that is, mm/cma.c is same with v1's one. So I carry Ack to patch 6-7. This patchset is based on linux-next 20140610. Thanks for taking care of this. I will test it with my setup and if everything goes well, I will take it to my -next tree. If any branch is required for anyone to continue his works on top of those patches, let me know, I will also prepare it. Hello, I'm glad to hear that. :) But, there is one concern. As you already know, I am preparing further patches (Aggressively allocate the pages on CMA reserved memory). It may be highly related to MM branch and also slightly depends on this CMA changes. In this case, what is the best strategy to merge this patchset? IMHO, Anrew's tree is more appropriate branch. If there is no issue in this case, I am willing to develope further patches based on your tree. That's probably easier. Marek, I'll merge these into -mm (and hence -next and git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) and shall hold them pending you review/ack/test/etc, OK? Ok. I've tested them and they work fine. I'm sorry that you had to wait for me for a few days. You can now add: Acked-and-tested-by: Marek Szyprowski m.szyprow...@samsung.com I've also rebased my pending patches onto this set (I will send them soon). The question is now if you want to keep the discussed patches in your -mm tree, or should I take them to my -next branch. If you like to keep them, I assume you will also take the patches which depends on the discussed changes. Best regards -- Marek Szyprowski, PhD Samsung RD Institute Poland -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 -next 0/9] CMA: generalize CMA reserved area management code
On Wed, 25 Jun 2014 14:33:56 +0200 Marek Szyprowski m.szyprow...@samsung.com wrote: That's probably easier. Marek, I'll merge these into -mm (and hence -next and git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git) and shall hold them pending you review/ack/test/etc, OK? Ok. I've tested them and they work fine. I'm sorry that you had to wait for me for a few days. You can now add: Acked-and-tested-by: Marek Szyprowski m.szyprow...@samsung.com Thanks. I've also rebased my pending patches onto this set (I will send them soon). The question is now if you want to keep the discussed patches in your -mm tree, or should I take them to my -next branch. If you like to keep them, I assume you will also take the patches which depends on the discussed changes. Yup, that works. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Prepare for in-kernel VFIO DMA operations acceleration
On 06.06.14 02:20, Alexey Kardashevskiy wrote: On 06/05/2014 09:57 PM, Alexander Graf wrote: On 05.06.14 09:25, Alexey Kardashevskiy wrote: This reserves 2 capability numbers. This implements an extended version of KVM_CREATE_SPAPR_TCE_64 ioctl. Please advise how to proceed with these patches as I suspect that first two should go via Paolo's tree while the last one via Alex Graf's tree (correct?). They would just go via my tree, but only be actually allocated (read: mergable to qemu) when they hit Paolo's tree. In fact, I don't think it makes sense to split them off at all. So? Are these patches going anywhere? Thanks. So? Are you going to address the comments? Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] New PAPR hypercall plus individual hypercall enables, v3
On 02.06.14 03:02, Paul Mackerras wrote: This patch series adds a way for userspace to control which sPAPR hypercalls get handled by kernel handlers vs. being sent up to userspace, and then adds an implementation of a new hypercall, H_SET_MODE. This version updates the documentation in api.txt as requested. The series is against the queue branch of the kvm tree. I would like these patches to go into 3.16 if possible. Thanks, applied to kvm-ppc-queue. I don't think there's a bug fix in here that would warrant them in 3.16 still :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] New PAPR hypercall plus individual hypercall enables, v3
On Wed, Jun 25, 2014 at 11:46:10PM +0200, Alexander Graf wrote: On 02.06.14 03:02, Paul Mackerras wrote: This patch series adds a way for userspace to control which sPAPR hypercalls get handled by kernel handlers vs. being sent up to userspace, and then adds an implementation of a new hypercall, H_SET_MODE. This version updates the documentation in api.txt as requested. The series is against the queue branch of the kvm tree. I would like these patches to go into 3.16 if possible. Thanks, applied to kvm-ppc-queue. I don't think there's a bug fix in here that would warrant them in 3.16 still :). I agree. It would be good to get a stable assignment of the number for KVM_CAP_PPC_ENABLE_HCALL so we can start getting the qemu patches upstream, though. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Prepare for in-kernel VFIO DMA operations acceleration
On 06/26/2014 07:12 AM, Alexander Graf wrote: On 06.06.14 02:20, Alexey Kardashevskiy wrote: On 06/05/2014 09:57 PM, Alexander Graf wrote: On 05.06.14 09:25, Alexey Kardashevskiy wrote: This reserves 2 capability numbers. This implements an extended version of KVM_CREATE_SPAPR_TCE_64 ioctl. Please advise how to proceed with these patches as I suspect that first two should go via Paolo's tree while the last one via Alex Graf's tree (correct?). They would just go via my tree, but only be actually allocated (read: mergable to qemu) when they hit Paolo's tree. In fact, I don't think it makes sense to split them off at all. So? Are these patches going anywhere? Thanks. So? Are you going to address the comments? Sorry, I cannot find here anything to fix. Ben asked some questions, I answered and there were no objections. What do I miss this time?... -- Alexey -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
Gavin Shan gws...@linux.vnet.ibm.com writes: On Mon, Jun 23, 2014 at 04:36:44PM +1000, Michael Neuling wrote: On Mon, 2014-06-23 at 12:14 +1000, Gavin Shan wrote: The patch implements one OPAL firmware sysfs file to support PCI error injection: /sys/firmware/opal/errinjct, which will be used like the way described as follows. According to PAPR spec, there are 3 RTAS calls related to error injection: ibm,open-errinjct: allocate token prior to doing error injection. ibm,close-errinjct: release the token allocated from ibm,open-errinjct. ibm,errinjct: do error injection. Sysfs file /sys/firmware/opal/errinjct accepts strings that have fixed format ei_token For now, we only support 32-bits and 64-bits PCI error injection and they should have following strings written to /sys/firmware/opal/errinjct as follows. We don't have corresponding sysfs files for ibm,open-errinjct and ibm,close-errinjct, which means that we rely on userland to maintain the token by itself. This sounds cool. Can you document the sysfs interface in Documentation/powerpc? Yeah, Documentation/powerpc/eeh-pci-error-recovery.txt needs update as Ben suggested. It's something in my list :-) It should probably also/instead be in Documentation/ABI/(testing|stable)/sysfs-firmware-opal-errinjct as this seems to be where sysfs bits get documented. Also, considering that we're specifically looking at PCI error injection, should the sysfs name be /sys/firmware/opal/pci-error-inject instead? -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
Gavin Shan gws...@linux.vnet.ibm.com writes: +static struct kobj_attribute errinjct_attr = + __ATTR(errinjct, 0600, NULL, errinjct_store); May also be good to have a read method that either lists current injected errors? I guess it depends on if they're one time errors or persistent errors too. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html