buildbot failure in kvm on next-i386
The Buildbot has detected a new failure on builder next-i386 while building kvm. Full details are available at: http://buildbot.b1-systems.de/kvm/builders/next-i386/builds/1260 Buildbot URL: http://buildbot.b1-systems.de/kvm/ Buildslave for this Build: b1_kvm_1 Build Reason: The Nightly scheduler named 'nightly_next' triggered this build Build Source Stamp: [branch next] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf
Re: Using virtio for inter-VM communication
On 2014-06-13 02:47, Rusty Russell wrote: Jan Kiszka jan.kis...@siemens.com writes: On 2014-06-12 04:27, Rusty Russell wrote: Henning Schild henning.sch...@siemens.com writes: It was also never implemented, and remains a thought experiment. However, implementing it in lguest should be fairly easy. The reason why a trusted helper, i.e. additional logic in the hypervisor, is not our favorite solution is that we'd like to keep the hypervisor as small as possible. I wouldn't exclude such an approach categorically, but we have to weigh the costs (lines of code, additional hypervisor interface) carefully against the gain (existing specifications and guest driver infrastructure). Reasonable, but I think you'll find it is about the minimal implementation in practice. Unfortunately, I don't have time during the next 6 months to implement it myself :( Back to VIRTIO_F_RING_SHMEM_ADDR (which you once brought up in an MCA working group discussion): What speaks against introducing an alternative encoding of addresses inside virtio data structures? The idea of this flag was to replace guest-physical addresses with offsets into a shared memory region associated with or part of a virtio device. We would also need a way of defining the shared memory region. But that's not the problem. If such a feature is not accepted by the guest? How to you fall back? Depends on the hypervisor and its scope, but it should be quite straightforward: full-featured ones like KVM could fall back to slow copying, specialized ones like Jailhouse would clear FEATURES_OK if the guest driver does not accept it (because there would be no ring walking or copying code in Jailhouse), thus refuse the activate the device. That would be absolutely fine for application domains of specialized hypervisors (often embedded, customized guests etc.). The shared memory regions could be exposed as a BARs (PCI) or additional address ranges (device tree) and addressed in the redefined guest address fields via some region index and offset. We don't add features which unmake the standard. That would preserve zero-copy capabilities (as long as you can work against the shared mem directly, e.g. doing DMA from a physical NIC or storage device into it) and keep the hypervisor out of the loop. This seems ill thought out. How will you program a NIC via the virtio protocol without a hypervisor? And how will you make it safe? You'll need an IOMMU. But if you have an IOMMU you don't need shared memory. Scenarios behind this are things like driver VMs: You pass through the physical hardware to a driver guest that talks to the hardware and relays data via one or more virtual channels to other VMs. This confines a certain set of security and stability risks to the driver VM. Is it too invasive to existing infrastructure or does it have some other pitfalls? You'll have to convince every vendor to implement your addition to the standard. Which is easier than inventing a completely new system, but it's not quite virtio. It would be an optional addition, a feature all three sides (host and the communicating guests) would have to agree on. I think we would only have to agree on extending the spec to enable this - after demonstrating it via an implementation, of course. Thanks, Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 19 +-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 34422be566ce..3d0f3fb9c6b6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, return rb; } -static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l, +bool is_base_size) { + int size, a_psize; /* Look at the 8 bit LP value */ unsigned int lp = (l LP_SHIFT) ((1 LP_BITS) - 1); @@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) continue; a_psize = __hpte_actual_psize(lp, size); - if (a_psize != -1) + if (a_psize != -1) { + if (is_base_size) + return 1ul mmu_psize_defs[size].shift; return 1ul mmu_psize_defs[a_psize].shift; + } } } return 0; } +static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 0); +} + +static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 1); +} + static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) { return ((ptel HPTE_R_RPN) ~(psize - 1)) PAGE_SHIFT; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index f53cf2eae36a..7ff45ed27c65 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, goto out; } if (!rma_setup is_vrma_hpte(v)) { - unsigned long psize = hpte_page_size(v, r); + unsigned long psize = hpte_base_page_size(v, r); unsigned long senc = slb_pgsize_encoding(psize); unsigned long lpcr; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 87624ab5ba82..c6aca75b8376 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -839,7 +839,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v, * to check against the actual page size. */ if ((v valid) (v mask) == val - hpte_page_size(v, r) == (1ul pshift)) + hpte_base_page_size(v, r) == (1ul pshift)) /* Return with the HPTE still locked */ return (hash 3) + (i 1); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Using virtio for inter-VM communication
Il 13/06/2014 08:23, Jan Kiszka ha scritto: That would preserve zero-copy capabilities (as long as you can work against the shared mem directly, e.g. doing DMA from a physical NIC or storage device into it) and keep the hypervisor out of the loop. This seems ill thought out. How will you program a NIC via the virtio protocol without a hypervisor? And how will you make it safe? You'll need an IOMMU. But if you have an IOMMU you don't need shared memory. Scenarios behind this are things like driver VMs: You pass through the physical hardware to a driver guest that talks to the hardware and relays data via one or more virtual channels to other VMs. This confines a certain set of security and stability risks to the driver VM. I think implementing Xen hypercalls in jailhouse for grant table and event channels would actually make a lot of sense. The Xen implementation is 2.5kLOC and I think it should be possible to compact it noticeably, especially if you limit yourself to 64-bit guests. It should also be almost enough to run Xen PVH guests as jailhouse partitions. If later Xen starts to support virtio, you will get that for free. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Why I advise against using ivshmem
Some dropped quoted text restored. Vincent JARDIN vincent.jar...@6wind.com writes: Markus, see inline (I am not on all mailing list, please, keep the cc list). Sure! The reasons for my dislike range from practical to philosophical. My practical concerns include: 1. ivshmem code needs work, but has no maintainer See David's contributions: http://patchwork.ozlabs.org/patch/358750/ We're grateful for David's patch for qemu-char.c, but this isn't ivshmem maintenance, yet. - Error handling is generally poor. For instance, device_add ivshmem kills your guest instantly. - More subjectively, I don't trust the code to be robust against abuse by our own guest, or the other guests sharing the memory. Convincing me would take a code audit. - MAINTAINERS doesn't cover ivshmem.c. - The last non-trivial commit that isn't obviously part of some tree-wide infrastructure or cleanup work is from September 2012 (commit c08ba66). 2. There is no libvirt support One can use qemu without libvivrt. You asked me for my reasons for disliking ivshmem. This is one. Sure, I can drink my water through a straw while standing on one foot, but that doesn't mean I have to like it. And me not liking it doesn't mean the next guy shouldn't like it. To each their own. 3. Out-of-tree server program required for full functionality Interrupts require a shared memory server running in the host (see docs/specs/ivshmem_device_spec.txt). It doesn't tell where to find one. The initial commit 6cbf4c8 points to www.gitorious.org/nahanni. That repository's last commit is from September 2012. He's dead, Jim. ivshmem_device_spec.txt is silent on what the server is supposed to do. We have the source code, it provides the documentation to write our own better server program. Good for you. Not good enough for the QEMU community. QEMU features requiring on out-of-tree software to be useful are fine, as long as said out-of-tree software is readily available to QEMU developers and users. Free software with a community around it and packaged in major distros qualifies. If you haven't got that, talk to us to find out whether what you've got qualifies, and if not, what you'd have to do to make it qualify. Back when we accepted ivshmem, the out-of-tree parts it needs were well below the community packaged bar. But folks interested in it talked to us, and the fact that it's in shows that QEMU maintainers decided what they had then was enough. Unfortunately, we now have considerably less: Nahanni appears to be dead. An apparently dead git repository you can study is not enough. The fact that you hold an improved reimplementation privately is immaterial. So is the (plausible) claim that others could also create a reimplementation. If this server requires privileges: I don't trust it without an audit. 4. Out-of-tree kernel uio driver required No, it is optional. Good to know. Would you be willing to send a patch to ivshmem_device_spec.txt clarifying that? The device is intended to be used with the provided UIO driver (ivshmem_device_spec.txt again). As far as I can tell, the provided UIO driver is the one in the dead Nahanni repo. By now, you should be expecting this: I don't trust that one either. These concerns are all fixable, but it'll take serious work, and time. Something like: * Find a maintainer for the device model I guess, we can find it into the DPDK.org community. * Review and fix its code * Get the required kernel module upstream which module? uio, it is not required. * Get all the required parts outside QEMU packaged in major distros, or absorbed into QEMU Redhat did disable it. why? it is there in QEMU. Up to now, I've been wearing my QEMU hat. Let me exchange it for my Red one for a bit. We (Red Hat) don't just package ship metric tons of random free software. We package ship useful free software we can support for many, many years. Sometimes, we find that we have to focus serious development resources on making something useful supportable (Paolo mentioned qcow2). We obviously can't focus on everything, though. Anyway, ivshmem didn't make the cut for RHEL-7.0. Sorry if that inconveniences you. To get it into RHEL, you need to show it's both useful and supportable. Building a community around it would go a long way towards that. If you want to discuss this in more detail with us, you may want to try communication channels provided by your RHEL subscription in addition to the QEMU development mailing list. Don't be shy, you're paying for it! As always, I'm not speaking for myself, not my employer. Okay, wearing my QEMU hat again. In short, create a viable community around ivshmem, either within the QEMU community, or separately but cooperating. At least, DPDK.org community is a community using it. Using something isn't the same as maintaining something. But it's a necessary
Re: [Qemu-devel] Why I advise against using ivshmem
(+merging with Paolo's email because of overlaps) see inline (I am not on all mailing list, please, keep the cc list). 1. ivshmem code needs work, but has no maintainer See David's contributions: http://patchwork.ozlabs.org/patch/358750/ We're grateful for David's patch for qemu-char.c, but this isn't ivshmem maintenance, yet. others can come (doc), see below. 2. There is no libvirt support One can use qemu without libvivrt. You asked me for my reasons for disliking ivshmem. This is one. Sure, I can drink my water through a straw while standing on one foot, but that doesn't mean I have to like it. And me not liking it doesn't mean the next guy shouldn't like it. To each their own. I like using qemu without libvirt, libvirt is not part of qemu. Let's avoid trolling about it ;) Back when we accepted ivshmem, the out-of-tree parts it needs were well below the community packaged bar. But folks interested in it talked to us, and the fact that it's in shows that QEMU maintainers decided what they had then was enough. Unfortunately, we now have considerably less: Nahanni appears to be dead. agree and to bad it is dead. We should let Nahanni dead since ivshmem is a QEMU topic now, see below. Does it make sense? An apparently dead git repository you can study is not enough. The fact that you hold an improved reimplementation privately is immaterial. So is the (plausible) claim that others could also create a reimplementation. Got the point. What's about a patch to docs/specs/ivshmem_device_spec.txt that improves it? I can make qemu's ivshmem better: - keep explaining memnic for instance, - explain how to write other ivshmem. does it help? 4. Out-of-tree kernel uio driver required No, it is optional. Good to know. Would you be willing to send a patch to ivshmem_device_spec.txt clarifying that? got the point, yes, * Get all the required parts outside QEMU packaged in major distros, or absorbed into QEMU Redhat did disable it. why? it is there in QEMU. Up to now, I've been wearing my QEMU hat. Let me exchange it for my Red one for a bit. We (Red Hat) don't just package ship metric tons of random free software. We package ship useful free software we can support for many, many years. Sometimes, we find that we have to focus serious development resources on making something useful supportable (Paolo mentioned qcow2). We obviously can't focus on everything, though. Good open technology should rule. ivshmem has use cases. And I go agree with you, it is like the phoenix, it has to be re-explained/documented to be back to life. I was not aware that the QEMU community was missing ivshmem contributors (my bad I did not check MAINTAINERS). Anyway, ivshmem didn't make the cut for RHEL-7.0. Sorry if that inconveniences you. To get it into RHEL, you need to show it's both useful and supportable. Building a community around it would go a long way towards that. understood. If you want to discuss this in more detail with us, you may want to try communication channels provided by your RHEL subscription in addition to the QEMU development mailing list. Don't be shy, you're paying for it! done. I was focusing on DPDK.org and ignorant of QEMU's status, thinking Redhat was covering it. How to know which part of an opensource software are and are not included into Redhat. Sales are ignorant about it ;). Redhat randomly disables some files at compilation (for some good reasons I guess, but not public rationals or I am missing something). Feel free to open this PR to anyone: https://bugzilla.redhat.com/show_bug.cgi?id=1088332 In short, create a viable community around ivshmem, either within the QEMU community, or separately but cooperating. At least, DPDK.org community is a community using it. Using something isn't the same as maintaining something. But it's a necessary first step. understood, after David's patch, documentation will come. (now Paolo's email since there were some overlaps) Markus especially referred to parts *outside* QEMU: the server, the uio driver, etc. These out-of-tree, non-packaged parts of ivshmem are one of the reasons why Red Hat has disabled ivshmem in RHEL7. You made the right choices, these out-of-tree packages are not required. You can use QEMU's ivshmem without any of the out-of-tree packages. The out-of-tree packages are just some examples of using ivshmem. He also listed many others. Basically for parts of QEMU that are not of high quality, we either fix them (this is for example what we did for qcow2) or disable them. Not just ivshmem suffered this fate, for example many network cards, sound cards, SCSI storage adapters. I and David (cc) are working on making it better based on the issues that are found. Now, vhost-user is in the process of being merged for 2.1. Compared to the DPDK solution: now, you cannot compare vhost-user to DPDK/ivshmem; both should exsit because
Re: [Qemu-devel] Why I advise against using ivshmem
Nahanni's poor current development coupled with virtIO's promising expansion was what encouraged us to explore virtIO-serial [1] for inter-virtual machine communication. Though virtIO-serial as it is isn't helpful for inter-VM communication, some work is needed for this purpose and this is exactly what we (I and two of my fellow classmates) accomplished. We haven't published it yet since we do need to polish yet for upstreaming it and are planning do it in near future. [1]: http://fedoraproject.org/wiki/Features/VirtioSerial On Fri, Jun 13, 2014 at 2:56 PM, Vincent JARDIN vincent.jar...@6wind.com wrote: (+merging with Paolo's email because of overlaps) see inline (I am not on all mailing list, please, keep the cc list). 1. ivshmem code needs work, but has no maintainer See David's contributions: http://patchwork.ozlabs.org/patch/358750/ We're grateful for David's patch for qemu-char.c, but this isn't ivshmem maintenance, yet. others can come (doc), see below. 2. There is no libvirt support One can use qemu without libvivrt. You asked me for my reasons for disliking ivshmem. This is one. Sure, I can drink my water through a straw while standing on one foot, but that doesn't mean I have to like it. And me not liking it doesn't mean the next guy shouldn't like it. To each their own. I like using qemu without libvirt, libvirt is not part of qemu. Let's avoid trolling about it ;) Back when we accepted ivshmem, the out-of-tree parts it needs were well below the community packaged bar. But folks interested in it talked to us, and the fact that it's in shows that QEMU maintainers decided what they had then was enough. Unfortunately, we now have considerably less: Nahanni appears to be dead. agree and to bad it is dead. We should let Nahanni dead since ivshmem is a QEMU topic now, see below. Does it make sense? An apparently dead git repository you can study is not enough. The fact that you hold an improved reimplementation privately is immaterial. So is the (plausible) claim that others could also create a reimplementation. Got the point. What's about a patch to docs/specs/ivshmem_device_spec.txt that improves it? I can make qemu's ivshmem better: - keep explaining memnic for instance, - explain how to write other ivshmem. does it help? 4. Out-of-tree kernel uio driver required No, it is optional. Good to know. Would you be willing to send a patch to ivshmem_device_spec.txt clarifying that? got the point, yes, * Get all the required parts outside QEMU packaged in major distros, or absorbed into QEMU Redhat did disable it. why? it is there in QEMU. Up to now, I've been wearing my QEMU hat. Let me exchange it for my Red one for a bit. We (Red Hat) don't just package ship metric tons of random free software. We package ship useful free software we can support for many, many years. Sometimes, we find that we have to focus serious development resources on making something useful supportable (Paolo mentioned qcow2). We obviously can't focus on everything, though. Good open technology should rule. ivshmem has use cases. And I go agree with you, it is like the phoenix, it has to be re-explained/documented to be back to life. I was not aware that the QEMU community was missing ivshmem contributors (my bad I did not check MAINTAINERS). Anyway, ivshmem didn't make the cut for RHEL-7.0. Sorry if that inconveniences you. To get it into RHEL, you need to show it's both useful and supportable. Building a community around it would go a long way towards that. understood. If you want to discuss this in more detail with us, you may want to try communication channels provided by your RHEL subscription in addition to the QEMU development mailing list. Don't be shy, you're paying for it! done. I was focusing on DPDK.org and ignorant of QEMU's status, thinking Redhat was covering it. How to know which part of an opensource software are and are not included into Redhat. Sales are ignorant about it ;). Redhat randomly disables some files at compilation (for some good reasons I guess, but not public rationals or I am missing something). Feel free to open this PR to anyone: https://bugzilla.redhat.com/show_bug.cgi?id=1088332 In short, create a viable community around ivshmem, either within the QEMU community, or separately but cooperating. At least, DPDK.org community is a community using it. Using something isn't the same as maintaining something. But it's a necessary first step. understood, after David's patch, documentation will come. (now Paolo's email since there were some overlaps) Markus especially referred to parts *outside* QEMU: the server, the uio driver, etc. These out-of-tree, non-packaged parts of ivshmem are one of the reasons why Red Hat has disabled ivshmem in RHEL7. You made the right choices, these out-of-tree packages are not
mips: Accidental removal of paravirt_cpus_done?
Hi Ralf, It seems you accidentally assimilated an (unwanted?) kvm change in my patch: On Tue, Jun 10, 2014 at 3:31 AM, Linux Kernel Mailing List linux-ker...@vger.kernel.org wrote: Gitweb: http://git.kernel.org/linus/;a=commit;h=5e888e8fb55cf3da870b85d04fef6bfe0d57c974 Commit: 5e888e8fb55cf3da870b85d04fef6bfe0d57c974 Parent: a1eace4ba53546bc7a6670b1c380cd5c1287ae8b Refname:refs/heads/master Author: Geert Uytterhoeven ge...@linux-m68k.org AuthorDate: Tue Apr 22 12:51:13 2014 +0200 Committer: Ralf Baechle r...@linux-mips.org CommitDate: Mon Jun 2 16:34:41 2014 +0200 mips: Update the email address of Geert Uytterhoeven All my Sony addresses are defunct. Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org Cc: linux-m...@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/6817/ Signed-off-by: Ralf Baechle r...@linux-mips.org --- arch/mips/include/asm/nile4.h |2 +- arch/mips/paravirt/paravirt-smp.c |5 - arch/mips/pci/ops-pmcmsp.c|2 +- arch/mips/pci/ops-tx3927.c|2 +- 4 files changed, 3 insertions(+), 8 deletions(-) diff --git a/arch/mips/paravirt/paravirt-smp.c b/arch/mips/paravirt/paravirt-smp.c index 73a123e..0164b0c 100644 --- a/arch/mips/paravirt/paravirt-smp.c +++ b/arch/mips/paravirt/paravirt-smp.c @@ -99,10 +99,6 @@ static void paravirt_smp_finish(void) local_irq_enable(); } -static void paravirt_cpus_done(void) -{ -} - static void paravirt_boot_secondary(int cpu, struct task_struct *idle) { paravirt_smp_gp[cpu] = (unsigned long)task_thread_info(idle); @@ -141,7 +137,6 @@ struct plat_smp_ops paravirt_smp_ops = { .send_ipi_mask = paravirt_send_ipi_mask, .init_secondary = paravirt_init_secondary, .smp_finish = paravirt_smp_finish, - .cpus_done = paravirt_cpus_done, .boot_secondary = paravirt_boot_secondary, .smp_setup = paravirt_smp_setup, .prepare_cpus = paravirt_prepare_cpus, Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
On 13.06.14 09:23, Aneesh Kumar K.V wrote: With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Why? What does this fix? Is this a bug fix, an enhancement? Don't describe only what you do, but also why you do it. Alex Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 19 +-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 34422be566ce..3d0f3fb9c6b6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, return rb; } -static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l, +bool is_base_size) { + int size, a_psize; /* Look at the 8 bit LP value */ unsigned int lp = (l LP_SHIFT) ((1 LP_BITS) - 1); @@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) continue; a_psize = __hpte_actual_psize(lp, size); - if (a_psize != -1) + if (a_psize != -1) { + if (is_base_size) + return 1ul mmu_psize_defs[size].shift; return 1ul mmu_psize_defs[a_psize].shift; + } } } return 0; } +static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 0); +} + +static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 1); +} + static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) { return ((ptel HPTE_R_RPN) ~(psize - 1)) PAGE_SHIFT; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index f53cf2eae36a..7ff45ed27c65 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, goto out; } if (!rma_setup is_vrma_hpte(v)) { - unsigned long psize = hpte_page_size(v, r); + unsigned long psize = hpte_base_page_size(v, r); unsigned long senc = slb_pgsize_encoding(psize); unsigned long lpcr; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 87624ab5ba82..c6aca75b8376 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -839,7 +839,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v, * to check against the actual page size. */ if ((v valid) (v mask) == val - hpte_page_size(v, r) == (1ul pshift)) + hpte_base_page_size(v, r) == (1ul pshift)) /* Return with the HPTE still locked */ return (hash 3) + (i 1); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Why I advise against using ivshmem
Il 13/06/2014 11:26, Vincent JARDIN ha scritto: Markus especially referred to parts *outside* QEMU: the server, the uio driver, etc. These out-of-tree, non-packaged parts of ivshmem are one of the reasons why Red Hat has disabled ivshmem in RHEL7. You made the right choices, these out-of-tree packages are not required. You can use QEMU's ivshmem without any of the out-of-tree packages. The out-of-tree packages are just some examples of using ivshmem. Fine, however Red Hat would also need a way to test ivshmem code, with proper quality assurance (that also benefits upstream, of course). With ivshmem this is not possible without the out-of-tree packages. Disabling all the unwanted devices is a lot of work and thankless too (you only get complaints, in fact!). But we prefer to ship only what we know we can test, support and improve. We do not want customers' bug reports to languish because they are using code that cannot really be fixed. Note that we do take into account community contributions in choosing which new code can be supported. For example most work on VMDK images was done by Fam when he was a student, libiscsi is mostly the work of Peter Lieven, and so on; both of them are supported in RHEL. These people did/do a great job, and we were happy to embrace those features! Now, putting back my QEMU hat... He also listed many others. Basically for parts of QEMU that are not of high quality, we either fix them (this is for example what we did for qcow2) or disable them. Not just ivshmem suffered this fate, for example many network cards, sound cards, SCSI storage adapters. I and David (cc) are working on making it better based on the issues that are found. Now, vhost-user is in the process of being merged for 2.1. Compared to the DPDK solution: now, you cannot compare vhost-user to DPDK/ivshmem; both should exsit because they have different scope and use cases. It is like comparing two different(A) models of IPC: - vhost-user - networking use case specific Not necessarily. First and foremost, vhost-user defines an API for communication between QEMU and the host, including: * file descriptor passing for the shared memory file * mapping offsets in shared memory to physical memory addresses in the guests * passing dirty memory information back and forth, so that migration is not prevented * sending interrupts to a device * setting up ring buffers in the shared memory None of these is virtio specific, except the last (even then, you could repurpose the messages to pass the address of the whole shared memory area, instead of the vrings only). Yes, the only front-end for vhost-user, right now, is a network device. But it is possible to connect vhost-scsi to vhost-user as well, it is possible to develop a vhost-serial as well, and it is possible to only use the RPC and develop arbitrary shared-memory based tools using this API. It's just that no one has done it yet. Also, vhost-user is documented! See here: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg00581.html The only part of ivshmem that vhost doesn't include is the n-way inter-guest doorbell. This is the part that requires a server and uio driver. vhost only supports host-guest and guest-host doorbells. * it doesn't require hugetlbfs (which only enabled shared memory by chance in older QEMU releases, that was never documented) ivhsmem does not require hugetlbfs. It is optional. * it doesn't require the kernel driver from the DPDK sample ivhsmem does not require DPDK kernel driver. see memnic's PMD: http://dpdk.org/browse/memnic/tree/pmd/pmd_memnic.c You're right, I was confusing memnic and the vhost example in DPDK. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Why I advise against using ivshmem
Hello, On 06/13/2014 11:26 AM, Vincent JARDIN wrote: ivhsmem does not require hugetlbfs. It is optional. * it doesn't require ivshmem (it does require shared memory, which will also be added to 2.1) Right, hugetlbfs is not required. A posix shared memory or tmpfs can be used instead. For instance, to use /dev/shm/foobar: qemu-system-x86_64 -enable-kvm -cpu host [...] \ -device ivshmem,size=16,shm=foobar Regards, Olivier -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mips: Accidental removal of paravirt_cpus_done?
On Fri, Jun 13, 2014 at 12:02:30PM +0200, Geert Uytterhoeven wrote: Hi Ralf, It seems you accidentally assimilated an (unwanted?) kvm change in my patch: Hi Geert, Actually this change was wanted. After Ralf informed me about a compile error in linux-next I've sent him an update for one of my mips-paravirt patches. Unfortunately that ended up in your (unrelated patch). Andreas On Tue, Jun 10, 2014 at 3:31 AM, Linux Kernel Mailing List linux-ker...@vger.kernel.org wrote: Gitweb: http://git.kernel.org/linus/;a=commit;h=5e888e8fb55cf3da870b85d04fef6bfe0d57c974 Commit: 5e888e8fb55cf3da870b85d04fef6bfe0d57c974 Parent: a1eace4ba53546bc7a6670b1c380cd5c1287ae8b Refname:refs/heads/master Author: Geert Uytterhoeven ge...@linux-m68k.org AuthorDate: Tue Apr 22 12:51:13 2014 +0200 Committer: Ralf Baechle r...@linux-mips.org CommitDate: Mon Jun 2 16:34:41 2014 +0200 mips: Update the email address of Geert Uytterhoeven All my Sony addresses are defunct. Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org Cc: linux-m...@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/6817/ Signed-off-by: Ralf Baechle r...@linux-mips.org --- arch/mips/include/asm/nile4.h |2 +- arch/mips/paravirt/paravirt-smp.c |5 - arch/mips/pci/ops-pmcmsp.c|2 +- arch/mips/pci/ops-tx3927.c|2 +- 4 files changed, 3 insertions(+), 8 deletions(-) diff --git a/arch/mips/paravirt/paravirt-smp.c b/arch/mips/paravirt/paravirt-smp.c index 73a123e..0164b0c 100644 --- a/arch/mips/paravirt/paravirt-smp.c +++ b/arch/mips/paravirt/paravirt-smp.c @@ -99,10 +99,6 @@ static void paravirt_smp_finish(void) local_irq_enable(); } -static void paravirt_cpus_done(void) -{ -} - static void paravirt_boot_secondary(int cpu, struct task_struct *idle) { paravirt_smp_gp[cpu] = (unsigned long)task_thread_info(idle); @@ -141,7 +137,6 @@ struct plat_smp_ops paravirt_smp_ops = { .send_ipi_mask = paravirt_send_ipi_mask, .init_secondary = paravirt_init_secondary, .smp_finish = paravirt_smp_finish, - .cpus_done = paravirt_cpus_done, .boot_secondary = paravirt_boot_secondary, .smp_setup = paravirt_smp_setup, .prepare_cpus = paravirt_prepare_cpus, Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Personal Donation to you
2,000,000 USD A Personal Donation to you from Mr. Pedro Quezada, Contact Mr. Pedro Quezada On (; p.quezada4e...@3mail.ie; ) for more details. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1] vhost: avoid large order allocations
On Tue, 13 May 2014 18:15:27 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Tue, May 13, 2014 at 04:29:58PM +0200, Romain Francoise wrote: Michael S. Tsirkin m...@redhat.com writes: Please dont' do this, extra indirection hurts performance. Instead, please change vhost_net_open and scsi to allocate the whole structure with vmalloc if kmalloc fails, along the lines of 74d332c13b2148ae934ea94dac1745ae92efe8e5 Back in January 2013, you didn't seem to think it was a good idea: https://lkml.org/lkml/2013/1/23/492 Hmm true, and Dave thought the structure's too large. I'll have to do some benchmarks to see what the effect of Michael's patch is, performance-wise. If it's too expensive I can pick up your patch, no need to repost. Hi Michael, do you have any update in this case for us? Michael -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 54521] nVMX: accurately emulate VMXON region
https://bugzilla.kernel.org/show_bug.cgi?id=54521 Paolo Bonzini bonz...@gnu.org changed: What|Removed |Added Status|NEW |RESOLVED CC||bonz...@gnu.org Kernel Version||3.16 Resolution|--- |CODE_FIX -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 53601] nVMX meta-bug
https://bugzilla.kernel.org/show_bug.cgi?id=53601 Bug 53601 depends on bug 54521, which changed state. Bug 54521 Summary: nVMX: accurately emulate VMXON region https://bugzilla.kernel.org/show_bug.cgi?id=54521 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |CODE_FIX -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Why I advise against using ivshmem
Fine, however Red Hat would also need a way to test ivshmem code, with proper quality assurance (that also benefits upstream, of course). With ivshmem this is not possible without the out-of-tree packages. You did not reply to my question: how to get the list of things that are/will be disabled by Redhat? About Redhat's QA, I do not care. About Qemu's QA, I do care ;) I guess we can combine both. What's about something like: tests/virtio-net-test.c # qtest_add_func( is a nop) but for ivshmem test/ivshmem-test.c ? would it have any values? If not, what do you use at Redhat to test Qemu? now, you cannot compare vhost-user to DPDK/ivshmem; both should exsit because they have different scope and use cases. It is like comparing two different(A) models of IPC: I do repeat this use case that you had removed because vhost-user does not solve it yet: - ivshmem - framework to be generic to have shared memory for many use cases (HPC, in-memory-database, a network too like memnic). - vhost-user - networking use case specific Not necessarily. First and foremost, vhost-user defines an API for communication between QEMU and the host, including: * file descriptor passing for the shared memory file * mapping offsets in shared memory to physical memory addresses in the guests * passing dirty memory information back and forth, so that migration is not prevented * sending interrupts to a device * setting up ring buffers in the shared memory Yes, I do agree that it is promising. And of course some tests are here: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg00584.html for some of the bullets you are listing (not all yet). Also, vhost-user is documented! See here: https://lists.gnu.org/archive/html/qemu-devel/2014-03/msg00581.html as I told you, we'll send a contribution with ivshmem's documentation. The only part of ivshmem that vhost doesn't include is the n-way inter-guest doorbell. This is the part that requires a server and uio driver. vhost only supports host-guest and guest-host doorbells. agree: both will need it: vhost and ivshmem requires a doorbell for VM2VM, but then we'll have a security issue to be managed by Qemu for vhost and ivshmem. I'll be pleased to contribute on it for ivshmem thru another thread that this one. ivhsmem does not require DPDK kernel driver. see memnic's PMD: http://dpdk.org/browse/memnic/tree/pmd/pmd_memnic.c You're right, I was confusing memnic and the vhost example in DPDK. Definitively, it proves a lack of documentation. You welcome. Olivier did explain it: ivhsmem does not require hugetlbfs. It is optional. * it doesn't require ivshmem (it does require shared memory, which will also be added to 2.1) Right, hugetlbfs is not required. A posix shared memory or tmpfs can be used instead. For instance, to use /dev/shm/foobar: qemu-system-x86_64 -enable-kvm -cpu host [...] \ -device ivshmem,size=16,shm=foobar Best regards, Vincent -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mips: Accidental removal of paravirt_cpus_done?
On Fri, Jun 13, 2014 at 12:02:30PM +0200, Geert Uytterhoeven wrote: It seems you accidentally assimilated an (unwanted?) kvm change in my patch: I accidentally must have done a git commit --amend with the wrong patch on top, sorry about that. The change itself was intensional. Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Why I advise against using ivshmem
Il 13/06/2014 15:41, Vincent JARDIN ha scritto: Fine, however Red Hat would also need a way to test ivshmem code, with proper quality assurance (that also benefits upstream, of course). With ivshmem this is not possible without the out-of-tree packages. You did not reply to my question: how to get the list of things that are/will be disabled by Redhat? I don't know exactly what the answer is, and this is probably not the right list to discuss it. I guess there are partnership programs with Red Hat that I don't know the details of, but these are more for management folks and not really for developers. ivshmem in particular was disabled even in RHEL7 beta, so you could have found out about this in December and opened a bug in Bugzilla about it. I guess we can combine both. What's about something like: tests/virtio-net-test.c # qtest_add_func( is a nop) but for ivshmem test/ivshmem-test.c ? would it have any values? The first things to do are: 1) try to understand if there is any value in a simplified shared memory device with no interrupts (and those no eventfd or uio dependencies, not even optionally). You are not using them because DPDK only does polling and basically reserves a core for the NIC code. If so, this would be a very simple device, just a 100 or so lines of code. We could get this in upstream, and it would be likely enabled in RHEL too. 2) if not, get the server and uio driver merged into the QEMU tree, and document the protocol in docs/specs/ivshmem_device_spec.txt. It doesn't matter if the code comes from the Nahanni repository or from your own implementation. Also start fixing bugs such as the ones that Markus reported (removing all exit() invocations). Writing testcases using the qtest framework would also be useful, but first of all it is important to make ivshmem easier to use. If not, what do you use at Redhat to test Qemu? We do integration testing using autotest/virt-test (QEMU and KVM developers for upstream use it too) and also some manual functional tests. Contributing ivshmem tests to the virt-test would also be helpful in demonstrating your interest in maintaining ivshmem. The repository and documentation is at https://github.com/autotest/virt-test/ (a bit Fedora-centric). I do repeat this use case that you had removed because vhost-user does not solve it yet: - ivshmem - framework to be generic to have shared memory for many use cases (HPC, in-memory-database, a network too like memnic). Right, ivshmem is better for guest-to-guest. vhost-user is not restricted to networking, but it is indeed more focused on guest-to-host. ivshmem is usable for guest-to-host, but I would prefer still some hybrid that uses vhost-like messages to pass the shared memory fds to the external program. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
Alexander Graf ag...@suse.de writes: On 13.06.14 09:23, Aneesh Kumar K.V wrote: With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Why? What does this fix? Is this a bug fix, an enhancement? Don't describe only what you do, but also why you do it. This could result in page fault failures (unhandled page fault) because even though we have a valid hpte entry mapping a 16MB page, since we were comparing actual page size against page size calculated from SLB bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe a failure in real and the bug was found during code audit. That could be because with THP we have guest ram backed by hugetlbfs and we always find the page in the host linux page table. The will result in do_h_enter always inserting HPTE_V_VALID entry and hence we might not really end up calling kvmppc_hv_find_lock_hpte. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, June 12, 2014 8:05 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Wood Scott-B07421 Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule On 06/12/2014 04:00 PM, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too tight. The tlb entries of one vcpu are polluted when a different vcpu from the same partition runs in-between. Relax the current tlb invalidation condition taking into account the lpid. Signed-off-by: Mihai Caraman mihai.caraman at freescale.com Your mailer is broken? :) This really should be an @. I think this should work. Scott, please ack. Alex, you were right. I screwed up the patch description by inverting relax and tight terms :) It should have been more like this: KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition taking into account the lpid. -Mike -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
On 13.06.14 16:28, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 13.06.14 09:23, Aneesh Kumar K.V wrote: With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Why? What does this fix? Is this a bug fix, an enhancement? Don't describe only what you do, but also why you do it. This could result in page fault failures (unhandled page fault) because even though we have a valid hpte entry mapping a 16MB page, since we were comparing actual page size against page size calculated from SLB bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe a failure in real and the bug was found during code audit. That could be because with THP we have guest ram backed by hugetlbfs and we always find the page in the host linux page table. The will result in do_h_enter always inserting HPTE_V_VALID entry and hence we might not really end up calling kvmppc_hv_find_lock_hpte. So why do we need to override to base page size for the VRMA region? Also I think you want to change the comment above the line in find_lock_hpte you're changing. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule
On 13.06.14 16:43, mihai.cara...@freescale.com wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, June 12, 2014 8:05 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Wood Scott-B07421 Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule On 06/12/2014 04:00 PM, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too tight. The tlb entries of one vcpu are polluted when a different vcpu from the same partition runs in-between. Relax the current tlb invalidation condition taking into account the lpid. Signed-off-by: Mihai Caraman mihai.caraman at freescale.com Your mailer is broken? :) This really should be an @. I think this should work. Scott, please ack. Alex, you were right. I screwed up the patch description by inverting relax and tight terms :) It should have been more like this: KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition taking into account the lpid. Can't we give every vcpu its own lpid? Or don't we trap on global invalidates? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
Alexander Graf ag...@suse.de writes: On 13.06.14 16:28, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 13.06.14 09:23, Aneesh Kumar K.V wrote: With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Why? What does this fix? Is this a bug fix, an enhancement? Don't describe only what you do, but also why you do it. This could result in page fault failures (unhandled page fault) because even though we have a valid hpte entry mapping a 16MB page, since we were comparing actual page size against page size calculated from SLB bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe a failure in real and the bug was found during code audit. That could be because with THP we have guest ram backed by hugetlbfs and we always find the page in the host linux page table. The will result in do_h_enter always inserting HPTE_V_VALID entry and hence we might not really end up calling kvmppc_hv_find_lock_hpte. So why do we need to override to base page size for the VRMA region? slb encoding should be derived based on base page size. Also I think you want to change the comment above the line in find_lock_hpte you're changing. Will do that. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule
On Fri, 2014-06-13 at 16:55 +0200, Alexander Graf wrote: On 13.06.14 16:43, mihai.cara...@freescale.com wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, June 12, 2014 8:05 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Wood Scott-B07421 Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule On 06/12/2014 04:00 PM, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too tight. The tlb entries of one vcpu are polluted when a different vcpu from the same partition runs in-between. Relax the current tlb invalidation condition taking into account the lpid. Can you quantify the performance improvement from this? We've had bugs in this area before, so let's make sure it's worth it before making this more complicated. Signed-off-by: Mihai Caraman mihai.caraman at freescale.com Your mailer is broken? :) This really should be an @. I think this should work. Scott, please ack. Alex, you were right. I screwed up the patch description by inverting relax and tight terms :) It should have been more like this: KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition taking into account the lpid. Can't we give every vcpu its own lpid? Or don't we trap on global invalidates? That would significantly increase the odds of exhausting LPIDs, especially on large chips like t4240 with similarly large VMs. If we were to do that, the LPIDs would need to be dynamically assigned (like PIDs), and should probably be a separate numberspace per physical core. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4 v3] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
On Thu, 2014-06-12 at 18:04 +0200, Alexander Graf wrote: On 06/02/2014 05:50 PM, Mihai Caraman wrote: On book3e, KVM uses load external pid (lwepx) dedicated instruction to read guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI and LRAT), generated by loading a guest address, needs to be handled by KVM. These exceptions are generated in a substituted guest translation context (EPLC[EGS] = 1) from host context (MSR[GS] = 0). Currently, KVM hooks only interrupts generated from guest context (MSR[GS] = 1), doing minimal checks on the fast path to avoid host performance degradation. lwepx exceptions originate from host state (MSR[GS] = 0) which implies additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by looking at the Exception Syndrome Register (ESR[EPID]) and the External PID Load Context Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious too intrusive for the host. Read guest last instruction from kvmppc_load_last_inst() by searching for the physical address and kmap it. This address the TODO for TLB eviction and execute-but-not-read entries, and allow us to get rid of lwepx until we are able to handle failures. A simple stress benchmark shows a 1% sys performance degradation compared with previous approach (lwepx without failure handling): time for i in `seq 1 1`; do /bin/echo /dev/null; done real0m 8.85s user0m 4.34s sys 0m 4.48s vs real0m 8.84s user0m 4.36s sys 0m 4.44s An alternative solution, to handle lwepx exceptions in KVM, is to temporary highjack the interrupt vector from host. Some cores share host IVOR registers between hardware threads, which is the case of FSL e6500, which impose additional synchronization logic for this solution to work. This optimized solution can be developed later on top of this patch. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v3: - reworked patch description - use unaltered kmap addr for kunmap - get last instruction before beeing preempted v2: - reworked patch description - used pr_* functions - addressed cosmetic feedback arch/powerpc/kvm/booke.c | 32 arch/powerpc/kvm/bookehv_interrupts.S | 37 -- arch/powerpc/kvm/e500_mmu_host.c | 93 +++ 3 files changed, 134 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 34a42b9..4ef52a8 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -880,6 +880,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, int r = RESUME_HOST; int s; int idx; + u32 last_inst = KVM_INST_FETCH_FAILED; + enum emulation_result emulated = EMULATE_DONE; /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); @@ -887,6 +889,15 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); + /* +* get last instruction before beeing preempted +* TODO: for e6500 check also BOOKE_INTERRUPT_LRAT_ERROR ESR_DATA +*/ + if (exit_nr == BOOKE_INTERRUPT_DATA_STORAGE || + exit_nr == BOOKE_INTERRUPT_DTLB_MISS || + exit_nr == BOOKE_INTERRUPT_HV_PRIV) Please make this a switch() - that's easier to read. + emulated = kvmppc_get_last_inst(vcpu, false, last_inst); + local_irq_enable(); trace_kvm_exit(exit_nr, vcpu); @@ -895,6 +906,26 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, run-exit_reason = KVM_EXIT_UNKNOWN; run-ready_for_interrupt_injection = 1; + switch (emulated) { + case EMULATE_AGAIN: + r = RESUME_GUEST; + goto out; + + case EMULATE_FAIL: + pr_debug(%s: emulation at %lx failed (%08x)\n, + __func__, vcpu-arch.pc, last_inst); + /* For debugging, encode the failing instruction and +* report it to userspace. */ + run-hw.hardware_exit_reason = ~0ULL 32; + run-hw.hardware_exit_reason |= last_inst; + kvmppc_core_queue_program(vcpu, ESR_PIL); + r = RESUME_HOST; + goto out; + + default: + break; + } I think you can just put this into a function. Scott, I think the patch overall looks quite good. Can you please check as well and if you agree give it your reviewed-by? Mike, when Scott gives you a reviewed-by, please include it for the next version. Alex + switch (exit_nr) { case BOOKE_INTERRUPT_MACHINE_CHECK: printk(MACHINE CHECK: %lx\n, mfspr(SPRN_MCSR)); @@ -1184,6 +1215,7 @@ int
[PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 19 +-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 34422be566ce..3d0f3fb9c6b6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, return rb; } -static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l, +bool is_base_size) { + int size, a_psize; /* Look at the 8 bit LP value */ unsigned int lp = (l LP_SHIFT) ((1 LP_BITS) - 1); @@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) continue; a_psize = __hpte_actual_psize(lp, size); - if (a_psize != -1) + if (a_psize != -1) { + if (is_base_size) + return 1ul mmu_psize_defs[size].shift; return 1ul mmu_psize_defs[a_psize].shift; + } } } return 0; } +static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 0); +} + +static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 1); +} + static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) { return ((ptel HPTE_R_RPN) ~(psize - 1)) PAGE_SHIFT; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index f53cf2eae36a..7ff45ed27c65 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, goto out; } if (!rma_setup is_vrma_hpte(v)) { - unsigned long psize = hpte_page_size(v, r); + unsigned long psize = hpte_base_page_size(v, r); unsigned long senc = slb_pgsize_encoding(psize); unsigned long lpcr; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 87624ab5ba82..c6aca75b8376 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -839,7 +839,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v, * to check against the actual page size. */ if ((v valid) (v mask) == val - hpte_page_size(v, r) == (1ul pshift)) + hpte_base_page_size(v, r) == (1ul pshift)) /* Return with the HPTE still locked */ return (hash 3) + (i 1); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
On 13.06.14 09:23, Aneesh Kumar K.V wrote: With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Why? What does this fix? Is this a bug fix, an enhancement? Don't describe only what you do, but also why you do it. Alex Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com --- arch/powerpc/include/asm/kvm_book3s_64.h | 19 +-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 34422be566ce..3d0f3fb9c6b6 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -202,8 +202,10 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, return rb; } -static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l, +bool is_base_size) { + int size, a_psize; /* Look at the 8 bit LP value */ unsigned int lp = (l LP_SHIFT) ((1 LP_BITS) - 1); @@ -218,14 +220,27 @@ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) continue; a_psize = __hpte_actual_psize(lp, size); - if (a_psize != -1) + if (a_psize != -1) { + if (is_base_size) + return 1ul mmu_psize_defs[size].shift; return 1ul mmu_psize_defs[a_psize].shift; + } } } return 0; } +static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 0); +} + +static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l) +{ + return __hpte_page_size(h, l, 1); +} + static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize) { return ((ptel HPTE_R_RPN) ~(psize - 1)) PAGE_SHIFT; diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index f53cf2eae36a..7ff45ed27c65 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -1567,7 +1567,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf, goto out; } if (!rma_setup is_vrma_hpte(v)) { - unsigned long psize = hpte_page_size(v, r); + unsigned long psize = hpte_base_page_size(v, r); unsigned long senc = slb_pgsize_encoding(psize); unsigned long lpcr; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 87624ab5ba82..c6aca75b8376 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -839,7 +839,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v, * to check against the actual page size. */ if ((v valid) (v mask) == val - hpte_page_size(v, r) == (1ul pshift)) + hpte_base_page_size(v, r) == (1ul pshift)) /* Return with the HPTE still locked */ return (hash 3) + (i 1); -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
Alexander Graf ag...@suse.de writes: On 13.06.14 09:23, Aneesh Kumar K.V wrote: With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Why? What does this fix? Is this a bug fix, an enhancement? Don't describe only what you do, but also why you do it. This could result in page fault failures (unhandled page fault) because even though we have a valid hpte entry mapping a 16MB page, since we were comparing actual page size against page size calculated from SLB bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe a failure in real and the bug was found during code audit. That could be because with THP we have guest ram backed by hugetlbfs and we always find the page in the host linux page table. The will result in do_h_enter always inserting HPTE_V_VALID entry and hence we might not really end up calling kvmppc_hv_find_lock_hpte. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, June 12, 2014 8:05 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Wood Scott-B07421 Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule On 06/12/2014 04:00 PM, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too tight. The tlb entries of one vcpu are polluted when a different vcpu from the same partition runs in-between. Relax the current tlb invalidation condition taking into account the lpid. Signed-off-by: Mihai Caraman mihai.caraman at freescale.com Your mailer is broken? :) This really should be an @. I think this should work. Scott, please ack. Alex, you were right. I screwed up the patch description by inverting relax and tight terms :) It should have been more like this: KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition taking into account the lpid. -Mike -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
On 13.06.14 16:28, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 13.06.14 09:23, Aneesh Kumar K.V wrote: With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Why? What does this fix? Is this a bug fix, an enhancement? Don't describe only what you do, but also why you do it. This could result in page fault failures (unhandled page fault) because even though we have a valid hpte entry mapping a 16MB page, since we were comparing actual page size against page size calculated from SLB bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe a failure in real and the bug was found during code audit. That could be because with THP we have guest ram backed by hugetlbfs and we always find the page in the host linux page table. The will result in do_h_enter always inserting HPTE_V_VALID entry and hence we might not really end up calling kvmppc_hv_find_lock_hpte. So why do we need to override to base page size for the VRMA region? Also I think you want to change the comment above the line in find_lock_hpte you're changing. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule
On 13.06.14 16:43, mihai.cara...@freescale.com wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, June 12, 2014 8:05 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Wood Scott-B07421 Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule On 06/12/2014 04:00 PM, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too tight. The tlb entries of one vcpu are polluted when a different vcpu from the same partition runs in-between. Relax the current tlb invalidation condition taking into account the lpid. Signed-off-by: Mihai Caraman mihai.caraman at freescale.com Your mailer is broken? :) This really should be an @. I think this should work. Scott, please ack. Alex, you were right. I screwed up the patch description by inverting relax and tight terms :) It should have been more like this: KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition taking into account the lpid. Can't we give every vcpu its own lpid? Or don't we trap on global invalidates? Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value
Alexander Graf ag...@suse.de writes: On 13.06.14 16:28, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 13.06.14 09:23, Aneesh Kumar K.V wrote: With guest supporting Multiple page size per segment (MPSS), hpte_page_size returns actual page size used. Add a new function to return base page size and use that to compare against the the page size calculated from SLB Why? What does this fix? Is this a bug fix, an enhancement? Don't describe only what you do, but also why you do it. This could result in page fault failures (unhandled page fault) because even though we have a valid hpte entry mapping a 16MB page, since we were comparing actual page size against page size calculated from SLB bits kvmppc_hv_find_lock_hpte will fail and return -1. I did not observe a failure in real and the bug was found during code audit. That could be because with THP we have guest ram backed by hugetlbfs and we always find the page in the host linux page table. The will result in do_h_enter always inserting HPTE_V_VALID entry and hence we might not really end up calling kvmppc_hv_find_lock_hpte. So why do we need to override to base page size for the VRMA region? slb encoding should be derived based on base page size. Also I think you want to change the comment above the line in find_lock_hpte you're changing. Will do that. -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule
On Fri, 2014-06-13 at 16:55 +0200, Alexander Graf wrote: On 13.06.14 16:43, mihai.cara...@freescale.com wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, June 12, 2014 8:05 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Wood Scott-B07421 Subject: Re: [PATCH] KVM: PPC: e500mc: Relax tlb invalidation condition on vcpu schedule On 06/12/2014 04:00 PM, Mihai Caraman wrote: On vcpu schedule, the condition checked for tlb pollution is too tight. The tlb entries of one vcpu are polluted when a different vcpu from the same partition runs in-between. Relax the current tlb invalidation condition taking into account the lpid. Can you quantify the performance improvement from this? We've had bugs in this area before, so let's make sure it's worth it before making this more complicated. Signed-off-by: Mihai Caraman mihai.caraman at freescale.com Your mailer is broken? :) This really should be an @. I think this should work. Scott, please ack. Alex, you were right. I screwed up the patch description by inverting relax and tight terms :) It should have been more like this: KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule On vcpu schedule, the condition checked for tlb pollution is too loose. The tlb entries of a vcpu are polluted (vs stale) only when a different vcpu within the same logical partition runs in-between. Optimize the tlb invalidation condition taking into account the lpid. Can't we give every vcpu its own lpid? Or don't we trap on global invalidates? That would significantly increase the odds of exhausting LPIDs, especially on large chips like t4240 with similarly large VMs. If we were to do that, the LPIDs would need to be dynamically assigned (like PIDs), and should probably be a separate numberspace per physical core. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4 v3] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation
On Thu, 2014-06-12 at 18:04 +0200, Alexander Graf wrote: On 06/02/2014 05:50 PM, Mihai Caraman wrote: On book3e, KVM uses load external pid (lwepx) dedicated instruction to read guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI and LRAT), generated by loading a guest address, needs to be handled by KVM. These exceptions are generated in a substituted guest translation context (EPLC[EGS] = 1) from host context (MSR[GS] = 0). Currently, KVM hooks only interrupts generated from guest context (MSR[GS] = 1), doing minimal checks on the fast path to avoid host performance degradation. lwepx exceptions originate from host state (MSR[GS] = 0) which implies additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by looking at the Exception Syndrome Register (ESR[EPID]) and the External PID Load Context Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious too intrusive for the host. Read guest last instruction from kvmppc_load_last_inst() by searching for the physical address and kmap it. This address the TODO for TLB eviction and execute-but-not-read entries, and allow us to get rid of lwepx until we are able to handle failures. A simple stress benchmark shows a 1% sys performance degradation compared with previous approach (lwepx without failure handling): time for i in `seq 1 1`; do /bin/echo /dev/null; done real0m 8.85s user0m 4.34s sys 0m 4.48s vs real0m 8.84s user0m 4.36s sys 0m 4.44s An alternative solution, to handle lwepx exceptions in KVM, is to temporary highjack the interrupt vector from host. Some cores share host IVOR registers between hardware threads, which is the case of FSL e6500, which impose additional synchronization logic for this solution to work. This optimized solution can be developed later on top of this patch. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v3: - reworked patch description - use unaltered kmap addr for kunmap - get last instruction before beeing preempted v2: - reworked patch description - used pr_* functions - addressed cosmetic feedback arch/powerpc/kvm/booke.c | 32 arch/powerpc/kvm/bookehv_interrupts.S | 37 -- arch/powerpc/kvm/e500_mmu_host.c | 93 +++ 3 files changed, 134 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 34a42b9..4ef52a8 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -880,6 +880,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, int r = RESUME_HOST; int s; int idx; + u32 last_inst = KVM_INST_FETCH_FAILED; + enum emulation_result emulated = EMULATE_DONE; /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); @@ -887,6 +889,15 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, /* restart interrupts if they were meant for the host */ kvmppc_restart_interrupt(vcpu, exit_nr); + /* +* get last instruction before beeing preempted +* TODO: for e6500 check also BOOKE_INTERRUPT_LRAT_ERROR ESR_DATA +*/ + if (exit_nr == BOOKE_INTERRUPT_DATA_STORAGE || + exit_nr == BOOKE_INTERRUPT_DTLB_MISS || + exit_nr == BOOKE_INTERRUPT_HV_PRIV) Please make this a switch() - that's easier to read. + emulated = kvmppc_get_last_inst(vcpu, false, last_inst); + local_irq_enable(); trace_kvm_exit(exit_nr, vcpu); @@ -895,6 +906,26 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, run-exit_reason = KVM_EXIT_UNKNOWN; run-ready_for_interrupt_injection = 1; + switch (emulated) { + case EMULATE_AGAIN: + r = RESUME_GUEST; + goto out; + + case EMULATE_FAIL: + pr_debug(%s: emulation at %lx failed (%08x)\n, + __func__, vcpu-arch.pc, last_inst); + /* For debugging, encode the failing instruction and +* report it to userspace. */ + run-hw.hardware_exit_reason = ~0ULL 32; + run-hw.hardware_exit_reason |= last_inst; + kvmppc_core_queue_program(vcpu, ESR_PIL); + r = RESUME_HOST; + goto out; + + default: + break; + } I think you can just put this into a function. Scott, I think the patch overall looks quite good. Can you please check as well and if you agree give it your reviewed-by? Mike, when Scott gives you a reviewed-by, please include it for the next version. Alex + switch (exit_nr) { case BOOKE_INTERRUPT_MACHINE_CHECK: printk(MACHINE CHECK: %lx\n, mfspr(SPRN_MCSR)); @@ -1184,6 +1215,7 @@ int