Re: [RFC] virtio: Support releasing lock during kick
On Fri, Jun 25, 2010 at 4:09 AM, Rusty Russell ru...@rustcorp.com.au wrote: On Thu, 24 Jun 2010 03:00:30 pm Stefan Hajnoczi wrote: On Wed, Jun 23, 2010 at 11:12 PM, Anthony Liguori anth...@codemonkey.ws wrote: Shouldn't it be possible to just drop the lock before invoking virtqueue_kick() and reacquire it afterwards? There's nothing in that virtqueue_kick() path that the lock is protecting AFAICT. No, that would lead to a race condition because vq-num_added is modified by both virtqueue_add_buf_gfp() and virtqueue_kick(). Without a lock held during virtqueue_kick() another vcpu could add bufs while vq-num_added is used and cleared by virtqueue_kick(): Right, this dovetails with another proposed change (was it Michael?) where we would update the avail idx inside add_buf, rather than waiting until kick. This means a barrier inside add_buf, but that's probably fine. If we do that, then we don't need a lock on virtqueue_kick. That would be nice, we could push the change up into just virtio-blk. I did wonder if virtio-net can take advantage of unlocked kick, too, but haven't investigated yet. The virtio-net kick in start_xmit() happens with the netdev _xmit_lock held. Any ideas? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: qemu fail to parse command line with -pcidevice 00:19.0
Thanks, Mark. -Original Message- From: Markus Armbruster [mailto:arm...@redhat.com] Sent: 2010年6月25日 12:58 To: Hao, Xudong Cc: qemu-de...@nongnu.org; aligu...@us.ibm.com; kvm@vger.kernel.org Subject: Re: qemu fail to parse command line with -pcidevice 00:19.0 Hao, Xudong xudong@intel.com writes: Work-around: -device pci-assign,host=00:19.1 OK, this new way can work when create guest with static assignment. But how to hot add a pci device to guest? the old hot add command pci_add pci_addr=auto host host=00:19.0 has the same parse error. Command line's -device becomes monitor's device_add: device_add pci-assign,host=00:19.1 BTW: if we use add -net none in qemu command, guest can not be created and nothing error printed. Do you have plan to fix this parse issue? Separate issue. Fix posted: Subject: [Qemu-devel] [PATCH] net: Fix VM start with '-net none' Date: Tue, 15 Jun 2010 13:30:39 +0530 Message-Id: 22a96312232a0458fc04268b79d17828c824df42.1276588830.git.amit.s...@redhat.com You could have found this yourself :)
Re: qemu fail to parse command line with -pcidevice 00:19.0
(2010/06/24 15:08), Markus Armbruster wrote: Note to qemu-devel: this issue is qemu-kvm only. Hao, Xudong xudong@intel.com writes: When assign one PCI device, qemu fail to parse the command line: qemu-system_x86 -smp 2 -m 1024 -hda /path/to/img -pcidevice host=00:19.0 Error: qemu-system-x86_64: Parameter 'id' expects an identifier Identifiers consist of letters, digits, '-', '.', '_', starting with a letter. pcidevice argument parse error; please check the help text for usage Could not add assigned device host=00:19.0 https://bugs.launchpad.net/qemu/+bug/597932 This issue caused by qemu-kvm commit b560a9ab9be06afcbb78b3791ab836dad208a239. The bug is in add_assigned_device(): r = get_param_value(id, sizeof(id), id, arg); if (!r) r = get_param_value(id, sizeof(id), name, arg); if (!r) r = get_param_value(id, sizeof(id), host, arg); We end up with invalid ID 00:19.0. ... Are there any strong reason why we cannot use ':' in the identifier? Thanks, H.Seto -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Graphical virtualisation management system
On Thu, Jun 24, 2010 at 02:01:52PM -0500, Javier Guerra Giraldez wrote: On Thu, Jun 24, 2010 at 1:32 PM, Freddie Cash fjwc...@gmail.com wrote: ??* virt-manager which requires X and seems to be more desktop-oriented; don't know about the others, but virt-manager runs only on the admin station. on the VM hosts you run only libvirtd, which doesn't need X While it can connect to remote systems it seems totally unusable for that to me. For one thing working over higher latency links like DSL or even transatlantik links seems to be almost impossible. Second I still haven't figure out how to install and manage a system using the serial console with KVM, which certainly contributes to the complete lack of usability above. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: VMX: Execute WBINVD to keep data consistency with assigned devices
Some guest device driver may leverage the Non-Snoop I/O, and explicitly WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or CLFLUSH, we need to maintain data consistency either by: 1: flushing cache (wbinvd) when the guest is scheduled out if there is no wbinvd exit, or 2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits. For wbinvd VMExit capable processors, we issue IPIs to all physical CPUs to do wbinvd, for we can't easily tell which physical CPUs are dirty. Signed-off-by: Yaozu (Eddie) Dong eddie.d...@intel.com Signed-off-by: Sheng Yang sh...@linux.intel.com --- arch/x86/include/asm/kvm_host.h |3 +++ arch/x86/kvm/emulate.c |5 - arch/x86/kvm/svm.c |6 ++ arch/x86/kvm/vmx.c | 27 ++- arch/x86/kvm/x86.c |6 ++ 5 files changed, 45 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a57cdea..1c392c9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -514,6 +514,8 @@ struct kvm_x86_ops { void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry); + void (*execute_wbinvd)(struct kvm_vcpu *vcpu); + const struct trace_print_flags *exit_reasons_str; }; @@ -571,6 +573,7 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); int kvm_emulate_halt(struct kvm_vcpu *vcpu); int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address); int emulate_clts(struct kvm_vcpu *vcpu); +int emulate_wbinvd(struct kvm_vcpu *vcpu); void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index abb8cec..085dcb7 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3138,8 +3138,11 @@ twobyte_insn: emulate_clts(ctxt-vcpu); c-dst.type = OP_NONE; break; - case 0x08: /* invd */ case 0x09: /* wbinvd */ + emulate_wbinvd(ctxt-vcpu); + c-dst.type = OP_NONE; + break; + case 0x08: /* invd */ case 0x0d: /* GrpP (prefetch) */ case 0x18: /* Grp16 (prefetch/nop) */ c-dst.type = OP_NONE; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 587b99d..6929da1 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3424,6 +3424,10 @@ static bool svm_rdtscp_supported(void) return false; } +static void svm_execute_wbinvd(struct kvm_vcpu *vcpu) +{ +} + static void svm_fpu_deactivate(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -3508,6 +3512,8 @@ static struct kvm_x86_ops svm_x86_ops = { .rdtscp_supported = svm_rdtscp_supported, .set_supported_cpuid = svm_set_supported_cpuid, + + .execute_wbinvd = svm_execute_wbinvd, }; static int __init svm_init(void) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e565689..063002c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -412,6 +412,12 @@ static inline bool cpu_has_virtual_nmis(void) return vmcs_config.pin_based_exec_ctrl PIN_BASED_VIRTUAL_NMIS; } +static inline bool cpu_has_wbinvd_exit(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl + SECONDARY_EXEC_WBINVD_EXITING; +} + static inline bool report_flexpriority(void) { return flexpriority_enabled; @@ -874,6 +880,11 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx) preempt_enable(); } +static void wbinvd_ipi(void *opaque) +{ + wbinvd(); +} + /* * Switches to specified vcpu, until a matching vcpu_put(), but assumes * vcpu mutex is already taken. @@ -905,6 +916,12 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) per_cpu(vcpus_on_cpu, cpu)); local_irq_enable(); + /* Issue WBINVD in case guest has executed it */ + if (!cpu_has_wbinvd_exit() vcpu-kvm-arch.iommu_domain + vcpu-cpu != -1) + smp_call_function_single(vcpu-cpu, + wbinvd_ipi, NULL, 1); + vcpu-cpu = cpu; /* * Linux uses per-cpu TSS and GDT, so set these when switching @@ -3397,10 +3414,16 @@ static int handle_invlpg(struct kvm_vcpu *vcpu) return 1; } +static void vmx_execute_wbinvd(struct kvm_vcpu *vcpu) +{ + if (vcpu-kvm-arch.iommu_domain) + smp_call_function(wbinvd_ipi, NULL, 1); +} + static int handle_wbinvd(struct kvm_vcpu *vcpu) { skip_emulated_instruction(vcpu); - /* TODO: Add support for VT-d/pass-through device */ + vmx_execute_wbinvd(vcpu); return 1; } @@ -4350,6 +4373,8 @@ static struct
Re: qemu fail to parse command line with -pcidevice 00:19.0
Hidetoshi Seto seto.hideto...@jp.fujitsu.com writes: (2010/06/24 15:08), Markus Armbruster wrote: Note to qemu-devel: this issue is qemu-kvm only. Hao, Xudong xudong@intel.com writes: When assign one PCI device, qemu fail to parse the command line: qemu-system_x86 -smp 2 -m 1024 -hda /path/to/img -pcidevice host=00:19.0 Error: qemu-system-x86_64: Parameter 'id' expects an identifier Identifiers consist of letters, digits, '-', '.', '_', starting with a letter. pcidevice argument parse error; please check the help text for usage Could not add assigned device host=00:19.0 https://bugs.launchpad.net/qemu/+bug/597932 This issue caused by qemu-kvm commit b560a9ab9be06afcbb78b3791ab836dad208a239. The bug is in add_assigned_device(): r = get_param_value(id, sizeof(id), id, arg); if (!r) r = get_param_value(id, sizeof(id), name, arg); if (!r) r = get_param_value(id, sizeof(id), host, arg); We end up with invalid ID 00:19.0. ... Are there any strong reason why we cannot use ':' in the identifier? Paul Brook (cc'ed) objected. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Execute WBINVD to keep data consistency with assigned devices
Sheng Yang wrote: Some guest device driver may leverage the Non-Snoop I/O, and explicitly WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or CLFLUSH, we need to maintain data consistency either by: 1: flushing cache (wbinvd) when the guest is scheduled out if there is no wbinvd exit, or 2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits. For wbinvd VMExit capable processors, we issue IPIs to all physical CPUs to do wbinvd, for we can't easily tell which physical CPUs are dirty. wbinvd is a heavy weapon in the hands of a guest. Even if it is limited to pass-through scenarios, do we really need to bother all physical host CPUs with potential multi-millisecond stalls? Think of VMs only running on a subset of CPUs (e.g. to isolate latency sources). I would suggest to track the physical CPU usage of VCPUs between two wbinvd requests and only send the wbinvd IPI to that set. Also, I think the code is still too much vmx-focused. Only the trapping should be vendor specific, the rest generic. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Graphical virtualisation management system
On 06/25/10 09:05, Christoph Hellwig wrote: On Thu, Jun 24, 2010 at 02:01:52PM -0500, Javier Guerra Giraldez wrote: On Thu, Jun 24, 2010 at 1:32 PM, Freddie Cashfjwc...@gmail.com wrote: ??* virt-manager which requires X and seems to be more desktop-oriented; don't know about the others, but virt-manager runs only on the admin station. on the VM hosts you run only libvirtd, which doesn't need X While it can connect to remote systems it seems totally unusable for that to me. For one thing working over higher latency links like DSL or even transatlantik links seems to be almost impossible. Works but is quite slow indeed. Also virt-manager remote host support works ok for a small number of hosts, but if you want to manage dozens of them it becomes unusable. Second I still haven't figure out how to install and manage a system using the serial console with KVM, which certainly contributes to the complete lack of usability above. Serial console support doesn't work for remote connections. Dunno whenever that is a restriction of virt-manager or the underlying libvirt. cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Graphical virtualisation management system
On Fri, Jun 25, 2010 at 11:07:26AM +0200, Gerd Hoffmann wrote: On 06/25/10 09:05, Christoph Hellwig wrote: Second I still haven't figure out how to install and manage a system using the serial console with KVM, which certainly contributes to the complete lack of usability above. Serial console support doesn't work for remote connections. Dunno whenever that is a restriction of virt-manager or the underlying libvirt. libvirt, kvm, virt-manager - arguably all of them :-) We really need to either tunnel the character device backend streams over VNC, or add a remote streams access API to libvirt, or virt-manager could do an ssh tunnel. VNC tunnelling is what I'd really like todo because that gives a solution that can work with even normal VNC clients like Vinagre. Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Graphical virtualisation management system
Hi, We want to move to a multi-tiered, SAN-based virtualisation setup, but can't find a VM management tool that handles both KVM and Xen (we have some old Opteron hardware that doesn't support SVM), and does not require Linux from end-to-end. For example, we want to run FreeBSD + ZFS on our storage servers, exporting storage via iSCSI (or NFS). We want to run a minimal Debian/Ubuntu install on the VM hosts (just to boot and run the management agents), with all of the VMs getting their storage via iSCSI. With a separate box acting as the management system. Preferably with a web-based management GUI, but that's more of an nice to have than a hard requirement. So far, I've looked at: * oVirt which requires Fedora/CentOS/RedHat on everything; NFS/iSCSI being hosted on non-linux shouldn't be a problem I think, at least the underlying libvirt handles this just fine and I can't see a reason why oVirt shouldn't (don't know oVirt in detail although I've played with it a bit a while ago). To manage the hosts oVirt wants to have some oVirt bits running on them. Porting them to Debian should be possible. But as the stuff interacts with the distro bootup scripts it is most likely noticable more work than just compile+install. cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Graphical virtualisation management system
On Fri, Jun 25, 2010 at 03:05:42AM -0400, Christoph Hellwig wrote: On Thu, Jun 24, 2010 at 02:01:52PM -0500, Javier Guerra Giraldez wrote: On Thu, Jun 24, 2010 at 1:32 PM, Freddie Cash fjwc...@gmail.com wrote: ??* virt-manager which requires X and seems to be more desktop-oriented; don't know about the others, but virt-manager runs only on the admin station. on the VM hosts you run only libvirtd, which doesn't need X While it can connect to remote systems it seems totally unusable for that to me. For one thing working over higher latency links like DSL or even transatlantik links seems to be almost impossible. It is fair to say that virt-manager is not really targetted at high latency WAN scenearios. It is really aimed at small scale local LAN deployments with 5-20 hosts maximum. For a serious WAN deployment you can't use the hub - spoke synchronous RPC architecture, instead you need a asynchronous message bus - this is where something like oVirt or RHEV is best. So I'd agree that you shouldn't use virt-manager across high latency DSL or transatlantic links, just use it in your local home or office LAN. Regards, Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM test: Make it possible to run VMs without NICs
On 06/25/2010 02:33 AM, Lucas Meneghel Rodrigues wrote: For unittesting, for example, is interesting that we run the VM with the bare mininum number of parameters. This fix allows that. Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/kvm_vm.py |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py index 7b1fc05..3c01fa0 100755 --- a/client/tests/kvm/kvm_vm.py +++ b/client/tests/kvm/kvm_vm.py @@ -118,8 +118,9 @@ class VM: self.root_dir = root_dir self.address_cache = address_cache self.netdev_id = [] -for nic in params.get(nics).split(): -self.netdev_id.append(kvm_utils.generate_random_id()) +if params.get(nics): +for nic in params.get(nics).split(): That's exactly what kvm_utils.get_sub_dict_names() does. It may be a long name for something so simple but it's used everywhere in kvm-autotest. +self.netdev_id.append(kvm_utils.generate_random_id()) I think the 3 lines above belong in VM.create(), not VM.__init__(), because VM params are routinely changed in calls to VM.create(). If the code stays in __init__() the changed params will not affect self.netdev_id. A good place for it would be near the code that handles -redir. # Find a unique identifier for this VM while True: -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sched: export sched_set/getaffinity (was Re: [PATCH 3/3] vhost: apply cpumask and cgroup to vhost pollers)
On Thu, Jun 24, 2010 at 03:45:51PM -0700, Sridhar Samudrala wrote: On Thu, 2010-06-24 at 11:11 +0300, Michael S. Tsirkin wrote: On Sun, May 30, 2010 at 10:25:01PM +0200, Tejun Heo wrote: Apply the cpumask and cgroup of the initializing task to the created vhost poller. Based on Sridhar Samudrala's patch. Cc: Michael S. Tsirkin m...@redhat.com Cc: Sridhar Samudrala samudrala.srid...@gmail.com I wanted to apply this, but modpost fails: ERROR: sched_setaffinity [drivers/vhost/vhost_net.ko] undefined! ERROR: sched_getaffinity [drivers/vhost/vhost_net.ko] undefined! Did you try building as a module? In my original implementation, i had these calls in workqueue.c. Now that these are moved to vhost.c which can be built as a module, these symbols need to be exported. The following patch fixes the build issue with vhost as a module. Signed-off-by: Sridhar Samudrala s...@us.ibm.com Signed-off-by: Michael S. Tsirkin m...@redhat.com Works for me. To simplify dependencies, I'd like to queue this together with the chost patches through net-next. Ack to this? diff --git a/kernel/sched.c b/kernel/sched.c index 3c2a54f..15a0c6f 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4837,6 +4837,7 @@ out_put_task: put_online_cpus(); return retval; } +EXPORT_SYMBOL_GPL(sched_setaffinity); static int get_user_cpu_mask(unsigned long __user *user_mask_ptr, unsigned len, struct cpumask *new_mask) @@ -4900,6 +4901,7 @@ out_unlock: return retval; } +EXPORT_SYMBOL_GPL(sched_getaffinity); /** * sys_sched_getaffinity - get the cpu affinity of a process --- drivers/vhost/vhost.c | 36 +++- 1 file changed, 31 insertions(+), 5 deletions(-) Index: work/drivers/vhost/vhost.c === --- work.orig/drivers/vhost/vhost.c +++ work/drivers/vhost/vhost.c @@ -23,6 +23,7 @@ #include linux/highmem.h #include linux/slab.h #include linux/kthread.h +#include linux/cgroup.h #include linux/net.h #include linux/if_packet.h @@ -176,12 +177,30 @@ repeat: long vhost_dev_init(struct vhost_dev *dev, struct vhost_virtqueue *vqs, int nvqs) { - struct task_struct *poller; - int i; + struct task_struct *poller = NULL; + cpumask_var_t mask; + int i, ret = -ENOMEM; + + if (!alloc_cpumask_var(mask, GFP_KERNEL)) + goto out; poller = kthread_create(vhost_poller, dev, vhost-%d, current-pid); - if (IS_ERR(poller)) - return PTR_ERR(poller); + if (IS_ERR(poller)) { + ret = PTR_ERR(poller); + goto out; + } + + ret = sched_getaffinity(current-pid, mask); + if (ret) + goto out; + + ret = sched_setaffinity(poller-pid, mask); + if (ret) + goto out; + + ret = cgroup_attach_task_current_cg(poller); + if (ret) + goto out; dev-vqs = vqs; dev-nvqs = nvqs; @@ -202,7 +221,14 @@ long vhost_dev_init(struct vhost_dev *de vhost_poll_init(dev-vqs[i].poll, dev-vqs[i].handle_kick, POLLIN, dev); } - return 0; + + wake_up_process(poller);/* avoid contributing to loadavg */ + ret = 0; +out: + if (ret) + kthread_stop(poller); + free_cpumask_var(mask); + return ret; } /* Caller should have device mutex */ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: VMX: Execute WBINVD to keep data consistency with assigned devices
On Friday 25 June 2010 16:54:19 Jan Kiszka wrote: Sheng Yang wrote: Some guest device driver may leverage the Non-Snoop I/O, and explicitly WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or CLFLUSH, we need to maintain data consistency either by: 1: flushing cache (wbinvd) when the guest is scheduled out if there is no wbinvd exit, or 2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits. For wbinvd VMExit capable processors, we issue IPIs to all physical CPUs to do wbinvd, for we can't easily tell which physical CPUs are dirty. wbinvd is a heavy weapon in the hands of a guest. Even if it is limited to pass-through scenarios, do we really need to bother all physical host CPUs with potential multi-millisecond stalls? Think of VMs only running on a subset of CPUs (e.g. to isolate latency sources). I would suggest to track the physical CPU usage of VCPUs between two wbinvd requests and only send the wbinvd IPI to that set. OK, would try to make it more specific(and complex)... Also, I think the code is still too much vmx-focused. Only the trapping should be vendor specific, the rest generic. OK, would consider it. -- regards Yang, Sheng Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: UIO interrupts being lost
On Thu, Jun 24, 2010 at 05:43:15PM -0600, Cam Macdonell wrote: Hi Michael, I'm trying to write a uio driver for my shared memory device for KVM and I'm running into a situation where several interrupts in quick succession are not all triggering the callback function in my kernel UIO driver, say 2 out of 5. My driver does not set the Interrupt Disable bit and if it helps, I'm using MSI-X interrupts. Even without the interrupt disable bit set, is there still a window where successive interrupts can be lost if they arrive too quickly? Thanks, Cam Yes, I think so: if an interrupt is delivered when ISR is running, it gets queued, but a second one gets lost. A queueing mechanism is necessary to avoid losing information, e.g. virtio implements exactly that. Why don't you reuse virtio for signalling? If I understand what Anthony said correctly, he objected to the specific implementation, not to the idea of reusing virtio spec and code. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] virtio: Support releasing lock during kick
On Fri, Jun 25, 2010 at 12:39:21PM +0930, Rusty Russell wrote: On Thu, 24 Jun 2010 03:00:30 pm Stefan Hajnoczi wrote: On Wed, Jun 23, 2010 at 11:12 PM, Anthony Liguori anth...@codemonkey.ws wrote: Shouldn't it be possible to just drop the lock before invoking virtqueue_kick() and reacquire it afterwards? There's nothing in that virtqueue_kick() path that the lock is protecting AFAICT. No, that would lead to a race condition because vq-num_added is modified by both virtqueue_add_buf_gfp() and virtqueue_kick(). Without a lock held during virtqueue_kick() another vcpu could add bufs while vq-num_added is used and cleared by virtqueue_kick(): Right, this dovetails with another proposed change (was it Michael?) where we would update the avail idx inside add_buf, rather than waiting until kick. This means a barrier inside add_buf, but that's probably fine. If we do that, then we don't need a lock on virtqueue_kick. Michael, thoughts? Maybe not even that: I think we could just do virtio_wmb() in add, and keep the mb() in kick. What I'm a bit worried about is contention on the cacheline including index and flags: the more we write to that line, the worse it gets. So need to test performance impact of this change: I didn't find time to do this yet, as I am trying to finalize the used index publishing patches. Any takers? Do we see performance improvement after making kick lockless? Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3][RFC] NUMA: add host side pinning
Jes Sorensen wrote: On 06/24/10 13:34, Andre Przywara wrote: Avi Kivity wrote: On 06/24/2010 01:58 PM, Andre Przywara wrote: Non-anonymous memory doesn't work well with ksm and transparent hugepages. Is it possible to use anonymous memory rather than file backed? I'd prefer non-file backed, too. But that is how the current huge pages implementation is done. We could use MAP_HUGETLB and declare NUMA _and_ huge pages as 2.6.32+ only. Unfortunately I didn't find an easy way to detect the presence of the MAP_HUGETLB flag. If the kernel does not support it, it seems that mmap silently ignores it and uses 4KB pages instead. Bit behind on the mailing list, but I think this look very promising. I really think it makes more sense to make QEMU aware of the NUMA setup as well, rather than relying on numctl to do the work outside. One thing you need to consider is what happens with migration once a user specifies -numa. IMHO it is acceptable to simply disable migration for the given guest. Is that really a problem? You create the guest on the target with a NUMA setup specific to the target machine. That could mean that you pin multiple guest nodes to the same host node, but that shouldn't break something, right? The guest part can (and should be!) migrated along with all the other device state. I think this is still missing from the current implementation. Cheers, Jes PS: Are you planning on submitting anything to Linux Plumbers Conference about this? :) Yes, I was planning to submit a proposal, as I saw NUMA mentioned in the topics list. AFAIK the deadline is July 19th, right? That gives me another week after my vacation (for which I leave in a few minutes). Regards, Andre. -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.
On Fri, Jun 25, 2010 at 09:03:46AM +0800, Dong, Eddie wrote: Herbert Xu wrote: On Wed, Jun 23, 2010 at 06:05:41PM +0800, Dong, Eddie wrote: I mean once the frontend side driver post the buffers to the backend driver, the backend driver will immediately use that buffers to compose skb or gro_frags and post them to the assigned host NIC driver as receive buffers. In that case, if the backend driver recieves a packet from the NIC that requires to do copy, it may be unable to find additional free guest buffer because all of them are already used by the NIC driver. We have to reserve some guest buffers for the possible copy even if the buffer address is not identified by original skb :( OK I see what you mean. Can you tell me how does Xiaohui's previous patch-set deal with this problem? Thanks, In current patch, each SKB for the assigned device (SRIOV VF or NIC or a complete queue pairs) uses the buffer from guest, so it eliminates copy completely in software and requires hardware to do so. If we can have an additonal place to store the buffer per skb (may cause copy later on), we can do copy later on or re-post the buffer to assigned NIC driver later on. But that may be not very clean either :( BTW, some hardware may require certain level of packet copy such as for broadcast packets in very old VMDq device, which is not addressed in previous Xiaohui's patch yet. We may address this by implementing an additional virtqueue between guest and host for slow path (broadcast packets only here) with additinal complexity in FE/BE driver. Thx, Eddie guest posts a large number of buffers to the host. Host can use them any way it wants to, and in any order, for example reserve half the buffers for the copy. This might waste some memory if buffers are used only partially, but let's worry about this later. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/10] KVM: MMU: fix writable sync sp mapping
While we sync the unsync sp, we may mapping the spte writable, it's dangerous, if one unsync sp's mapping gfn is another unsync page's gfn. For example: have two unsync pages SP1, SP2 and: SP1.pte[0] = P SP2.gfn's pfn = P [SP1.pte[0] = SP2.gfn's pfn] First, we unsync SP2, it will write protect for SP2.gfn since SP1.pte[0] is mapping to this page, it will mark read only. Then, we unsync SP1, SP1.pte[0] may mark to writable. Now, we will write SP2.gfn by SP1.pte[0] mapping This bug will corrupt guest's page table, fixed by mark read-only mapping if the mapped gfn has shadow page Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 14 -- 1 files changed, 4 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 045a0f9..556a798 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1810,11 +1810,14 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn, bool need_unsync = false; for_each_gfn_indirect_valid_sp(vcpu-kvm, s, gfn, node) { + if (!can_unsync) + return 1; + if (s-role.level != PT_PAGE_TABLE_LEVEL) return 1; if (!need_unsync !s-unsync) { - if (!can_unsync || !oos_shadow) + if (!oos_shadow) return 1; need_unsync = true; } @@ -1877,15 +1880,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (!tdp_enabled !(pte_access ACC_WRITE_MASK)) spte = ~PT_USER_MASK; - /* -* Optimization: for pte sync, if spte was writable the hash -* lookup is unnecessary (and expensive). Write protection -* is responsibility of mmu_get_page / kvm_sync_page. -* Same reasoning can be applied to dirty page accounting. -*/ - if (!can_unsync is_writable_pte(*sptep)) - goto set_pte; - if (mmu_need_write_protect(vcpu, gfn, can_unsync)) { pgprintk(%s: found shadow page for %lx, marking ro\n, __func__, gfn); -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/10] KVM: MMU: fix conflict access permissions in direct sp
In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 - |---| P[W] | | LPA \ PDE2 - |---| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page And, it also cleanup the code that directly get the last level's dirty flag Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/paging_tmpl.h |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 37c26cb..e46eb8a 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -306,6 +306,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, gfn_t table_gfn; int r; int level; + bool dirty = is_dirty_gpte(gw-ptes[gw-level-1]); pt_element_t curr_pte; struct kvm_shadow_walk_iterator iterator; @@ -319,8 +320,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, mmu_set_spte(vcpu, sptep, access, gw-pte_access access, user_fault, write_fault, -is_dirty_gpte(gw-ptes[gw-level-1]), -ptwrite, level, +dirty, ptwrite, level, gw-gfn, pfn, false, true); break; } @@ -335,10 +335,11 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, } if (level = gw-level) { - int delta = level - gw-level + 1; direct = 1; - if (!is_dirty_gpte(gw-ptes[level - delta])) + if (!dirty) access = ~ACC_WRITE_MASK; + access = gw-pte_access; + /* * It is a large guest pages backed by small host pages, * So we set @direct(@sp-role.direct)=1, and set -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/10] KVM: MMU: fix forgot to flush all vcpu's tlb
After remove a rmap, we should flush all vcpu's tlb Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 0412ba4..f151540 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1933,6 +1933,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, pgprintk(hfn old %lx new %lx\n, spte_to_pfn(*sptep), pfn); rmap_remove(vcpu-kvm, sptep); + __set_spte(sptep, shadow_trap_nonpresent_pte); + kvm_flush_remote_tlbs(vcpu-kvm); } else was_rmapped = 1; } -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/10] KVM: MMU: introduce gfn_to_hva_many() function
This function not only return the gfn's hva but also the page number after @gfn in the slot It's used in the later patch Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- include/linux/kvm_host.h |1 + virt/kvm/kvm_main.c | 13 - 2 files changed, 13 insertions(+), 1 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 515fefd..8f7af32 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -289,6 +289,7 @@ void kvm_disable_largepages(void); void kvm_arch_flush_shadow(struct kvm *kvm); struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn); +unsigned long gfn_to_hva_many(struct kvm *kvm, gfn_t gfn, int *entry); unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn); void kvm_release_page_clean(struct page *page); void kvm_release_page_dirty(struct page *page); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 60bb3d5..a007889 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -913,15 +913,26 @@ static unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn) return slot-userspace_addr + (gfn - slot-base_gfn) * PAGE_SIZE; } -unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) +unsigned long gfn_to_hva_many(struct kvm *kvm, gfn_t gfn, int *entry) { struct kvm_memory_slot *slot; slot = gfn_to_memslot(kvm, gfn); + if (!slot || slot-flags KVM_MEMSLOT_INVALID) return bad_hva(); + + if (entry) + *entry = slot-npages - (gfn - slot-base_gfn); + return gfn_to_hva_memslot(slot, gfn); } +EXPORT_SYMBOL_GPL(gfn_to_hva_many); + +unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) +{ + return gfn_to_hva_many(kvm, gfn, NULL); +} EXPORT_SYMBOL_GPL(gfn_to_hva); static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic) -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/10] KVM: MMU: fix direct sp's access corruptted
Consider using small page to fit guest's large page mapping: If the mapping is writable but the dirty flag is not set, we will find the read-only direct sp and setup the mapping, then if the write #PF occur, we will mark this mapping writable in the read-only direct sp, now, other real read-only mapping will happily write it without #PF. It may hurt guest's COW Fixed by re-install the mapping when write #PF occur. Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c |3 ++- arch/x86/kvm/paging_tmpl.h | 18 ++ 2 files changed, 20 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 556a798..0412ba4 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -153,7 +153,8 @@ module_param(oos_shadow, bool, 0644); #define CREATE_TRACE_POINTS #include mmutrace.h -#define SPTE_HOST_WRITEABLE (1ULL PT_FIRST_AVAIL_BITS_SHIFT) +#define SPTE_HOST_WRITEABLE(1ULL PT_FIRST_AVAIL_BITS_SHIFT) +#define SPTE_NO_DIRTY (2ULL PT_FIRST_AVAIL_BITS_SHIFT) #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index e46eb8a..fdba751 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -325,6 +325,20 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, break; } + if (*sptep SPTE_NO_DIRTY) { + struct kvm_mmu_page *child; + + WARN_ON(level != gw-level); + WARN_ON(!is_shadow_present_pte(*sptep)); + if (dirty) { + child = page_header(*sptep + PT64_BASE_ADDR_MASK); + mmu_page_remove_parent_pte(child, sptep); + __set_spte(sptep, shadow_trap_nonpresent_pte); + kvm_flush_remote_tlbs(vcpu-kvm); + } + } + if (is_shadow_present_pte(*sptep) !is_large_pte(*sptep)) continue; @@ -365,6 +379,10 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, } } + if (level == gw-level !dirty + access gw-pte_access ACC_WRITE_MASK) + spte |= SPTE_NO_DIRTY; + spte = __pa(sp-spt) | PT_PRESENT_MASK | PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK; -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/10] KVM: MMU: introduce gfn_to_pfn_atomic() function
Introduce gfn_to_pfn_atomic(), it's the fast path and can used in atomic context, the later patch will use it Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/mm/gup.c|2 ++ include/linux/kvm_host.h |1 + virt/kvm/kvm_main.c | 32 +--- 3 files changed, 28 insertions(+), 7 deletions(-) diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c index 738e659..0c9034b 100644 --- a/arch/x86/mm/gup.c +++ b/arch/x86/mm/gup.c @@ -6,6 +6,7 @@ */ #include linux/sched.h #include linux/mm.h +#include linux/module.h #include linux/vmstat.h #include linux/highmem.h @@ -274,6 +275,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, return nr; } +EXPORT_SYMBOL_GPL(__get_user_pages_fast); /** * get_user_pages_fast() - pin user pages in memory diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9289d1a..515fefd 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -295,6 +295,7 @@ void kvm_release_page_dirty(struct page *page); void kvm_set_page_dirty(struct page *page); void kvm_set_page_accessed(struct page *page); +pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn); pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn); pfn_t gfn_to_pfn_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 885d3f5..60bb3d5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -924,19 +924,25 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn) } EXPORT_SYMBOL_GPL(gfn_to_hva); -static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr) +static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic) { struct page *page[1]; int npages; pfn_t pfn; - might_sleep(); - - npages = get_user_pages_fast(addr, 1, 1, page); + if (atomic) + npages = __get_user_pages_fast(addr, 1, 1, page); + else { + might_sleep(); + npages = get_user_pages_fast(addr, 1, 1, page); + } if (unlikely(npages != 1)) { struct vm_area_struct *vma; + if (atomic) + goto return_bad_page; + down_read(current-mm-mmap_sem); if (is_hwpoison_address(addr)) { up_read(current-mm-mmap_sem); @@ -949,6 +955,7 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr) if (vma == NULL || addr vma-vm_start || !(vma-vm_flags VM_PFNMAP)) { up_read(current-mm-mmap_sem); +return_bad_page: get_page(bad_page); return page_to_pfn(bad_page); } @@ -962,7 +969,7 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr) return pfn; } -pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) +pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic) { unsigned long addr; @@ -972,7 +979,18 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) return page_to_pfn(bad_page); } - return hva_to_pfn(kvm, addr); + return hva_to_pfn(kvm, addr, atomic); +} + +pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn) +{ + return __gfn_to_pfn(kvm, gfn, true); +} +EXPORT_SYMBOL_GPL(gfn_to_pfn_atomic); + +pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn) +{ + return __gfn_to_pfn(kvm, gfn, false); } EXPORT_SYMBOL_GPL(gfn_to_pfn); @@ -980,7 +998,7 @@ pfn_t gfn_to_pfn_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn) { unsigned long addr = gfn_to_hva_memslot(slot, gfn); - return hva_to_pfn(kvm, addr); + return hva_to_pfn(kvm, addr, false); } struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 7/10] KVM: MMU: introduce mmu_topup_memory_cache_atomic()
Introduce mmu_topup_memory_cache_atomic(), it support topup memory cache in atomic context Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 29 + 1 files changed, 25 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f151540..6c0 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -291,15 +291,16 @@ static void __set_spte(u64 *sptep, u64 spte) #endif } -static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, - struct kmem_cache *base_cache, int min) +static int __mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, + struct kmem_cache *base_cache, int min, + int max, gfp_t flags) { void *obj; if (cache-nobjs = min) return 0; - while (cache-nobjs ARRAY_SIZE(cache-objects)) { - obj = kmem_cache_zalloc(base_cache, GFP_KERNEL); + while (cache-nobjs max) { + obj = kmem_cache_zalloc(base_cache, flags); if (!obj) return -ENOMEM; cache-objects[cache-nobjs++] = obj; @@ -307,6 +308,26 @@ static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, return 0; } +static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, + struct kmem_cache *base_cache, int min) +{ + return __mmu_topup_memory_cache(cache, base_cache, min, + ARRAY_SIZE(cache-objects), GFP_KERNEL); +} + +static int mmu_topup_memory_cache_atomic(struct kvm_mmu_memory_cache *cache, + struct kmem_cache *base_cache, int min) +{ + return __mmu_topup_memory_cache(cache, base_cache, min, min, + GFP_ATOMIC); +} + +static int pte_prefetch_topup_memory_cache(struct kvm_vcpu *vcpu, int num) +{ + return mmu_topup_memory_cache_atomic(vcpu-arch.mmu_rmap_desc_cache, +rmap_desc_cache, num); +} + static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc, struct kmem_cache *cache) { -- 1.6.1.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 8/10] KVM: MMU: prefetch ptes when intercepted guest #PF
Support prefetch ptes when intercept guest #PF, avoid to #PF by later access If we meet any failure in the prefetch path, we will exit it and not try other ptes to avoid become heavy path Note: this speculative will mark page become dirty but it not really accessed, the same issue is in other speculative paths like invlpg, pte write, fortunately, it just affect host memory management. After Avi's patchset named [PATCH v2 1/4] KVM: MMU: Introduce drop_spte() merged, we will easily fix it. Will do it in the future. Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 69 + arch/x86/kvm/paging_tmpl.h | 74 2 files changed, 143 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 6c0..b2ad723 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -89,6 +89,8 @@ module_param(oos_shadow, bool, 0644); } #endif +#define PTE_PREFETCH_NUM 16 + #define PT_FIRST_AVAIL_BITS_SHIFT 9 #define PT64_SECOND_AVAIL_BITS_SHIFT 52 @@ -1998,6 +2000,72 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu) { } +static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, + struct kvm_mmu_page *sp, + u64 *start, u64 *end) +{ + gfn_t gfn; + struct page *pages[PTE_PREFETCH_NUM]; + + if (pte_prefetch_topup_memory_cache(vcpu, end - start)) + return -1; + + gfn = sp-gfn + start - sp-spt; + while (start end) { + unsigned long addr; + int entry, j, ret; + + addr = gfn_to_hva_many(vcpu-kvm, gfn, entry); + if (kvm_is_error_hva(addr)) + return -1; + + entry = min(entry, (int)(end - start)); + ret = __get_user_pages_fast(addr, entry, 1, pages); + if (ret = 0) + return -1; + + for (j = 0; j ret; j++, gfn++, start++) + mmu_set_spte(vcpu, start, ACC_ALL, +sp-role.access, 0, 0, 1, NULL, +sp-role.level, gfn, +page_to_pfn(pages[j]), true, false); + + if (ret entry) + return -1; + } + return 0; +} + +static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) +{ + struct kvm_mmu_page *sp; + u64 *start = NULL; + int index, i, max; + + sp = page_header(__pa(sptep)); + WARN_ON(!sp-role.direct); + + if (sp-role.level PT_PAGE_TABLE_LEVEL) + return; + + index = sptep - sp-spt; + i = index ~(PTE_PREFETCH_NUM - 1); + max = index | (PTE_PREFETCH_NUM - 1); + + for (; i max; i++) { + u64 *spte = sp-spt + i; + + if (*spte != shadow_trap_nonpresent_pte || spte == sptep) { + if (!start) + continue; + if (direct_pte_prefetch_many(vcpu, sp, start, spte) 0) + break; + start = NULL; + } else if (!start) + start = spte; + } +} + static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, int level, gfn_t gfn, pfn_t pfn) { @@ -2012,6 +2080,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, 0, write, 1, pt_write, level, gfn, pfn, false, true); ++vcpu-stat.pf_fixed; + direct_pte_prefetch(vcpu, iterator.sptep); break; } diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index fdba751..134f031 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -291,6 +291,79 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, gpte_to_gfn(gpte), pfn, true, true); } +static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, u64 *sptep) +{ + struct kvm_mmu_page *sp; + pt_element_t gptep[PTE_PREFETCH_NUM]; + gpa_t first_pte_gpa; + int offset = 0, index, i, j, max; + + sp = page_header(__pa(sptep)); + index = sptep - sp-spt; + + if (sp-role.level PT_PAGE_TABLE_LEVEL) + return; + + if (sp-role.direct) + return direct_pte_prefetch(vcpu, sptep); + + index = sptep - sp-spt; + i = index ~(PTE_PREFETCH_NUM - 1); + max = index | (PTE_PREFETCH_NUM - 1); + + if (PTTYPE == 32) + offset = sp-role.quadrant PT64_LEVEL_BITS; + + first_pte_gpa = gfn_to_gpa(sp-gfn) + + (offset + i) * sizeof(pt_element_t); + + if
[PATCH v2 10/10] KVM: MMU: trace pte prefetch
Trace pte prefetch, it can help us to improve the prefetch's performance Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/mmu.c | 42 +- arch/x86/kvm/mmutrace.h| 33 + arch/x86/kvm/paging_tmpl.h | 29 ++--- 3 files changed, 88 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index b2ad723..bcf4626 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -91,6 +91,12 @@ module_param(oos_shadow, bool, 0644); #define PTE_PREFETCH_NUM 16 +#define PREFETCH_SUCCESS 0 +#define PREFETCH_ERR_GFN2PFN 1 +#define PREFETCH_ERR_ALLOC_MEM 2 +#define PREFETCH_ERR_RSVD_BITS_SET 3 +#define PREFETCH_ERR_MMIO 4 + #define PT_FIRST_AVAIL_BITS_SHIFT 9 #define PT64_SECOND_AVAIL_BITS_SHIFT 52 @@ -2002,13 +2008,16 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu) static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, - u64 *start, u64 *end) + u64 *start, u64 *end, u64 address) { gfn_t gfn; struct page *pages[PTE_PREFETCH_NUM]; - if (pte_prefetch_topup_memory_cache(vcpu, end - start)) + if (pte_prefetch_topup_memory_cache(vcpu, end - start)) { + trace_pte_prefetch(true, address, 0, + PREFETCH_ERR_ALLOC_MEM); return -1; + } gfn = sp-gfn + start - sp-spt; while (start end) { @@ -2016,27 +2025,40 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, int entry, j, ret; addr = gfn_to_hva_many(vcpu-kvm, gfn, entry); - if (kvm_is_error_hva(addr)) + if (kvm_is_error_hva(addr)) { + trace_pte_prefetch(true, address, 0, + PREFETCH_ERR_MMIO); return -1; + } entry = min(entry, (int)(end - start)); ret = __get_user_pages_fast(addr, entry, 1, pages); - if (ret = 0) + if (ret = 0) { + trace_pte_prefetch(true, address, 0, + PREFETCH_ERR_GFN2PFN); return -1; + } - for (j = 0; j ret; j++, gfn++, start++) + for (j = 0; j ret; j++, gfn++, start++) { + trace_pte_prefetch(true, address, 0, + PREFETCH_SUCCESS); mmu_set_spte(vcpu, start, ACC_ALL, sp-role.access, 0, 0, 1, NULL, sp-role.level, gfn, page_to_pfn(pages[j]), true, false); + } - if (ret entry) + if (ret entry) { + trace_pte_prefetch(true, address, 0, + PREFETCH_ERR_GFN2PFN); return -1; + } } return 0; } -static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) +static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep, + u64 addr) { struct kvm_mmu_page *sp; u64 *start = NULL; @@ -2058,7 +2080,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) if (*spte != shadow_trap_nonpresent_pte || spte == sptep) { if (!start) continue; - if (direct_pte_prefetch_many(vcpu, sp, start, spte) 0) + if (direct_pte_prefetch_many(vcpu, sp, start, + spte, addr) 0) break; start = NULL; } else if (!start) @@ -2080,7 +2103,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, 0, write, 1, pt_write, level, gfn, pfn, false, true); ++vcpu-stat.pf_fixed; - direct_pte_prefetch(vcpu, iterator.sptep); + direct_pte_prefetch(vcpu, iterator.sptep, + gfn PAGE_SHIFT); break; } diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h index 3aab0f0..c07b6a6 100644 --- a/arch/x86/kvm/mmutrace.h +++ b/arch/x86/kvm/mmutrace.h @@ -195,6 +195,39 @@ DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_prepare_zap_page, TP_ARGS(sp) ); + +#define pte_prefetch_err \ + {PREFETCH_SUCCESS, SUCCESS
[ANNOUNCE] kvm-kmod-2.6.35-rc3
No pending KVM patches for upcoming 2.6.35, so let's give it a try in form of a release candidate. Major KVM changes since kvm-kmod-2.6.34: - lots of x86 emulator fixes and improvements - timekeeping (kvm-clock) improvements - SVM: nesting correctness and performance improvements - tons of clean-ups and smaller fixes kvm-kmod changes: - expand relative kernel paths You can download this version from http://downloads.sourceforge.net/project/kvm/kvm-kmod/2.6.35-rc3/kvm-kmod-2.6.35-rc3.tar.bz2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] KVM: VMX: Execute WBINVD to keep data consistency with assigned devices
Some guest device driver may leverage the Non-Snoop I/O, and explicitly WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or CLFLUSH, we need to maintain data consistency either by: 1: flushing cache (wbinvd) when the guest is scheduled out if there is no wbinvd exit, or 2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits. Signed-off-by: Yaozu (Eddie) Dong eddie.d...@intel.com Signed-off-by: Sheng Yang sh...@linux.intel.com --- Jan- I've check if we can make it more generic. But the logic here heavily depends on if processor have WBINVD exit feature, and the common part with SVM is no more than 10 lines, all in the branch of if statement. So I think it's fine to keep them there. Maybe wbinvd_ipi() can be moved, but it's somehow strange for KVM scope. Any suggestion to make this wrap function more clean? I hope we have an marco can do that... arch/x86/include/asm/kvm_host.h |3 ++ arch/x86/kvm/emulate.c |5 +++- arch/x86/kvm/svm.c |6 + arch/x86/kvm/vmx.c | 45 ++- arch/x86/kvm/x86.c |6 + 5 files changed, 63 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a57cdea..1c392c9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -514,6 +514,8 @@ struct kvm_x86_ops { void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry); + void (*execute_wbinvd)(struct kvm_vcpu *vcpu); + const struct trace_print_flags *exit_reasons_str; }; @@ -571,6 +573,7 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); int kvm_emulate_halt(struct kvm_vcpu *vcpu); int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address); int emulate_clts(struct kvm_vcpu *vcpu); +int emulate_wbinvd(struct kvm_vcpu *vcpu); void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index abb8cec..085dcb7 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3138,8 +3138,11 @@ twobyte_insn: emulate_clts(ctxt-vcpu); c-dst.type = OP_NONE; break; - case 0x08: /* invd */ case 0x09: /* wbinvd */ + emulate_wbinvd(ctxt-vcpu); + c-dst.type = OP_NONE; + break; + case 0x08: /* invd */ case 0x0d: /* GrpP (prefetch) */ case 0x18: /* Grp16 (prefetch/nop) */ c-dst.type = OP_NONE; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 587b99d..6929da1 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3424,6 +3424,10 @@ static bool svm_rdtscp_supported(void) return false; } +static void svm_execute_wbinvd(struct kvm_vcpu *vcpu) +{ +} + static void svm_fpu_deactivate(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -3508,6 +3512,8 @@ static struct kvm_x86_ops svm_x86_ops = { .rdtscp_supported = svm_rdtscp_supported, .set_supported_cpuid = svm_set_supported_cpuid, + + .execute_wbinvd = svm_execute_wbinvd, }; static int __init svm_init(void) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e565689..fd6c7e6 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -29,6 +29,7 @@ #include linux/ftrace_event.h #include linux/slab.h #include linux/tboot.h +#include linux/cpumask.h #include kvm_cache_regs.h #include x86.h @@ -170,6 +171,8 @@ struct vcpu_vmx { u32 exit_reason; bool rdtscp_enabled; + + cpumask_t wbinvd_dirty_mask; }; static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) @@ -412,6 +415,12 @@ static inline bool cpu_has_virtual_nmis(void) return vmcs_config.pin_based_exec_ctrl PIN_BASED_VIRTUAL_NMIS; } +static inline bool cpu_has_vmx_wbinvd_exit(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl + SECONDARY_EXEC_WBINVD_EXITING; +} + static inline bool report_flexpriority(void) { return flexpriority_enabled; @@ -874,6 +883,11 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx) preempt_enable(); } +static void wbinvd_ipi(void *opaque) +{ + wbinvd(); +} + /* * Switches to specified vcpu, until a matching vcpu_put(), but assumes * vcpu mutex is already taken. @@ -905,6 +919,15 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) per_cpu(vcpus_on_cpu, cpu)); local_irq_enable(); + /* Address WBINVD may be executed by guest */ + if (vcpu-kvm-arch.iommu_domain) { + if (cpu_has_vmx_wbinvd_exit()) + cpu_set(cpu, vmx-wbinvd_dirty_mask); + else if (vcpu-cpu
[ kvm-Bugs-2001121 ] Windows 2003 x64 - SESSION5_INITIALIZATION_FAILED
Bugs item #2001121, was opened at 2008-06-23 21:09 Message generated for change (Comment added) made by jessorensen You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2001121group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: intel Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: Andreas 'ac0v' Specht (ac0v) Assigned to: Nobody/Anonymous (nobody) Summary: Windows 2003 x64 - SESSION5_INITIALIZATION_FAILED Initial Comment: Host Machine: CPU:2x Intel(R) Xeon(R) CPU E5405 @ 2.00GHz Kernel: Linux version 2.6.25-gentoo-r4 Arch: x86_64 KVM:tried kvm-69 and kvm-70 Guest System: tried Windows 2003 x64 and Windows 2003 x64 with slipstreamed Service Pack 2 Hi, I get a BSoD (see attachment) while installing Windows 2003 x64 which contains the error message SESSION5_INITIALIZATION_FAILED Serial log is empty. I start my KVM via this command: kvm -hda /dev/lvg1/sap-test -boot d -cdrom /srv/install/iso/windows/2003-server-x64.iso -vnc :4 -m 3048 -smp 4 -daemonize Using -no-kvm or the -no-kvm-pit switch doesn't help and shows only the message Setup is starting Windows. The -no-kvm-irqchip switch has no effect (same BSoD). Any Ideas? Regards, Andreas 'ac0v' Specht -- Comment By: Jes Sorensen (jessorensen) Date: 2010-06-25 15:32 Message: Windows 2003 x64 r2 installs and boots just fine with a 2.6.32 kernel and qemu-kvm based on 0.12.1, smp 4, 3072MB. The problem seems to be have been resolved in some of the emulator updates that went in since you tried. If you do see this problem again, please open a new bug in launchpad. Closing. Jes -- Comment By: MaSc82 (masc82) Date: 2009-01-09 16:02 Message: The issue persists with kvm-82 modules. Neither win2003 x64 r2 CD nor installed system will boot, failing with BSOD SESSION5_INITIALIZATION_FAILED. Had to revert to older 2.6.28 modules having block virtio disabled again :( -- Comment By: MaSc82 (masc82) Date: 2008-12-25 17:35 Message: Updated to 2.6.28 including kvm modules, which seem to work very well with kvm81, at the same time supporting win2003 x64, so all mentioned issues are resolved for me, but only when using the kvm modules of linux kernel 2.6.28. -- Comment By: MaSc82 (masc82) Date: 2008-12-22 16:58 Message: I've got the same issue with kvm-81 and Linux version 2.6.27-gentoo-r7. The problem does not occur when using the kvm modules coming with the kernel, but these (probably older?) modules still have bugs with smp and block device virtio (temporary freeze).. Can anyone shed some light on this, please? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2001121group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: VMX: Execute WBINVD to keep data consistency with assigned devices
Sheng Yang wrote: Some guest device driver may leverage the Non-Snoop I/O, and explicitly WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or CLFLUSH, we need to maintain data consistency either by: 1: flushing cache (wbinvd) when the guest is scheduled out if there is no wbinvd exit, or 2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits. Signed-off-by: Yaozu (Eddie) Dong eddie.d...@intel.com Signed-off-by: Sheng Yang sh...@linux.intel.com --- Jan- I've check if we can make it more generic. But the logic here heavily depends on if processor have WBINVD exit feature, and the common part with SVM is no more than 10 lines, all in the branch of if statement. AFAIK, all AMD processors with SVM support have wbinvd trapping. So you can simply move the VMX part which deals with cpu_has_vmx_wbinvd_exit into generic services to call them from SVM as well. Or is wbinvd emulation for device pass-through an Intel-only issue? Joerg? So I think it's fine to keep them there. Maybe wbinvd_ipi() can be moved, but it's somehow strange for KVM scope. Any suggestion to make this wrap function more clean? I hope we have an marco can do that... arch/x86/include/asm/kvm_host.h |3 ++ arch/x86/kvm/emulate.c |5 +++- arch/x86/kvm/svm.c |6 + arch/x86/kvm/vmx.c | 45 ++- arch/x86/kvm/x86.c |6 + 5 files changed, 63 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a57cdea..1c392c9 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -514,6 +514,8 @@ struct kvm_x86_ops { void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry); + void (*execute_wbinvd)(struct kvm_vcpu *vcpu); + const struct trace_print_flags *exit_reasons_str; }; @@ -571,6 +573,7 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); int kvm_emulate_halt(struct kvm_vcpu *vcpu); int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address); int emulate_clts(struct kvm_vcpu *vcpu); +int emulate_wbinvd(struct kvm_vcpu *vcpu); void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index abb8cec..085dcb7 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3138,8 +3138,11 @@ twobyte_insn: emulate_clts(ctxt-vcpu); c-dst.type = OP_NONE; break; - case 0x08: /* invd */ case 0x09: /* wbinvd */ + emulate_wbinvd(ctxt-vcpu); + c-dst.type = OP_NONE; + break; + case 0x08: /* invd */ case 0x0d: /* GrpP (prefetch) */ case 0x18: /* Grp16 (prefetch/nop) */ c-dst.type = OP_NONE; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 587b99d..6929da1 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3424,6 +3424,10 @@ static bool svm_rdtscp_supported(void) return false; } +static void svm_execute_wbinvd(struct kvm_vcpu *vcpu) +{ +} + static void svm_fpu_deactivate(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -3508,6 +3512,8 @@ static struct kvm_x86_ops svm_x86_ops = { .rdtscp_supported = svm_rdtscp_supported, .set_supported_cpuid = svm_set_supported_cpuid, + + .execute_wbinvd = svm_execute_wbinvd, }; static int __init svm_init(void) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e565689..fd6c7e6 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -29,6 +29,7 @@ #include linux/ftrace_event.h #include linux/slab.h #include linux/tboot.h +#include linux/cpumask.h #include kvm_cache_regs.h #include x86.h @@ -170,6 +171,8 @@ struct vcpu_vmx { u32 exit_reason; bool rdtscp_enabled; + + cpumask_t wbinvd_dirty_mask; }; static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) @@ -412,6 +415,12 @@ static inline bool cpu_has_virtual_nmis(void) return vmcs_config.pin_based_exec_ctrl PIN_BASED_VIRTUAL_NMIS; } +static inline bool cpu_has_vmx_wbinvd_exit(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl + SECONDARY_EXEC_WBINVD_EXITING; +} + static inline bool report_flexpriority(void) { return flexpriority_enabled; @@ -874,6 +883,11 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx) preempt_enable(); } +static void wbinvd_ipi(void *opaque) +{ + wbinvd(); +} + /* * Switches to specified vcpu, until a matching vcpu_put(), but assumes * vcpu mutex is already taken. @@ -905,6 +919,15 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu,
[ kvm-Bugs-1949429 ] Windows XP 2003 - 64-bit Editions may FAIL during setup
Bugs item #1949429, was opened at 2008-04-23 09:40 Message generated for change (Comment added) made by jessorensen You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1949429group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: Windows XP 2003 - 64-bit Editions may FAIL during setup Initial Comment: Windows XP 2003 - 64-bit Editions may FAIL during setup. Guest OS stucks during the second stage setup (graphical stage), and proceed nowhere. I must kill VM manually and restart setup from scratch. Reproducible: Sometimes. It applies to all KVM-60 series (from KVM-60 up to KVM-67) on Intel. Other KVM versions below and above may be affected as well. I do not have any debug, because it is hard to reproduce. -Alexey Technologov, 23.04.2008. -- Comment By: Jes Sorensen (jessorensen) Date: 2010-06-25 15:34 Message: Hi, Are you still seeing this, or can we close the bug? Just ran a 2003x64 install test here and encountered no problems, but your report states it only happens sometimes? Thanks, Jes -- Comment By: Technologov (technologov) Date: 2008-08-03 10:38 Message: Logged In: YES user_id=1839746 Originator: YES Still happens with KVM-71. -Alexey, 3.8.2008. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1949429group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2034672 ] guest: BUG: soft lockup - CPU#0 stuck for 41s!
Bugs item #2034672, was opened at 2008-08-01 08:22 Message generated for change (Comment added) made by jessorensen You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2034672group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Rafal Wijata (ravpl) Assigned to: Nobody/Anonymous (nobody) Summary: guest: BUG: soft lockup - CPU#0 stuck for 41s! Initial Comment: host: kvm71, 64bit 2.6.25.11-60.fc8, 8Gram, 2*E5420(8cores), 3ware raid10 guest: 64bit 2.6.18-92.1.6.el5, 5Gram, 6cpus, hdd on raw file. I know this bug happens even in non-virtual machines(browsing internet shows that clearly), but inside kvm I'm getting excessive rate of this bug (under load, even few times a hour) An example can be found at end of this message. The record was something over 500 seconds !! Now, I suspect it has something to do with the network or net driver. There's almost always either swapper or network service in the backtrace. But I cannot confirm for surely. BUG: soft lockup - CPU#0 stuck for 41s! [events/0:20] CPU 0: Modules linked in: nfsd exportfs auth_rpcgss ipv6 xfrm_nalgo crypto_api autofs4 nfs lockd fscache nfs_acl sunrpc dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac lp floppy loop ide_cd parport_pc i2c_piix4 serio_raw parport cdrom i2c_core e1000 pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 20, comm: events/0 Not tainted 2.6.18-92.1.6.el5 #1 RIP: 0010:[80011ec7] [80011ec7] __do_softirq+0x53/0xd6 RSP: 0018:80418f60 EFLAGS: 0206 RAX: 0002 RBX: 803b6f80 RCX: 0380 RDX: 81015f9e7fd8 RSI: 0280 RDI: 81015f9d97a0 RBP: 80418ee0 R08: 0001 R09: 810080bf5000 R10: 0046 R11: 0246 R12: 8005dc8e R13: 0002 R14: 80077090 R15: 80418ee0 FS: () GS:8039f000() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 00300c203080 CR3: 00015df0a000 CR4: 06e0 Call Trace: IRQ [8005e2fc] call_softirq+0x1c/0x28 [8006c6e4] do_softirq+0x2c/0x85 [8005dc8e] apic_timer_interrupt+0x66/0x6c EOI [80064af8] _spin_unlock_irqrestore+0x8/0x9 [880fdc61] :e1000:e1000_update_stats+0x5f6/0x5fd [88101ed5] :e1000:e1000_watchdog_task+0x535/0x65a [8004cea9] run_workqueue+0x94/0xe4 [800497be] worker_thread+0x0/0x122 [800498ae] worker_thread+0xf0/0x122 [8008ad76] default_wake_function+0x0/0xe [8003253d] kthread+0xfe/0x132 [8005dfb1] child_rip+0xa/0x11 [8003243f] kthread+0x0/0x132 [8005dfa7] child_rip+0x0/0x11 BUG: soft lockup - CPU#2 stuck for 17s! [swapper:0] CPU 2: Modules linked in: nfsd exportfs auth_rpcgss ipv6 xfrm_nalgo crypto_api autofs4 nfs lockd fscache nfs_acl sunrpc dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac lp floppy loop ide_cd parport_pc i2c_piix4 serio_raw parport cdrom i2c_core e1000 pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 0, comm: swapper Not tainted 2.6.18-92.1.6.el5 #1 RIP: 0010:[8006aed7] [8006aed7] default_idle+0x29/0x50 RSP: 0018:810104e63ef0 EFLAGS: 0246 RAX: RBX: 0002 RCX: RDX: RSI: 0001 RDI: 802e6658 RBP: 810104e1d270 R08: 810104e62000 R09: 003e R10: 810104f64038 R11: R12: 0c51b3f5 R13: 3434e623bb62 R14: 81015f9db7e0 R15: 810104e1d080 FS: () GS:810104e1cec0() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 2b3a3b647230 CR3: 00201000 CR4: 06e0 Call Trace: [80048b1d] cpu_idle+0x95/0xb8 [800767da] start_secondary+0x45a/0x469 -- Comment By: Jes Sorensen (jessorensen) Date: 2010-06-25 15:44 Message: Hi, Looking through old bugs. Do you still see this problem or can we close the bug? I believe a lot of these problems have been fixed in more recent KVM, but if you could let us know that would be great. Thanks, Jes -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2034672group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-1817779 ] KVM crash with Windows XP guest because of ACPI
Bugs item #1817779, was opened at 2007-10-22 13:02 Message generated for change (Settings changed) made by jessorensen You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1817779group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: KVM crash with Windows XP guest because of ACPI Initial Comment: Host: Fedora7, 64-bit, Intel CPU, KVM-48. When I start Windows XP guest, that was installed with ACPI enabled, without ACPI in KVM, KVM crashes. The command is: [alex...@pink-intel ~]$ ./qemu-kvm -hda /isos/disks-vm/alexeye/WindowsXP-Pro.vmdk -m 512 -no-acpi With -no-kvm it stucks, but not crashes. The same crash happens with -no-acpi -no-kvm-irqchip parameters. -Technologov -- Comment By: Jes Sorensen (jessorensen) Date: 2010-06-25 15:55 Message: With recent KVM kernel 2.6.32 and qemu-kvm 0.12.1, an XP guest installed with ACPI no longer takes down qemu-kvm when booted with the -no-acpi flag. As expected Windows refuses to boot and offers safe mode and then bails, since too many system parameters is changed, but qemu-kvm survives it fine. Closing Jes -- Comment By: Jes Sorensen (jessorensen) Date: 2010-06-11 11:12 Message: Hi, Looking through old bugs - please let us know if this still happens with recent QEMU/KVM. If not, lets close this bug. Thanks, Jes -- Comment By: argoo (argoo) Date: 2007-10-26 19:41 Message: Logged In: YES user_id=865799 Originator: NO I recommend following this workaround... http://kvm.qumranet.com/kvmwiki/Windows_ACPI_Workaround -- Comment By: Technologov (technologov) Date: 2007-10-22 15:01 Message: Logged In: YES user_id=1839746 Originator: YES Attached stack with unhandled vm exit. -- Comment By: Technologov (technologov) Date: 2007-10-22 13:18 Message: Logged In: YES user_id=1839746 Originator: YES Attached stack with unhandled vm exit. -- Comment By: Technologov (technologov) Date: 2007-10-22 13:03 Message: Logged In: YES user_id=1839746 Originator: YES File Added: KVM48-VMX64-WindowsXP-no-acpi.txt -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1817779group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] virtio: Support releasing lock during kick
On Fri, Jun 25, 2010 at 01:43:17PM +0300, Michael S. Tsirkin wrote: On Fri, Jun 25, 2010 at 12:39:21PM +0930, Rusty Russell wrote: On Thu, 24 Jun 2010 03:00:30 pm Stefan Hajnoczi wrote: On Wed, Jun 23, 2010 at 11:12 PM, Anthony Liguori anth...@codemonkey.ws wrote: Shouldn't it be possible to just drop the lock before invoking virtqueue_kick() and reacquire it afterwards? There's nothing in that virtqueue_kick() path that the lock is protecting AFAICT. No, that would lead to a race condition because vq-num_added is modified by both virtqueue_add_buf_gfp() and virtqueue_kick(). Without a lock held during virtqueue_kick() another vcpu could add bufs while vq-num_added is used and cleared by virtqueue_kick(): Right, this dovetails with another proposed change (was it Michael?) where we would update the avail idx inside add_buf, rather than waiting until kick. This means a barrier inside add_buf, but that's probably fine. If we do that, then we don't need a lock on virtqueue_kick. Michael, thoughts? Maybe not even that: I think we could just do virtio_wmb() in add, and keep the mb() in kick. What I'm a bit worried about is contention on the cacheline including index and flags: the more we write to that line, the worse it gets. So need to test performance impact of this change: I didn't find time to do this yet, as I am trying to finalize the used index publishing patches. Any takers? Do we see performance improvement after making kick lockless? There was no guest CPU reduction or I/O throughput increase with my patch when running 4 dd iflag=direct bs=4k if=/dev/vdb of=/dev/null processes. However, the lock_stat numbers above show clear improvement of the lock hold/wait times. I was hoping to see guest CPU utilization go down and I/O throughput go up, so there is still investigation to do with my patch in isolation. Although I'd like to try it later, putting my patch on top of your avail idx work is too early because it will be harder to reason about the performance with both patches present at the same time. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] virtio: Support releasing lock during kick
On Fri, Jun 25, 2010 at 04:31:44PM +0100, Stefan Hajnoczi wrote: On Fri, Jun 25, 2010 at 01:43:17PM +0300, Michael S. Tsirkin wrote: On Fri, Jun 25, 2010 at 12:39:21PM +0930, Rusty Russell wrote: On Thu, 24 Jun 2010 03:00:30 pm Stefan Hajnoczi wrote: On Wed, Jun 23, 2010 at 11:12 PM, Anthony Liguori anth...@codemonkey.ws wrote: Shouldn't it be possible to just drop the lock before invoking virtqueue_kick() and reacquire it afterwards? There's nothing in that virtqueue_kick() path that the lock is protecting AFAICT. No, that would lead to a race condition because vq-num_added is modified by both virtqueue_add_buf_gfp() and virtqueue_kick(). Without a lock held during virtqueue_kick() another vcpu could add bufs while vq-num_added is used and cleared by virtqueue_kick(): Right, this dovetails with another proposed change (was it Michael?) where we would update the avail idx inside add_buf, rather than waiting until kick. This means a barrier inside add_buf, but that's probably fine. If we do that, then we don't need a lock on virtqueue_kick. Michael, thoughts? Maybe not even that: I think we could just do virtio_wmb() in add, and keep the mb() in kick. What I'm a bit worried about is contention on the cacheline including index and flags: the more we write to that line, the worse it gets. So need to test performance impact of this change: I didn't find time to do this yet, as I am trying to finalize the used index publishing patches. Any takers? Do we see performance improvement after making kick lockless? There was no guest CPU reduction or I/O throughput increase with my patch when running 4 dd iflag=direct bs=4k if=/dev/vdb of=/dev/null processes. However, the lock_stat numbers above show clear improvement of the lock hold/wait times. I was hoping to see guest CPU utilization go down and I/O throughput go up, so there is still investigation to do with my patch in isolation. Although I'd like to try it later, putting my patch on top of your avail idx work is too early because it will be harder to reason about the performance with both patches present at the same time. Stefan What about host CPU utilization? Also, are you using PARAVIRT_SPINLOCKS? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] virtio: Support releasing lock during kick
On Fri, Jun 25, 2010 at 06:32:20PM +0300, Michael S. Tsirkin wrote: On Fri, Jun 25, 2010 at 04:31:44PM +0100, Stefan Hajnoczi wrote: On Fri, Jun 25, 2010 at 01:43:17PM +0300, Michael S. Tsirkin wrote: On Fri, Jun 25, 2010 at 12:39:21PM +0930, Rusty Russell wrote: On Thu, 24 Jun 2010 03:00:30 pm Stefan Hajnoczi wrote: On Wed, Jun 23, 2010 at 11:12 PM, Anthony Liguori anth...@codemonkey.ws wrote: Shouldn't it be possible to just drop the lock before invoking virtqueue_kick() and reacquire it afterwards? There's nothing in that virtqueue_kick() path that the lock is protecting AFAICT. No, that would lead to a race condition because vq-num_added is modified by both virtqueue_add_buf_gfp() and virtqueue_kick(). Without a lock held during virtqueue_kick() another vcpu could add bufs while vq-num_added is used and cleared by virtqueue_kick(): Right, this dovetails with another proposed change (was it Michael?) where we would update the avail idx inside add_buf, rather than waiting until kick. This means a barrier inside add_buf, but that's probably fine. If we do that, then we don't need a lock on virtqueue_kick. Michael, thoughts? Maybe not even that: I think we could just do virtio_wmb() in add, and keep the mb() in kick. What I'm a bit worried about is contention on the cacheline including index and flags: the more we write to that line, the worse it gets. So need to test performance impact of this change: I didn't find time to do this yet, as I am trying to finalize the used index publishing patches. Any takers? Do we see performance improvement after making kick lockless? There was no guest CPU reduction or I/O throughput increase with my patch when running 4 dd iflag=direct bs=4k if=/dev/vdb of=/dev/null processes. However, the lock_stat numbers above show clear improvement of the lock hold/wait times. I was hoping to see guest CPU utilization go down and I/O throughput go up, so there is still investigation to do with my patch in isolation. Although I'd like to try it later, putting my patch on top of your avail idx work is too early because it will be harder to reason about the performance with both patches present at the same time. Stefan What about host CPU utilization? There is data available for host CPU utilization, I need to dig it up. Also, are you using PARAVIRT_SPINLOCKS? No. I haven't found much documentation on paravirt spinlocks other than the commit that introduced them: commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8 Author: Jeremy Fitzhardinge jer...@goop.org Date: Mon Jul 7 12:07:51 2008 -0700 paravirt: introduce a lock-byte spinlock implementation PARAVIRT_SPINLOCKS is not set in the config I use, probably because of the associated performance issue that causes distros to build without them: commit b4ecc126991b30fe5f9a59dfacda046aeac124b2 Author: Jeremy Fitzhardinge jer...@goop.org Date: Wed May 13 17:16:55 2009 -0700 x86: Fix performance regression caused by paravirt_ops on native kernels I would expect performance results to be smoother with PARAVIRT_SPINLOCKS for the guest kernel. I will add it for future runs, thanks for pointing it out. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2063072 ] compiling problem with tcg_ctx
Bugs item #2063072, was opened at 2008-08-20 23:29 Message generated for change (Comment added) made by jessorensen You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2063072group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Closed Resolution: Works For Me Priority: 5 Private: No Submitted By: Jana Delego (janado) Assigned to: Anthony Liguori (aliguori) Summary: compiling problem with tcg_ctx Initial Comment: When compiling kvm using the --disable-cpu-emulation flag on a 64 bit Intel Ubuntu, the compiler aborts with error undefined reference to tcg_ctx, This problem exists since kvm-70. -- Comment By: Jes Sorensen (jessorensen) Date: 2010-06-25 18:10 Message: upstream qemu-kvm builds and boots fine with --disable-cpu-emulation now. Closing Jes -- Comment By: Avi Kivity (avik) Date: 2008-10-02 16:05 Message: Well, it would be nice to support --disable-cpu-emulation, for example if you're worried about tcg security holes or tcg performance. -- Comment By: Anthony Liguori (aliguori) Date: 2008-09-29 15:56 Message: --disable-cpu-emulation should not be used with x86. It only exists as an ugly hack because ia64 doesn't support TCG. -- Comment By: Shen Okinudo (okinu) Date: 2008-09-29 03:37 Message: This bug persists in kvm-76 -- Comment By: Marshal Newrock (freedombi) Date: 2008-09-02 01:40 Message: Logged In: YES user_id=2201280 Originator: NO This seems to work with kvm-74. The patch allowed compilation, and the guest appears to be running well. -- Comment By: Amit Shah (amitshah) Date: 2008-08-29 11:59 Message: Logged In: YES user_id=201894 Originator: NO I'm not sure if this will make qemu work properly, but it fixes the build (also attached). Can you confirm if this works? commit 244cafe6688940c25c81b31aa223c9e24656806e Author: Amit Shah amit.s...@qumranet.com Date: Fri Aug 29 15:20:14 2008 +0530 KVM: QEMU: Fix userspace build with --disable-cpu-emulation I'm not sure this will work properly, but fixes the build. ppc might need something like this as well Signed-off-by: Amit Shah amit.s...@qumranet.com diff --git a/qemu/target-i386/fake-exec.c b/qemu/target-i386/fake-exec.c index 737286d..552089b 100644 --- a/qemu/target-i386/fake-exec.c +++ b/qemu/target-i386/fake-exec.c @@ -12,6 +12,13 @@ */ #include exec.h #include cpu.h +#include tcg.h + +/* code generation context */ +TCGContext tcg_ctx; + +uint16_t gen_opc_buf[OPC_BUF_SIZE]; +TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE]; int code_copy_enabled = 0; @@ -45,10 +52,6 @@ int cpu_x86_gen_code(CPUState *env, TranslationBlock *tb, int *gen_code_size_ptr return 0; } -void flush_icache_range(unsigned long start, unsigned long stop) -{ -} - void optimize_flags_init(void) { } File Added: 0001-KVM-QEMU-Fix-userspace-build-with-disable-cpu-em.patch -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2063072group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-1949429 ] Windows XP 2003 - 64-bit Editions may FAIL during setup
Bugs item #1949429, was opened at 2008-04-23 10:40 Message generated for change (Comment added) made by technologov You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1949429group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Works For Me Priority: 7 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: Windows XP 2003 - 64-bit Editions may FAIL during setup Initial Comment: Windows XP 2003 - 64-bit Editions may FAIL during setup. Guest OS stucks during the second stage setup (graphical stage), and proceed nowhere. I must kill VM manually and restart setup from scratch. Reproducible: Sometimes. It applies to all KVM-60 series (from KVM-60 up to KVM-67) on Intel. Other KVM versions below and above may be affected as well. I do not have any debug, because it is hard to reproduce. -Alexey Technologov, 23.04.2008. -- Comment By: Technologov (technologov) Date: 2010-06-25 19:27 Message: Nope, I can't reproduce this anymore. Running on RHEL-5.5 (and it's default KVM shipped with the distro). -- Comment By: Jes Sorensen (jessorensen) Date: 2010-06-25 16:34 Message: Hi, Are you still seeing this, or can we close the bug? Just ran a 2003x64 install test here and encountered no problems, but your report states it only happens sometimes? Thanks, Jes -- Comment By: Technologov (technologov) Date: 2008-08-03 11:38 Message: Logged In: YES user_id=1839746 Originator: YES Still happens with KVM-71. -Alexey, 3.8.2008. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1949429group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM in remote server with bridge in a single ethernet interface
Hi, I have only one ethernet port in a remote server. (eth0) I have a public address with x.x.x.164 netmask 255.255.255.240 gw x.x.x.161 and want to use in my guest OS the next available ip address (x.x.x.165 netmask 255.255.255.240 gw x.x.x.161) Is this posible with brctl to achieve this? I did a file called ifcfg-br0 with: DEVICE=br0 TYPE=Bridge BOOTPROTO=none BROADCAST=x.x.x.175 HWADDR=xx:xx:xx:xx:xx:xx IPADDR=x.x.x.164 NETMASK=255.255.255.240 NETWORK=x.x.x.160 ONBOOT=yes GATEWAY=x.x.x.161 then replace ifcfg-eth0 with: DEVICE=eth0 BRIDGE=br0 #BOOTPROTO=none ONBOOT=yes then reboot, after that i use: I was connected to my remote server but problems begin when I assigned the x.x.x.165 ip addres to the guest OS with virt-manager to begin installation. I lost the remote connection. Maybe I miss something to avoid loosing the connection ? i'm still receiving ping from x.x.x.165 but x.x.x.164 is not responding. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-1900228 ] Time on guest slows down sometimes...
Bugs item #1900228, was opened at 2008-02-23 10:26 Message generated for change (Comment added) made by glommer You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1900228group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: amd Group: None Status: Open Resolution: None Priority: 6 Private: No Submitted By: stevie1024 (stevie1024) Assigned to: Nobody/Anonymous (nobody) Summary: Time on guest slows down sometimes... Initial Comment: I run kvm version 60 on an linux 2.6.24 kernel on an AMD 64 processor (see below for details). I installed 2 guest machines, one linux and one windows (XP) and started them both. The clocks on both guests are sometimes slowed down. If e.g. I play some youtube clips on the Windows guest, the clock of the Windows guest starts lagging about 15% (time is about 15% slower than host, or real, time). If I cause some load on the linux guest, e.g. with 'tar jcvf test /usr', the clock of this guest also runs about 5% slower. And there's a message in the client syslog saying 'warning: many lost ticks, Your time source seems to be instable or some driver is hogging interupts'. Running e.g. 'tar jcvf test /usr' on the host doesn't have any influence on the guest clocks. I think this is quite a serious bug, as 'ntp' can't compensate for this instability. If I understand correctly ntp can only compensate for clocks that are too slow or too fast by a reasonably fixed (low) percentage. /proc/cpuinfo: processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 95 model name : AMD Athlon(tm) Processor LE-1600 stepping: 3 cpu MHz : 2204.998 cache size : 1024 KB fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow up rep_good pni cx16 lahf_lm svm extapic cr8_legacy bogomips: 4413.24 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc uname -a: Linux bigbird 2.6.24-1-amd64 #1 SMP Mon Feb 11 13:47:43 UTC 2008 x86_64 GNU/Linux clients were started with: kvm -m 1024 -usb -hda wxp_test.img -net nic,vlan=0 -net user,vlan=0,hostname=wxp-test -boot c -vnc :0 and kvm -m 512 -usb -hda debian_server.img -net nic,vlan=0 -net user,vlan=0,hostname=debian-server -vnc :1 -- Comment By: Glauber de Oliveira Costa (glommer) Date: 2010-06-25 17:01 Message: Specially for windows guests, the option -rt-td-hack may be helpful. For linux guests, if you can run a new enough guest, using a guest with kvmclock can make the clock go in sync . -- Comment By: stevie1024 (stevie1024) Date: 2008-02-23 10:32 Message: Logged In: YES user_id=2017347 Originator: YES I forgot to mention, I think bug 1826080 is related to this, but I think my description is more general. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1900228group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/4] KVM: cleanup: remove kvm_get_dirty_log()
On 23.06.2010, at 08:01, Takuya Yoshikawa wrote: kvm_get_dirty_log() is a helper function for kvm_vm_ioctl_get_dirty_log() which is currently used by ia64 and ppc and the following is what it is doing: - sanity checks - bitmap scan to check if the slot is dirty - copy_to_user() Considering the fact that x86 is not using this anymore and sanity checks must be done before kvm_ia64_sync_dirty_log(), we can say that this is not working for code sharing effectively. So we just remove this. This patch plus 4/4 broke dirty bitmap updating on PPC. I didn't get around to track down why, but I figured you should now. Is there any way to get you a PPC development box? A simple G4 or G5 should be 200$ on ebay by now :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM in remote server with bridge in a single ethernet interface
Hi, I have only one ethernet port in a remote server. (eth0) I have a public address with x.x.x.164 netmask 255.255.255.240 gw x.x.x.161 and want to use in my guest OS the next available ip address (x.x.x.165 netmask 255.255.255.240 gw x.x.x.161) Is this posible with brctl to achieve this? I did a file called ifcfg-br0 with: DEVICE=br0 TYPE=Bridge BOOTPROTO=none BROADCAST=x.x.x.175 HWADDR=xx:xx:xx:xx:xx:xx IPADDR=x.x.x.164 NETMASK=255.255.255.240 NETWORK=x.x.x.160 ONBOOT=yes GATEWAY=x.x.x.161 then replace ifcfg-eth0 with: DEVICE=eth0 BRIDGE=br0 #BOOTPROTO=none ONBOOT=yes then reboot, after that i use: I was connected to my remote server but problems begin when I assigned the x.x.x.165 ip addres to the guest OS with virt-manager to begin installation. I lost the remote connection. Maybe I miss something to avoid loosing the connection ? i'm still receiving ping from x.x.x.165 but x.x.x.164 is not responding. Here is my config: Distro: Centos 5.5 x64 Linux v2.noc.com.mx 2.6.18-194.3.1.el5 #1 SMP Thu May 13 13:08:30 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux kvm-83-164.el5_5.9 kvm-qemu-img-83-164.el5_5.9 kmod-kvm-83-164.el5_5.9 etherboot-zroms-kvm-5.4.4-13.el5.centos eth0 Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx inet addr:x.x.x.164 Bcast:x.x.x.175 Mask:255.255.255.240 inet6 addr: fe80::225:90ff:fe04:7874/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7140395 errors:0 dropped:0 overruns:0 frame:0 TX packets:2491842 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:9632227753 (8.9 GiB) TX bytes:226154906 (215.6 MiB) Memory:fb5e-fb60 loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:109391 errors:0 dropped:0 overruns:0 frame:0 TX packets:109391 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:46719292 (44.5 MiB) TX bytes:46719292 (44.5 MiB) virbr0Link encap:Ethernet HWaddr 00:00:00:00:00:00 inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0 inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:357 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:110402 (107.8 KiB) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S_32 MMU debug compile fixes
Due to previous changes, the Book3S_32 guest MMU code didn't compile properly when enabling debugging. This patch repairs the broken code paths, making it possible to define DEBUG_MMU and friends again. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_32_mmu.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 3292d76..079760b 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -104,7 +104,7 @@ static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvmppc_vcpu_book3s *vcpu_book3 pteg = (vcpu_book3s-sdr1 0x) | hash; dprintk(MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n, - vcpu_book3s-vcpu.arch.pc, eaddr, vcpu_book3s-sdr1, pteg, + kvmppc_get_pc(vcpu_book3s-vcpu), eaddr, vcpu_book3s-sdr1, pteg, sre-vsid); r = gfn_to_hva(vcpu_book3s-vcpu.kvm, pteg PAGE_SHIFT); @@ -269,7 +269,7 @@ no_page_found: dprintk_pte(KVM MMU: No PTE found (sdr1=0x%llx ptegp=0x%lx)\n, to_book3s(vcpu)-sdr1, ptegp); for (i=0; i16; i+=2) { - dprintk_pte( %02d: 0x%x - 0x%x (0x%llx)\n, + dprintk_pte( %02d: 0x%x - 0x%x (0x%x)\n, i, pteg[i], pteg[i+1], ptem); } } -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Faster MMU lookups for Book3s
Book3s suffered from my really bad shadow MMU implementation so far. So I finally got around to implement a combined hash and list mechanism that allows for much faster lookup of mapped pages. To show that it really is faster, I tried to run simple process spawning code inside the guest with and without these patches: [without] debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello /dev/null; done real0m20.235s user0m10.418s sys 0m9.766s [with] debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello /dev/null; done real0m14.659s user0m8.967s sys 0m5.688s So as you can see, performance improved significantly. Alexander Graf (2): KVM: PPC: Add generic hpte management functions KVM: PPC: Make use of hash based Shadow MMU arch/powerpc/include/asm/kvm_book3s.h |7 + arch/powerpc/include/asm/kvm_host.h | 18 ++- arch/powerpc/kvm/Makefile |2 + arch/powerpc/kvm/book3s_32_mmu_host.c | 104 ++--- arch/powerpc/kvm/book3s_64_mmu_host.c | 98 +--- arch/powerpc/kvm/book3s_mmu_hpte.c| 286 + 6 files changed, 327 insertions(+), 188 deletions(-) create mode 100644 arch/powerpc/kvm/book3s_mmu_hpte.c -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Make use of hash based Shadow MMU
We just introduced generic functions to handle shadow pages on PPC. This patch makes the respective backends make use of them, getting rid of a lot of duplicate code along the way. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h |7 ++ arch/powerpc/include/asm/kvm_host.h | 18 +- arch/powerpc/kvm/Makefile |2 + arch/powerpc/kvm/book3s_32_mmu_host.c | 104 +++- arch/powerpc/kvm/book3s_64_mmu_host.c | 98 ++ 5 files changed, 41 insertions(+), 188 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 4e99559..a96e405 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -115,6 +115,13 @@ extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu); extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte); extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr); extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu); + +extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte); +extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu); +extern void kvmppc_mmu_hpte_destroy(struct kvm_vcpu *vcpu); +extern int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu); +extern void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte); + extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data); extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data); extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 0c9ad86..895eb63 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -38,7 +38,13 @@ #define KVM_NR_PAGE_SIZES 1 #define KVM_PAGES_PER_HPAGE(x) (1UL31) -#define HPTEG_CACHE_NUM 1024 +#define HPTEG_CACHE_NUM(1 15) +#define HPTEG_HASH_BITS_PTE13 +#define HPTEG_HASH_BITS_VPTE 13 +#define HPTEG_HASH_BITS_VPTE_LONG 5 +#define HPTEG_HASH_NUM_PTE (1 HPTEG_HASH_BITS_PTE) +#define HPTEG_HASH_NUM_VPTE(1 HPTEG_HASH_BITS_VPTE) +#define HPTEG_HASH_NUM_VPTE_LONG (1 HPTEG_HASH_BITS_VPTE_LONG) struct kvm; struct kvm_run; @@ -151,6 +157,9 @@ struct kvmppc_mmu { }; struct hpte_cache { + struct list_head list_pte; + struct list_head list_vpte; + struct list_head list_vpte_long; u64 host_va; u64 pfn; ulong slot; @@ -282,8 +291,11 @@ struct kvm_vcpu_arch { unsigned long pending_exceptions; #ifdef CONFIG_PPC_BOOK3S - struct hpte_cache hpte_cache[HPTEG_CACHE_NUM]; - int hpte_cache_offset; + struct kmem_cache *hpte_cache; + struct list_head hpte_hash_pte[HPTEG_HASH_NUM_PTE]; + struct list_head hpte_hash_vpte[HPTEG_HASH_NUM_VPTE]; + struct list_head hpte_hash_vpte_long[HPTEG_HASH_NUM_VPTE_LONG]; + int hpte_cache_count; #endif }; diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index ff43606..d45c818 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -45,6 +45,7 @@ kvm-book3s_64-objs := \ book3s.o \ book3s_emulate.o \ book3s_interrupts.o \ + book3s_mmu_hpte.o \ book3s_64_mmu_host.o \ book3s_64_mmu.o \ book3s_32_mmu.o @@ -57,6 +58,7 @@ kvm-book3s_32-objs := \ book3s.o \ book3s_emulate.o \ book3s_interrupts.o \ + book3s_mmu_hpte.o \ book3s_32_mmu_host.o \ book3s_32_mmu.o kvm-objs-$(CONFIG_KVM_BOOK3S_32) := $(kvm-book3s_32-objs) diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c index 904f5ac..0b51ef8 100644 --- a/arch/powerpc/kvm/book3s_32_mmu_host.c +++ b/arch/powerpc/kvm/book3s_32_mmu_host.c @@ -58,105 +58,19 @@ static ulong htab; static u32 htabmask; -static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte) +void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte) { volatile u32 *pteg; - dprintk_mmu(KVM: Flushing SPTE: 0x%llx (0x%llx) - 0x%llx\n, - pte-pte.eaddr, pte-pte.vpage, pte-host_va); - + /* Remove from host HTAB */ pteg = (u32*)pte-slot; - pteg[0] = 0; + + /* And make sure it's gone from the TLB too */ asm volatile (sync); asm volatile (tlbie %0 : : r (pte-pte.eaddr) : memory); asm volatile (sync); asm volatile (tlbsync); - - pte-host_va = 0; - - if (pte-pte.may_write) - kvm_release_pfn_dirty(pte-pfn); - else - kvm_release_pfn_clean(pte-pfn); -} - -void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong
[PATCH] KVM: PPC: Add generic hpte management functions
Currently the shadow paging code keeps an array of entries it knows about. Whenever the guest invalidates an entry, we loop through that entry, trying to invalidate matching parts. While this is a really simple implementation, it is probably the most ineffective one possible. So instead, let's keep an array of lists around that are indexed by a hash. This way each PTE can be added by 4 list_add, removed by 4 list_del invocations and the search only needs to loop through entries that share the same hash. This patch implements said lookup and exports generic functions that both the 32-bit and 64-bit backend can use. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - remove hpte_all list - lookup all using vpte_long lists - decrease size of vpte_long hash - fix missing brackets --- arch/powerpc/kvm/book3s_mmu_hpte.c | 286 1 files changed, 286 insertions(+), 0 deletions(-) create mode 100644 arch/powerpc/kvm/book3s_mmu_hpte.c diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c b/arch/powerpc/kvm/book3s_mmu_hpte.c new file mode 100644 index 000..5826e61 --- /dev/null +++ b/arch/powerpc/kvm/book3s_mmu_hpte.c @@ -0,0 +1,286 @@ +/* + * Copyright (C) 2010 SUSE Linux Products GmbH. All rights reserved. + * + * Authors: + * Alexander Graf ag...@suse.de + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#include linux/kvm_host.h +#include linux/hash.h +#include linux/slab.h + +#include asm/kvm_ppc.h +#include asm/kvm_book3s.h +#include asm/machdep.h +#include asm/mmu_context.h +#include asm/hw_irq.h + +#define PTE_SIZE 12 + +/* #define DEBUG_MMU */ + +#ifdef DEBUG_MMU +#define dprintk_mmu(a, ...) printk(KERN_INFO a, __VA_ARGS__) +#else +#define dprintk_mmu(a, ...) do { } while(0) +#endif + +static inline u64 kvmppc_mmu_hash_pte(u64 eaddr) { + return hash_64(eaddr PTE_SIZE, HPTEG_HASH_BITS_PTE); +} + +static inline u64 kvmppc_mmu_hash_vpte(u64 vpage) { + return hash_64(vpage 0xfULL, HPTEG_HASH_BITS_VPTE); +} + +static inline u64 kvmppc_mmu_hash_vpte_long(u64 vpage) { + return hash_64((vpage 0xff000ULL) 12, + HPTEG_HASH_BITS_VPTE_LONG); +} + +void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte) +{ + u64 index; + + /* Add to ePTE list */ + index = kvmppc_mmu_hash_pte(pte-pte.eaddr); + list_add(pte-list_pte, vcpu-arch.hpte_hash_pte[index]); + + /* Add to vPTE list */ + index = kvmppc_mmu_hash_vpte(pte-pte.vpage); + list_add(pte-list_vpte, vcpu-arch.hpte_hash_vpte[index]); + + /* Add to vPTE_long list */ + index = kvmppc_mmu_hash_vpte_long(pte-pte.vpage); + list_add(pte-list_vpte_long, vcpu-arch.hpte_hash_vpte_long[index]); +} + +static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte) +{ + dprintk_mmu(KVM: Flushing SPT: 0x%lx (0x%llx) - 0x%llx\n, + pte-pte.eaddr, pte-pte.vpage, pte-host_va); + + /* Different for 32 and 64 bit */ + kvmppc_mmu_invalidate_pte(vcpu, pte); + + if (pte-pte.may_write) + kvm_release_pfn_dirty(pte-pfn); + else + kvm_release_pfn_clean(pte-pfn); + + list_del(pte-list_pte); + list_del(pte-list_vpte); + list_del(pte-list_vpte_long); + + kmem_cache_free(vcpu-arch.hpte_cache, pte); +} + +static void kvmppc_mmu_pte_flush_all(struct kvm_vcpu *vcpu) +{ + struct hpte_cache *pte, *tmp; + int i; + + for (i = 0; i HPTEG_HASH_NUM_VPTE_LONG; i++) { + struct list_head *list = vcpu-arch.hpte_hash_vpte_long[i]; + + list_for_each_entry_safe(pte, tmp, list, list_vpte_long) { + /* Jump over the helper entry */ + if (pte-list_vpte_long == list) + continue; + + invalidate_pte(vcpu, pte); + } + } +} + +void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong ea_mask) +{ + u64 i; + + dprintk_mmu(KVM: Flushing %d Shadow PTEs: 0x%lx 0x%lx\n, + vcpu-arch.hpte_cache_count, guest_ea, ea_mask); + + guest_ea = ea_mask; + + switch (ea_mask) { + case ~0xfffUL: + { + struct list_head *list; + struct hpte_cache *pte, *tmp; + +
Re: [PATCH] KVM: PPC: Add generic hpte management functions
On 26.06.2010, at 01:16, Alexander Graf wrote: Currently the shadow paging code keeps an array of entries it knows about. Whenever the guest invalidates an entry, we loop through that entry, trying to invalidate matching parts. While this is a really simple implementation, it is probably the most ineffective one possible. So instead, let's keep an array of lists around that are indexed by a hash. This way each PTE can be added by 4 list_add, removed by 4 list_del invocations and the search only needs to loop through entries that share the same hash. This patch implements said lookup and exports generic functions that both the 32-bit and 64-bit backend can use. Yikes - I forgot -n. This is patch 1/2. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/26] KVM: PPC: Introduce shared page
For transparent variable sharing between the hypervisor and guest, I introduce a shared page. This shared page will contain all the registers the guest can read and write safely without exiting guest context. This patch only implements the stubs required for the basic structure of the shared page. The actual register moving follows. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |2 ++ arch/powerpc/include/asm/kvm_para.h |5 + arch/powerpc/kernel/asm-offsets.c |1 + arch/powerpc/kvm/44x.c |7 +++ arch/powerpc/kvm/book3s.c |7 +++ arch/powerpc/kvm/e500.c |7 +++ 6 files changed, 29 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 895eb63..bca9391 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -25,6 +25,7 @@ #include linux/interrupt.h #include linux/types.h #include linux/kvm_types.h +#include linux/kvm_para.h #include asm/kvm_asm.h #define KVM_MAX_VCPUS 1 @@ -289,6 +290,7 @@ struct kvm_vcpu_arch { struct tasklet_struct tasklet; u64 dec_jiffies; unsigned long pending_exceptions; + struct kvm_vcpu_arch_shared *shared; #ifdef CONFIG_PPC_BOOK3S struct kmem_cache *hpte_cache; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 2d48f6a..1485ba8 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -20,6 +20,11 @@ #ifndef __POWERPC_KVM_PARA_H__ #define __POWERPC_KVM_PARA_H__ +#include linux/types.h + +struct kvm_vcpu_arch_shared { +}; + #ifdef __KERNEL__ static inline int kvm_para_available(void) diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 496cc5b..944f593 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -400,6 +400,7 @@ int main(void) DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6)); DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7)); DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid)); + DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared)); /* book3s */ #ifdef CONFIG_PPC_BOOK3S diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c index 73c0a3f..e7b1f3f 100644 --- a/arch/powerpc/kvm/44x.c +++ b/arch/powerpc/kvm/44x.c @@ -123,8 +123,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto free_vcpu; + vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO); + if (!vcpu-arch.shared) + goto uninit_vcpu; + return vcpu; +uninit_vcpu: + kvm_vcpu_uninit(vcpu); free_vcpu: kmem_cache_free(kvm_vcpu_cache, vcpu_44x); out: @@ -135,6 +141,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu) { struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu); + free_page((unsigned long)vcpu-arch.shared); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, vcpu_44x); } diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 884d4a5..ba79b35 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1247,6 +1247,10 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto free_shadow_vcpu; + vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO); + if (!vcpu-arch.shared) + goto uninit_vcpu; + vcpu-arch.host_retip = kvm_return_point; vcpu-arch.host_msr = mfmsr(); #ifdef CONFIG_PPC_BOOK3S_64 @@ -1277,6 +1281,8 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) return vcpu; +uninit_vcpu: + kvm_vcpu_uninit(vcpu); free_shadow_vcpu: kfree(vcpu_book3s-shadow_vcpu); free_vcpu: @@ -1289,6 +1295,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu) { struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu); + free_page((unsigned long)vcpu-arch.shared); kvm_vcpu_uninit(vcpu); kfree(vcpu_book3s-shadow_vcpu); vfree(vcpu_book3s); diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c index e8a00b0..71750f2 100644 --- a/arch/powerpc/kvm/e500.c +++ b/arch/powerpc/kvm/e500.c @@ -117,8 +117,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto uninit_vcpu; + vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO); + if (!vcpu-arch.shared) + goto uninit_tlb; + return vcpu; +uninit_tlb: + kvmppc_e500_tlb_uninit(vcpu_e500); uninit_vcpu: kvm_vcpu_uninit(vcpu); free_vcpu: @@ -131,6 +137,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu) { struct kvmppc_vcpu_e500 *vcpu_e500 =
[PATCH 09/26] KVM: PPC: Add PV guest scratch registers
While running in hooked code we need to store register contents out because we must not clobber any registers. So let's add some fields to the shared page we can just happily write to. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_para.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index d1fe9ae..edf8f83 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,9 @@ #include linux/types.h struct kvm_vcpu_arch_shared { + __u64 scratch1; + __u64 scratch2; + __u64 scratch3; __u64 critical; /* Guest may not get interrupts if == r1 */ __u64 sprg0; __u64 sprg1; -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/26] KVM: PPC: Tell guest about pending interrupts
When the guest turns on interrupts again, it needs to know if we have an interrupt pending for it. Because if so, it should rather get out of guest context and get the interrupt. So we introduce a new field in the shared page that we use to tell the guest that there's a pending interrupt lying around. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kvm/book3s.c |7 +++ arch/powerpc/kvm/booke.c|7 +++ 3 files changed, 15 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index edf8f83..c7305d7 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -36,6 +36,7 @@ struct kvm_vcpu_arch_shared { __u64 dar; __u64 msr; __u32 dsisr; + __u32 int_pending; /* Tells the guest if we have an interrupt */ }; #define KVM_PVR_PARA 0x4b564d3f /* KVM? */ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index f0e8047..e76c950 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -334,6 +334,7 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) { unsigned long *pending = vcpu-arch.pending_exceptions; + unsigned long old_pending = vcpu-arch.pending_exceptions; unsigned int priority; #ifdef EXIT_DEBUG @@ -353,6 +354,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) BITS_PER_BYTE * sizeof(*pending), priority + 1); } + + /* Tell the guest about our interrupt status */ + if (*pending) + vcpu-arch.shared-int_pending = 1; + else if (old_pending) + vcpu-arch.shared-int_pending = 0; } void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 485f8fa..2229df9 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -221,6 +221,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) { unsigned long *pending = vcpu-arch.pending_exceptions; + unsigned long old_pending = vcpu-arch.pending_exceptions; unsigned int priority; priority = __ffs(*pending); @@ -232,6 +233,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) BITS_PER_BYTE * sizeof(*pending), priority + 1); } + + /* Tell the guest about our interrupt status */ + if (*pending) + vcpu-arch.shared-int_pending = 1; + else if (old_pending) + vcpu-arch.shared-int_pending = 0; } /** -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/26] KVM: PPC: Add PV guest critical sections
When running in hooked code we need a way to disable interrupts without clobbering any interrupts or exiting out to the hypervisor. To achieve this, we have an additional critical field in the shared page. If that field is equal to the r1 register of the guest, it tells the hypervisor that we're in such a critical section and thus may not receive any interrupts. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kvm/book3s.c | 15 +-- arch/powerpc/kvm/booke.c| 12 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index eaab306..d1fe9ae 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,7 @@ #include linux/types.h struct kvm_vcpu_arch_shared { + __u64 critical; /* Guest may not get interrupts if == r1 */ __u64 sprg0; __u64 sprg1; __u64 sprg2; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index e8001c5..f0e8047 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -251,14 +251,25 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) int deliver = 1; int vec = 0; ulong flags = 0ULL; + ulong crit_raw = vcpu-arch.shared-critical; + ulong crit_r1 = kvmppc_get_gpr(vcpu, 1); + bool crit; + + /* Truncate crit indicators in 32 bit mode */ + if (!(vcpu-arch.shared-msr MSR_SF)) { + crit_raw = 0x; + crit_r1 = 0x; + } + + crit = (crit_raw == crit_r1); switch (priority) { case BOOK3S_IRQPRIO_DECREMENTER: - deliver = vcpu-arch.shared-msr MSR_EE; + deliver = (vcpu-arch.shared-msr MSR_EE) !crit; vec = BOOK3S_INTERRUPT_DECREMENTER; break; case BOOK3S_IRQPRIO_EXTERNAL: - deliver = vcpu-arch.shared-msr MSR_EE; + deliver = (vcpu-arch.shared-msr MSR_EE) !crit; vec = BOOK3S_INTERRUPT_EXTERNAL; break; case BOOK3S_IRQPRIO_SYSTEM_RESET: diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index e7d1216..485f8fa 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -147,6 +147,17 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, int allowed = 0; ulong uninitialized_var(msr_mask); bool update_esr = false, update_dear = false; + ulong crit_raw = vcpu-arch.shared-critical; + ulong crit_r1 = kvmppc_get_gpr(vcpu, 1); + bool crit; + + /* Truncate crit indicators in 32 bit mode */ + if (!(vcpu-arch.shared-msr MSR_SF)) { + crit_raw = 0x; + crit_r1 = 0x; + } + + crit = (crit_raw == crit_r1); switch (priority) { case BOOKE_IRQPRIO_DTLB_MISS: @@ -181,6 +192,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, case BOOKE_IRQPRIO_DECREMENTER: case BOOKE_IRQPRIO_FIT: allowed = vcpu-arch.shared-msr MSR_EE; + allowed = allowed !crit; msr_mask = MSR_CE|MSR_ME|MSR_DE; break; case BOOKE_IRQPRIO_DEBUG: -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/26] KVM: PPC: Convert SPRG[0-4] to shared page
When in kernel mode there are 4 additional registers available that are simple data storage. Instead of exiting to the hypervisor to read and write those, we can just share them with the guest using the page. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |4 arch/powerpc/include/asm/kvm_para.h |4 arch/powerpc/kvm/book3s.c | 16 arch/powerpc/kvm/booke.c| 16 arch/powerpc/kvm/emulate.c | 24 5 files changed, 36 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 6bcf62f..83c45ea 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -216,10 +216,6 @@ struct kvm_vcpu_arch { ulong guest_owned_ext; #endif u32 mmucr; - ulong sprg0; - ulong sprg1; - ulong sprg2; - ulong sprg3; ulong sprg4; ulong sprg5; ulong sprg6; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index d7fc6c2..e402999 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,10 @@ #include linux/types.h struct kvm_vcpu_arch_shared { + __u64 sprg0; + __u64 sprg1; + __u64 sprg2; + __u64 sprg3; __u64 srr0; __u64 srr1; __u64 dar; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index b144697..5a6f055 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1062,10 +1062,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs-srr0 = vcpu-arch.shared-srr0; regs-srr1 = vcpu-arch.shared-srr1; regs-pid = vcpu-arch.pid; - regs-sprg0 = vcpu-arch.sprg0; - regs-sprg1 = vcpu-arch.sprg1; - regs-sprg2 = vcpu-arch.sprg2; - regs-sprg3 = vcpu-arch.sprg3; + regs-sprg0 = vcpu-arch.shared-sprg0; + regs-sprg1 = vcpu-arch.shared-sprg1; + regs-sprg2 = vcpu-arch.shared-sprg2; + regs-sprg3 = vcpu-arch.shared-sprg3; regs-sprg5 = vcpu-arch.sprg4; regs-sprg6 = vcpu-arch.sprg5; regs-sprg7 = vcpu-arch.sprg6; @@ -1088,10 +1088,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvmppc_set_msr(vcpu, regs-msr); vcpu-arch.shared-srr0 = regs-srr0; vcpu-arch.shared-srr1 = regs-srr1; - vcpu-arch.sprg0 = regs-sprg0; - vcpu-arch.sprg1 = regs-sprg1; - vcpu-arch.sprg2 = regs-sprg2; - vcpu-arch.sprg3 = regs-sprg3; + vcpu-arch.shared-sprg0 = regs-sprg0; + vcpu-arch.shared-sprg1 = regs-sprg1; + vcpu-arch.shared-sprg2 = regs-sprg2; + vcpu-arch.shared-sprg3 = regs-sprg3; vcpu-arch.sprg5 = regs-sprg4; vcpu-arch.sprg6 = regs-sprg5; vcpu-arch.sprg7 = regs-sprg6; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 8b546fe..984c461 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -495,10 +495,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs-srr0 = vcpu-arch.shared-srr0; regs-srr1 = vcpu-arch.shared-srr1; regs-pid = vcpu-arch.pid; - regs-sprg0 = vcpu-arch.sprg0; - regs-sprg1 = vcpu-arch.sprg1; - regs-sprg2 = vcpu-arch.sprg2; - regs-sprg3 = vcpu-arch.sprg3; + regs-sprg0 = vcpu-arch.shared-sprg0; + regs-sprg1 = vcpu-arch.shared-sprg1; + regs-sprg2 = vcpu-arch.shared-sprg2; + regs-sprg3 = vcpu-arch.shared-sprg3; regs-sprg5 = vcpu-arch.sprg4; regs-sprg6 = vcpu-arch.sprg5; regs-sprg7 = vcpu-arch.sprg6; @@ -521,10 +521,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvmppc_set_msr(vcpu, regs-msr); vcpu-arch.shared-srr0 = regs-srr0; vcpu-arch.shared-srr1 = regs-srr1; - vcpu-arch.sprg0 = regs-sprg0; - vcpu-arch.sprg1 = regs-sprg1; - vcpu-arch.sprg2 = regs-sprg2; - vcpu-arch.sprg3 = regs-sprg3; + vcpu-arch.shared-sprg0 = regs-sprg0; + vcpu-arch.shared-sprg1 = regs-sprg1; + vcpu-arch.shared-sprg2 = regs-sprg2; + vcpu-arch.shared-sprg3 = regs-sprg3; vcpu-arch.sprg5 = regs-sprg4; vcpu-arch.sprg6 = regs-sprg5; vcpu-arch.sprg7 = regs-sprg6; diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index ad0fa4f..454869b 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -263,13 +263,17 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) kvmppc_set_gpr(vcpu, rt, get_tb()); break; case SPRN_SPRG0: - kvmppc_set_gpr(vcpu, rt,
[PATCH 26/26] KVM: PPC: Add Documentation about PV interface
We just introduced a new PV interface that screams for documentation. So here it is - a shiny new and awesome text file describing the internal works of the PPC KVM paravirtual interface. Signed-off-by: Alexander Graf ag...@suse.de --- Documentation/kvm/ppc-pv.txt | 164 ++ 1 files changed, 164 insertions(+), 0 deletions(-) create mode 100644 Documentation/kvm/ppc-pv.txt diff --git a/Documentation/kvm/ppc-pv.txt b/Documentation/kvm/ppc-pv.txt new file mode 100644 index 000..7cbcd51 --- /dev/null +++ b/Documentation/kvm/ppc-pv.txt @@ -0,0 +1,164 @@ +The PPC KVM paravirtual interface += + +The basic execution principle by which KVM on PowerPC works is to run all kernel +space code in PR=1 which is user space. This way we trap all privileged +instructions and can emulate them accordingly. + +Unfortunately that is also the downfall. There are quite some privileged +instructions that needlessly return us to the hypervisor even though they +could be handled differently. + +This is what the PPC PV interface helps with. It takes privileged instructions +and transforms them into unprivileged ones with some help from the hypervisor. +This cuts down virtualization costs by about 50% on some of my benchmarks. + +The code for that interface can be found in arch/powerpc/kernel/kvm* + +Querying for existence +== + +To find out if we're running on KVM or not, we overlay the PVR register. Usually +the PVR register contains an id that identifies your CPU type. If, however, you +pass KVM_PVR_PARA in the register that you want the PVR result in, the register +still contains KVM_PVR_PARA after the mfpvr call. + + LOAD_REG_IMM(r5, KVM_PVR_PARA) + mfpvr r5 + [r5 still contains KVM_PVR_PARA] + +Once determined to run under a PV capable KVM, you can now use hypercalls as +described below. + +PPC hypercalls +== + +The only viable ways to reliably get from guest context to host context are: + + 1) Call an invalid instruction + 2) Call the sc instruction with a parameter to sc + 3) Call the sc instruction with parameters in GPRs + +Method 1 is always a bad idea. Invalid instructions can be replaced later on +by valid instructions, rendering the interface broken. + +Method 2 also has downfalls. If the parameter to sc is != 0 the spec is +rather unclear if the sc is targeted directly for the hypervisor or the +supervisor. It would also require that we read the syscall issuing instruction +every time a syscall is issued, slowing down guest syscalls. + +Method 3 is what KVM uses. We pass magic constants (KVM_SC_MAGIC_R3 and +KVM_SC_MAGIC_R4) in r3 and r4 respectively. If a syscall instruction with these +magic values arrives from the guest's kernel mode, we take the syscall as a +hypercall. + +The parameters are as follows: + + r3 KVM_SC_MAGIC_R3 + r4 KVM_SC_MAGIC_R4 + r5 Hypercall number + r6 First parameter + r7 Second parameter + r8 Third parameter + r9 Fourth parameter + +Hypercall definitions are shared in generic code, so the same hypercall numbers +apply for x86 and powerpc alike. + +The magic page +== + +To enable communication between the hypervisor and guest there is a new shared +page that contains parts of supervisor visible register state. The guest can +map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE. + +With this hypercall issued the guest always gets the magic page mapped at the +desired location in effective and physical address space. For now, we always +map the page to -4096. This way we can access it using absolute load and store +functions. The following instruction reads the first field of the magic page: + + ld rX, -4096(0) + +The interface is designed to be extensible should there be need later to add +additional registers to the magic page. If you add fields to the magic page, +also define a new hypercall feature to indicate that the host can give you more +registers. Only if the host supports the additional features, make use of them. + +The magic page has the following layout as described in +arch/powerpc/include/asm/kvm_para.h: + +struct kvm_vcpu_arch_shared { + __u64 scratch1; + __u64 scratch2; + __u64 scratch3; + __u64 critical; /* Guest may not get interrupts if == r1 */ + __u64 sprg0; + __u64 sprg1; + __u64 sprg2; + __u64 sprg3; + __u64 srr0; + __u64 srr1; + __u64 dar; + __u64 msr; + __u32 dsisr; + __u32 int_pending; /* Tells the guest if we have an interrupt */ +}; + +Additions to the page must only occur at the end. Struct fields are always 32 +bit aligned. + +Patched instructions + + +The ld and std instructions are transormed to lwz and stw instructions
[PATCH 21/26] KVM: PPC: Introduce kvm_tmp framework
We will soon require more sophisticated methods to replace single instructions with multiple instructions. We do that by branching to a memory region where we write replacement code for the instruction to. This region needs to be within 32 MB of the patched instruction though, because that's the furthest we can jump with immediate branches. So we keep 1MB of free space around in bss. After we're done initing we can just tell the mm system that the unused pages are free, but until then we have enough space to fit all our code in. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm.c | 41 +++-- 1 files changed, 39 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index b091f94..7e8fe6f 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -64,6 +64,8 @@ #define KVM_INST_TLBSYNC 0x7c00046c static bool kvm_patching_worked = true; +static char kvm_tmp[1024 * 1024]; +static int kvm_tmp_index; static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt) { @@ -98,6 +100,23 @@ static void kvm_patch_ins_nop(u32 *inst) *inst = KVM_INST_NOP; } +static u32 *kvm_alloc(int len) +{ + u32 *p; + + if ((kvm_tmp_index + len) ARRAY_SIZE(kvm_tmp)) { + printk(KERN_ERR KVM: No more space (%d + %d)\n, + kvm_tmp_index, len); + kvm_patching_worked = false; + return NULL; + } + + p = (void*)kvm_tmp[kvm_tmp_index]; + kvm_tmp_index += len; + + return p; +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -197,12 +216,27 @@ static void kvm_use_magic_page(void) kvm_check_ins(p); } +static void kvm_free_tmp(void) +{ + unsigned long start, end; + + start = (ulong)kvm_tmp[kvm_tmp_index + (PAGE_SIZE - 1)] PAGE_MASK; + end = (ulong)kvm_tmp[ARRAY_SIZE(kvm_tmp)] PAGE_MASK; + + /* Free the tmp space we don't need */ + for (; start end; start += PAGE_SIZE) { + ClearPageReserved(virt_to_page(start)); + init_page_count(virt_to_page(start)); + free_page(start); + totalram_pages++; + } +} + static int __init kvm_guest_init(void) { - char *p; if (!kvm_para_available()) - return 0; + goto free_tmp; if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE)) kvm_use_magic_page(); @@ -210,6 +244,9 @@ static int __init kvm_guest_init(void) printk(KERN_INFO KVM: Live patching for a fast VM %s\n, kvm_patching_worked ? worked : failed); +free_tmp: + kvm_free_tmp(); + return 0; } -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/26] KVM: PPC: Expose magic page support to guest
Now that we have the shared page in place and the MMU code knows about the magic page, we can expose that capability to the guest! Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_para.h |2 ++ arch/powerpc/kvm/powerpc.c | 11 +++ 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index c7305d7..9f8efa4 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -43,6 +43,8 @@ struct kvm_vcpu_arch_shared { #define KVM_SC_MAGIC_R30x4b564d52 /* KVMR */ #define KVM_SC_MAGIC_R40x554c455a /* ULEZ */ +#define KVM_FEATURE_MAGIC_PAGE 1 + #ifdef __KERNEL__ static inline int kvm_para_available(void) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index fe7a1c8..1d28a81 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -60,8 +60,19 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu) } switch (nr) { + case KVM_HC_PPC_MAP_MAGIC_PAGE: + { + vcpu-arch.magic_page_pa = param1; + vcpu-arch.magic_page_ea = param2; + + r = 0; + break; + } case KVM_HC_FEATURES: r = 0; +#if !defined(CONFIG_KVM_440) /* XXX missing bits on 440 */ + r |= (1 KVM_FEATURE_MAGIC_PAGE); +#endif break; default: r = -KVM_ENOSYS; -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/26] KVM: PPC: Magic Page BookE support
As we now have Book3s support for the magic page, we also need BookE to join in on the party. This patch implements generic magic page logic for BookE and specific TLB logic for e500. I didn't have any 440 around, so I didn't dare to blindly try and write up broken code. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/booke.c| 29 + arch/powerpc/kvm/e500_tlb.c | 19 +-- 2 files changed, 46 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 2229df9..7957aa4 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -241,6 +241,31 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) vcpu-arch.shared-int_pending = 0; } +/* Check if a DTLB miss was on the magic page. Returns !0 if so. */ +int kvmppc_dtlb_magic_page(struct kvm_vcpu *vcpu, ulong eaddr) +{ + ulong mp_ea = vcpu-arch.magic_page_ea; + ulong gpaddr = vcpu-arch.magic_page_pa; + int gtlb_index = 11 | (1 16); /* Random number in TLB1 */ + + /* Check for existence of magic page */ + if(likely(!mp_ea)) + return 0; + + /* Check if we're on the magic page */ + if(likely((eaddr 12) != (mp_ea 12))) + return 0; + + /* Don't map in user mode */ + if(vcpu-arch.shared-msr MSR_PR) + return 0; + + kvmppc_mmu_map(vcpu, vcpu-arch.magic_page_ea, gpaddr, gtlb_index); + kvmppc_account_exit(vcpu, DTLB_VIRT_MISS_EXITS); + + return 1; +} + /** * kvmppc_handle_exit * @@ -308,6 +333,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, r = RESUME_HOST; break; case EMULATE_FAIL: + case EMULATE_DO_MMIO: /* XXX Deliver Program interrupt to guest. */ printk(KERN_CRIT %s: emulation at %lx failed (%08x)\n, __func__, vcpu-arch.pc, vcpu-arch.last_inst); @@ -377,6 +403,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, gpa_t gpaddr; gfn_t gfn; + if (kvmppc_dtlb_magic_page(vcpu, eaddr)) + break; + /* Check the guest TLB. */ gtlb_index = kvmppc_mmu_dtlb_index(vcpu, eaddr); if (gtlb_index 0) { diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c index 66845a5..f5582ca 100644 --- a/arch/powerpc/kvm/e500_tlb.c +++ b/arch/powerpc/kvm/e500_tlb.c @@ -295,9 +295,22 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, struct page *new_page; struct tlbe *stlbe; hpa_t hpaddr; + u32 mas2 = gtlbe-mas2; + u32 mas3 = gtlbe-mas3; stlbe = vcpu_e500-shadow_tlb[tlbsel][esel]; + if ((vcpu_e500-vcpu.arch.magic_page_ea) + ((vcpu_e500-vcpu.arch.magic_page_pa PAGE_SHIFT) == gfn) + !(vcpu_e500-vcpu.arch.shared-msr MSR_PR)) { + mas2 = 0; + mas3 = E500_TLB_SUPER_PERM_MASK; + hpaddr = virt_to_phys(vcpu_e500-vcpu.arch.shared); + new_page = pfn_to_page(hpaddr PAGE_SHIFT); + get_page(new_page); + goto mapped; + } + /* Get reference to new page. */ new_page = gfn_to_page(vcpu_e500-vcpu.kvm, gfn); if (is_error_page(new_page)) { @@ -305,6 +318,8 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, kvm_release_page_clean(new_page); return; } + +mapped: hpaddr = page_to_phys(new_page); /* Drop reference to old page. */ @@ -316,10 +331,10 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, stlbe-mas1 = MAS1_TSIZE(BOOK3E_PAGESZ_4K) | MAS1_TID(get_tlb_tid(gtlbe)) | MAS1_TS | MAS1_VALID; stlbe-mas2 = (gvaddr MAS2_EPN) - | e500_shadow_mas2_attrib(gtlbe-mas2, + | e500_shadow_mas2_attrib(mas2, vcpu_e500-vcpu.arch.shared-msr MSR_PR); stlbe-mas3 = (hpaddr MAS3_RPN) - | e500_shadow_mas3_attrib(gtlbe-mas3, + | e500_shadow_mas3_attrib(mas3, vcpu_e500-vcpu.arch.shared-msr MSR_PR); stlbe-mas7 = (hpaddr 32) MAS7_RPN; -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 25/26] KVM: PPC: PV wrteei
On BookE the preferred way to write the EE bit is the wrteei instruction. It already encodes the EE bit in the instruction. So in order to get BookE some speedups as well, let's also PV'nize thati instruction. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm.c | 50 arch/powerpc/kernel/kvm_emul.S | 41 2 files changed, 91 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 3557bc8..85e2163 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -66,6 +66,9 @@ #define KVM_INST_MTMSRD_L1 0x7c010164 #define KVM_INST_MTMSR 0x7c000124 +#define KVM_INST_WRTEEI_0 0x7c000146 +#define KVM_INST_WRTEEI_1 0x7c008146 + static bool kvm_patching_worked = true; static char kvm_tmp[1024 * 1024]; static int kvm_tmp_index; @@ -200,6 +203,47 @@ static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt) *inst = KVM_INST_B | (distance_start KVM_INST_B_MASK); } +#ifdef CONFIG_BOOKE + +extern u32 kvm_emulate_wrteei_branch_offs; +extern u32 kvm_emulate_wrteei_ee_offs; +extern u32 kvm_emulate_wrteei_len; +extern u32 kvm_emulate_wrteei[]; + +static void kvm_patch_ins_wrteei(u32 *inst) +{ + u32 *p; + int distance_start; + int distance_end; + ulong next_inst; + + p = kvm_alloc(kvm_emulate_wrteei_len * 4); + if (!p) + return; + + /* Find out where we are and put everything there */ + distance_start = (ulong)p - (ulong)inst; + next_inst = ((ulong)inst + 4); + distance_end = next_inst - (ulong)p[kvm_emulate_wrteei_branch_offs]; + + /* Make sure we only write valid b instructions */ + if (distance_start KVM_INST_B_MAX) { + kvm_patching_worked = false; + return; + } + + /* Modify the chunk to fit the invocation */ + memcpy(p, kvm_emulate_wrteei, kvm_emulate_wrteei_len * 4); + p[kvm_emulate_wrteei_branch_offs] |= distance_end KVM_INST_B_MASK; + p[kvm_emulate_wrteei_ee_offs] |= (*inst MSR_EE); + flush_icache_range((ulong)p, (ulong)p + kvm_emulate_wrteei_len * 4); + + /* Patch the invocation */ + *inst = KVM_INST_B | (distance_start KVM_INST_B_MASK); +} + +#endif + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -289,6 +333,12 @@ static void kvm_check_ins(u32 *inst) } switch (_inst) { +#ifdef CONFIG_BOOKE + case KVM_INST_WRTEEI_0: + case KVM_INST_WRTEEI_1: + kvm_patch_ins_wrteei(inst); + break; +#endif } flush_icache_range((ulong)inst, (ulong)inst + 4); diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index ccf5a42..b79b9de 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -194,3 +194,44 @@ kvm_emulate_mtmsr_orig_ins_offs: .global kvm_emulate_mtmsr_len kvm_emulate_mtmsr_len: .long (kvm_emulate_mtmsr_end - kvm_emulate_mtmsr) / 4 + + + +.global kvm_emulate_wrteei +kvm_emulate_wrteei: + + SCRATCH_SAVE + + /* Fetch old MSR in r31 */ + LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + /* Remove MSR_EE from old MSR */ + li r30, 0 + ori r30, r30, MSR_EE + andcr31, r31, r30 + + /* OR new MSR_EE onto the old MSR */ +kvm_emulate_wrteei_ee: + ori r31, r31, 0 + + /* Write new MSR value back */ + STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + SCRATCH_RESTORE + + /* Go back to caller */ +kvm_emulate_wrteei_branch: + b . +kvm_emulate_wrteei_end: + +.global kvm_emulate_wrteei_branch_offs +kvm_emulate_wrteei_branch_offs: + .long (kvm_emulate_wrteei_branch - kvm_emulate_wrteei) / 4 + +.global kvm_emulate_wrteei_ee_offs +kvm_emulate_wrteei_ee_offs: + .long (kvm_emulate_wrteei_ee - kvm_emulate_wrteei) / 4 + +.global kvm_emulate_wrteei_len +kvm_emulate_wrteei_len: + .long (kvm_emulate_wrteei_end - kvm_emulate_wrteei) / 4 -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/26] KVM: PPC: Make RMO a define
On PowerPC it's very normal to not support all of the physical RAM in real mode. To check if we're matching on the shared page or not, we need to know the limits so we can restrain ourselves to that range. So let's make it a define instead of open-coding it. And while at it, let's also increase it. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |2 ++ arch/powerpc/kvm/book3s.c |4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 83c45ea..e35c1ac 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -47,6 +47,8 @@ #define HPTEG_HASH_NUM_VPTE(1 HPTEG_HASH_BITS_VPTE) #define HPTEG_HASH_NUM_VPTE_LONG (1 HPTEG_HASH_BITS_VPTE_LONG) +#define KVM_RMO0x0fffULL + struct kvm; struct kvm_run; struct kvm_vcpu; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index e76c950..2f55aa5 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -462,7 +462,7 @@ static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, bool data, r = vcpu-arch.mmu.xlate(vcpu, eaddr, pte, data); } else { pte-eaddr = eaddr; - pte-raddr = eaddr 0x; + pte-raddr = eaddr KVM_RMO; pte-vpage = VSID_REAL | eaddr 12; pte-may_read = true; pte-may_write = true; @@ -576,7 +576,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, pte.may_execute = true; pte.may_read = true; pte.may_write = true; - pte.raddr = eaddr 0x; + pte.raddr = eaddr KVM_RMO; pte.eaddr = eaddr; pte.vpage = eaddr 12; } -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 20/26] KVM: PPC: PV tlbsync to nop
With our current MMU scheme we don't need to know about the tlbsync instruction. So we can just nop it out. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index b165b20..b091f94 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -61,6 +61,8 @@ #define KVM_INST_MTSPR_DAR 0x7c1303a6 #define KVM_INST_MTSPR_DSISR 0x7c1203a6 +#define KVM_INST_TLBSYNC 0x7c00046c + static bool kvm_patching_worked = true; static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt) @@ -91,6 +93,11 @@ static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt) *inst = KVM_INST_STW | rt | (addr 0xfffc); } +static void kvm_patch_ins_nop(u32 *inst) +{ + *inst = KVM_INST_NOP; +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -159,6 +166,11 @@ static void kvm_check_ins(u32 *inst) case KVM_INST_MTSPR_DSISR: kvm_patch_ins_stw(inst, magic_var(dsisr), inst_rt); break; + + /* Nops */ + case KVM_INST_TLBSYNC: + kvm_patch_ins_nop(inst); + break; } switch (_inst) { -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 24/26] KVM: PPC: PV mtmsrd L=0 and mtmsr
There is also a form of mtmsr where all bits need to be addressed. While the PPC64 Linux kernel behaves resonably well here, the PPC32 one never uses the L=1 form but does mtmsr even for simple things like only changing EE. So we need to hook into that one as well and check for a mask of bits that we deem safe to change from within guest context. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm.c | 51 arch/powerpc/kernel/kvm_emul.S | 84 2 files changed, 135 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 71153d0..3557bc8 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -62,7 +62,9 @@ #define KVM_INST_MTSPR_DSISR 0x7c1203a6 #define KVM_INST_TLBSYNC 0x7c00046c +#define KVM_INST_MTMSRD_L0 0x7c000164 #define KVM_INST_MTMSRD_L1 0x7c010164 +#define KVM_INST_MTMSR 0x7c000124 static bool kvm_patching_worked = true; static char kvm_tmp[1024 * 1024]; @@ -155,6 +157,49 @@ static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt) *inst = KVM_INST_B | (distance_start KVM_INST_B_MASK); } +extern u32 kvm_emulate_mtmsr_branch_offs; +extern u32 kvm_emulate_mtmsr_reg1_offs; +extern u32 kvm_emulate_mtmsr_reg2_offs; +extern u32 kvm_emulate_mtmsr_reg3_offs; +extern u32 kvm_emulate_mtmsr_orig_ins_offs; +extern u32 kvm_emulate_mtmsr_len; +extern u32 kvm_emulate_mtmsr[]; + +static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt) +{ + u32 *p; + int distance_start; + int distance_end; + ulong next_inst; + + p = kvm_alloc(kvm_emulate_mtmsr_len * 4); + if (!p) + return; + + /* Find out where we are and put everything there */ + distance_start = (ulong)p - (ulong)inst; + next_inst = ((ulong)inst + 4); + distance_end = next_inst - (ulong)p[kvm_emulate_mtmsr_branch_offs]; + + /* Make sure we only write valid b instructions */ + if (distance_start KVM_INST_B_MAX) { + kvm_patching_worked = false; + return; + } + + /* Modify the chunk to fit the invocation */ + memcpy(p, kvm_emulate_mtmsr, kvm_emulate_mtmsr_len * 4); + p[kvm_emulate_mtmsr_branch_offs] |= distance_end KVM_INST_B_MASK; + p[kvm_emulate_mtmsr_reg1_offs] |= rt; + p[kvm_emulate_mtmsr_reg2_offs] |= rt; + p[kvm_emulate_mtmsr_reg3_offs] |= rt; + p[kvm_emulate_mtmsr_orig_ins_offs] = *inst; + flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsr_len * 4); + + /* Patch the invocation */ + *inst = KVM_INST_B | (distance_start KVM_INST_B_MASK); +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -235,6 +280,12 @@ static void kvm_check_ins(u32 *inst) if (get_rt(inst_rt) 30) kvm_patch_ins_mtmsrd(inst, inst_rt); break; + case KVM_INST_MTMSR: + case KVM_INST_MTMSRD_L0: + /* We use r30 and r31 during the hook */ + if (get_rt(inst_rt) 30) + kvm_patch_ins_mtmsr(inst, inst_rt); + break; } switch (_inst) { diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index 25e6683..ccf5a42 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -110,3 +110,87 @@ kvm_emulate_mtmsrd_reg_offs: .global kvm_emulate_mtmsrd_len kvm_emulate_mtmsrd_len: .long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4 + + +#define MSR_SAFE_BITS (MSR_EE | MSR_CE | MSR_ME | MSR_RI) +#define MSR_CRITICAL_BITS ~MSR_SAFE_BITS + +.global kvm_emulate_mtmsr +kvm_emulate_mtmsr: + + SCRATCH_SAVE + + /* Fetch old MSR in r31 */ + LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + /* Find the changed bits between old and new MSR */ +kvm_emulate_mtmsr_reg1: + xor r31, r0, r31 + + /* Check if we need to really do mtmsr */ + LOAD_REG_IMMEDIATE(r30, MSR_CRITICAL_BITS) + and.r31, r31, r30 + + /* No critical bits changed? Maybe we can stay in the guest. */ + beq maybe_stay_in_guest + +do_mtmsr: + + SCRATCH_RESTORE + + /* Just fire off the mtmsr if it's critical */ +kvm_emulate_mtmsr_orig_ins: + mtmsr r0 + + b kvm_emulate_mtmsr_branch + +maybe_stay_in_guest: + + /* Check if we have to fetch an interrupt */ + lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0) + cmpwi r31, 0 + beq+no_mtmsr + + /* Check if we may trigger an interrupt */ +kvm_emulate_mtmsr_reg2: + andi. r31, r0, MSR_EE + beq no_mtmsr + + b do_mtmsr + +no_mtmsr: + + /* Put MSR into magic page because we don't call mtmsr */ +kvm_emulate_mtmsr_reg3: + STL64(r0, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + SCRATCH_RESTORE + + /* Go back to
[PATCH 18/26] KVM: PPC: KVM PV guest stubs
We will soon start and replace instructions from the text section with other, paravirtualized versions. To ease the readability of those patches I split out the generic looping and magic page mapping code out. This patch still only contains stubs. But at least it loops through the text section :). Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm.c | 59 + 1 files changed, 59 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 2d8dd73..d873bc6 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -32,3 +32,62 @@ #define KVM_MAGIC_PAGE (-4096L) #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x) +static bool kvm_patching_worked = true; + +static void kvm_map_magic_page(void *data) +{ + kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, + KVM_MAGIC_PAGE, /* Physical Address */ + KVM_MAGIC_PAGE); /* Effective Address */ +} + +static void kvm_check_ins(u32 *inst) +{ + u32 _inst = *inst; + u32 inst_no_rt = _inst ~KVM_MASK_RT; + u32 inst_rt = _inst KVM_MASK_RT; + + switch (inst_no_rt) { + } + + switch (_inst) { + } + + flush_icache_range((ulong)inst, (ulong)inst + 4); +} + +static void kvm_use_magic_page(void) +{ + u32 *p; + u32 *start, *end; + + /* Tell the host to map the magic page to -4096 on all CPUs */ + + on_each_cpu(kvm_map_magic_page, NULL, 1); + + /* Now loop through all code and find instructions */ + + start = (void*)_stext; + end = (void*)_etext; + + for (p = start; p end; p++) + kvm_check_ins(p); +} + +static int __init kvm_guest_init(void) +{ + char *p; + + if (!kvm_para_available()) + return 0; + + if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE)) + kvm_use_magic_page(); + + printk(KERN_INFO KVM: Live patching for a fast VM %s\n, +kvm_patching_worked ? worked : failed); + + return 0; +} + +postcore_initcall(kvm_guest_init); -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/26] KVM: PPC: First magic page steps
We will be introducing a method to project the shared page in guest context. As soon as we're talking about this coupling, the shared page is colled magic page. This patch introduces simple defines, so the follow-up patches are easier to read. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |2 ++ include/linux/kvm_para.h|1 + 2 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index e35c1ac..5f8c214 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -285,6 +285,8 @@ struct kvm_vcpu_arch { u64 dec_jiffies; unsigned long pending_exceptions; struct kvm_vcpu_arch_shared *shared; + unsigned long magic_page_pa; /* phys addr to map the magic page to */ + unsigned long magic_page_ea; /* effect. addr to map the magic page to */ #ifdef CONFIG_PPC_BOOK3S struct kmem_cache *hpte_cache; diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h index 3b8080e..ac2015a 100644 --- a/include/linux/kvm_para.h +++ b/include/linux/kvm_para.h @@ -18,6 +18,7 @@ #define KVM_HC_VAPIC_POLL_IRQ 1 #define KVM_HC_MMU_OP 2 #define KVM_HC_FEATURES3 +#define KVM_HC_PPC_MAP_MAGIC_PAGE 4 /* * hypercalls use architecture specific -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 23/26] KVM: PPC: PV mtmsrd L=1
The PowerPC ISA has a special instruction for mtmsr that only changes the EE and RI bits, namely the L=1 form. Since that one is reasonably often occuring and simple to implement, let's go with this first. Writing EE=0 is always just a store. Doing EE=1 also requires us to check for pending interrupts and if necessary exit back to the hypervisor. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm.c | 45 arch/powerpc/kernel/kvm_emul.S | 56 2 files changed, 101 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 7e8fe6f..71153d0 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -62,6 +62,7 @@ #define KVM_INST_MTSPR_DSISR 0x7c1203a6 #define KVM_INST_TLBSYNC 0x7c00046c +#define KVM_INST_MTMSRD_L1 0x7c010164 static bool kvm_patching_worked = true; static char kvm_tmp[1024 * 1024]; @@ -117,6 +118,43 @@ static u32 *kvm_alloc(int len) return p; } +extern u32 kvm_emulate_mtmsrd_branch_offs; +extern u32 kvm_emulate_mtmsrd_reg_offs; +extern u32 kvm_emulate_mtmsrd_len; +extern u32 kvm_emulate_mtmsrd[]; + +static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt) +{ + u32 *p; + int distance_start; + int distance_end; + ulong next_inst; + + p = kvm_alloc(kvm_emulate_mtmsrd_len * 4); + if (!p) + return; + + /* Find out where we are and put everything there */ + distance_start = (ulong)p - (ulong)inst; + next_inst = ((ulong)inst + 4); + distance_end = next_inst - (ulong)p[kvm_emulate_mtmsrd_branch_offs]; + + /* Make sure we only write valid b instructions */ + if (distance_start KVM_INST_B_MAX) { + kvm_patching_worked = false; + return; + } + + /* Modify the chunk to fit the invocation */ + memcpy(p, kvm_emulate_mtmsrd, kvm_emulate_mtmsrd_len * 4); + p[kvm_emulate_mtmsrd_branch_offs] |= distance_end KVM_INST_B_MASK; + p[kvm_emulate_mtmsrd_reg_offs] |= rt; + flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsrd_len * 4); + + /* Patch the invocation */ + *inst = KVM_INST_B | (distance_start KVM_INST_B_MASK); +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -190,6 +228,13 @@ static void kvm_check_ins(u32 *inst) case KVM_INST_TLBSYNC: kvm_patch_ins_nop(inst); break; + + /* Rewrites */ + case KVM_INST_MTMSRD_L1: + /* We use r30 and r31 during the hook */ + if (get_rt(inst_rt) 30) + kvm_patch_ins_mtmsrd(inst, inst_rt); + break; } switch (_inst) { diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index 7da835a..25e6683 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -54,3 +54,59 @@ /* Disable critical section. We are critical if \ shared-critical == r1 and r2 is always != r1 */ \ STL64(r2, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0); + +.global kvm_emulate_mtmsrd +kvm_emulate_mtmsrd: + + SCRATCH_SAVE + + /* Put MSR ~(MSR_EE|MSR_RI) in r31 */ + LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + lis r30, (~(MSR_EE | MSR_RI))@h + ori r30, r30, (~(MSR_EE | MSR_RI))@l + and r31, r31, r30 + + /* OR the register's (MSR_EE|MSR_RI) on MSR */ +kvm_emulate_mtmsrd_reg: + andi. r30, r0, (MSR_EE|MSR_RI) + or r31, r31, r30 + + /* Put MSR back into magic page */ + STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + /* Check if we have to fetch an interrupt */ + lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0) + cmpwi r31, 0 + beq+no_check + + /* Check if we may trigger an interrupt */ + andi. r30, r30, MSR_EE + beq no_check + + SCRATCH_RESTORE + + /* Nag hypervisor */ + tlbsync + + b kvm_emulate_mtmsrd_branch + +no_check: + + SCRATCH_RESTORE + + /* Go back to caller */ +kvm_emulate_mtmsrd_branch: + b . +kvm_emulate_mtmsrd_end: + +.global kvm_emulate_mtmsrd_branch_offs +kvm_emulate_mtmsrd_branch_offs: + .long (kvm_emulate_mtmsrd_branch - kvm_emulate_mtmsrd) / 4 + +.global kvm_emulate_mtmsrd_reg_offs +kvm_emulate_mtmsrd_reg_offs: + .long (kvm_emulate_mtmsrd_reg - kvm_emulate_mtmsrd) / 4 + +.global kvm_emulate_mtmsrd_len +kvm_emulate_mtmsrd_len: + .long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4 -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 17/26] KVM: PPC: Generic KVM PV guest support
We have all the hypervisor pieces in place now, but the guest parts are still missing. This patch implements basic awareness of KVM when running Linux as guest. It doesn't do anything with it yet though. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/Makefile |2 ++ arch/powerpc/kernel/asm-offsets.c | 15 +++ arch/powerpc/kernel/kvm.c | 34 ++ arch/powerpc/kernel/kvm_emul.S| 27 +++ arch/powerpc/platforms/Kconfig| 10 ++ 5 files changed, 88 insertions(+), 0 deletions(-) create mode 100644 arch/powerpc/kernel/kvm.c create mode 100644 arch/powerpc/kernel/kvm_emul.S diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 58d0572..2d7eb9e 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -125,6 +125,8 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),) obj-y += ppc_save_regs.o endif +obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o + # Disable GCOV in odd or sensitive code GCOV_PROFILE_prom_init.o := n GCOV_PROFILE_ftrace.o := n diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index a55d47e..e3e740b 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -465,6 +465,21 @@ int main(void) DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr)); #endif /* CONFIG_PPC_BOOK3S */ #endif + +#ifdef CONFIG_KVM_GUEST + DEFINE(KVM_MAGIC_SCRATCH1, offsetof(struct kvm_vcpu_arch_shared, + scratch1)); + DEFINE(KVM_MAGIC_SCRATCH2, offsetof(struct kvm_vcpu_arch_shared, + scratch2)); + DEFINE(KVM_MAGIC_SCRATCH3, offsetof(struct kvm_vcpu_arch_shared, + scratch3)); + DEFINE(KVM_MAGIC_INT, offsetof(struct kvm_vcpu_arch_shared, + int_pending)); + DEFINE(KVM_MAGIC_MSR, offsetof(struct kvm_vcpu_arch_shared, msr)); + DEFINE(KVM_MAGIC_CRITICAL, offsetof(struct kvm_vcpu_arch_shared, + critical)); +#endif + #ifdef CONFIG_44x DEFINE(PGD_T_LOG2, PGD_T_LOG2); DEFINE(PTE_T_LOG2, PTE_T_LOG2); diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c new file mode 100644 index 000..2d8dd73 --- /dev/null +++ b/arch/powerpc/kernel/kvm.c @@ -0,0 +1,34 @@ +/* + * Copyright (C) 2010 SUSE Linux Products GmbH. All rights reserved. + * + * Authors: + * Alexander Graf ag...@suse.de + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#include linux/kvm_host.h +#include linux/init.h +#include linux/kvm_para.h +#include linux/slab.h + +#include asm/reg.h +#include asm/kvm_ppc.h +#include asm/sections.h +#include asm/cacheflush.h +#include asm/disassemble.h + +#define KVM_MAGIC_PAGE (-4096L) +#define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x) + diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S new file mode 100644 index 000..c7b9fc9 --- /dev/null +++ b/arch/powerpc/kernel/kvm_emul.S @@ -0,0 +1,27 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright SUSE Linux Products GmbH 2010 + * + * Authors: Alexander Graf ag...@suse.de + */ + +#include asm/ppc_asm.h +#include asm/kvm_asm.h +#include asm/reg.h +#include asm/page.h +#include asm/asm-offsets.h + +#define KVM_MAGIC_PAGE (-4096) + diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig index d1663db..1744349 100644 --- a/arch/powerpc/platforms/Kconfig +++ b/arch/powerpc/platforms/Kconfig @@ -21,6 +21,16 @@ source
[PATCH 19/26] KVM: PPC: PV instructions to loads and stores
Some instructions can simply be replaced by load and store instructions to or from the magic page. This patch replaces often called instructions that fall into the above category. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm.c | 111 + 1 files changed, 111 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index d873bc6..b165b20 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -32,8 +32,65 @@ #define KVM_MAGIC_PAGE (-4096L) #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x) +#define KVM_INST_LWZ 0x8000 +#define KVM_INST_STW 0x9000 +#define KVM_INST_LD0xe800 +#define KVM_INST_STD 0xf800 +#define KVM_INST_NOP 0x6000 +#define KVM_INST_B 0x4800 +#define KVM_INST_B_MASK0x03ff +#define KVM_INST_B_MAX 0x01ff + +#define KVM_MASK_RT0x03e0 +#define KVM_INST_MFMSR 0x7ca6 +#define KVM_INST_MFSPR_SPRG0 0x7c1042a6 +#define KVM_INST_MFSPR_SPRG1 0x7c1142a6 +#define KVM_INST_MFSPR_SPRG2 0x7c1242a6 +#define KVM_INST_MFSPR_SPRG3 0x7c1342a6 +#define KVM_INST_MFSPR_SRR00x7c1a02a6 +#define KVM_INST_MFSPR_SRR10x7c1b02a6 +#define KVM_INST_MFSPR_DAR 0x7c1302a6 +#define KVM_INST_MFSPR_DSISR 0x7c1202a6 + +#define KVM_INST_MTSPR_SPRG0 0x7c1043a6 +#define KVM_INST_MTSPR_SPRG1 0x7c1143a6 +#define KVM_INST_MTSPR_SPRG2 0x7c1243a6 +#define KVM_INST_MTSPR_SPRG3 0x7c1343a6 +#define KVM_INST_MTSPR_SRR00x7c1a03a6 +#define KVM_INST_MTSPR_SRR10x7c1b03a6 +#define KVM_INST_MTSPR_DAR 0x7c1303a6 +#define KVM_INST_MTSPR_DSISR 0x7c1203a6 + static bool kvm_patching_worked = true; +static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt) +{ +#ifdef CONFIG_64BIT + *inst = KVM_INST_LD | rt | (addr 0xfffc); +#else + *inst = KVM_INST_LWZ | rt | ((addr + 4) 0xfffc); +#endif +} + +static void kvm_patch_ins_lwz(u32 *inst, long addr, u32 rt) +{ + *inst = KVM_INST_LWZ | rt | (addr 0x); +} + +static void kvm_patch_ins_std(u32 *inst, long addr, u32 rt) +{ +#ifdef CONFIG_64BIT + *inst = KVM_INST_STD | rt | (addr 0xfffc); +#else + *inst = KVM_INST_STW | rt | ((addr + 4) 0xfffc); +#endif +} + +static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt) +{ + *inst = KVM_INST_STW | rt | (addr 0xfffc); +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -48,6 +105,60 @@ static void kvm_check_ins(u32 *inst) u32 inst_rt = _inst KVM_MASK_RT; switch (inst_no_rt) { + /* Loads */ + case KVM_INST_MFMSR: + kvm_patch_ins_ld(inst, magic_var(msr), inst_rt); + break; + case KVM_INST_MFSPR_SPRG0: + kvm_patch_ins_ld(inst, magic_var(sprg0), inst_rt); + break; + case KVM_INST_MFSPR_SPRG1: + kvm_patch_ins_ld(inst, magic_var(sprg1), inst_rt); + break; + case KVM_INST_MFSPR_SPRG2: + kvm_patch_ins_ld(inst, magic_var(sprg2), inst_rt); + break; + case KVM_INST_MFSPR_SPRG3: + kvm_patch_ins_ld(inst, magic_var(sprg3), inst_rt); + break; + case KVM_INST_MFSPR_SRR0: + kvm_patch_ins_ld(inst, magic_var(srr0), inst_rt); + break; + case KVM_INST_MFSPR_SRR1: + kvm_patch_ins_ld(inst, magic_var(srr1), inst_rt); + break; + case KVM_INST_MFSPR_DAR: + kvm_patch_ins_ld(inst, magic_var(dar), inst_rt); + break; + case KVM_INST_MFSPR_DSISR: + kvm_patch_ins_lwz(inst, magic_var(dsisr), inst_rt); + break; + + /* Stores */ + case KVM_INST_MTSPR_SPRG0: + kvm_patch_ins_std(inst, magic_var(sprg0), inst_rt); + break; + case KVM_INST_MTSPR_SPRG1: + kvm_patch_ins_std(inst, magic_var(sprg1), inst_rt); + break; + case KVM_INST_MTSPR_SPRG2: + kvm_patch_ins_std(inst, magic_var(sprg2), inst_rt); + break; + case KVM_INST_MTSPR_SPRG3: + kvm_patch_ins_std(inst, magic_var(sprg3), inst_rt); + break; + case KVM_INST_MTSPR_SRR0: + kvm_patch_ins_std(inst, magic_var(srr0), inst_rt); + break; + case KVM_INST_MTSPR_SRR1: + kvm_patch_ins_std(inst, magic_var(srr1), inst_rt); + break; + case KVM_INST_MTSPR_DAR: + kvm_patch_ins_std(inst, magic_var(dar), inst_rt); + break; + case KVM_INST_MTSPR_DSISR: + kvm_patch_ins_stw(inst, magic_var(dsisr), inst_rt); + break; } switch (_inst) { -- 1.6.0.2 -- To unsubscribe from this
[PATCH 22/26] KVM: PPC: PV assembler helpers
When we hook an instruction we need to make sure we don't clobber any of the registers at that point. So we write them out to scratch space in the magic page. To make sure we don't fall into a race with another piece of hooked code, we need to disable interrupts. To make the later patches and code in general easier readable, let's introduce a set of defines that save and restore r30, r31 and cr. Let's also define some helpers to read the lower 32 bits of a 64 bit field on 32 bit systems. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm_emul.S | 29 + 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index c7b9fc9..7da835a 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -25,3 +25,32 @@ #define KVM_MAGIC_PAGE (-4096) +#ifdef CONFIG_64BIT +#define LL64(reg, offs, reg2) ld reg, (offs)(reg2) +#define STL64(reg, offs, reg2) std reg, (offs)(reg2) +#else +#define LL64(reg, offs, reg2) lwz reg, (offs + 4)(reg2) +#define STL64(reg, offs, reg2) stw reg, (offs + 4)(reg2) +#endif + +#define SCRATCH_SAVE \ + /* Enable critical section. We are critical if \ + shared-critical == r1 */\ + STL64(r1, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0); \ + \ + /* Save state */\ + PPC_STL r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH1)(0); \ + PPC_STL r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH2)(0); \ + mfcrr31;\ + stw r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH3)(0); + +#define SCRATCH_RESTORE \ + /* Restore state */ \ + PPC_LL r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH1)(0); \ + lwz r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH3)(0); \ + mtcrr30;\ + PPC_LL r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH2)(0); \ + \ + /* Disable critical section. We are critical if \ + shared-critical == r1 and r2 is always != r1 */ \ + STL64(r2, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0); -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/26] KVM: PPC: Magic Page Book3s support
We need to override EA as well as PA lookups for the magic page. When the guest tells us to project it, the magic page overrides any guest mappings. In order to reflect that, we need to hook into all the MMU layers of KVM to force map the magic page if necessary. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s.c |7 +++ arch/powerpc/kvm/book3s_32_mmu.c | 16 arch/powerpc/kvm/book3s_32_mmu_host.c | 12 arch/powerpc/kvm/book3s_64_mmu.c | 30 +- arch/powerpc/kvm/book3s_64_mmu_host.c | 12 5 files changed, 76 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 2f55aa5..6ce7fa1 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -551,6 +551,13 @@ mmio: static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) { + ulong mp_pa = vcpu-arch.magic_page_pa; + + if (unlikely(mp_pa) + unlikely((mp_pa KVM_RMO) PAGE_SHIFT == gfn)) { + return 1; + } + return kvm_is_visible_gfn(vcpu-kvm, gfn); } diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 41130c8..d2bd1a6 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -281,8 +281,24 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data) { int r; + ulong mp_ea = vcpu-arch.magic_page_ea; pte-eaddr = eaddr; + + /* Magic page override */ + if (unlikely(mp_ea) + unlikely((eaddr ~0xfffULL) == (mp_ea ~0xfffULL)) + !(vcpu-arch.shared-msr MSR_PR)) { + pte-vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data); + pte-raddr = vcpu-arch.magic_page_pa | (pte-raddr 0xfff); + pte-raddr = KVM_RMO; + pte-may_execute = true; + pte-may_read = true; + pte-may_write = true; + + return 0; + } + r = kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data); if (r 0) r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, true); diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c index 67b8c38..658d3e0 100644 --- a/arch/powerpc/kvm/book3s_32_mmu_host.c +++ b/arch/powerpc/kvm/book3s_32_mmu_host.c @@ -145,6 +145,16 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte) bool primary = false; bool evict = false; struct hpte_cache *pte; + ulong mp_pa = vcpu-arch.magic_page_pa; + + /* Magic page override */ + if (unlikely(mp_pa) + unlikely((orig_pte-raddr ~0xfffUL KVM_RMO) == +(mp_pa ~0xfffUL KVM_RMO))) { + hpaddr = (pfn_t)virt_to_phys(vcpu-arch.shared); + get_page(pfn_to_page(hpaddr PAGE_SHIFT)); + goto mapped; + } /* Get host physical address for gpa */ hpaddr = gfn_to_pfn(vcpu-kvm, orig_pte-raddr PAGE_SHIFT); @@ -155,6 +165,8 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte) } hpaddr = PAGE_SHIFT; +mapped: + /* and write the mapping ea - hpa into the pt */ vcpu-arch.mmu.esid_to_vsid(vcpu, orig_pte-eaddr SID_SHIFT, vsid); map = find_sid_vsid(vcpu, vsid); diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c index 58aa840..4a2e5fc 100644 --- a/arch/powerpc/kvm/book3s_64_mmu.c +++ b/arch/powerpc/kvm/book3s_64_mmu.c @@ -163,6 +163,22 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, bool found = false; bool perm_err = false; int second = 0; + ulong mp_ea = vcpu-arch.magic_page_ea; + + /* Magic page override */ + if (unlikely(mp_ea) + unlikely((eaddr ~0xfffULL) == (mp_ea ~0xfffULL)) + !(vcpu-arch.shared-msr MSR_PR)) { + gpte-eaddr = eaddr; + gpte-vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu, eaddr, data); + gpte-raddr = vcpu-arch.magic_page_pa | (gpte-raddr 0xfff); + gpte-raddr = KVM_RMO; + gpte-may_execute = true; + gpte-may_read = true; + gpte-may_write = true; + + return 0; + } slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, eaddr); if (!slbe) @@ -445,6 +461,7 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid, ulong ea = esid SID_SHIFT; struct kvmppc_slb *slb; u64 gvsid = esid; + ulong mp_ea = vcpu-arch.magic_page_ea; if (vcpu-arch.shared-msr (MSR_DR|MSR_IR)) { slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), ea); @@ -464,7 +481,7 @@ static int
[PATCH 07/26] KVM: PPC: Implement hypervisor interface
To communicate with KVM directly we need to plumb some sort of interface between the guest and KVM. Usually those interfaces use hypercalls. This hypercall implementation is described in the last patch of the series in a special documentation file. Please read that for further information. This patch implements stubs to handle KVM PPC hypercalls on the host and guest side alike. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_para.h | 100 ++- arch/powerpc/include/asm/kvm_ppc.h |1 + arch/powerpc/kvm/book3s.c | 10 +++- arch/powerpc/kvm/booke.c| 11 - arch/powerpc/kvm/emulate.c | 11 - arch/powerpc/kvm/powerpc.c | 28 ++ include/linux/kvm_para.h|1 + 7 files changed, 156 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index e402999..eaab306 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -34,16 +34,112 @@ struct kvm_vcpu_arch_shared { __u32 dsisr; }; +#define KVM_PVR_PARA 0x4b564d3f /* KVM? */ +#define KVM_SC_MAGIC_R30x4b564d52 /* KVMR */ +#define KVM_SC_MAGIC_R40x554c455a /* ULEZ */ + #ifdef __KERNEL__ static inline int kvm_para_available(void) { - return 0; + unsigned long pvr = KVM_PVR_PARA; + + asm volatile(mfpvr %0 : =r(pvr) : 0(pvr)); + return pvr == KVM_PVR_PARA; +} + +static inline long kvm_hypercall0(unsigned int nr) +{ + unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3; + unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4; + unsigned long register _nr asm(r5) = nr; + + asm volatile(sc +: =r(r3) +: r(r3), r(r4), r(_nr) +: memory); + + return r3; } +static inline long kvm_hypercall1(unsigned int nr, unsigned long p1) +{ + unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3; + unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4; + unsigned long register _nr asm(r5) = nr; + unsigned long register _p1 asm(r6) = p1; + + asm volatile(sc +: =r(r3) +: r(r3), r(r4), r(_nr), r(_p1) +: memory); + + return r3; +} + +static inline long kvm_hypercall2(unsigned int nr, unsigned long p1, + unsigned long p2) +{ + unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3; + unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4; + unsigned long register _nr asm(r5) = nr; + unsigned long register _p1 asm(r6) = p1; + unsigned long register _p2 asm(r7) = p2; + + asm volatile(sc +: =r(r3) +: r(r3), r(r4), r(_nr), r(_p1), r(_p2) +: memory); + + return r3; +} + +static inline long kvm_hypercall3(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3) +{ + unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3; + unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4; + unsigned long register _nr asm(r5) = nr; + unsigned long register _p1 asm(r6) = p1; + unsigned long register _p2 asm(r7) = p2; + unsigned long register _p3 asm(r8) = p3; + + asm volatile(sc +: =r(r3) +: r(r3), r(r4), r(_nr), r(_p1), r(_p2), r(_p3) +: memory); + + return r3; +} + +static inline long kvm_hypercall4(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3, + unsigned long p4) +{ + unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3; + unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4; + unsigned long register _nr asm(r5) = nr; + unsigned long register _p1 asm(r6) = p1; + unsigned long register _p2 asm(r7) = p2; + unsigned long register _p3 asm(r8) = p3; + unsigned long register _p4 asm(r9) = p4; + + asm volatile(sc +: =r(r3) +: r(r3), r(r4), r(_nr), r(_p1), r(_p2), r(_p3), + r(_p4) +: memory); + + return r3; +} + + static inline unsigned int kvm_arch_para_features(void) { - return 0; + if (!kvm_para_available()) + return 0; + + return kvm_hypercall0(KVM_HC_FEATURES); } #endif /* __KERNEL__ */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 18d139e..ecb3bc7 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -107,6 +107,7 @@ extern int kvmppc_booke_init(void); extern void kvmppc_booke_exit(void); extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu); +extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);
[PATCH 03/26] KVM: PPC: Convert DSISR to shared page
The DSISR register contains information about a data page fault. It is fully read/write from inside the guest context and we don't need to worry about interacting based on writes of this register. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h|1 - arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kvm/book3s.c| 11 ++- arch/powerpc/kvm/book3s_emulate.c|6 +++--- arch/powerpc/kvm/book3s_paired_singles.c |2 +- 5 files changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index a96e405..4f29caa 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -85,7 +85,6 @@ struct kvmppc_vcpu_book3s { u64 hid[6]; u64 gqr[8]; int slb_nr; - u32 dsisr; u64 sdr1; u64 hior; u64 msr_mask; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index a17dc52..9f7565b 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -24,6 +24,7 @@ struct kvm_vcpu_arch_shared { __u64 msr; + __u32 dsisr; }; #ifdef __KERNEL__ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 3dd3003..57fd73e 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -595,15 +595,16 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (page_found == -ENOENT) { /* Page not found in guest PTE entries */ vcpu-arch.dear = kvmppc_get_fault_dar(vcpu); - to_book3s(vcpu)-dsisr = to_svcpu(vcpu)-fault_dsisr; + vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr; vcpu-arch.shared-msr |= (to_svcpu(vcpu)-shadow_srr1 0xf800ULL); kvmppc_book3s_queue_irqprio(vcpu, vec); } else if (page_found == -EPERM) { /* Storage protection */ vcpu-arch.dear = kvmppc_get_fault_dar(vcpu); - to_book3s(vcpu)-dsisr = to_svcpu(vcpu)-fault_dsisr ~DSISR_NOHPTE; - to_book3s(vcpu)-dsisr |= DSISR_PROTFAULT; + vcpu-arch.shared-dsisr = + to_svcpu(vcpu)-fault_dsisr ~DSISR_NOHPTE; + vcpu-arch.shared-dsisr |= DSISR_PROTFAULT; vcpu-arch.shared-msr |= (to_svcpu(vcpu)-shadow_srr1 0xf800ULL); kvmppc_book3s_queue_irqprio(vcpu, vec); @@ -867,7 +868,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, r = kvmppc_handle_pagefault(run, vcpu, dar, exit_nr); } else { vcpu-arch.dear = dar; - to_book3s(vcpu)-dsisr = to_svcpu(vcpu)-fault_dsisr; + vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr; kvmppc_book3s_queue_irqprio(vcpu, exit_nr); kvmppc_mmu_pte_flush(vcpu, vcpu-arch.dear, ~0xFFFUL); r = RESUME_GUEST; @@ -994,7 +995,7 @@ program_interrupt: } case BOOK3S_INTERRUPT_ALIGNMENT: if (kvmppc_read_inst(vcpu) == EMULATE_DONE) { - to_book3s(vcpu)-dsisr = kvmppc_alignment_dsisr(vcpu, + vcpu-arch.shared-dsisr = kvmppc_alignment_dsisr(vcpu, kvmppc_get_last_inst(vcpu)); vcpu-arch.dear = kvmppc_alignment_dar(vcpu, kvmppc_get_last_inst(vcpu)); diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 35d3c16..9982ff1 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -221,7 +221,7 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, else if (r == -EPERM) dsisr |= DSISR_PROTFAULT; - to_book3s(vcpu)-dsisr = dsisr; + vcpu-arch.shared-dsisr = dsisr; to_svcpu(vcpu)-fault_dsisr = dsisr; kvmppc_book3s_queue_irqprio(vcpu, @@ -327,7 +327,7 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs) to_book3s(vcpu)-sdr1 = spr_val; break; case SPRN_DSISR: - to_book3s(vcpu)-dsisr = spr_val; + vcpu-arch.shared-dsisr = spr_val; break; case SPRN_DAR: vcpu-arch.dear = spr_val; @@ -440,7 +440,7 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt) kvmppc_set_gpr(vcpu, rt, to_book3s(vcpu)-sdr1);
[PATCH 00/26] KVM PPC PV framework
On PPC we run PR=0 (kernel mode) code in PR=1 (user mode) and don't use the hypervisor extensions. While that is all great to show that virtualization is possible, there are quite some cases where the emulation overhead of privileged instructions is killing performance. This patchset tackles exactly that issue. It introduces a paravirtual framework using which KVM and Linux share a page to exchange register state with. That way we don't have to switch to the hypervisor just to change a value of a privileged register. To prove my point, I ran the same test I did for the MMU optimizations against the PV framework. Here are the results: [without] debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello /dev/null; done real0m14.659s user0m8.967s sys 0m5.688s [with] debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello /dev/null; done real0m7.557s user0m4.121s sys 0m3.426s So this is a significant performance improvement! I'm quite happy how fast this whole thing becomes :) I tried to take all comments I've heard from people so far about such a PV framework into account. In case you told me something before that is a no-go and I still did it, please just tell me again. Now go and have fun with fast VMs on PPC! Get yourself a G5 on ebay and start experiencing the power yourself. - heh Alexander Graf (26): KVM: PPC: Introduce shared page KVM: PPC: Convert MSR to shared page KVM: PPC: Convert DSISR to shared page KVM: PPC: Convert DAR to shared page. KVM: PPC: Convert SRR0 and SRR1 to shared page KVM: PPC: Convert SPRG[0-4] to shared page KVM: PPC: Implement hypervisor interface KVM: PPC: Add PV guest critical sections KVM: PPC: Add PV guest scratch registers KVM: PPC: Tell guest about pending interrupts KVM: PPC: Make RMO a define KVM: PPC: First magic page steps KVM: PPC: Magic Page Book3s support KVM: PPC: Magic Page BookE support KVM: PPC: Expose magic page support to guest KVM: Move kvm_guest_init out of generic code KVM: PPC: Generic KVM PV guest support KVM: PPC: KVM PV guest stubs KVM: PPC: PV instructions to loads and stores KVM: PPC: PV tlbsync to nop KVM: PPC: Introduce kvm_tmp framework KVM: PPC: PV assembler helpers KVM: PPC: PV mtmsrd L=1 KVM: PPC: PV mtmsrd L=0 and mtmsr KVM: PPC: PV wrteei KVM: PPC: Add Documentation about PV interface Documentation/kvm/ppc-pv.txt | 164 arch/powerpc/include/asm/kvm_book3s.h|1 - arch/powerpc/include/asm/kvm_host.h | 14 +- arch/powerpc/include/asm/kvm_para.h | 121 +- arch/powerpc/include/asm/kvm_ppc.h |1 + arch/powerpc/kernel/Makefile |2 + arch/powerpc/kernel/asm-offsets.c| 18 ++- arch/powerpc/kernel/kvm.c| 399 ++ arch/powerpc/kernel/kvm_emul.S | 237 ++ arch/powerpc/kvm/44x.c |7 + arch/powerpc/kvm/44x_tlb.c |8 +- arch/powerpc/kvm/book3s.c| 162 - arch/powerpc/kvm/book3s_32_mmu.c | 28 ++- arch/powerpc/kvm/book3s_32_mmu_host.c| 16 +- arch/powerpc/kvm/book3s_64_mmu.c | 42 +++- arch/powerpc/kvm/book3s_64_mmu_host.c| 16 +- arch/powerpc/kvm/book3s_emulate.c| 25 +- arch/powerpc/kvm/book3s_paired_singles.c | 11 +- arch/powerpc/kvm/booke.c | 110 +++-- arch/powerpc/kvm/booke.h |6 +- arch/powerpc/kvm/booke_emulate.c | 14 +- arch/powerpc/kvm/booke_interrupts.S |3 +- arch/powerpc/kvm/e500.c |7 + arch/powerpc/kvm/e500_tlb.c | 31 ++- arch/powerpc/kvm/e500_tlb.h |2 +- arch/powerpc/kvm/emulate.c | 47 +++- arch/powerpc/kvm/powerpc.c | 42 +++- arch/powerpc/platforms/Kconfig | 10 + arch/x86/include/asm/kvm_para.h |6 + include/linux/kvm_para.h |7 +- 30 files changed, 1383 insertions(+), 174 deletions(-) create mode 100644 Documentation/kvm/ppc-pv.txt create mode 100644 arch/powerpc/kernel/kvm.c create mode 100644 arch/powerpc/kernel/kvm_emul.S -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/26] KVM: PPC: Convert SRR0 and SRR1 to shared page
The SRR0 and SRR1 registers contain cached values of the PC and MSR respectively. They get written to by the hypervisor when an interrupt occurs or directly by the kernel. They are also used to tell the rfi(d) instruction where to jump to. Because it only gets touched on defined events that, it's very simple to share with the guest. Hypervisor and guest both have full r/w access. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |2 -- arch/powerpc/include/asm/kvm_para.h |2 ++ arch/powerpc/kvm/book3s.c | 12 ++-- arch/powerpc/kvm/book3s_emulate.c |4 ++-- arch/powerpc/kvm/booke.c| 15 --- arch/powerpc/kvm/booke_emulate.c|4 ++-- arch/powerpc/kvm/emulate.c | 12 7 files changed, 28 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 108dabc..6bcf62f 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -224,8 +224,6 @@ struct kvm_vcpu_arch { ulong sprg5; ulong sprg6; ulong sprg7; - ulong srr0; - ulong srr1; ulong csrr0; ulong csrr1; ulong dsrr0; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index ec72a1c..d7fc6c2 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,8 @@ #include linux/types.h struct kvm_vcpu_arch_shared { + __u64 srr0; + __u64 srr1; __u64 dar; __u64 msr; __u32 dsisr; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 245bd2d..b144697 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -162,8 +162,8 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr) void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags) { - vcpu-arch.srr0 = kvmppc_get_pc(vcpu); - vcpu-arch.srr1 = vcpu-arch.shared-msr | flags; + vcpu-arch.shared-srr0 = kvmppc_get_pc(vcpu); + vcpu-arch.shared-srr1 = vcpu-arch.shared-msr | flags; kvmppc_set_pc(vcpu, to_book3s(vcpu)-hior + vec); vcpu-arch.mmu.reset_msr(vcpu); } @@ -1059,8 +1059,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs-lr = kvmppc_get_lr(vcpu); regs-xer = kvmppc_get_xer(vcpu); regs-msr = vcpu-arch.shared-msr; - regs-srr0 = vcpu-arch.srr0; - regs-srr1 = vcpu-arch.srr1; + regs-srr0 = vcpu-arch.shared-srr0; + regs-srr1 = vcpu-arch.shared-srr1; regs-pid = vcpu-arch.pid; regs-sprg0 = vcpu-arch.sprg0; regs-sprg1 = vcpu-arch.sprg1; @@ -1086,8 +1086,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvmppc_set_lr(vcpu, regs-lr); kvmppc_set_xer(vcpu, regs-xer); kvmppc_set_msr(vcpu, regs-msr); - vcpu-arch.srr0 = regs-srr0; - vcpu-arch.srr1 = regs-srr1; + vcpu-arch.shared-srr0 = regs-srr0; + vcpu-arch.shared-srr1 = regs-srr1; vcpu-arch.sprg0 = regs-sprg0; vcpu-arch.sprg1 = regs-sprg1; vcpu-arch.sprg2 = regs-sprg2; diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index c147864..f333cb4 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -73,8 +73,8 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, switch (get_xop(inst)) { case OP_19_XOP_RFID: case OP_19_XOP_RFI: - kvmppc_set_pc(vcpu, vcpu-arch.srr0); - kvmppc_set_msr(vcpu, vcpu-arch.srr1); + kvmppc_set_pc(vcpu, vcpu-arch.shared-srr0); + kvmppc_set_msr(vcpu, vcpu-arch.shared-srr1); *advance = 0; break; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 5844bcf..8b546fe 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -64,7 +64,8 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu) printk(pc: %08lx msr: %08llx\n, vcpu-arch.pc, vcpu-arch.shared-msr); printk(lr: %08lx ctr: %08lx\n, vcpu-arch.lr, vcpu-arch.ctr); - printk(srr0: %08lx srr1: %08lx\n, vcpu-arch.srr0, vcpu-arch.srr1); + printk(srr0: %08llx srr1: %08llx\n, vcpu-arch.shared-srr0, + vcpu-arch.shared-srr1); printk(exceptions: %08lx\n, vcpu-arch.pending_exceptions); @@ -189,8 +190,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, } if (allowed) { - vcpu-arch.srr0 = vcpu-arch.pc; - vcpu-arch.srr1 = vcpu-arch.shared-msr; + vcpu-arch.shared-srr0 = vcpu-arch.pc; +
[PATCH 02/26] KVM: PPC: Convert MSR to shared page
One of the most obvious registers to share with the guest directly is the MSR. The MSR contains the interrupts enabled flag which the guest has to toggle in critical sections. So in order to bring the overhead of interrupt en- and disabling down, let's put msr into the shared page. Keep in mind that even though you can fully read its contents, writing to it doesn't always update all state. There are a few safe fields that don't require hypervisor interaction. See the guest implementation that follows later for reference. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |1 - arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kernel/asm-offsets.c|2 +- arch/powerpc/kvm/44x_tlb.c |8 ++-- arch/powerpc/kvm/book3s.c| 65 -- arch/powerpc/kvm/book3s_32_mmu.c | 12 +++--- arch/powerpc/kvm/book3s_32_mmu_host.c|4 +- arch/powerpc/kvm/book3s_64_mmu.c | 12 +++--- arch/powerpc/kvm/book3s_64_mmu_host.c|4 +- arch/powerpc/kvm/book3s_emulate.c|9 ++-- arch/powerpc/kvm/book3s_paired_singles.c |7 ++- arch/powerpc/kvm/booke.c | 20 +- arch/powerpc/kvm/booke.h |6 +- arch/powerpc/kvm/booke_emulate.c |6 +- arch/powerpc/kvm/booke_interrupts.S |3 +- arch/powerpc/kvm/e500_tlb.c | 12 +++--- arch/powerpc/kvm/e500_tlb.h |2 +- arch/powerpc/kvm/powerpc.c |3 +- 18 files changed, 93 insertions(+), 84 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index bca9391..249c242 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -210,7 +210,6 @@ struct kvm_vcpu_arch { u32 cr; #endif - ulong msr; #ifdef CONFIG_PPC_BOOK3S ulong shadow_msr; ulong hflags; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 1485ba8..a17dc52 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,7 @@ #include linux/types.h struct kvm_vcpu_arch_shared { + __u64 msr; }; #ifdef __KERNEL__ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 944f593..a55d47e 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -394,13 +394,13 @@ int main(void) DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, arch.host_stack)); DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid)); DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr)); - DEFINE(VCPU_MSR, offsetof(struct kvm_vcpu, arch.msr)); DEFINE(VCPU_SPRG4, offsetof(struct kvm_vcpu, arch.sprg4)); DEFINE(VCPU_SPRG5, offsetof(struct kvm_vcpu, arch.sprg5)); DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6)); DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7)); DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid)); DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared)); + DEFINE(VCPU_SHARED_MSR, offsetof(struct kvm_vcpu_arch_shared, msr)); /* book3s */ #ifdef CONFIG_PPC_BOOK3S diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c index 8123125..4cbbca7 100644 --- a/arch/powerpc/kvm/44x_tlb.c +++ b/arch/powerpc/kvm/44x_tlb.c @@ -221,14 +221,14 @@ gpa_t kvmppc_mmu_xlate(struct kvm_vcpu *vcpu, unsigned int gtlb_index, int kvmppc_mmu_itlb_index(struct kvm_vcpu *vcpu, gva_t eaddr) { - unsigned int as = !!(vcpu-arch.msr MSR_IS); + unsigned int as = !!(vcpu-arch.shared-msr MSR_IS); return kvmppc_44x_tlb_index(vcpu, eaddr, vcpu-arch.pid, as); } int kvmppc_mmu_dtlb_index(struct kvm_vcpu *vcpu, gva_t eaddr) { - unsigned int as = !!(vcpu-arch.msr MSR_DS); + unsigned int as = !!(vcpu-arch.shared-msr MSR_DS); return kvmppc_44x_tlb_index(vcpu, eaddr, vcpu-arch.pid, as); } @@ -353,7 +353,7 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gpa_t gpaddr, stlbe.word1 = (hpaddr 0xfc00) | ((hpaddr 32) 0xf); stlbe.word2 = kvmppc_44x_tlb_shadow_attrib(flags, - vcpu-arch.msr MSR_PR); + vcpu-arch.shared-msr MSR_PR); stlbe.tid = !(asid 0xff); /* Keep track of the reference so we can properly release it later. */ @@ -422,7 +422,7 @@ static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu, /* Does it match current guest AS? */ /* XXX what about IS != DS? */ - if (get_tlb_ts(tlbe) != !!(vcpu-arch.msr MSR_IS)) + if (get_tlb_ts(tlbe) != !!(vcpu-arch.shared-msr MSR_IS)) return 0; gpa = get_tlb_raddr(tlbe); diff --git
[PATCH 04/26] KVM: PPC: Convert DAR to shared page.
The DAR register contains the address a data page fault occured at. This register behaves pretty much like a simple data storage register that gets written to on data faults. There is no hypervisor interaction required on read or write. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |1 - arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kvm/book3s.c| 14 +++--- arch/powerpc/kvm/book3s_emulate.c|6 +++--- arch/powerpc/kvm/book3s_paired_singles.c |2 +- arch/powerpc/kvm/booke.c |2 +- arch/powerpc/kvm/booke_emulate.c |4 ++-- 7 files changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 249c242..108dabc 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -230,7 +230,6 @@ struct kvm_vcpu_arch { ulong csrr1; ulong dsrr0; ulong dsrr1; - ulong dear; ulong esr; u32 dec; u32 decar; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 9f7565b..ec72a1c 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,7 @@ #include linux/types.h struct kvm_vcpu_arch_shared { + __u64 dar; __u64 msr; __u32 dsisr; }; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 57fd73e..245bd2d 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -594,14 +594,14 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (page_found == -ENOENT) { /* Page not found in guest PTE entries */ - vcpu-arch.dear = kvmppc_get_fault_dar(vcpu); + vcpu-arch.shared-dar = kvmppc_get_fault_dar(vcpu); vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr; vcpu-arch.shared-msr |= (to_svcpu(vcpu)-shadow_srr1 0xf800ULL); kvmppc_book3s_queue_irqprio(vcpu, vec); } else if (page_found == -EPERM) { /* Storage protection */ - vcpu-arch.dear = kvmppc_get_fault_dar(vcpu); + vcpu-arch.shared-dar = kvmppc_get_fault_dar(vcpu); vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr ~DSISR_NOHPTE; vcpu-arch.shared-dsisr |= DSISR_PROTFAULT; @@ -610,7 +610,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_book3s_queue_irqprio(vcpu, vec); } else if (page_found == -EINVAL) { /* Page not found in guest SLB */ - vcpu-arch.dear = kvmppc_get_fault_dar(vcpu); + vcpu-arch.shared-dar = kvmppc_get_fault_dar(vcpu); kvmppc_book3s_queue_irqprio(vcpu, vec + 0x80); } else if (!is_mmio kvmppc_visible_gfn(vcpu, pte.raddr PAGE_SHIFT)) { @@ -867,17 +867,17 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, if (to_svcpu(vcpu)-fault_dsisr DSISR_NOHPTE) { r = kvmppc_handle_pagefault(run, vcpu, dar, exit_nr); } else { - vcpu-arch.dear = dar; + vcpu-arch.shared-dar = dar; vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr; kvmppc_book3s_queue_irqprio(vcpu, exit_nr); - kvmppc_mmu_pte_flush(vcpu, vcpu-arch.dear, ~0xFFFUL); + kvmppc_mmu_pte_flush(vcpu, dar, ~0xFFFUL); r = RESUME_GUEST; } break; } case BOOK3S_INTERRUPT_DATA_SEGMENT: if (kvmppc_mmu_map_segment(vcpu, kvmppc_get_fault_dar(vcpu)) 0) { - vcpu-arch.dear = kvmppc_get_fault_dar(vcpu); + vcpu-arch.shared-dar = kvmppc_get_fault_dar(vcpu); kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DATA_SEGMENT); } @@ -997,7 +997,7 @@ program_interrupt: if (kvmppc_read_inst(vcpu) == EMULATE_DONE) { vcpu-arch.shared-dsisr = kvmppc_alignment_dsisr(vcpu, kvmppc_get_last_inst(vcpu)); - vcpu-arch.dear = kvmppc_alignment_dar(vcpu, + vcpu-arch.shared-dar = kvmppc_alignment_dar(vcpu, kvmppc_get_last_inst(vcpu)); kvmppc_book3s_queue_irqprio(vcpu, exit_nr); } diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index 9982ff1..c147864 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++
Re: [PATCH v2 3/4] KVM: cleanup: remove kvm_get_dirty_log()
On Fri, 25 Jun 2010 21:25:57 +0200 Alexander Graf ag...@suse.de wrote: This patch plus 4/4 broke dirty bitmap updating on PPC. I didn't get around to track down why, but I figured you should now. Is there any way to get you a PPC development box? A simple G4 or G5 should be 200$ on ebay by now :). I'm sorry, I thought this change was just trivial code transformation and test for x86 would be OK: but not actually. Probably the reason is around the timing of copy_to_user() and newly introduced clear_user() for clean slot. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/4] KVM: cleanup: remove kvm_get_dirty_log()
This patch plus 4/4 broke dirty bitmap updating on PPC. I didn't get around to track down why, but I figured you should now. Is there any way to get you a PPC development box? A simple G4 or G5 should be 200$ on ebay by now :). A simple G4 or G5, thanks for the info, I'll buy one. I hope I can contribute a bit from there to kvm-ppc :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Where is the entry point of hypercalls in kvm
Hello, I am trying to understand the virtio mechanism in linux. I read that the kick function will notify the host side about the newly published buffers. I am looking especially at virtio_net.Once a packet is ready for transmission the kick function is called. From here i where does it go? Which code contains the backend driver of virtio. Where is the code in the hypervisor which this kick will go to? Thank you... Thanks, Bala -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm/ppc: fix build warning
On 06/25/2010 12:42 AM, Alexander Graf wrote: On 24.06.2010, at 21:44, Denis Kirjanov wrote: Fix build warning: arch/powerpc/kvm/book3s_64_mmu.c: In function 'kvmppc_mmu_book3s_64_esid_to_vsid': arch/powerpc/kvm/book3s_64_mmu.c:446: warning: 'slb' may be used uninitialized in this function Signed-off-by: Denis Kirjanovdkirja...@kernel.org Are you sure this isn't a broken compiler? I don't see where it could be used uninitialized. I'm using gcc version 4.3.4 (Gentoo 4.3.4 p1.1, pie-10.1.5) slb pointer initialized inside conditional branch and used later in the case case MSR_DR|MSR_IR -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm/ppc: fix build warning
On 25.06.2010, at 11:02, Denis Kirjanov wrote: On 06/25/2010 12:42 AM, Alexander Graf wrote: On 24.06.2010, at 21:44, Denis Kirjanov wrote: Fix build warning: arch/powerpc/kvm/book3s_64_mmu.c: In function 'kvmppc_mmu_book3s_64_esid_to_vsid': arch/powerpc/kvm/book3s_64_mmu.c:446: warning: 'slb' may be used uninitialized in this function Signed-off-by: Denis Kirjanovdkirja...@kernel.org Are you sure this isn't a broken compiler? I don't see where it could be used uninitialized. I'm using gcc version 4.3.4 (Gentoo 4.3.4 p1.1, pie-10.1.5) slb pointer initialized inside conditional branch and used later in the case case MSR_DR|MSR_IR Oh, I'm apparently looking at completely different code. The same function in git://git.kernel.org/pub/scm/virt/kvm/kvm.git is good. Which tree did you use? Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm/ppc: fix build warning
On 06/25/2010 01:02 PM, Denis Kirjanov wrote: On 06/25/2010 12:42 AM, Alexander Graf wrote: On 24.06.2010, at 21:44, Denis Kirjanov wrote: Fix build warning: arch/powerpc/kvm/book3s_64_mmu.c: In function 'kvmppc_mmu_book3s_64_esid_to_vsid': arch/powerpc/kvm/book3s_64_mmu.c:446: warning: 'slb' may be used uninitialized in this function Signed-off-by: Denis Kirjanovdkirja...@kernel.org Are you sure this isn't a broken compiler? I don't see where it could be used uninitialized. I'm using gcc version 4.3.4 (Gentoo 4.3.4 p1.1, pie-10.1.5) slb pointer initialized inside conditional branch and used later in the case case MSR_DR|MSR_IR This is based on linux-next tree (-next-20100623) -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/4] KVM: cleanup: remove kvm_get_dirty_log()
On 23.06.2010, at 08:01, Takuya Yoshikawa wrote: kvm_get_dirty_log() is a helper function for kvm_vm_ioctl_get_dirty_log() which is currently used by ia64 and ppc and the following is what it is doing: - sanity checks - bitmap scan to check if the slot is dirty - copy_to_user() Considering the fact that x86 is not using this anymore and sanity checks must be done before kvm_ia64_sync_dirty_log(), we can say that this is not working for code sharing effectively. So we just remove this. This patch plus 4/4 broke dirty bitmap updating on PPC. I didn't get around to track down why, but I figured you should now. Is there any way to get you a PPC development box? A simple G4 or G5 should be 200$ on ebay by now :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Book3S_32 MMU debug compile fixes
Due to previous changes, the Book3S_32 guest MMU code didn't compile properly when enabling debugging. This patch repairs the broken code paths, making it possible to define DEBUG_MMU and friends again. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_32_mmu.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 3292d76..079760b 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -104,7 +104,7 @@ static hva_t kvmppc_mmu_book3s_32_get_pteg(struct kvmppc_vcpu_book3s *vcpu_book3 pteg = (vcpu_book3s-sdr1 0x) | hash; dprintk(MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n, - vcpu_book3s-vcpu.arch.pc, eaddr, vcpu_book3s-sdr1, pteg, + kvmppc_get_pc(vcpu_book3s-vcpu), eaddr, vcpu_book3s-sdr1, pteg, sre-vsid); r = gfn_to_hva(vcpu_book3s-vcpu.kvm, pteg PAGE_SHIFT); @@ -269,7 +269,7 @@ no_page_found: dprintk_pte(KVM MMU: No PTE found (sdr1=0x%llx ptegp=0x%lx)\n, to_book3s(vcpu)-sdr1, ptegp); for (i=0; i16; i+=2) { - dprintk_pte( %02d: 0x%x - 0x%x (0x%llx)\n, + dprintk_pte( %02d: 0x%x - 0x%x (0x%x)\n, i, pteg[i], pteg[i+1], ptem); } } -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: Make use of hash based Shadow MMU
We just introduced generic functions to handle shadow pages on PPC. This patch makes the respective backends make use of them, getting rid of a lot of duplicate code along the way. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h |7 ++ arch/powerpc/include/asm/kvm_host.h | 18 +- arch/powerpc/kvm/Makefile |2 + arch/powerpc/kvm/book3s_32_mmu_host.c | 104 +++- arch/powerpc/kvm/book3s_64_mmu_host.c | 98 ++ 5 files changed, 41 insertions(+), 188 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 4e99559..a96e405 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -115,6 +115,13 @@ extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu *vcpu); extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte); extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr); extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu); + +extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte); +extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu); +extern void kvmppc_mmu_hpte_destroy(struct kvm_vcpu *vcpu); +extern int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu); +extern void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte); + extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data); extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, bool data); extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 0c9ad86..895eb63 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -38,7 +38,13 @@ #define KVM_NR_PAGE_SIZES 1 #define KVM_PAGES_PER_HPAGE(x) (1UL31) -#define HPTEG_CACHE_NUM 1024 +#define HPTEG_CACHE_NUM(1 15) +#define HPTEG_HASH_BITS_PTE13 +#define HPTEG_HASH_BITS_VPTE 13 +#define HPTEG_HASH_BITS_VPTE_LONG 5 +#define HPTEG_HASH_NUM_PTE (1 HPTEG_HASH_BITS_PTE) +#define HPTEG_HASH_NUM_VPTE(1 HPTEG_HASH_BITS_VPTE) +#define HPTEG_HASH_NUM_VPTE_LONG (1 HPTEG_HASH_BITS_VPTE_LONG) struct kvm; struct kvm_run; @@ -151,6 +157,9 @@ struct kvmppc_mmu { }; struct hpte_cache { + struct list_head list_pte; + struct list_head list_vpte; + struct list_head list_vpte_long; u64 host_va; u64 pfn; ulong slot; @@ -282,8 +291,11 @@ struct kvm_vcpu_arch { unsigned long pending_exceptions; #ifdef CONFIG_PPC_BOOK3S - struct hpte_cache hpte_cache[HPTEG_CACHE_NUM]; - int hpte_cache_offset; + struct kmem_cache *hpte_cache; + struct list_head hpte_hash_pte[HPTEG_HASH_NUM_PTE]; + struct list_head hpte_hash_vpte[HPTEG_HASH_NUM_VPTE]; + struct list_head hpte_hash_vpte_long[HPTEG_HASH_NUM_VPTE_LONG]; + int hpte_cache_count; #endif }; diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index ff43606..d45c818 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -45,6 +45,7 @@ kvm-book3s_64-objs := \ book3s.o \ book3s_emulate.o \ book3s_interrupts.o \ + book3s_mmu_hpte.o \ book3s_64_mmu_host.o \ book3s_64_mmu.o \ book3s_32_mmu.o @@ -57,6 +58,7 @@ kvm-book3s_32-objs := \ book3s.o \ book3s_emulate.o \ book3s_interrupts.o \ + book3s_mmu_hpte.o \ book3s_32_mmu_host.o \ book3s_32_mmu.o kvm-objs-$(CONFIG_KVM_BOOK3S_32) := $(kvm-book3s_32-objs) diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c index 904f5ac..0b51ef8 100644 --- a/arch/powerpc/kvm/book3s_32_mmu_host.c +++ b/arch/powerpc/kvm/book3s_32_mmu_host.c @@ -58,105 +58,19 @@ static ulong htab; static u32 htabmask; -static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte) +void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte) { volatile u32 *pteg; - dprintk_mmu(KVM: Flushing SPTE: 0x%llx (0x%llx) - 0x%llx\n, - pte-pte.eaddr, pte-pte.vpage, pte-host_va); - + /* Remove from host HTAB */ pteg = (u32*)pte-slot; - pteg[0] = 0; + + /* And make sure it's gone from the TLB too */ asm volatile (sync); asm volatile (tlbie %0 : : r (pte-pte.eaddr) : memory); asm volatile (sync); asm volatile (tlbsync); - - pte-host_va = 0; - - if (pte-pte.may_write) - kvm_release_pfn_dirty(pte-pfn); - else - kvm_release_pfn_clean(pte-pfn); -} - -void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong
Re: [PATCH] KVM: PPC: Add generic hpte management functions
On 26.06.2010, at 01:16, Alexander Graf wrote: Currently the shadow paging code keeps an array of entries it knows about. Whenever the guest invalidates an entry, we loop through that entry, trying to invalidate matching parts. While this is a really simple implementation, it is probably the most ineffective one possible. So instead, let's keep an array of lists around that are indexed by a hash. This way each PTE can be added by 4 list_add, removed by 4 list_del invocations and the search only needs to loop through entries that share the same hash. This patch implements said lookup and exports generic functions that both the 32-bit and 64-bit backend can use. Yikes - I forgot -n. This is patch 1/2. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/26] KVM: PPC: Tell guest about pending interrupts
When the guest turns on interrupts again, it needs to know if we have an interrupt pending for it. Because if so, it should rather get out of guest context and get the interrupt. So we introduce a new field in the shared page that we use to tell the guest that there's a pending interrupt lying around. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_para.h |1 + arch/powerpc/kvm/book3s.c |7 +++ arch/powerpc/kvm/booke.c|7 +++ 3 files changed, 15 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index edf8f83..c7305d7 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -36,6 +36,7 @@ struct kvm_vcpu_arch_shared { __u64 dar; __u64 msr; __u32 dsisr; + __u32 int_pending; /* Tells the guest if we have an interrupt */ }; #define KVM_PVR_PARA 0x4b564d3f /* KVM? */ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index f0e8047..e76c950 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -334,6 +334,7 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) { unsigned long *pending = vcpu-arch.pending_exceptions; + unsigned long old_pending = vcpu-arch.pending_exceptions; unsigned int priority; #ifdef EXIT_DEBUG @@ -353,6 +354,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) BITS_PER_BYTE * sizeof(*pending), priority + 1); } + + /* Tell the guest about our interrupt status */ + if (*pending) + vcpu-arch.shared-int_pending = 1; + else if (old_pending) + vcpu-arch.shared-int_pending = 0; } void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 485f8fa..2229df9 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -221,6 +221,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) { unsigned long *pending = vcpu-arch.pending_exceptions; + unsigned long old_pending = vcpu-arch.pending_exceptions; unsigned int priority; priority = __ffs(*pending); @@ -232,6 +233,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) BITS_PER_BYTE * sizeof(*pending), priority + 1); } + + /* Tell the guest about our interrupt status */ + if (*pending) + vcpu-arch.shared-int_pending = 1; + else if (old_pending) + vcpu-arch.shared-int_pending = 0; } /** -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/26] KVM: PPC: Convert SPRG[0-4] to shared page
When in kernel mode there are 4 additional registers available that are simple data storage. Instead of exiting to the hypervisor to read and write those, we can just share them with the guest using the page. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |4 arch/powerpc/include/asm/kvm_para.h |4 arch/powerpc/kvm/book3s.c | 16 arch/powerpc/kvm/booke.c| 16 arch/powerpc/kvm/emulate.c | 24 5 files changed, 36 insertions(+), 28 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 6bcf62f..83c45ea 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -216,10 +216,6 @@ struct kvm_vcpu_arch { ulong guest_owned_ext; #endif u32 mmucr; - ulong sprg0; - ulong sprg1; - ulong sprg2; - ulong sprg3; ulong sprg4; ulong sprg5; ulong sprg6; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index d7fc6c2..e402999 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,10 @@ #include linux/types.h struct kvm_vcpu_arch_shared { + __u64 sprg0; + __u64 sprg1; + __u64 sprg2; + __u64 sprg3; __u64 srr0; __u64 srr1; __u64 dar; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index b144697..5a6f055 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1062,10 +1062,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs-srr0 = vcpu-arch.shared-srr0; regs-srr1 = vcpu-arch.shared-srr1; regs-pid = vcpu-arch.pid; - regs-sprg0 = vcpu-arch.sprg0; - regs-sprg1 = vcpu-arch.sprg1; - regs-sprg2 = vcpu-arch.sprg2; - regs-sprg3 = vcpu-arch.sprg3; + regs-sprg0 = vcpu-arch.shared-sprg0; + regs-sprg1 = vcpu-arch.shared-sprg1; + regs-sprg2 = vcpu-arch.shared-sprg2; + regs-sprg3 = vcpu-arch.shared-sprg3; regs-sprg5 = vcpu-arch.sprg4; regs-sprg6 = vcpu-arch.sprg5; regs-sprg7 = vcpu-arch.sprg6; @@ -1088,10 +1088,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvmppc_set_msr(vcpu, regs-msr); vcpu-arch.shared-srr0 = regs-srr0; vcpu-arch.shared-srr1 = regs-srr1; - vcpu-arch.sprg0 = regs-sprg0; - vcpu-arch.sprg1 = regs-sprg1; - vcpu-arch.sprg2 = regs-sprg2; - vcpu-arch.sprg3 = regs-sprg3; + vcpu-arch.shared-sprg0 = regs-sprg0; + vcpu-arch.shared-sprg1 = regs-sprg1; + vcpu-arch.shared-sprg2 = regs-sprg2; + vcpu-arch.shared-sprg3 = regs-sprg3; vcpu-arch.sprg5 = regs-sprg4; vcpu-arch.sprg6 = regs-sprg5; vcpu-arch.sprg7 = regs-sprg6; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 8b546fe..984c461 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -495,10 +495,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs-srr0 = vcpu-arch.shared-srr0; regs-srr1 = vcpu-arch.shared-srr1; regs-pid = vcpu-arch.pid; - regs-sprg0 = vcpu-arch.sprg0; - regs-sprg1 = vcpu-arch.sprg1; - regs-sprg2 = vcpu-arch.sprg2; - regs-sprg3 = vcpu-arch.sprg3; + regs-sprg0 = vcpu-arch.shared-sprg0; + regs-sprg1 = vcpu-arch.shared-sprg1; + regs-sprg2 = vcpu-arch.shared-sprg2; + regs-sprg3 = vcpu-arch.shared-sprg3; regs-sprg5 = vcpu-arch.sprg4; regs-sprg6 = vcpu-arch.sprg5; regs-sprg7 = vcpu-arch.sprg6; @@ -521,10 +521,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvmppc_set_msr(vcpu, regs-msr); vcpu-arch.shared-srr0 = regs-srr0; vcpu-arch.shared-srr1 = regs-srr1; - vcpu-arch.sprg0 = regs-sprg0; - vcpu-arch.sprg1 = regs-sprg1; - vcpu-arch.sprg2 = regs-sprg2; - vcpu-arch.sprg3 = regs-sprg3; + vcpu-arch.shared-sprg0 = regs-sprg0; + vcpu-arch.shared-sprg1 = regs-sprg1; + vcpu-arch.shared-sprg2 = regs-sprg2; + vcpu-arch.shared-sprg3 = regs-sprg3; vcpu-arch.sprg5 = regs-sprg4; vcpu-arch.sprg6 = regs-sprg5; vcpu-arch.sprg7 = regs-sprg6; diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index ad0fa4f..454869b 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -263,13 +263,17 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) kvmppc_set_gpr(vcpu, rt, get_tb()); break; case SPRN_SPRG0: - kvmppc_set_gpr(vcpu, rt,
[PATCH 01/26] KVM: PPC: Introduce shared page
For transparent variable sharing between the hypervisor and guest, I introduce a shared page. This shared page will contain all the registers the guest can read and write safely without exiting guest context. This patch only implements the stubs required for the basic structure of the shared page. The actual register moving follows. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |2 ++ arch/powerpc/include/asm/kvm_para.h |5 + arch/powerpc/kernel/asm-offsets.c |1 + arch/powerpc/kvm/44x.c |7 +++ arch/powerpc/kvm/book3s.c |7 +++ arch/powerpc/kvm/e500.c |7 +++ 6 files changed, 29 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 895eb63..bca9391 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -25,6 +25,7 @@ #include linux/interrupt.h #include linux/types.h #include linux/kvm_types.h +#include linux/kvm_para.h #include asm/kvm_asm.h #define KVM_MAX_VCPUS 1 @@ -289,6 +290,7 @@ struct kvm_vcpu_arch { struct tasklet_struct tasklet; u64 dec_jiffies; unsigned long pending_exceptions; + struct kvm_vcpu_arch_shared *shared; #ifdef CONFIG_PPC_BOOK3S struct kmem_cache *hpte_cache; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 2d48f6a..1485ba8 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -20,6 +20,11 @@ #ifndef __POWERPC_KVM_PARA_H__ #define __POWERPC_KVM_PARA_H__ +#include linux/types.h + +struct kvm_vcpu_arch_shared { +}; + #ifdef __KERNEL__ static inline int kvm_para_available(void) diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 496cc5b..944f593 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -400,6 +400,7 @@ int main(void) DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6)); DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7)); DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid)); + DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared)); /* book3s */ #ifdef CONFIG_PPC_BOOK3S diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c index 73c0a3f..e7b1f3f 100644 --- a/arch/powerpc/kvm/44x.c +++ b/arch/powerpc/kvm/44x.c @@ -123,8 +123,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto free_vcpu; + vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO); + if (!vcpu-arch.shared) + goto uninit_vcpu; + return vcpu; +uninit_vcpu: + kvm_vcpu_uninit(vcpu); free_vcpu: kmem_cache_free(kvm_vcpu_cache, vcpu_44x); out: @@ -135,6 +141,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu) { struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu); + free_page((unsigned long)vcpu-arch.shared); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, vcpu_44x); } diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 884d4a5..ba79b35 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -1247,6 +1247,10 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto free_shadow_vcpu; + vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO); + if (!vcpu-arch.shared) + goto uninit_vcpu; + vcpu-arch.host_retip = kvm_return_point; vcpu-arch.host_msr = mfmsr(); #ifdef CONFIG_PPC_BOOK3S_64 @@ -1277,6 +1281,8 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) return vcpu; +uninit_vcpu: + kvm_vcpu_uninit(vcpu); free_shadow_vcpu: kfree(vcpu_book3s-shadow_vcpu); free_vcpu: @@ -1289,6 +1295,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu) { struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu); + free_page((unsigned long)vcpu-arch.shared); kvm_vcpu_uninit(vcpu); kfree(vcpu_book3s-shadow_vcpu); vfree(vcpu_book3s); diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c index e8a00b0..71750f2 100644 --- a/arch/powerpc/kvm/e500.c +++ b/arch/powerpc/kvm/e500.c @@ -117,8 +117,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) if (err) goto uninit_vcpu; + vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO); + if (!vcpu-arch.shared) + goto uninit_tlb; + return vcpu; +uninit_tlb: + kvmppc_e500_tlb_uninit(vcpu_e500); uninit_vcpu: kvm_vcpu_uninit(vcpu); free_vcpu: @@ -131,6 +137,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu) { struct kvmppc_vcpu_e500 *vcpu_e500 =
[PATCH 09/26] KVM: PPC: Add PV guest scratch registers
While running in hooked code we need to store register contents out because we must not clobber any registers. So let's add some fields to the shared page we can just happily write to. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_para.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index d1fe9ae..edf8f83 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,9 @@ #include linux/types.h struct kvm_vcpu_arch_shared { + __u64 scratch1; + __u64 scratch2; + __u64 scratch3; __u64 critical; /* Guest may not get interrupts if == r1 */ __u64 sprg0; __u64 sprg1; -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 26/26] KVM: PPC: Add Documentation about PV interface
We just introduced a new PV interface that screams for documentation. So here it is - a shiny new and awesome text file describing the internal works of the PPC KVM paravirtual interface. Signed-off-by: Alexander Graf ag...@suse.de --- Documentation/kvm/ppc-pv.txt | 164 ++ 1 files changed, 164 insertions(+), 0 deletions(-) create mode 100644 Documentation/kvm/ppc-pv.txt diff --git a/Documentation/kvm/ppc-pv.txt b/Documentation/kvm/ppc-pv.txt new file mode 100644 index 000..7cbcd51 --- /dev/null +++ b/Documentation/kvm/ppc-pv.txt @@ -0,0 +1,164 @@ +The PPC KVM paravirtual interface += + +The basic execution principle by which KVM on PowerPC works is to run all kernel +space code in PR=1 which is user space. This way we trap all privileged +instructions and can emulate them accordingly. + +Unfortunately that is also the downfall. There are quite some privileged +instructions that needlessly return us to the hypervisor even though they +could be handled differently. + +This is what the PPC PV interface helps with. It takes privileged instructions +and transforms them into unprivileged ones with some help from the hypervisor. +This cuts down virtualization costs by about 50% on some of my benchmarks. + +The code for that interface can be found in arch/powerpc/kernel/kvm* + +Querying for existence +== + +To find out if we're running on KVM or not, we overlay the PVR register. Usually +the PVR register contains an id that identifies your CPU type. If, however, you +pass KVM_PVR_PARA in the register that you want the PVR result in, the register +still contains KVM_PVR_PARA after the mfpvr call. + + LOAD_REG_IMM(r5, KVM_PVR_PARA) + mfpvr r5 + [r5 still contains KVM_PVR_PARA] + +Once determined to run under a PV capable KVM, you can now use hypercalls as +described below. + +PPC hypercalls +== + +The only viable ways to reliably get from guest context to host context are: + + 1) Call an invalid instruction + 2) Call the sc instruction with a parameter to sc + 3) Call the sc instruction with parameters in GPRs + +Method 1 is always a bad idea. Invalid instructions can be replaced later on +by valid instructions, rendering the interface broken. + +Method 2 also has downfalls. If the parameter to sc is != 0 the spec is +rather unclear if the sc is targeted directly for the hypervisor or the +supervisor. It would also require that we read the syscall issuing instruction +every time a syscall is issued, slowing down guest syscalls. + +Method 3 is what KVM uses. We pass magic constants (KVM_SC_MAGIC_R3 and +KVM_SC_MAGIC_R4) in r3 and r4 respectively. If a syscall instruction with these +magic values arrives from the guest's kernel mode, we take the syscall as a +hypercall. + +The parameters are as follows: + + r3 KVM_SC_MAGIC_R3 + r4 KVM_SC_MAGIC_R4 + r5 Hypercall number + r6 First parameter + r7 Second parameter + r8 Third parameter + r9 Fourth parameter + +Hypercall definitions are shared in generic code, so the same hypercall numbers +apply for x86 and powerpc alike. + +The magic page +== + +To enable communication between the hypervisor and guest there is a new shared +page that contains parts of supervisor visible register state. The guest can +map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE. + +With this hypercall issued the guest always gets the magic page mapped at the +desired location in effective and physical address space. For now, we always +map the page to -4096. This way we can access it using absolute load and store +functions. The following instruction reads the first field of the magic page: + + ld rX, -4096(0) + +The interface is designed to be extensible should there be need later to add +additional registers to the magic page. If you add fields to the magic page, +also define a new hypercall feature to indicate that the host can give you more +registers. Only if the host supports the additional features, make use of them. + +The magic page has the following layout as described in +arch/powerpc/include/asm/kvm_para.h: + +struct kvm_vcpu_arch_shared { + __u64 scratch1; + __u64 scratch2; + __u64 scratch3; + __u64 critical; /* Guest may not get interrupts if == r1 */ + __u64 sprg0; + __u64 sprg1; + __u64 sprg2; + __u64 sprg3; + __u64 srr0; + __u64 srr1; + __u64 dar; + __u64 msr; + __u32 dsisr; + __u32 int_pending; /* Tells the guest if we have an interrupt */ +}; + +Additions to the page must only occur at the end. Struct fields are always 32 +bit aligned. + +Patched instructions + + +The ld and std instructions are transormed to lwz and stw instructions
[PATCH 14/26] KVM: PPC: Magic Page BookE support
As we now have Book3s support for the magic page, we also need BookE to join in on the party. This patch implements generic magic page logic for BookE and specific TLB logic for e500. I didn't have any 440 around, so I didn't dare to blindly try and write up broken code. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/booke.c| 29 + arch/powerpc/kvm/e500_tlb.c | 19 +-- 2 files changed, 46 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 2229df9..7957aa4 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -241,6 +241,31 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu) vcpu-arch.shared-int_pending = 0; } +/* Check if a DTLB miss was on the magic page. Returns !0 if so. */ +int kvmppc_dtlb_magic_page(struct kvm_vcpu *vcpu, ulong eaddr) +{ + ulong mp_ea = vcpu-arch.magic_page_ea; + ulong gpaddr = vcpu-arch.magic_page_pa; + int gtlb_index = 11 | (1 16); /* Random number in TLB1 */ + + /* Check for existence of magic page */ + if(likely(!mp_ea)) + return 0; + + /* Check if we're on the magic page */ + if(likely((eaddr 12) != (mp_ea 12))) + return 0; + + /* Don't map in user mode */ + if(vcpu-arch.shared-msr MSR_PR) + return 0; + + kvmppc_mmu_map(vcpu, vcpu-arch.magic_page_ea, gpaddr, gtlb_index); + kvmppc_account_exit(vcpu, DTLB_VIRT_MISS_EXITS); + + return 1; +} + /** * kvmppc_handle_exit * @@ -308,6 +333,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, r = RESUME_HOST; break; case EMULATE_FAIL: + case EMULATE_DO_MMIO: /* XXX Deliver Program interrupt to guest. */ printk(KERN_CRIT %s: emulation at %lx failed (%08x)\n, __func__, vcpu-arch.pc, vcpu-arch.last_inst); @@ -377,6 +403,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, gpa_t gpaddr; gfn_t gfn; + if (kvmppc_dtlb_magic_page(vcpu, eaddr)) + break; + /* Check the guest TLB. */ gtlb_index = kvmppc_mmu_dtlb_index(vcpu, eaddr); if (gtlb_index 0) { diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c index 66845a5..f5582ca 100644 --- a/arch/powerpc/kvm/e500_tlb.c +++ b/arch/powerpc/kvm/e500_tlb.c @@ -295,9 +295,22 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, struct page *new_page; struct tlbe *stlbe; hpa_t hpaddr; + u32 mas2 = gtlbe-mas2; + u32 mas3 = gtlbe-mas3; stlbe = vcpu_e500-shadow_tlb[tlbsel][esel]; + if ((vcpu_e500-vcpu.arch.magic_page_ea) + ((vcpu_e500-vcpu.arch.magic_page_pa PAGE_SHIFT) == gfn) + !(vcpu_e500-vcpu.arch.shared-msr MSR_PR)) { + mas2 = 0; + mas3 = E500_TLB_SUPER_PERM_MASK; + hpaddr = virt_to_phys(vcpu_e500-vcpu.arch.shared); + new_page = pfn_to_page(hpaddr PAGE_SHIFT); + get_page(new_page); + goto mapped; + } + /* Get reference to new page. */ new_page = gfn_to_page(vcpu_e500-vcpu.kvm, gfn); if (is_error_page(new_page)) { @@ -305,6 +318,8 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, kvm_release_page_clean(new_page); return; } + +mapped: hpaddr = page_to_phys(new_page); /* Drop reference to old page. */ @@ -316,10 +331,10 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, stlbe-mas1 = MAS1_TSIZE(BOOK3E_PAGESZ_4K) | MAS1_TID(get_tlb_tid(gtlbe)) | MAS1_TS | MAS1_VALID; stlbe-mas2 = (gvaddr MAS2_EPN) - | e500_shadow_mas2_attrib(gtlbe-mas2, + | e500_shadow_mas2_attrib(mas2, vcpu_e500-vcpu.arch.shared-msr MSR_PR); stlbe-mas3 = (hpaddr MAS3_RPN) - | e500_shadow_mas3_attrib(gtlbe-mas3, + | e500_shadow_mas3_attrib(mas3, vcpu_e500-vcpu.arch.shared-msr MSR_PR); stlbe-mas7 = (hpaddr 32) MAS7_RPN; -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 23/26] KVM: PPC: PV mtmsrd L=1
The PowerPC ISA has a special instruction for mtmsr that only changes the EE and RI bits, namely the L=1 form. Since that one is reasonably often occuring and simple to implement, let's go with this first. Writing EE=0 is always just a store. Doing EE=1 also requires us to check for pending interrupts and if necessary exit back to the hypervisor. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm.c | 45 arch/powerpc/kernel/kvm_emul.S | 56 2 files changed, 101 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 7e8fe6f..71153d0 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -62,6 +62,7 @@ #define KVM_INST_MTSPR_DSISR 0x7c1203a6 #define KVM_INST_TLBSYNC 0x7c00046c +#define KVM_INST_MTMSRD_L1 0x7c010164 static bool kvm_patching_worked = true; static char kvm_tmp[1024 * 1024]; @@ -117,6 +118,43 @@ static u32 *kvm_alloc(int len) return p; } +extern u32 kvm_emulate_mtmsrd_branch_offs; +extern u32 kvm_emulate_mtmsrd_reg_offs; +extern u32 kvm_emulate_mtmsrd_len; +extern u32 kvm_emulate_mtmsrd[]; + +static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt) +{ + u32 *p; + int distance_start; + int distance_end; + ulong next_inst; + + p = kvm_alloc(kvm_emulate_mtmsrd_len * 4); + if (!p) + return; + + /* Find out where we are and put everything there */ + distance_start = (ulong)p - (ulong)inst; + next_inst = ((ulong)inst + 4); + distance_end = next_inst - (ulong)p[kvm_emulate_mtmsrd_branch_offs]; + + /* Make sure we only write valid b instructions */ + if (distance_start KVM_INST_B_MAX) { + kvm_patching_worked = false; + return; + } + + /* Modify the chunk to fit the invocation */ + memcpy(p, kvm_emulate_mtmsrd, kvm_emulate_mtmsrd_len * 4); + p[kvm_emulate_mtmsrd_branch_offs] |= distance_end KVM_INST_B_MASK; + p[kvm_emulate_mtmsrd_reg_offs] |= rt; + flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsrd_len * 4); + + /* Patch the invocation */ + *inst = KVM_INST_B | (distance_start KVM_INST_B_MASK); +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -190,6 +228,13 @@ static void kvm_check_ins(u32 *inst) case KVM_INST_TLBSYNC: kvm_patch_ins_nop(inst); break; + + /* Rewrites */ + case KVM_INST_MTMSRD_L1: + /* We use r30 and r31 during the hook */ + if (get_rt(inst_rt) 30) + kvm_patch_ins_mtmsrd(inst, inst_rt); + break; } switch (_inst) { diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index 7da835a..25e6683 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -54,3 +54,59 @@ /* Disable critical section. We are critical if \ shared-critical == r1 and r2 is always != r1 */ \ STL64(r2, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0); + +.global kvm_emulate_mtmsrd +kvm_emulate_mtmsrd: + + SCRATCH_SAVE + + /* Put MSR ~(MSR_EE|MSR_RI) in r31 */ + LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + lis r30, (~(MSR_EE | MSR_RI))@h + ori r30, r30, (~(MSR_EE | MSR_RI))@l + and r31, r31, r30 + + /* OR the register's (MSR_EE|MSR_RI) on MSR */ +kvm_emulate_mtmsrd_reg: + andi. r30, r0, (MSR_EE|MSR_RI) + or r31, r31, r30 + + /* Put MSR back into magic page */ + STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + /* Check if we have to fetch an interrupt */ + lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0) + cmpwi r31, 0 + beq+no_check + + /* Check if we may trigger an interrupt */ + andi. r30, r30, MSR_EE + beq no_check + + SCRATCH_RESTORE + + /* Nag hypervisor */ + tlbsync + + b kvm_emulate_mtmsrd_branch + +no_check: + + SCRATCH_RESTORE + + /* Go back to caller */ +kvm_emulate_mtmsrd_branch: + b . +kvm_emulate_mtmsrd_end: + +.global kvm_emulate_mtmsrd_branch_offs +kvm_emulate_mtmsrd_branch_offs: + .long (kvm_emulate_mtmsrd_branch - kvm_emulate_mtmsrd) / 4 + +.global kvm_emulate_mtmsrd_reg_offs +kvm_emulate_mtmsrd_reg_offs: + .long (kvm_emulate_mtmsrd_reg - kvm_emulate_mtmsrd) / 4 + +.global kvm_emulate_mtmsrd_len +kvm_emulate_mtmsrd_len: + .long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4 -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 19/26] KVM: PPC: PV instructions to loads and stores
Some instructions can simply be replaced by load and store instructions to or from the magic page. This patch replaces often called instructions that fall into the above category. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm.c | 111 + 1 files changed, 111 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index d873bc6..b165b20 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -32,8 +32,65 @@ #define KVM_MAGIC_PAGE (-4096L) #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x) +#define KVM_INST_LWZ 0x8000 +#define KVM_INST_STW 0x9000 +#define KVM_INST_LD0xe800 +#define KVM_INST_STD 0xf800 +#define KVM_INST_NOP 0x6000 +#define KVM_INST_B 0x4800 +#define KVM_INST_B_MASK0x03ff +#define KVM_INST_B_MAX 0x01ff + +#define KVM_MASK_RT0x03e0 +#define KVM_INST_MFMSR 0x7ca6 +#define KVM_INST_MFSPR_SPRG0 0x7c1042a6 +#define KVM_INST_MFSPR_SPRG1 0x7c1142a6 +#define KVM_INST_MFSPR_SPRG2 0x7c1242a6 +#define KVM_INST_MFSPR_SPRG3 0x7c1342a6 +#define KVM_INST_MFSPR_SRR00x7c1a02a6 +#define KVM_INST_MFSPR_SRR10x7c1b02a6 +#define KVM_INST_MFSPR_DAR 0x7c1302a6 +#define KVM_INST_MFSPR_DSISR 0x7c1202a6 + +#define KVM_INST_MTSPR_SPRG0 0x7c1043a6 +#define KVM_INST_MTSPR_SPRG1 0x7c1143a6 +#define KVM_INST_MTSPR_SPRG2 0x7c1243a6 +#define KVM_INST_MTSPR_SPRG3 0x7c1343a6 +#define KVM_INST_MTSPR_SRR00x7c1a03a6 +#define KVM_INST_MTSPR_SRR10x7c1b03a6 +#define KVM_INST_MTSPR_DAR 0x7c1303a6 +#define KVM_INST_MTSPR_DSISR 0x7c1203a6 + static bool kvm_patching_worked = true; +static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt) +{ +#ifdef CONFIG_64BIT + *inst = KVM_INST_LD | rt | (addr 0xfffc); +#else + *inst = KVM_INST_LWZ | rt | ((addr + 4) 0xfffc); +#endif +} + +static void kvm_patch_ins_lwz(u32 *inst, long addr, u32 rt) +{ + *inst = KVM_INST_LWZ | rt | (addr 0x); +} + +static void kvm_patch_ins_std(u32 *inst, long addr, u32 rt) +{ +#ifdef CONFIG_64BIT + *inst = KVM_INST_STD | rt | (addr 0xfffc); +#else + *inst = KVM_INST_STW | rt | ((addr + 4) 0xfffc); +#endif +} + +static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt) +{ + *inst = KVM_INST_STW | rt | (addr 0xfffc); +} + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -48,6 +105,60 @@ static void kvm_check_ins(u32 *inst) u32 inst_rt = _inst KVM_MASK_RT; switch (inst_no_rt) { + /* Loads */ + case KVM_INST_MFMSR: + kvm_patch_ins_ld(inst, magic_var(msr), inst_rt); + break; + case KVM_INST_MFSPR_SPRG0: + kvm_patch_ins_ld(inst, magic_var(sprg0), inst_rt); + break; + case KVM_INST_MFSPR_SPRG1: + kvm_patch_ins_ld(inst, magic_var(sprg1), inst_rt); + break; + case KVM_INST_MFSPR_SPRG2: + kvm_patch_ins_ld(inst, magic_var(sprg2), inst_rt); + break; + case KVM_INST_MFSPR_SPRG3: + kvm_patch_ins_ld(inst, magic_var(sprg3), inst_rt); + break; + case KVM_INST_MFSPR_SRR0: + kvm_patch_ins_ld(inst, magic_var(srr0), inst_rt); + break; + case KVM_INST_MFSPR_SRR1: + kvm_patch_ins_ld(inst, magic_var(srr1), inst_rt); + break; + case KVM_INST_MFSPR_DAR: + kvm_patch_ins_ld(inst, magic_var(dar), inst_rt); + break; + case KVM_INST_MFSPR_DSISR: + kvm_patch_ins_lwz(inst, magic_var(dsisr), inst_rt); + break; + + /* Stores */ + case KVM_INST_MTSPR_SPRG0: + kvm_patch_ins_std(inst, magic_var(sprg0), inst_rt); + break; + case KVM_INST_MTSPR_SPRG1: + kvm_patch_ins_std(inst, magic_var(sprg1), inst_rt); + break; + case KVM_INST_MTSPR_SPRG2: + kvm_patch_ins_std(inst, magic_var(sprg2), inst_rt); + break; + case KVM_INST_MTSPR_SPRG3: + kvm_patch_ins_std(inst, magic_var(sprg3), inst_rt); + break; + case KVM_INST_MTSPR_SRR0: + kvm_patch_ins_std(inst, magic_var(srr0), inst_rt); + break; + case KVM_INST_MTSPR_SRR1: + kvm_patch_ins_std(inst, magic_var(srr1), inst_rt); + break; + case KVM_INST_MTSPR_DAR: + kvm_patch_ins_std(inst, magic_var(dar), inst_rt); + break; + case KVM_INST_MTSPR_DSISR: + kvm_patch_ins_stw(inst, magic_var(dsisr), inst_rt); + break; } switch (_inst) { -- 1.6.0.2 -- To unsubscribe from this
[PATCH 17/26] KVM: PPC: Generic KVM PV guest support
We have all the hypervisor pieces in place now, but the guest parts are still missing. This patch implements basic awareness of KVM when running Linux as guest. It doesn't do anything with it yet though. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/Makefile |2 ++ arch/powerpc/kernel/asm-offsets.c | 15 +++ arch/powerpc/kernel/kvm.c | 34 ++ arch/powerpc/kernel/kvm_emul.S| 27 +++ arch/powerpc/platforms/Kconfig| 10 ++ 5 files changed, 88 insertions(+), 0 deletions(-) create mode 100644 arch/powerpc/kernel/kvm.c create mode 100644 arch/powerpc/kernel/kvm_emul.S diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 58d0572..2d7eb9e 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -125,6 +125,8 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),) obj-y += ppc_save_regs.o endif +obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o + # Disable GCOV in odd or sensitive code GCOV_PROFILE_prom_init.o := n GCOV_PROFILE_ftrace.o := n diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index a55d47e..e3e740b 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -465,6 +465,21 @@ int main(void) DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr)); #endif /* CONFIG_PPC_BOOK3S */ #endif + +#ifdef CONFIG_KVM_GUEST + DEFINE(KVM_MAGIC_SCRATCH1, offsetof(struct kvm_vcpu_arch_shared, + scratch1)); + DEFINE(KVM_MAGIC_SCRATCH2, offsetof(struct kvm_vcpu_arch_shared, + scratch2)); + DEFINE(KVM_MAGIC_SCRATCH3, offsetof(struct kvm_vcpu_arch_shared, + scratch3)); + DEFINE(KVM_MAGIC_INT, offsetof(struct kvm_vcpu_arch_shared, + int_pending)); + DEFINE(KVM_MAGIC_MSR, offsetof(struct kvm_vcpu_arch_shared, msr)); + DEFINE(KVM_MAGIC_CRITICAL, offsetof(struct kvm_vcpu_arch_shared, + critical)); +#endif + #ifdef CONFIG_44x DEFINE(PGD_T_LOG2, PGD_T_LOG2); DEFINE(PTE_T_LOG2, PTE_T_LOG2); diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c new file mode 100644 index 000..2d8dd73 --- /dev/null +++ b/arch/powerpc/kernel/kvm.c @@ -0,0 +1,34 @@ +/* + * Copyright (C) 2010 SUSE Linux Products GmbH. All rights reserved. + * + * Authors: + * Alexander Graf ag...@suse.de + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#include linux/kvm_host.h +#include linux/init.h +#include linux/kvm_para.h +#include linux/slab.h + +#include asm/reg.h +#include asm/kvm_ppc.h +#include asm/sections.h +#include asm/cacheflush.h +#include asm/disassemble.h + +#define KVM_MAGIC_PAGE (-4096L) +#define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x) + diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S new file mode 100644 index 000..c7b9fc9 --- /dev/null +++ b/arch/powerpc/kernel/kvm_emul.S @@ -0,0 +1,27 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright SUSE Linux Products GmbH 2010 + * + * Authors: Alexander Graf ag...@suse.de + */ + +#include asm/ppc_asm.h +#include asm/kvm_asm.h +#include asm/reg.h +#include asm/page.h +#include asm/asm-offsets.h + +#define KVM_MAGIC_PAGE (-4096) + diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig index d1663db..1744349 100644 --- a/arch/powerpc/platforms/Kconfig +++ b/arch/powerpc/platforms/Kconfig @@ -21,6 +21,16 @@ source
[PATCH 13/26] KVM: PPC: Magic Page Book3s support
We need to override EA as well as PA lookups for the magic page. When the guest tells us to project it, the magic page overrides any guest mappings. In order to reflect that, we need to hook into all the MMU layers of KVM to force map the magic page if necessary. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s.c |7 +++ arch/powerpc/kvm/book3s_32_mmu.c | 16 arch/powerpc/kvm/book3s_32_mmu_host.c | 12 arch/powerpc/kvm/book3s_64_mmu.c | 30 +- arch/powerpc/kvm/book3s_64_mmu_host.c | 12 5 files changed, 76 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 2f55aa5..6ce7fa1 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -551,6 +551,13 @@ mmio: static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn) { + ulong mp_pa = vcpu-arch.magic_page_pa; + + if (unlikely(mp_pa) + unlikely((mp_pa KVM_RMO) PAGE_SHIFT == gfn)) { + return 1; + } + return kvm_is_visible_gfn(vcpu-kvm, gfn); } diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c index 41130c8..d2bd1a6 100644 --- a/arch/powerpc/kvm/book3s_32_mmu.c +++ b/arch/powerpc/kvm/book3s_32_mmu.c @@ -281,8 +281,24 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, struct kvmppc_pte *pte, bool data) { int r; + ulong mp_ea = vcpu-arch.magic_page_ea; pte-eaddr = eaddr; + + /* Magic page override */ + if (unlikely(mp_ea) + unlikely((eaddr ~0xfffULL) == (mp_ea ~0xfffULL)) + !(vcpu-arch.shared-msr MSR_PR)) { + pte-vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data); + pte-raddr = vcpu-arch.magic_page_pa | (pte-raddr 0xfff); + pte-raddr = KVM_RMO; + pte-may_execute = true; + pte-may_read = true; + pte-may_write = true; + + return 0; + } + r = kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data); if (r 0) r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, true); diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c b/arch/powerpc/kvm/book3s_32_mmu_host.c index 67b8c38..658d3e0 100644 --- a/arch/powerpc/kvm/book3s_32_mmu_host.c +++ b/arch/powerpc/kvm/book3s_32_mmu_host.c @@ -145,6 +145,16 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte) bool primary = false; bool evict = false; struct hpte_cache *pte; + ulong mp_pa = vcpu-arch.magic_page_pa; + + /* Magic page override */ + if (unlikely(mp_pa) + unlikely((orig_pte-raddr ~0xfffUL KVM_RMO) == +(mp_pa ~0xfffUL KVM_RMO))) { + hpaddr = (pfn_t)virt_to_phys(vcpu-arch.shared); + get_page(pfn_to_page(hpaddr PAGE_SHIFT)); + goto mapped; + } /* Get host physical address for gpa */ hpaddr = gfn_to_pfn(vcpu-kvm, orig_pte-raddr PAGE_SHIFT); @@ -155,6 +165,8 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *orig_pte) } hpaddr = PAGE_SHIFT; +mapped: + /* and write the mapping ea - hpa into the pt */ vcpu-arch.mmu.esid_to_vsid(vcpu, orig_pte-eaddr SID_SHIFT, vsid); map = find_sid_vsid(vcpu, vsid); diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c index 58aa840..4a2e5fc 100644 --- a/arch/powerpc/kvm/book3s_64_mmu.c +++ b/arch/powerpc/kvm/book3s_64_mmu.c @@ -163,6 +163,22 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, bool found = false; bool perm_err = false; int second = 0; + ulong mp_ea = vcpu-arch.magic_page_ea; + + /* Magic page override */ + if (unlikely(mp_ea) + unlikely((eaddr ~0xfffULL) == (mp_ea ~0xfffULL)) + !(vcpu-arch.shared-msr MSR_PR)) { + gpte-eaddr = eaddr; + gpte-vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu, eaddr, data); + gpte-raddr = vcpu-arch.magic_page_pa | (gpte-raddr 0xfff); + gpte-raddr = KVM_RMO; + gpte-may_execute = true; + gpte-may_read = true; + gpte-may_write = true; + + return 0; + } slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, eaddr); if (!slbe) @@ -445,6 +461,7 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct kvm_vcpu *vcpu, ulong esid, ulong ea = esid SID_SHIFT; struct kvmppc_slb *slb; u64 gvsid = esid; + ulong mp_ea = vcpu-arch.magic_page_ea; if (vcpu-arch.shared-msr (MSR_DR|MSR_IR)) { slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), ea); @@ -464,7 +481,7 @@ static int
[PATCH 15/26] KVM: PPC: Expose magic page support to guest
Now that we have the shared page in place and the MMU code knows about the magic page, we can expose that capability to the guest! Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_para.h |2 ++ arch/powerpc/kvm/powerpc.c | 11 +++ 2 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index c7305d7..9f8efa4 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -43,6 +43,8 @@ struct kvm_vcpu_arch_shared { #define KVM_SC_MAGIC_R30x4b564d52 /* KVMR */ #define KVM_SC_MAGIC_R40x554c455a /* ULEZ */ +#define KVM_FEATURE_MAGIC_PAGE 1 + #ifdef __KERNEL__ static inline int kvm_para_available(void) diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index fe7a1c8..1d28a81 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -60,8 +60,19 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu) } switch (nr) { + case KVM_HC_PPC_MAP_MAGIC_PAGE: + { + vcpu-arch.magic_page_pa = param1; + vcpu-arch.magic_page_ea = param2; + + r = 0; + break; + } case KVM_HC_FEATURES: r = 0; +#if !defined(CONFIG_KVM_440) /* XXX missing bits on 440 */ + r |= (1 KVM_FEATURE_MAGIC_PAGE); +#endif break; default: r = -KVM_ENOSYS; -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/26] KVM: PPC: Implement hypervisor interface
To communicate with KVM directly we need to plumb some sort of interface between the guest and KVM. Usually those interfaces use hypercalls. This hypercall implementation is described in the last patch of the series in a special documentation file. Please read that for further information. This patch implements stubs to handle KVM PPC hypercalls on the host and guest side alike. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_para.h | 100 ++- arch/powerpc/include/asm/kvm_ppc.h |1 + arch/powerpc/kvm/book3s.c | 10 +++- arch/powerpc/kvm/booke.c| 11 - arch/powerpc/kvm/emulate.c | 11 - arch/powerpc/kvm/powerpc.c | 28 ++ include/linux/kvm_para.h|1 + 7 files changed, 156 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index e402999..eaab306 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -34,16 +34,112 @@ struct kvm_vcpu_arch_shared { __u32 dsisr; }; +#define KVM_PVR_PARA 0x4b564d3f /* KVM? */ +#define KVM_SC_MAGIC_R30x4b564d52 /* KVMR */ +#define KVM_SC_MAGIC_R40x554c455a /* ULEZ */ + #ifdef __KERNEL__ static inline int kvm_para_available(void) { - return 0; + unsigned long pvr = KVM_PVR_PARA; + + asm volatile(mfpvr %0 : =r(pvr) : 0(pvr)); + return pvr == KVM_PVR_PARA; +} + +static inline long kvm_hypercall0(unsigned int nr) +{ + unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3; + unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4; + unsigned long register _nr asm(r5) = nr; + + asm volatile(sc +: =r(r3) +: r(r3), r(r4), r(_nr) +: memory); + + return r3; } +static inline long kvm_hypercall1(unsigned int nr, unsigned long p1) +{ + unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3; + unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4; + unsigned long register _nr asm(r5) = nr; + unsigned long register _p1 asm(r6) = p1; + + asm volatile(sc +: =r(r3) +: r(r3), r(r4), r(_nr), r(_p1) +: memory); + + return r3; +} + +static inline long kvm_hypercall2(unsigned int nr, unsigned long p1, + unsigned long p2) +{ + unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3; + unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4; + unsigned long register _nr asm(r5) = nr; + unsigned long register _p1 asm(r6) = p1; + unsigned long register _p2 asm(r7) = p2; + + asm volatile(sc +: =r(r3) +: r(r3), r(r4), r(_nr), r(_p1), r(_p2) +: memory); + + return r3; +} + +static inline long kvm_hypercall3(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3) +{ + unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3; + unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4; + unsigned long register _nr asm(r5) = nr; + unsigned long register _p1 asm(r6) = p1; + unsigned long register _p2 asm(r7) = p2; + unsigned long register _p3 asm(r8) = p3; + + asm volatile(sc +: =r(r3) +: r(r3), r(r4), r(_nr), r(_p1), r(_p2), r(_p3) +: memory); + + return r3; +} + +static inline long kvm_hypercall4(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3, + unsigned long p4) +{ + unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3; + unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4; + unsigned long register _nr asm(r5) = nr; + unsigned long register _p1 asm(r6) = p1; + unsigned long register _p2 asm(r7) = p2; + unsigned long register _p3 asm(r8) = p3; + unsigned long register _p4 asm(r9) = p4; + + asm volatile(sc +: =r(r3) +: r(r3), r(r4), r(_nr), r(_p1), r(_p2), r(_p3), + r(_p4) +: memory); + + return r3; +} + + static inline unsigned int kvm_arch_para_features(void) { - return 0; + if (!kvm_para_available()) + return 0; + + return kvm_hypercall0(KVM_HC_FEATURES); } #endif /* __KERNEL__ */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 18d139e..ecb3bc7 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -107,6 +107,7 @@ extern int kvmppc_booke_init(void); extern void kvmppc_booke_exit(void); extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu); +extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);
[PATCH 25/26] KVM: PPC: PV wrteei
On BookE the preferred way to write the EE bit is the wrteei instruction. It already encodes the EE bit in the instruction. So in order to get BookE some speedups as well, let's also PV'nize thati instruction. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/kvm.c | 50 arch/powerpc/kernel/kvm_emul.S | 41 2 files changed, 91 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 3557bc8..85e2163 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -66,6 +66,9 @@ #define KVM_INST_MTMSRD_L1 0x7c010164 #define KVM_INST_MTMSR 0x7c000124 +#define KVM_INST_WRTEEI_0 0x7c000146 +#define KVM_INST_WRTEEI_1 0x7c008146 + static bool kvm_patching_worked = true; static char kvm_tmp[1024 * 1024]; static int kvm_tmp_index; @@ -200,6 +203,47 @@ static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt) *inst = KVM_INST_B | (distance_start KVM_INST_B_MASK); } +#ifdef CONFIG_BOOKE + +extern u32 kvm_emulate_wrteei_branch_offs; +extern u32 kvm_emulate_wrteei_ee_offs; +extern u32 kvm_emulate_wrteei_len; +extern u32 kvm_emulate_wrteei[]; + +static void kvm_patch_ins_wrteei(u32 *inst) +{ + u32 *p; + int distance_start; + int distance_end; + ulong next_inst; + + p = kvm_alloc(kvm_emulate_wrteei_len * 4); + if (!p) + return; + + /* Find out where we are and put everything there */ + distance_start = (ulong)p - (ulong)inst; + next_inst = ((ulong)inst + 4); + distance_end = next_inst - (ulong)p[kvm_emulate_wrteei_branch_offs]; + + /* Make sure we only write valid b instructions */ + if (distance_start KVM_INST_B_MAX) { + kvm_patching_worked = false; + return; + } + + /* Modify the chunk to fit the invocation */ + memcpy(p, kvm_emulate_wrteei, kvm_emulate_wrteei_len * 4); + p[kvm_emulate_wrteei_branch_offs] |= distance_end KVM_INST_B_MASK; + p[kvm_emulate_wrteei_ee_offs] |= (*inst MSR_EE); + flush_icache_range((ulong)p, (ulong)p + kvm_emulate_wrteei_len * 4); + + /* Patch the invocation */ + *inst = KVM_INST_B | (distance_start KVM_INST_B_MASK); +} + +#endif + static void kvm_map_magic_page(void *data) { kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE, @@ -289,6 +333,12 @@ static void kvm_check_ins(u32 *inst) } switch (_inst) { +#ifdef CONFIG_BOOKE + case KVM_INST_WRTEEI_0: + case KVM_INST_WRTEEI_1: + kvm_patch_ins_wrteei(inst); + break; +#endif } flush_icache_range((ulong)inst, (ulong)inst + 4); diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S index ccf5a42..b79b9de 100644 --- a/arch/powerpc/kernel/kvm_emul.S +++ b/arch/powerpc/kernel/kvm_emul.S @@ -194,3 +194,44 @@ kvm_emulate_mtmsr_orig_ins_offs: .global kvm_emulate_mtmsr_len kvm_emulate_mtmsr_len: .long (kvm_emulate_mtmsr_end - kvm_emulate_mtmsr) / 4 + + + +.global kvm_emulate_wrteei +kvm_emulate_wrteei: + + SCRATCH_SAVE + + /* Fetch old MSR in r31 */ + LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + /* Remove MSR_EE from old MSR */ + li r30, 0 + ori r30, r30, MSR_EE + andcr31, r31, r30 + + /* OR new MSR_EE onto the old MSR */ +kvm_emulate_wrteei_ee: + ori r31, r31, 0 + + /* Write new MSR value back */ + STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0) + + SCRATCH_RESTORE + + /* Go back to caller */ +kvm_emulate_wrteei_branch: + b . +kvm_emulate_wrteei_end: + +.global kvm_emulate_wrteei_branch_offs +kvm_emulate_wrteei_branch_offs: + .long (kvm_emulate_wrteei_branch - kvm_emulate_wrteei) / 4 + +.global kvm_emulate_wrteei_ee_offs +kvm_emulate_wrteei_ee_offs: + .long (kvm_emulate_wrteei_ee - kvm_emulate_wrteei) / 4 + +.global kvm_emulate_wrteei_len +kvm_emulate_wrteei_len: + .long (kvm_emulate_wrteei_end - kvm_emulate_wrteei) / 4 -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/26] KVM: PPC: Convert SRR0 and SRR1 to shared page
The SRR0 and SRR1 registers contain cached values of the PC and MSR respectively. They get written to by the hypervisor when an interrupt occurs or directly by the kernel. They are also used to tell the rfi(d) instruction where to jump to. Because it only gets touched on defined events that, it's very simple to share with the guest. Hypervisor and guest both have full r/w access. This patch converts all users of the current field to the shared page. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |2 -- arch/powerpc/include/asm/kvm_para.h |2 ++ arch/powerpc/kvm/book3s.c | 12 ++-- arch/powerpc/kvm/book3s_emulate.c |4 ++-- arch/powerpc/kvm/booke.c| 15 --- arch/powerpc/kvm/booke_emulate.c|4 ++-- arch/powerpc/kvm/emulate.c | 12 7 files changed, 28 insertions(+), 23 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 108dabc..6bcf62f 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -224,8 +224,6 @@ struct kvm_vcpu_arch { ulong sprg5; ulong sprg6; ulong sprg7; - ulong srr0; - ulong srr1; ulong csrr0; ulong csrr1; ulong dsrr0; diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index ec72a1c..d7fc6c2 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -23,6 +23,8 @@ #include linux/types.h struct kvm_vcpu_arch_shared { + __u64 srr0; + __u64 srr1; __u64 dar; __u64 msr; __u32 dsisr; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 245bd2d..b144697 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -162,8 +162,8 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr) void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags) { - vcpu-arch.srr0 = kvmppc_get_pc(vcpu); - vcpu-arch.srr1 = vcpu-arch.shared-msr | flags; + vcpu-arch.shared-srr0 = kvmppc_get_pc(vcpu); + vcpu-arch.shared-srr1 = vcpu-arch.shared-msr | flags; kvmppc_set_pc(vcpu, to_book3s(vcpu)-hior + vec); vcpu-arch.mmu.reset_msr(vcpu); } @@ -1059,8 +1059,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) regs-lr = kvmppc_get_lr(vcpu); regs-xer = kvmppc_get_xer(vcpu); regs-msr = vcpu-arch.shared-msr; - regs-srr0 = vcpu-arch.srr0; - regs-srr1 = vcpu-arch.srr1; + regs-srr0 = vcpu-arch.shared-srr0; + regs-srr1 = vcpu-arch.shared-srr1; regs-pid = vcpu-arch.pid; regs-sprg0 = vcpu-arch.sprg0; regs-sprg1 = vcpu-arch.sprg1; @@ -1086,8 +1086,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) kvmppc_set_lr(vcpu, regs-lr); kvmppc_set_xer(vcpu, regs-xer); kvmppc_set_msr(vcpu, regs-msr); - vcpu-arch.srr0 = regs-srr0; - vcpu-arch.srr1 = regs-srr1; + vcpu-arch.shared-srr0 = regs-srr0; + vcpu-arch.shared-srr1 = regs-srr1; vcpu-arch.sprg0 = regs-sprg0; vcpu-arch.sprg1 = regs-sprg1; vcpu-arch.sprg2 = regs-sprg2; diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c index c147864..f333cb4 100644 --- a/arch/powerpc/kvm/book3s_emulate.c +++ b/arch/powerpc/kvm/book3s_emulate.c @@ -73,8 +73,8 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, switch (get_xop(inst)) { case OP_19_XOP_RFID: case OP_19_XOP_RFI: - kvmppc_set_pc(vcpu, vcpu-arch.srr0); - kvmppc_set_msr(vcpu, vcpu-arch.srr1); + kvmppc_set_pc(vcpu, vcpu-arch.shared-srr0); + kvmppc_set_msr(vcpu, vcpu-arch.shared-srr1); *advance = 0; break; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 5844bcf..8b546fe 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -64,7 +64,8 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu) printk(pc: %08lx msr: %08llx\n, vcpu-arch.pc, vcpu-arch.shared-msr); printk(lr: %08lx ctr: %08lx\n, vcpu-arch.lr, vcpu-arch.ctr); - printk(srr0: %08lx srr1: %08lx\n, vcpu-arch.srr0, vcpu-arch.srr1); + printk(srr0: %08llx srr1: %08llx\n, vcpu-arch.shared-srr0, + vcpu-arch.shared-srr1); printk(exceptions: %08lx\n, vcpu-arch.pending_exceptions); @@ -189,8 +190,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, } if (allowed) { - vcpu-arch.srr0 = vcpu-arch.pc; - vcpu-arch.srr1 = vcpu-arch.shared-msr; + vcpu-arch.shared-srr0 = vcpu-arch.pc; +
[PATCH 00/26] KVM PPC PV framework
On PPC we run PR=0 (kernel mode) code in PR=1 (user mode) and don't use the hypervisor extensions. While that is all great to show that virtualization is possible, there are quite some cases where the emulation overhead of privileged instructions is killing performance. This patchset tackles exactly that issue. It introduces a paravirtual framework using which KVM and Linux share a page to exchange register state with. That way we don't have to switch to the hypervisor just to change a value of a privileged register. To prove my point, I ran the same test I did for the MMU optimizations against the PV framework. Here are the results: [without] debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello /dev/null; done real0m14.659s user0m8.967s sys 0m5.688s [with] debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello /dev/null; done real0m7.557s user0m4.121s sys 0m3.426s So this is a significant performance improvement! I'm quite happy how fast this whole thing becomes :) I tried to take all comments I've heard from people so far about such a PV framework into account. In case you told me something before that is a no-go and I still did it, please just tell me again. Now go and have fun with fast VMs on PPC! Get yourself a G5 on ebay and start experiencing the power yourself. - heh Alexander Graf (26): KVM: PPC: Introduce shared page KVM: PPC: Convert MSR to shared page KVM: PPC: Convert DSISR to shared page KVM: PPC: Convert DAR to shared page. KVM: PPC: Convert SRR0 and SRR1 to shared page KVM: PPC: Convert SPRG[0-4] to shared page KVM: PPC: Implement hypervisor interface KVM: PPC: Add PV guest critical sections KVM: PPC: Add PV guest scratch registers KVM: PPC: Tell guest about pending interrupts KVM: PPC: Make RMO a define KVM: PPC: First magic page steps KVM: PPC: Magic Page Book3s support KVM: PPC: Magic Page BookE support KVM: PPC: Expose magic page support to guest KVM: Move kvm_guest_init out of generic code KVM: PPC: Generic KVM PV guest support KVM: PPC: KVM PV guest stubs KVM: PPC: PV instructions to loads and stores KVM: PPC: PV tlbsync to nop KVM: PPC: Introduce kvm_tmp framework KVM: PPC: PV assembler helpers KVM: PPC: PV mtmsrd L=1 KVM: PPC: PV mtmsrd L=0 and mtmsr KVM: PPC: PV wrteei KVM: PPC: Add Documentation about PV interface Documentation/kvm/ppc-pv.txt | 164 arch/powerpc/include/asm/kvm_book3s.h|1 - arch/powerpc/include/asm/kvm_host.h | 14 +- arch/powerpc/include/asm/kvm_para.h | 121 +- arch/powerpc/include/asm/kvm_ppc.h |1 + arch/powerpc/kernel/Makefile |2 + arch/powerpc/kernel/asm-offsets.c| 18 ++- arch/powerpc/kernel/kvm.c| 399 ++ arch/powerpc/kernel/kvm_emul.S | 237 ++ arch/powerpc/kvm/44x.c |7 + arch/powerpc/kvm/44x_tlb.c |8 +- arch/powerpc/kvm/book3s.c| 162 - arch/powerpc/kvm/book3s_32_mmu.c | 28 ++- arch/powerpc/kvm/book3s_32_mmu_host.c| 16 +- arch/powerpc/kvm/book3s_64_mmu.c | 42 +++- arch/powerpc/kvm/book3s_64_mmu_host.c| 16 +- arch/powerpc/kvm/book3s_emulate.c| 25 +- arch/powerpc/kvm/book3s_paired_singles.c | 11 +- arch/powerpc/kvm/booke.c | 110 +++-- arch/powerpc/kvm/booke.h |6 +- arch/powerpc/kvm/booke_emulate.c | 14 +- arch/powerpc/kvm/booke_interrupts.S |3 +- arch/powerpc/kvm/e500.c |7 + arch/powerpc/kvm/e500_tlb.c | 31 ++- arch/powerpc/kvm/e500_tlb.h |2 +- arch/powerpc/kvm/emulate.c | 47 +++- arch/powerpc/kvm/powerpc.c | 42 +++- arch/powerpc/platforms/Kconfig | 10 + arch/x86/include/asm/kvm_para.h |6 + include/linux/kvm_para.h |7 +- 30 files changed, 1383 insertions(+), 174 deletions(-) create mode 100644 Documentation/kvm/ppc-pv.txt create mode 100644 arch/powerpc/kernel/kvm.c create mode 100644 arch/powerpc/kernel/kvm_emul.S -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/26] KVM: PPC: First magic page steps
We will be introducing a method to project the shared page in guest context. As soon as we're talking about this coupling, the shared page is colled magic page. This patch introduces simple defines, so the follow-up patches are easier to read. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_host.h |2 ++ include/linux/kvm_para.h|1 + 2 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index e35c1ac..5f8c214 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -285,6 +285,8 @@ struct kvm_vcpu_arch { u64 dec_jiffies; unsigned long pending_exceptions; struct kvm_vcpu_arch_shared *shared; + unsigned long magic_page_pa; /* phys addr to map the magic page to */ + unsigned long magic_page_ea; /* effect. addr to map the magic page to */ #ifdef CONFIG_PPC_BOOK3S struct kmem_cache *hpte_cache; diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h index 3b8080e..ac2015a 100644 --- a/include/linux/kvm_para.h +++ b/include/linux/kvm_para.h @@ -18,6 +18,7 @@ #define KVM_HC_VAPIC_POLL_IRQ 1 #define KVM_HC_MMU_OP 2 #define KVM_HC_FEATURES3 +#define KVM_HC_PPC_MAP_MAGIC_PAGE 4 /* * hypercalls use architecture specific -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html