date:20100625

Re: [RFC] virtio: Support releasing lock during kick

2010-06-25 Thread Stefan Hajnoczi

On Fri, Jun 25, 2010 at 4:09 AM, Rusty Russell ru...@rustcorp.com.au wrote:
 On Thu, 24 Jun 2010 03:00:30 pm Stefan Hajnoczi wrote:
 On Wed, Jun 23, 2010 at 11:12 PM, Anthony Liguori anth...@codemonkey.ws 
 wrote:
  Shouldn't it be possible to just drop the lock before invoking
  virtqueue_kick() and reacquire it afterwards?  There's nothing in that
  virtqueue_kick() path that the lock is protecting AFAICT.

 No, that would lead to a race condition because vq-num_added is
 modified by both virtqueue_add_buf_gfp() and virtqueue_kick().
 Without a lock held during virtqueue_kick() another vcpu could add
 bufs while vq-num_added is used and cleared by virtqueue_kick():

 Right, this dovetails with another proposed change (was it Michael?)
 where we would update the avail idx inside add_buf, rather than waiting
 until kick.  This means a barrier inside add_buf, but that's probably
 fine.

 If we do that, then we don't need a lock on virtqueue_kick.

That would be nice, we could push the change up into just virtio-blk.

I did wonder if virtio-net can take advantage of unlocked kick, too,
but haven't investigated yet.  The virtio-net kick in start_xmit()
happens with the netdev _xmit_lock held.  Any ideas?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: qemu fail to parse command line with -pcidevice 00:19.0

2010-06-25 Thread Hao, Xudong

Thanks, Mark. 

-Original Message-
From: Markus Armbruster [mailto:arm...@redhat.com] 
Sent: 2010年6月25日 12:58
To: Hao, Xudong
Cc: qemu-de...@nongnu.org; aligu...@us.ibm.com; kvm@vger.kernel.org
Subject: Re: qemu fail to parse command line with -pcidevice 00:19.0

Hao, Xudong xudong@intel.com writes:

Work-around: -device pci-assign,host=00:19.1
 OK, this new way can work when create guest with static assignment.
 But how to hot add a pci device to guest? the old hot add command pci_add 
 pci_addr=auto host host=00:19.0 has the same parse error.

Command line's -device becomes monitor's device_add:

device_add pci-assign,host=00:19.1

 BTW: if we use add -net none in qemu command, guest can not be created and 
 nothing error printed.

 Do you have plan to fix this parse issue?

Separate issue.  Fix posted:

Subject: [Qemu-devel] [PATCH] net: Fix VM start with '-net none'
Date: Tue, 15 Jun 2010 13:30:39 +0530
Message-Id: 
22a96312232a0458fc04268b79d17828c824df42.1276588830.git.amit.s...@redhat.com

You could have found this yourself :)

Re: qemu fail to parse command line with -pcidevice 00:19.0

2010-06-25 Thread Hidetoshi Seto

(2010/06/24 15:08), Markus Armbruster wrote:
 Note to qemu-devel: this issue is qemu-kvm only.
 
 Hao, Xudong xudong@intel.com writes:
 
 When assign one PCI device, qemu fail to parse the command line:
 qemu-system_x86 -smp 2 -m 1024 -hda /path/to/img -pcidevice host=00:19.0
 Error:
 qemu-system-x86_64: Parameter 'id' expects an identifier
 Identifiers consist of letters, digits, '-', '.', '_', starting with a 
 letter.
 pcidevice argument parse error; please check the help text for usage
 Could not add assigned device host=00:19.0

 https://bugs.launchpad.net/qemu/+bug/597932

 This issue caused by qemu-kvm commit 
 b560a9ab9be06afcbb78b3791ab836dad208a239.
 
 The bug is in add_assigned_device():
 
 r = get_param_value(id, sizeof(id), id, arg);
 if (!r)
 r = get_param_value(id, sizeof(id), name, arg);
 if (!r)
 r = get_param_value(id, sizeof(id), host, arg);
 
 We end up with invalid ID 00:19.0.

... Are there any strong reason why we cannot use ':' in the identifier?


Thanks,
H.Seto

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Graphical virtualisation management system

2010-06-25 Thread Christoph Hellwig

On Thu, Jun 24, 2010 at 02:01:52PM -0500, Javier Guerra Giraldez wrote:
 On Thu, Jun 24, 2010 at 1:32 PM, Freddie Cash fjwc...@gmail.com wrote:
  ??* virt-manager which requires X and seems to be more desktop-oriented;
 
 don't know about the others, but virt-manager runs only on the admin
 station.  on the VM hosts you run only libvirtd, which doesn't need X

While it can connect to remote systems it seems totally unusable for
that to me.  For one thing working over higher latency links like DSL
or even transatlantik links seems to be almost impossible.  Second I
still haven't figure out how to install and manage a system using the
serial console with KVM, which certainly contributes to the complete
lack of usability above.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: VMX: Execute WBINVD to keep data consistency with assigned devices

2010-06-25 Thread Sheng Yang

Some guest device driver may leverage the Non-Snoop I/O, and explicitly
WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or
CLFLUSH, we need to maintain data consistency either by:
1: flushing cache (wbinvd) when the guest is scheduled out if there is no
wbinvd exit, or
2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits.

For wbinvd VMExit capable processors, we issue IPIs to all physical CPUs to
do wbinvd, for we can't easily tell which physical CPUs are dirty.

Signed-off-by: Yaozu (Eddie) Dong eddie.d...@intel.com
Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/include/asm/kvm_host.h |3 +++
 arch/x86/kvm/emulate.c  |5 -
 arch/x86/kvm/svm.c  |6 ++
 arch/x86/kvm/vmx.c  |   27 ++-
 arch/x86/kvm/x86.c  |6 ++
 5 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a57cdea..1c392c9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -514,6 +514,8 @@ struct kvm_x86_ops {
 
void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry);
 
+   void (*execute_wbinvd)(struct kvm_vcpu *vcpu);
+
const struct trace_print_flags *exit_reasons_str;
 };
 
@@ -571,6 +573,7 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address);
 int emulate_clts(struct kvm_vcpu *vcpu);
+int emulate_wbinvd(struct kvm_vcpu *vcpu);
 
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index abb8cec..085dcb7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3138,8 +3138,11 @@ twobyte_insn:
emulate_clts(ctxt-vcpu);
c-dst.type = OP_NONE;
break;
-   case 0x08:  /* invd */
case 0x09:  /* wbinvd */
+   emulate_wbinvd(ctxt-vcpu);
+   c-dst.type = OP_NONE;
+   break;
+   case 0x08:  /* invd */
case 0x0d:  /* GrpP (prefetch) */
case 0x18:  /* Grp16 (prefetch/nop) */
c-dst.type = OP_NONE;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 587b99d..6929da1 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3424,6 +3424,10 @@ static bool svm_rdtscp_supported(void)
return false;
 }
 
+static void svm_execute_wbinvd(struct kvm_vcpu *vcpu)
+{
+}
+
 static void svm_fpu_deactivate(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3508,6 +3512,8 @@ static struct kvm_x86_ops svm_x86_ops = {
.rdtscp_supported = svm_rdtscp_supported,
 
.set_supported_cpuid = svm_set_supported_cpuid,
+
+   .execute_wbinvd = svm_execute_wbinvd,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e565689..063002c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -412,6 +412,12 @@ static inline bool cpu_has_virtual_nmis(void)
return vmcs_config.pin_based_exec_ctrl  PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline bool cpu_has_wbinvd_exit(void)
+{
+   return vmcs_config.cpu_based_2nd_exec_ctrl 
+   SECONDARY_EXEC_WBINVD_EXITING;
+}
+
 static inline bool report_flexpriority(void)
 {
return flexpriority_enabled;
@@ -874,6 +880,11 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx)
preempt_enable();
 }
 
+static void wbinvd_ipi(void *opaque)
+{
+   wbinvd();
+}
+
 /*
  * Switches to specified vcpu, until a matching vcpu_put(), but assumes
  * vcpu mutex is already taken.
@@ -905,6 +916,12 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 per_cpu(vcpus_on_cpu, cpu));
local_irq_enable();
 
+   /* Issue WBINVD in case guest has executed it */
+   if (!cpu_has_wbinvd_exit()  vcpu-kvm-arch.iommu_domain 
+   vcpu-cpu != -1)
+   smp_call_function_single(vcpu-cpu,
+   wbinvd_ipi, NULL, 1);
+
vcpu-cpu = cpu;
/*
 * Linux uses per-cpu TSS and GDT, so set these when switching
@@ -3397,10 +3414,16 @@ static int handle_invlpg(struct kvm_vcpu *vcpu)
return 1;
 }
 
+static void vmx_execute_wbinvd(struct kvm_vcpu *vcpu)
+{
+   if (vcpu-kvm-arch.iommu_domain)
+   smp_call_function(wbinvd_ipi, NULL, 1);
+}
+
 static int handle_wbinvd(struct kvm_vcpu *vcpu)
 {
skip_emulated_instruction(vcpu);
-   /* TODO: Add support for VT-d/pass-through device */
+   vmx_execute_wbinvd(vcpu);
return 1;
 }
 
@@ -4350,6 +4373,8 @@ static struct

Re: qemu fail to parse command line with -pcidevice 00:19.0

2010-06-25 Thread Markus Armbruster

Hidetoshi Seto seto.hideto...@jp.fujitsu.com writes:

 (2010/06/24 15:08), Markus Armbruster wrote:
 Note to qemu-devel: this issue is qemu-kvm only.
 
 Hao, Xudong xudong@intel.com writes:
 
 When assign one PCI device, qemu fail to parse the command line:
 qemu-system_x86 -smp 2 -m 1024 -hda /path/to/img -pcidevice host=00:19.0
 Error:
 qemu-system-x86_64: Parameter 'id' expects an identifier
 Identifiers consist of letters, digits, '-', '.', '_', starting with a 
 letter.
 pcidevice argument parse error; please check the help text for usage
 Could not add assigned device host=00:19.0

 https://bugs.launchpad.net/qemu/+bug/597932

 This issue caused by qemu-kvm commit 
 b560a9ab9be06afcbb78b3791ab836dad208a239.
 
 The bug is in add_assigned_device():
 
 r = get_param_value(id, sizeof(id), id, arg);
 if (!r)
 r = get_param_value(id, sizeof(id), name, arg);
 if (!r)
 r = get_param_value(id, sizeof(id), host, arg);
 
 We end up with invalid ID 00:19.0.

 ... Are there any strong reason why we cannot use ':' in the identifier?

Paul Brook (cc'ed) objected.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: VMX: Execute WBINVD to keep data consistency with assigned devices

2010-06-25 Thread Jan Kiszka

Sheng Yang wrote:
 Some guest device driver may leverage the Non-Snoop I/O, and explicitly
 WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or
 CLFLUSH, we need to maintain data consistency either by:
 1: flushing cache (wbinvd) when the guest is scheduled out if there is no
 wbinvd exit, or
 2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits.
 
 For wbinvd VMExit capable processors, we issue IPIs to all physical CPUs to
 do wbinvd, for we can't easily tell which physical CPUs are dirty.

wbinvd is a heavy weapon in the hands of a guest. Even if it is limited
to pass-through scenarios, do we really need to bother all physical host
CPUs with potential multi-millisecond stalls? Think of VMs only running
on a subset of CPUs (e.g. to isolate latency sources). I would suggest
to track the physical CPU usage of VCPUs between two wbinvd requests and
only send the wbinvd IPI to that set.

Also, I think the code is still too much vmx-focused. Only the trapping
should be vendor specific, the rest generic.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Graphical virtualisation management system

2010-06-25 Thread Gerd Hoffmann


On 06/25/10 09:05, Christoph Hellwig wrote:

On Thu, Jun 24, 2010 at 02:01:52PM -0500, Javier Guerra Giraldez wrote:

On Thu, Jun 24, 2010 at 1:32 PM, Freddie Cashfjwc...@gmail.com  wrote:

??* virt-manager which requires X and seems to be more desktop-oriented;


don't know about the others, but virt-manager runs only on the admin
station.  on the VM hosts you run only libvirtd, which doesn't need X


While it can connect to remote systems it seems totally unusable for
that to me.  For one thing working over higher latency links like DSL
or even transatlantik links seems to be almost impossible.


Works but is quite slow indeed.  Also virt-manager remote host support 
works ok for a small number of hosts, but if you want to manage dozens 
of them it becomes unusable.



Second I
still haven't figure out how to install and manage a system using the
serial console with KVM, which certainly contributes to the complete
lack of usability above.


Serial console support doesn't work for remote connections.  Dunno 
whenever that is a restriction of virt-manager or the underlying libvirt.


cheers,
  Gerd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Graphical virtualisation management system

2010-06-25 Thread Daniel P. Berrange

On Fri, Jun 25, 2010 at 11:07:26AM +0200, Gerd Hoffmann wrote:
 On 06/25/10 09:05, Christoph Hellwig wrote:
 
 Second I
 still haven't figure out how to install and manage a system using the
 serial console with KVM, which certainly contributes to the complete
 lack of usability above.
 
 Serial console support doesn't work for remote connections.  Dunno 
 whenever that is a restriction of virt-manager or the underlying libvirt.

libvirt, kvm, virt-manager - arguably all of them :-) We really need to
either tunnel the character device backend streams over VNC, or add a
remote streams access API to libvirt, or virt-manager could do an ssh
tunnel. VNC tunnelling is what I'd really like todo because that gives
a solution that can work with even normal VNC clients like Vinagre.

Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Graphical virtualisation management system

2010-06-25 Thread Gerd Hoffmann


  Hi,


We want to move to a multi-tiered, SAN-based virtualisation setup, but
can't find a VM management tool that handles both KVM and Xen (we have
some old Opteron hardware that doesn't support SVM), and does not
require Linux from end-to-end.  For example, we want to run FreeBSD +
ZFS on our storage servers, exporting storage via iSCSI (or NFS).  We
want to run a minimal Debian/Ubuntu install on the VM hosts (just to
boot and run the management agents), with all of the VMs getting their
storage via iSCSI.  With a separate box acting as the management
system.  Preferably with a web-based management GUI, but that's more
of an nice to have than a hard requirement.



So far, I've looked at:
   * oVirt which requires Fedora/CentOS/RedHat on everything;


NFS/iSCSI being hosted on non-linux shouldn't be a problem I think, at 
least the underlying libvirt handles this just fine and I can't see a 
reason why oVirt shouldn't (don't know oVirt in detail although I've 
played with it a bit a while ago).


To manage the hosts oVirt wants to have some oVirt bits running on them. 
 Porting them to Debian should be possible.  But as the stuff interacts 
with the distro bootup scripts it is most likely noticable more work 
than just compile+install.


cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Graphical virtualisation management system

2010-06-25 Thread Daniel P. Berrange

On Fri, Jun 25, 2010 at 03:05:42AM -0400, Christoph Hellwig wrote:
 On Thu, Jun 24, 2010 at 02:01:52PM -0500, Javier Guerra Giraldez wrote:
  On Thu, Jun 24, 2010 at 1:32 PM, Freddie Cash fjwc...@gmail.com wrote:
   ??* virt-manager which requires X and seems to be more desktop-oriented;
  
  don't know about the others, but virt-manager runs only on the admin
  station.  on the VM hosts you run only libvirtd, which doesn't need X
 
 While it can connect to remote systems it seems totally unusable for
 that to me.  For one thing working over higher latency links like DSL
 or even transatlantik links seems to be almost impossible.

It is fair to say that virt-manager is not really targetted at high 
latency WAN scenearios. It is really aimed at small scale local LAN
deployments with 5-20 hosts maximum.  For a serious WAN deployment
you can't use the hub - spoke synchronous RPC architecture, instead
you need a asynchronous message bus - this is where something like
oVirt or RHEV is best.  So I'd agree that you shouldn't use virt-manager
across high latency DSL or transatlantic links, just use it in your local 
home or office LAN.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] KVM test: Make it possible to run VMs without NICs

2010-06-25 Thread Michael Goldish

On 06/25/2010 02:33 AM, Lucas Meneghel Rodrigues wrote:
 For unittesting, for example, is interesting that we
 run the VM with the bare mininum number of parameters.
 This fix allows that.
 
 Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
 ---
  client/tests/kvm/kvm_vm.py |5 +++--
  1 files changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
 index 7b1fc05..3c01fa0 100755
 --- a/client/tests/kvm/kvm_vm.py
 +++ b/client/tests/kvm/kvm_vm.py
 @@ -118,8 +118,9 @@ class VM:
  self.root_dir = root_dir
  self.address_cache = address_cache
  self.netdev_id = []
 -for nic in params.get(nics).split():
 -self.netdev_id.append(kvm_utils.generate_random_id())
 +if params.get(nics):
 +for nic in params.get(nics).split():

That's exactly what kvm_utils.get_sub_dict_names() does.  It may be a
long name for something so simple but it's used everywhere in kvm-autotest.

 +self.netdev_id.append(kvm_utils.generate_random_id())

I think the 3 lines above belong in VM.create(), not VM.__init__(),
because VM params are routinely changed in calls to VM.create().  If the
code stays in __init__() the changed params will not affect
self.netdev_id.  A good place for it would be near the code that handles
-redir.

  
  # Find a unique identifier for this VM
  while True:
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] sched: export sched_set/getaffinity (was Re: [PATCH 3/3] vhost: apply cpumask and cgroup to vhost pollers)

2010-06-25 Thread Michael S. Tsirkin

On Thu, Jun 24, 2010 at 03:45:51PM -0700, Sridhar Samudrala wrote:
 On Thu, 2010-06-24 at 11:11 +0300, Michael S. Tsirkin wrote:
  On Sun, May 30, 2010 at 10:25:01PM +0200, Tejun Heo wrote:
   Apply the cpumask and cgroup of the initializing task to the created
   vhost poller.
   
   Based on Sridhar Samudrala's patch.
   
   Cc: Michael S. Tsirkin m...@redhat.com
   Cc: Sridhar Samudrala samudrala.srid...@gmail.com
  
  
  I wanted to apply this, but modpost fails:
  ERROR: sched_setaffinity [drivers/vhost/vhost_net.ko] undefined!
  ERROR: sched_getaffinity [drivers/vhost/vhost_net.ko] undefined!
  
  Did you try building as a module?
 
 In my original implementation, i had these calls in workqueue.c.
 Now that these are moved to vhost.c which can be built as a module,
 these symbols need to be exported.
 The following patch fixes the build issue with vhost as a module.
 
 Signed-off-by: Sridhar Samudrala s...@us.ibm.com

Signed-off-by: Michael S. Tsirkin m...@redhat.com

Works for me. To simplify dependencies, I'd like to queue this
together with the chost patches through net-next.
Ack to this?

 diff --git a/kernel/sched.c b/kernel/sched.c
 index 3c2a54f..15a0c6f 100644
 --- a/kernel/sched.c
 +++ b/kernel/sched.c
 @@ -4837,6 +4837,7 @@ out_put_task:
   put_online_cpus();
   return retval;
  }
 +EXPORT_SYMBOL_GPL(sched_setaffinity);
  
  static int get_user_cpu_mask(unsigned long __user *user_mask_ptr,
 unsigned len,
struct cpumask *new_mask)
 @@ -4900,6 +4901,7 @@ out_unlock:
  
   return retval;
  }
 +EXPORT_SYMBOL_GPL(sched_getaffinity);
  
  /**
   * sys_sched_getaffinity - get the cpu affinity of a process
 
 
   ---
drivers/vhost/vhost.c |   36 +++-
1 file changed, 31 insertions(+), 5 deletions(-)
   
   Index: work/drivers/vhost/vhost.c
   ===
   --- work.orig/drivers/vhost/vhost.c
   +++ work/drivers/vhost/vhost.c
   @@ -23,6 +23,7 @@
#include linux/highmem.h
#include linux/slab.h
#include linux/kthread.h
   +#include linux/cgroup.h
   
#include linux/net.h
#include linux/if_packet.h
   @@ -176,12 +177,30 @@ repeat:
long vhost_dev_init(struct vhost_dev *dev,
 struct vhost_virtqueue *vqs, int nvqs)
{
   - struct task_struct *poller;
   - int i;
   + struct task_struct *poller = NULL;
   + cpumask_var_t mask;
   + int i, ret = -ENOMEM;
   +
   + if (!alloc_cpumask_var(mask, GFP_KERNEL))
   + goto out;
   
 poller = kthread_create(vhost_poller, dev, vhost-%d, current-pid);
   - if (IS_ERR(poller))
   - return PTR_ERR(poller);
   + if (IS_ERR(poller)) {
   + ret = PTR_ERR(poller);
   + goto out;
   + }
   +
   + ret = sched_getaffinity(current-pid, mask);
   + if (ret)
   + goto out;
   +
   + ret = sched_setaffinity(poller-pid, mask);
   + if (ret)
   + goto out;
   +
   + ret = cgroup_attach_task_current_cg(poller);
   + if (ret)
   + goto out;
   
 dev-vqs = vqs;
 dev-nvqs = nvqs;
   @@ -202,7 +221,14 @@ long vhost_dev_init(struct vhost_dev *de
 vhost_poll_init(dev-vqs[i].poll,
 dev-vqs[i].handle_kick, POLLIN, dev);
 }
   - return 0;
   +
   + wake_up_process(poller);/* avoid contributing to loadavg */
   + ret = 0;
   +out:
   + if (ret)
   + kthread_stop(poller);
   + free_cpumask_var(mask);
   + return ret;
}
   
/* Caller should have device mutex */
  --
  To unsubscribe from this list: send the line unsubscribe netdev in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: VMX: Execute WBINVD to keep data consistency with assigned devices

2010-06-25 Thread Sheng Yang

On Friday 25 June 2010 16:54:19 Jan Kiszka wrote:
 Sheng Yang wrote:
  Some guest device driver may leverage the Non-Snoop I/O, and explicitly
  WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD
  or CLFLUSH, we need to maintain data consistency either by:
  1: flushing cache (wbinvd) when the guest is scheduled out if there is no
  wbinvd exit, or
  2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits.
  
  For wbinvd VMExit capable processors, we issue IPIs to all physical CPUs
  to do wbinvd, for we can't easily tell which physical CPUs are dirty.
 
 wbinvd is a heavy weapon in the hands of a guest. Even if it is limited
 to pass-through scenarios, do we really need to bother all physical host
 CPUs with potential multi-millisecond stalls? Think of VMs only running
 on a subset of CPUs (e.g. to isolate latency sources). I would suggest
 to track the physical CPU usage of VCPUs between two wbinvd requests and
 only send the wbinvd IPI to that set.

OK, would try to make it more specific(and complex)...
 
 Also, I think the code is still too much vmx-focused. Only the trapping
 should be vendor specific, the rest generic.

OK, would consider it.

--
regards
Yang, Sheng

 
 Jan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: UIO interrupts being lost

2010-06-25 Thread Michael S. Tsirkin

On Thu, Jun 24, 2010 at 05:43:15PM -0600, Cam Macdonell wrote:
 Hi Michael,
 
 I'm trying to write a uio driver for my shared memory device for KVM
 and I'm running into a situation where several interrupts in quick
 succession are not all triggering the callback function in my kernel
 UIO driver, say 2 out of 5.  My driver does not set the Interrupt
 Disable bit and if it helps, I'm using MSI-X interrupts.  Even without
 the interrupt disable bit set, is there still a window where
 successive interrupts can be lost if they arrive too quickly?
 
 Thanks,
 Cam

Yes, I think so: if an interrupt is delivered when
ISR is running, it gets queued, but a second one
gets lost.

A queueing mechanism is necessary to avoid losing
information, e.g. virtio implements exactly that.
Why don't you reuse virtio for signalling?

If I understand what Anthony said correctly,
he objected to the specific implementation,
not to the idea of reusing virtio spec and code.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] virtio: Support releasing lock during kick

2010-06-25 Thread Michael S. Tsirkin

On Fri, Jun 25, 2010 at 12:39:21PM +0930, Rusty Russell wrote:
 On Thu, 24 Jun 2010 03:00:30 pm Stefan Hajnoczi wrote:
  On Wed, Jun 23, 2010 at 11:12 PM, Anthony Liguori anth...@codemonkey.ws 
  wrote:
   Shouldn't it be possible to just drop the lock before invoking
   virtqueue_kick() and reacquire it afterwards?  There's nothing in that
   virtqueue_kick() path that the lock is protecting AFAICT.
  
  No, that would lead to a race condition because vq-num_added is
  modified by both virtqueue_add_buf_gfp() and virtqueue_kick().
  Without a lock held during virtqueue_kick() another vcpu could add
  bufs while vq-num_added is used and cleared by virtqueue_kick():
 
 Right, this dovetails with another proposed change (was it Michael?)
 where we would update the avail idx inside add_buf, rather than waiting
 until kick.  This means a barrier inside add_buf, but that's probably
 fine.
 
 If we do that, then we don't need a lock on virtqueue_kick.
 
 Michael, thoughts?

Maybe not even that: I think we could just do virtio_wmb()
in add, and keep the mb() in kick.

What I'm a bit worried about is contention on the cacheline
including index and flags: the more we write to that line,
the worse it gets.

So need to test performance impact of this change:
I didn't find time to do this yet, as I am trying
to finalize the used index publishing patches.
Any takers?

Do we see performance improvement after making kick lockless?

 Thanks,
 Rusty.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3][RFC] NUMA: add host side pinning

2010-06-25 Thread Andre Przywara


Jes Sorensen wrote:

On 06/24/10 13:34, Andre Przywara wrote:

Avi Kivity wrote:

On 06/24/2010 01:58 PM, Andre Przywara wrote:
Non-anonymous memory doesn't work well with ksm and transparent
hugepages.  Is it possible to use anonymous memory rather than file
backed?

I'd prefer non-file backed, too. But that is how the current huge pages
implementation is done. We could use MAP_HUGETLB and declare NUMA _and_
huge pages as 2.6.32+ only. Unfortunately I didn't find an easy way to
detect the presence of the MAP_HUGETLB flag. If the kernel does not
support it, it seems that mmap silently ignores it and uses 4KB pages
instead.


Bit behind on the mailing list, but I think this look very promising.

I really think it makes more sense to make QEMU aware of the NUMA setup
as well, rather than relying on numctl to do the work outside.

One thing you need to consider is what happens with migration once a
user specifies -numa. IMHO it is acceptable to simply disable migration
for the given guest.
Is that really a problem? You create the guest on the target with a NUMA 
setup specific to the target machine. That could mean that you pin 
multiple guest nodes to the same host node, but that shouldn't break 
something, right? The guest part can (and should be!) migrated along 
with all the other device state. I think this is still missing from the 
current implementation.




Cheers,
Jes

PS: Are you planning on submitting anything to Linux Plumbers Conference
about this? :)
Yes, I was planning to submit a proposal, as I saw NUMA mentioned in the 
topics list. AFAIK the deadline is July 19th, right? That gives me 
another week after my vacation (for which I leave in a few minutes).


Regards,
Andre.


--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v7 01/19] Add a new structure for skb buffer from external.

2010-06-25 Thread Michael S. Tsirkin

On Fri, Jun 25, 2010 at 09:03:46AM +0800, Dong, Eddie wrote:
 Herbert Xu wrote:
  On Wed, Jun 23, 2010 at 06:05:41PM +0800, Dong, Eddie wrote:
  
  I mean once the frontend side driver post the buffers to the backend
  driver, the backend driver will immediately use that buffers to
  compose skb or gro_frags and post them to the assigned host NIC
  driver as receive buffers. In that case, if the backend driver
  recieves a packet from the NIC that requires to do copy, it may be
  unable to find additional free guest buffer because all of them are
  already used by the NIC driver. We have to reserve some guest
  buffers for the possible copy even if the buffer address is not
  identified by original skb :(
  
  OK I see what you mean.  Can you tell me how does Xiaohui's
  previous patch-set deal with this problem?
  
  Thanks,
 
 In current patch, each SKB for the assigned device (SRIOV VF or NIC or a 
 complete queue pairs) uses the buffer from guest, so it eliminates copy 
 completely in software and requires hardware to do so. If we can have an 
 additonal place to store the buffer per skb (may cause copy later on), we can 
 do copy later on or re-post the buffer to assigned NIC driver later on. But 
 that may be not very clean either :(
 BTW, some hardware may require certain level of packet copy such as for 
 broadcast packets in very old VMDq device, which is not addressed in previous 
 Xiaohui's patch yet. We may address this by implementing an additional 
 virtqueue between guest and host for slow path (broadcast packets only here) 
 with additinal complexity in FE/BE driver. 
 
 Thx, Eddie

guest posts a large number of buffers to the host.
Host can use them any way it wants to, and in any order,
for example reserve half the buffers for the copy.

This might waste some memory if buffers are used
only partially, but let's worry about this later.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/10] KVM: MMU: fix writable sync sp mapping

2010-06-25 Thread Xiao Guangrong

While we sync the unsync sp, we may mapping the spte writable, it's
dangerous, if one unsync sp's mapping gfn is another unsync page's gfn.

For example:
have two unsync pages SP1, SP2 and:

SP1.pte[0] = P
SP2.gfn's pfn = P
[SP1.pte[0] = SP2.gfn's pfn]

First, we unsync SP2, it will write protect for SP2.gfn since
SP1.pte[0] is mapping to this page, it will mark read only.

Then, we unsync SP1, SP1.pte[0] may mark to writable.

Now, we will write SP2.gfn by SP1.pte[0] mapping

This bug will corrupt guest's page table, fixed by mark read-only mapping
if the mapped gfn has shadow page

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |   14 --
 1 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 045a0f9..556a798 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1810,11 +1810,14 @@ static int mmu_need_write_protect(struct kvm_vcpu 
*vcpu, gfn_t gfn,
bool need_unsync = false;
 
for_each_gfn_indirect_valid_sp(vcpu-kvm, s, gfn, node) {
+   if (!can_unsync)
+   return 1;
+
if (s-role.level != PT_PAGE_TABLE_LEVEL)
return 1;
 
if (!need_unsync  !s-unsync) {
-   if (!can_unsync || !oos_shadow)
+   if (!oos_shadow)
return 1;
need_unsync = true;
}
@@ -1877,15 +1880,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (!tdp_enabled  !(pte_access  ACC_WRITE_MASK))
spte = ~PT_USER_MASK;
 
-   /*
-* Optimization: for pte sync, if spte was writable the hash
-* lookup is unnecessary (and expensive). Write protection
-* is responsibility of mmu_get_page / kvm_sync_page.
-* Same reasoning can be applied to dirty page accounting.
-*/
-   if (!can_unsync  is_writable_pte(*sptep))
-   goto set_pte;
-
if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
pgprintk(%s: found shadow page for %lx, marking ro\n,
 __func__, gfn);
-- 
1.6.1.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/10] KVM: MMU: fix conflict access permissions in direct sp

2010-06-25 Thread Xiao Guangrong

In no-direct mapping, we mark sp is 'direct' when we mapping the
guest's larger page, but its access is encoded form upper page-struct
entire not include the last mapping, it will cause access conflict.

For example, have this mapping:
[W] 
  / PDE1 - |---|
  P[W]  |   | LPA
  \ PDE2 - |---|
[R]

P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the
same lage page(LPA). The P's access is WR, PDE1's access is WR,
PDE2's access is RO(just consider read-write permissions here)

When guest access PDE1, we will create a direct sp for LPA, the sp's
access is from P, is W, then we will mark the ptes is W in this sp.

Then, guest access PDE2, we will find LPA's shadow page, is the same as
PDE's, and mark the ptes is RO.

So, if guest access PDE1, the incorrect #PF is occured.

Fixed by encode the last mapping access into direct shadow page

And, it also cleanup the code that directly get the last level's dirty flag

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/paging_tmpl.h |9 +
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 37c26cb..e46eb8a 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -306,6 +306,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
gfn_t table_gfn;
int r;
int level;
+   bool dirty = is_dirty_gpte(gw-ptes[gw-level-1]);
pt_element_t curr_pte;
struct kvm_shadow_walk_iterator iterator;
 
@@ -319,8 +320,7 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
mmu_set_spte(vcpu, sptep, access,
 gw-pte_access  access,
 user_fault, write_fault,
-is_dirty_gpte(gw-ptes[gw-level-1]),
-ptwrite, level,
+dirty, ptwrite, level,
 gw-gfn, pfn, false, true);
break;
}
@@ -335,10 +335,11 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t 
addr,
}
 
if (level = gw-level) {
-   int delta = level - gw-level + 1;
direct = 1;
-   if (!is_dirty_gpte(gw-ptes[level - delta]))
+   if (!dirty)
access = ~ACC_WRITE_MASK;
+   access = gw-pte_access;
+
/*
 * It is a large guest pages backed by small host pages,
 * So we set @direct(@sp-role.direct)=1, and set
-- 
1.6.1.2



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 4/10] KVM: MMU: fix forgot to flush all vcpu's tlb

2010-06-25 Thread Xiao Guangrong

After remove a rmap, we should flush all vcpu's tlb

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0412ba4..f151540 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1933,6 +1933,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
pgprintk(hfn old %lx new %lx\n,
 spte_to_pfn(*sptep), pfn);
rmap_remove(vcpu-kvm, sptep);
+   __set_spte(sptep, shadow_trap_nonpresent_pte);
+   kvm_flush_remote_tlbs(vcpu-kvm);
} else
was_rmapped = 1;
}
-- 
1.6.1.2



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 6/10] KVM: MMU: introduce gfn_to_hva_many() function

2010-06-25 Thread Xiao Guangrong

This function not only return the gfn's hva but also the page number
after @gfn in the slot

It's used in the later patch

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 include/linux/kvm_host.h |1 +
 virt/kvm/kvm_main.c  |   13 -
 2 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 515fefd..8f7af32 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -289,6 +289,7 @@ void kvm_disable_largepages(void);
 void kvm_arch_flush_shadow(struct kvm *kvm);
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
+unsigned long gfn_to_hva_many(struct kvm *kvm, gfn_t gfn, int *entry);
 unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
 void kvm_release_page_clean(struct page *page);
 void kvm_release_page_dirty(struct page *page);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 60bb3d5..a007889 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -913,15 +913,26 @@ static unsigned long gfn_to_hva_memslot(struct 
kvm_memory_slot *slot, gfn_t gfn)
return slot-userspace_addr + (gfn - slot-base_gfn) * PAGE_SIZE;
 }
 
-unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
+unsigned long gfn_to_hva_many(struct kvm *kvm, gfn_t gfn, int *entry)
 {
struct kvm_memory_slot *slot;
 
slot = gfn_to_memslot(kvm, gfn);
+
if (!slot || slot-flags  KVM_MEMSLOT_INVALID)
return bad_hva();
+
+   if (entry)
+   *entry = slot-npages - (gfn - slot-base_gfn);
+
return gfn_to_hva_memslot(slot, gfn);
 }
+EXPORT_SYMBOL_GPL(gfn_to_hva_many);
+
+unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
+{
+   return gfn_to_hva_many(kvm, gfn, NULL);
+}
 EXPORT_SYMBOL_GPL(gfn_to_hva);
 
 static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic)
-- 
1.6.1.2



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 3/10] KVM: MMU: fix direct sp's access corruptted

2010-06-25 Thread Xiao Guangrong

Consider using small page to fit guest's large page mapping:

If the mapping is writable but the dirty flag is not set, we will find
the read-only direct sp and setup the mapping, then if the write #PF
occur, we will mark this mapping writable in the read-only direct sp,
now, other real read-only mapping will happily write it without #PF.

It may hurt guest's COW

Fixed by re-install the mapping when write #PF occur.

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |3 ++-
 arch/x86/kvm/paging_tmpl.h |   18 ++
 2 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 556a798..0412ba4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -153,7 +153,8 @@ module_param(oos_shadow, bool, 0644);
 #define CREATE_TRACE_POINTS
 #include mmutrace.h
 
-#define SPTE_HOST_WRITEABLE (1ULL  PT_FIRST_AVAIL_BITS_SHIFT)
+#define SPTE_HOST_WRITEABLE(1ULL  PT_FIRST_AVAIL_BITS_SHIFT)
+#define SPTE_NO_DIRTY  (2ULL  PT_FIRST_AVAIL_BITS_SHIFT)
 
 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
 
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index e46eb8a..fdba751 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -325,6 +325,20 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
break;
}
 
+   if (*sptep  SPTE_NO_DIRTY) {
+   struct kvm_mmu_page *child;
+
+   WARN_ON(level !=  gw-level);
+   WARN_ON(!is_shadow_present_pte(*sptep));
+   if (dirty) {
+   child = page_header(*sptep 
+ PT64_BASE_ADDR_MASK);
+   mmu_page_remove_parent_pte(child, sptep);
+   __set_spte(sptep, shadow_trap_nonpresent_pte);
+   kvm_flush_remote_tlbs(vcpu-kvm);
+   }
+   }
+
if (is_shadow_present_pte(*sptep)  !is_large_pte(*sptep))
continue;
 
@@ -365,6 +379,10 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
}
}
 
+   if (level == gw-level  !dirty 
+ access  gw-pte_access  ACC_WRITE_MASK)
+   spte |= SPTE_NO_DIRTY;
+
spte = __pa(sp-spt)
| PT_PRESENT_MASK | PT_ACCESSED_MASK
| PT_WRITABLE_MASK | PT_USER_MASK;
-- 
1.6.1.2



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 5/10] KVM: MMU: introduce gfn_to_pfn_atomic() function

2010-06-25 Thread Xiao Guangrong

Introduce gfn_to_pfn_atomic(), it's the fast path and can used in atomic
context, the later patch will use it

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/mm/gup.c|2 ++
 include/linux/kvm_host.h |1 +
 virt/kvm/kvm_main.c  |   32 +---
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index 738e659..0c9034b 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -6,6 +6,7 @@
  */
 #include linux/sched.h
 #include linux/mm.h
+#include linux/module.h
 #include linux/vmstat.h
 #include linux/highmem.h
 
@@ -274,6 +275,7 @@ int __get_user_pages_fast(unsigned long start, int 
nr_pages, int write,
 
return nr;
 }
+EXPORT_SYMBOL_GPL(__get_user_pages_fast);
 
 /**
  * get_user_pages_fast() - pin user pages in memory
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9289d1a..515fefd 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -295,6 +295,7 @@ void kvm_release_page_dirty(struct page *page);
 void kvm_set_page_dirty(struct page *page);
 void kvm_set_page_accessed(struct page *page);
 
+pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn);
 pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
 pfn_t gfn_to_pfn_memslot(struct kvm *kvm,
 struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 885d3f5..60bb3d5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -924,19 +924,25 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
 }
 EXPORT_SYMBOL_GPL(gfn_to_hva);
 
-static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr)
+static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr, bool atomic)
 {
struct page *page[1];
int npages;
pfn_t pfn;
 
-   might_sleep();
-
-   npages = get_user_pages_fast(addr, 1, 1, page);
+   if (atomic)
+   npages = __get_user_pages_fast(addr, 1, 1, page);
+   else {
+   might_sleep();
+   npages = get_user_pages_fast(addr, 1, 1, page);
+   }
 
if (unlikely(npages != 1)) {
struct vm_area_struct *vma;
 
+   if (atomic)
+   goto return_bad_page;
+
down_read(current-mm-mmap_sem);
if (is_hwpoison_address(addr)) {
up_read(current-mm-mmap_sem);
@@ -949,6 +955,7 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr)
if (vma == NULL || addr  vma-vm_start ||
!(vma-vm_flags  VM_PFNMAP)) {
up_read(current-mm-mmap_sem);
+return_bad_page:
get_page(bad_page);
return page_to_pfn(bad_page);
}
@@ -962,7 +969,7 @@ static pfn_t hva_to_pfn(struct kvm *kvm, unsigned long addr)
return pfn;
 }
 
-pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic)
 {
unsigned long addr;
 
@@ -972,7 +979,18 @@ pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
return page_to_pfn(bad_page);
}
 
-   return hva_to_pfn(kvm, addr);
+   return hva_to_pfn(kvm, addr, atomic);
+}
+
+pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn)
+{
+   return __gfn_to_pfn(kvm, gfn, true);
+}
+EXPORT_SYMBOL_GPL(gfn_to_pfn_atomic);
+
+pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
+{
+   return __gfn_to_pfn(kvm, gfn, false);
 }
 EXPORT_SYMBOL_GPL(gfn_to_pfn);
 
@@ -980,7 +998,7 @@ pfn_t gfn_to_pfn_memslot(struct kvm *kvm,
 struct kvm_memory_slot *slot, gfn_t gfn)
 {
unsigned long addr = gfn_to_hva_memslot(slot, gfn);
-   return hva_to_pfn(kvm, addr);
+   return hva_to_pfn(kvm, addr, false);
 }
 
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
-- 
1.6.1.2



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 7/10] KVM: MMU: introduce mmu_topup_memory_cache_atomic()

2010-06-25 Thread Xiao Guangrong

Introduce mmu_topup_memory_cache_atomic(), it support topup memory
cache in atomic context

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |   29 +
 1 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f151540..6c0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -291,15 +291,16 @@ static void __set_spte(u64 *sptep, u64 spte)
 #endif
 }
 
-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
- struct kmem_cache *base_cache, int min)
+static int __mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
+   struct kmem_cache *base_cache, int min,
+   int max, gfp_t flags)
 {
void *obj;
 
if (cache-nobjs = min)
return 0;
-   while (cache-nobjs  ARRAY_SIZE(cache-objects)) {
-   obj = kmem_cache_zalloc(base_cache, GFP_KERNEL);
+   while (cache-nobjs  max) {
+   obj = kmem_cache_zalloc(base_cache, flags);
if (!obj)
return -ENOMEM;
cache-objects[cache-nobjs++] = obj;
@@ -307,6 +308,26 @@ static int mmu_topup_memory_cache(struct 
kvm_mmu_memory_cache *cache,
return 0;
 }
 
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
+ struct kmem_cache *base_cache, int min)
+{
+   return __mmu_topup_memory_cache(cache, base_cache, min,
+   ARRAY_SIZE(cache-objects), GFP_KERNEL);
+}
+
+static int mmu_topup_memory_cache_atomic(struct kvm_mmu_memory_cache *cache,
+ struct kmem_cache *base_cache, int min)
+{
+   return __mmu_topup_memory_cache(cache, base_cache, min, min,
+   GFP_ATOMIC);
+}
+
+static int pte_prefetch_topup_memory_cache(struct kvm_vcpu *vcpu, int num)
+{
+   return mmu_topup_memory_cache_atomic(vcpu-arch.mmu_rmap_desc_cache,
+rmap_desc_cache, num);
+}
+
 static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc,
  struct kmem_cache *cache)
 {
-- 
1.6.1.2




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 8/10] KVM: MMU: prefetch ptes when intercepted guest #PF

2010-06-25 Thread Xiao Guangrong

Support prefetch ptes when intercept guest #PF, avoid to #PF by later
access

If we meet any failure in the prefetch path, we will exit it and
not try other ptes to avoid become heavy path

Note: this speculative will mark page become dirty but it not really
accessed, the same issue is in other speculative paths like invlpg,
pte write, fortunately, it just affect host memory management. After
Avi's patchset named [PATCH v2 1/4] KVM: MMU: Introduce drop_spte()
merged, we will easily fix it. Will do it in the future.

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |   69 +
 arch/x86/kvm/paging_tmpl.h |   74 
 2 files changed, 143 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6c0..b2ad723 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -89,6 +89,8 @@ module_param(oos_shadow, bool, 0644);
}
 #endif
 
+#define PTE_PREFETCH_NUM   16
+
 #define PT_FIRST_AVAIL_BITS_SHIFT 9
 #define PT64_SECOND_AVAIL_BITS_SHIFT 52
 
@@ -1998,6 +2000,72 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
 {
 }
 
+static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
+   struct kvm_mmu_page *sp,
+   u64 *start, u64 *end)
+{
+   gfn_t gfn;
+   struct page *pages[PTE_PREFETCH_NUM];
+
+   if (pte_prefetch_topup_memory_cache(vcpu, end - start))
+   return -1;
+
+   gfn = sp-gfn + start - sp-spt;
+   while (start  end) {
+   unsigned long addr;
+   int entry, j, ret;
+
+   addr = gfn_to_hva_many(vcpu-kvm, gfn, entry);
+   if (kvm_is_error_hva(addr))
+   return -1;
+
+   entry = min(entry, (int)(end - start));
+   ret = __get_user_pages_fast(addr, entry, 1, pages);
+   if (ret = 0)
+   return -1;
+
+   for (j = 0; j  ret; j++, gfn++, start++)
+   mmu_set_spte(vcpu, start, ACC_ALL,
+sp-role.access, 0, 0, 1, NULL,
+sp-role.level, gfn,
+page_to_pfn(pages[j]), true, false);
+
+   if (ret  entry)
+   return -1;
+   }
+   return 0;
+}
+
+static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
+{
+   struct kvm_mmu_page *sp;
+   u64 *start = NULL;
+   int index, i, max;
+
+   sp = page_header(__pa(sptep));
+   WARN_ON(!sp-role.direct);
+
+   if (sp-role.level  PT_PAGE_TABLE_LEVEL)
+   return;
+
+   index = sptep - sp-spt;
+   i = index  ~(PTE_PREFETCH_NUM - 1);
+   max = index | (PTE_PREFETCH_NUM - 1);
+
+   for (; i  max; i++) {
+   u64 *spte = sp-spt + i;
+
+   if (*spte != shadow_trap_nonpresent_pte || spte == sptep) {
+   if (!start)
+   continue;
+   if (direct_pte_prefetch_many(vcpu, sp, start, spte)  0)
+   break;
+   start = NULL;
+   } else if (!start)
+   start = spte;
+   }
+}
+
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
int level, gfn_t gfn, pfn_t pfn)
 {
@@ -2012,6 +2080,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
 0, write, 1, pt_write,
 level, gfn, pfn, false, true);
++vcpu-stat.pf_fixed;
+   direct_pte_prefetch(vcpu, iterator.sptep);
break;
}
 
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index fdba751..134f031 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -291,6 +291,79 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, 
struct kvm_mmu_page *sp,
 gpte_to_gfn(gpte), pfn, true, true);
 }
 
+static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, u64 *sptep)
+{
+   struct kvm_mmu_page *sp;
+   pt_element_t gptep[PTE_PREFETCH_NUM];
+   gpa_t first_pte_gpa;
+   int offset = 0, index, i, j, max;
+
+   sp = page_header(__pa(sptep));
+   index = sptep - sp-spt;
+
+   if (sp-role.level  PT_PAGE_TABLE_LEVEL)
+   return;
+
+   if (sp-role.direct)
+   return direct_pte_prefetch(vcpu, sptep);
+
+   index = sptep - sp-spt;
+   i = index  ~(PTE_PREFETCH_NUM - 1);
+   max = index | (PTE_PREFETCH_NUM - 1);
+
+   if (PTTYPE == 32)
+   offset = sp-role.quadrant  PT64_LEVEL_BITS;
+
+   first_pte_gpa = gfn_to_gpa(sp-gfn) +
+   (offset + i) * sizeof(pt_element_t);
+
+   if

[PATCH v2 10/10] KVM: MMU: trace pte prefetch

2010-06-25 Thread Xiao Guangrong

Trace pte prefetch, it can help us to improve the prefetch's performance

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |   42 +-
 arch/x86/kvm/mmutrace.h|   33 +
 arch/x86/kvm/paging_tmpl.h |   29 ++---
 3 files changed, 88 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b2ad723..bcf4626 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -91,6 +91,12 @@ module_param(oos_shadow, bool, 0644);
 
 #define PTE_PREFETCH_NUM   16
 
+#define PREFETCH_SUCCESS   0
+#define PREFETCH_ERR_GFN2PFN   1
+#define PREFETCH_ERR_ALLOC_MEM 2
+#define PREFETCH_ERR_RSVD_BITS_SET 3
+#define PREFETCH_ERR_MMIO  4
+
 #define PT_FIRST_AVAIL_BITS_SHIFT 9
 #define PT64_SECOND_AVAIL_BITS_SHIFT 52
 
@@ -2002,13 +2008,16 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
 
 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp,
-   u64 *start, u64 *end)
+   u64 *start, u64 *end, u64 address)
 {
gfn_t gfn;
struct page *pages[PTE_PREFETCH_NUM];
 
-   if (pte_prefetch_topup_memory_cache(vcpu, end - start))
+   if (pte_prefetch_topup_memory_cache(vcpu, end - start)) {
+   trace_pte_prefetch(true, address, 0,
+  PREFETCH_ERR_ALLOC_MEM);
return -1;
+   }
 
gfn = sp-gfn + start - sp-spt;
while (start  end) {
@@ -2016,27 +2025,40 @@ static int direct_pte_prefetch_many(struct kvm_vcpu 
*vcpu,
int entry, j, ret;
 
addr = gfn_to_hva_many(vcpu-kvm, gfn, entry);
-   if (kvm_is_error_hva(addr))
+   if (kvm_is_error_hva(addr)) {
+   trace_pte_prefetch(true, address, 0,
+  PREFETCH_ERR_MMIO);
return -1;
+   }
 
entry = min(entry, (int)(end - start));
ret = __get_user_pages_fast(addr, entry, 1, pages);
-   if (ret = 0)
+   if (ret = 0) {
+   trace_pte_prefetch(true, address, 0,
+  PREFETCH_ERR_GFN2PFN);
return -1;
+   }
 
-   for (j = 0; j  ret; j++, gfn++, start++)
+   for (j = 0; j  ret; j++, gfn++, start++) {
+   trace_pte_prefetch(true, address, 0,
+  PREFETCH_SUCCESS);
mmu_set_spte(vcpu, start, ACC_ALL,
 sp-role.access, 0, 0, 1, NULL,
 sp-role.level, gfn,
 page_to_pfn(pages[j]), true, false);
+   }
 
-   if (ret  entry)
+   if (ret  entry) {
+   trace_pte_prefetch(true, address, 0,
+  PREFETCH_ERR_GFN2PFN);
return -1;
+   }
}
return 0;
 }
 
-static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
+static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep,
+   u64 addr)
 {
struct kvm_mmu_page *sp;
u64 *start = NULL;
@@ -2058,7 +2080,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, 
u64 *sptep)
if (*spte != shadow_trap_nonpresent_pte || spte == sptep) {
if (!start)
continue;
-   if (direct_pte_prefetch_many(vcpu, sp, start, spte)  0)
+   if (direct_pte_prefetch_many(vcpu, sp, start,
+   spte, addr)  0)
break;
start = NULL;
} else if (!start)
@@ -2080,7 +2103,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, 
int write,
 0, write, 1, pt_write,
 level, gfn, pfn, false, true);
++vcpu-stat.pf_fixed;
-   direct_pte_prefetch(vcpu, iterator.sptep);
+   direct_pte_prefetch(vcpu, iterator.sptep,
+   gfn  PAGE_SHIFT);
break;
}
 
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index 3aab0f0..c07b6a6 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -195,6 +195,39 @@ DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_prepare_zap_page,
 
TP_ARGS(sp)
 );
+
+#define pte_prefetch_err   \
+   {PREFETCH_SUCCESS,  SUCCESS

[ANNOUNCE] kvm-kmod-2.6.35-rc3

2010-06-25 Thread Jan Kiszka

No pending KVM patches for upcoming 2.6.35, so let's give it a try in
form of a release candidate.

Major KVM changes since kvm-kmod-2.6.34:
 - lots of x86 emulator fixes and improvements
 - timekeeping (kvm-clock) improvements
 - SVM: nesting correctness and performance improvements
 - tons of clean-ups and smaller fixes

kvm-kmod changes:
 - expand relative kernel paths

You can download this version from

http://downloads.sourceforge.net/project/kvm/kvm-kmod/2.6.35-rc3/kvm-kmod-2.6.35-rc3.tar.bz2
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] KVM: VMX: Execute WBINVD to keep data consistency with assigned devices

2010-06-25 Thread Sheng Yang

Some guest device driver may leverage the Non-Snoop I/O, and explicitly
WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or
CLFLUSH, we need to maintain data consistency either by:
1: flushing cache (wbinvd) when the guest is scheduled out if there is no
wbinvd exit, or
2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits.

Signed-off-by: Yaozu (Eddie) Dong eddie.d...@intel.com
Signed-off-by: Sheng Yang sh...@linux.intel.com
---
Jan-

I've check if we can make it more generic. But the logic here heavily depends on
if processor have WBINVD exit feature, and the common part with SVM is no more
than 10 lines, all in the branch of if statement. So I think it's fine to keep
them there. Maybe wbinvd_ipi() can be moved, but it's somehow strange for KVM
scope. Any suggestion to make this wrap function more clean? I hope we have
an marco can do that...

 arch/x86/include/asm/kvm_host.h |3 ++
 arch/x86/kvm/emulate.c  |5 +++-
 arch/x86/kvm/svm.c  |6 +
 arch/x86/kvm/vmx.c  |   45 ++-
 arch/x86/kvm/x86.c  |6 +
 5 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a57cdea..1c392c9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -514,6 +514,8 @@ struct kvm_x86_ops {
 
void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry);
 
+   void (*execute_wbinvd)(struct kvm_vcpu *vcpu);
+
const struct trace_print_flags *exit_reasons_str;
 };
 
@@ -571,6 +573,7 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address);
 int emulate_clts(struct kvm_vcpu *vcpu);
+int emulate_wbinvd(struct kvm_vcpu *vcpu);
 
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
 int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index abb8cec..085dcb7 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3138,8 +3138,11 @@ twobyte_insn:
emulate_clts(ctxt-vcpu);
c-dst.type = OP_NONE;
break;
-   case 0x08:  /* invd */
case 0x09:  /* wbinvd */
+   emulate_wbinvd(ctxt-vcpu);
+   c-dst.type = OP_NONE;
+   break;
+   case 0x08:  /* invd */
case 0x0d:  /* GrpP (prefetch) */
case 0x18:  /* Grp16 (prefetch/nop) */
c-dst.type = OP_NONE;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 587b99d..6929da1 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3424,6 +3424,10 @@ static bool svm_rdtscp_supported(void)
return false;
 }
 
+static void svm_execute_wbinvd(struct kvm_vcpu *vcpu)
+{
+}
+
 static void svm_fpu_deactivate(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3508,6 +3512,8 @@ static struct kvm_x86_ops svm_x86_ops = {
.rdtscp_supported = svm_rdtscp_supported,
 
.set_supported_cpuid = svm_set_supported_cpuid,
+
+   .execute_wbinvd = svm_execute_wbinvd,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e565689..fd6c7e6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -29,6 +29,7 @@
 #include linux/ftrace_event.h
 #include linux/slab.h
 #include linux/tboot.h
+#include linux/cpumask.h
 #include kvm_cache_regs.h
 #include x86.h
 
@@ -170,6 +171,8 @@ struct vcpu_vmx {
u32 exit_reason;
 
bool rdtscp_enabled;
+
+   cpumask_t wbinvd_dirty_mask;
 };
 
 static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
@@ -412,6 +415,12 @@ static inline bool cpu_has_virtual_nmis(void)
return vmcs_config.pin_based_exec_ctrl  PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline bool cpu_has_vmx_wbinvd_exit(void)
+{
+   return vmcs_config.cpu_based_2nd_exec_ctrl 
+   SECONDARY_EXEC_WBINVD_EXITING;
+}
+
 static inline bool report_flexpriority(void)
 {
return flexpriority_enabled;
@@ -874,6 +883,11 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx)
preempt_enable();
 }
 
+static void wbinvd_ipi(void *opaque)
+{
+   wbinvd();
+}
+
 /*
  * Switches to specified vcpu, until a matching vcpu_put(), but assumes
  * vcpu mutex is already taken.
@@ -905,6 +919,15 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 per_cpu(vcpus_on_cpu, cpu));
local_irq_enable();
 
+   /* Address WBINVD may be executed by guest */
+   if (vcpu-kvm-arch.iommu_domain) {
+   if (cpu_has_vmx_wbinvd_exit())
+   cpu_set(cpu, vmx-wbinvd_dirty_mask);
+   else if (vcpu-cpu

[ kvm-Bugs-2001121 ] Windows 2003 x64 - SESSION5_INITIALIZATION_FAILED

2010-06-25 Thread SourceForge.net

Bugs item #2001121, was opened at 2008-06-23 21:09
Message generated for change (Comment added) made by jessorensen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2001121group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Andreas 'ac0v' Specht (ac0v)
Assigned to: Nobody/Anonymous (nobody)
Summary: Windows 2003 x64 - SESSION5_INITIALIZATION_FAILED

Initial Comment:
Host Machine:
CPU:2x Intel(R) Xeon(R) CPU E5405  @ 2.00GHz
Kernel: Linux version 2.6.25-gentoo-r4
Arch:   x86_64
KVM:tried kvm-69 and kvm-70

Guest System:
tried Windows 2003 x64 and Windows 2003 x64 with slipstreamed Service Pack 2

Hi,

I get a BSoD (see attachment) while installing Windows 2003 x64 which contains 
the error message SESSION5_INITIALIZATION_FAILED

Serial log is empty.

I start my KVM via this command:

kvm -hda /dev/lvg1/sap-test -boot d -cdrom 
/srv/install/iso/windows/2003-server-x64.iso -vnc :4 -m 3048 -smp 4 -daemonize

Using -no-kvm or the -no-kvm-pit switch doesn't help and shows only the message 
Setup is starting Windows.

The -no-kvm-irqchip switch has no effect (same BSoD).

Any Ideas?

Regards,
Andreas 'ac0v' Specht

--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-06-25 15:32

Message:
Windows 2003 x64 r2 installs and boots just fine with a 2.6.32 kernel and
qemu-kvm based on 0.12.1, smp 4, 3072MB. The problem seems to be have been
resolved in some of the emulator updates that went in since you tried.

If you do see this problem again, please open a new bug in launchpad.

Closing.

Jes


--

Comment By: MaSc82 (masc82)
Date: 2009-01-09 16:02

Message:
The issue persists with kvm-82 modules. Neither win2003 x64 r2 CD nor
installed system will boot, failing with BSOD
SESSION5_INITIALIZATION_FAILED. Had to revert to older 2.6.28 modules
having block virtio disabled again :(

--

Comment By: MaSc82 (masc82)
Date: 2008-12-25 17:35

Message:
Updated to 2.6.28 including kvm modules, which seem to work very well with
kvm81, at the same time supporting win2003 x64, so all mentioned issues are
resolved for me, but only when using the kvm modules of linux kernel
2.6.28.

--

Comment By: MaSc82 (masc82)
Date: 2008-12-22 16:58

Message:
I've got the same issue with kvm-81 and Linux version 2.6.27-gentoo-r7.

The problem does not occur when using the kvm modules coming with the
kernel, but these (probably older?) modules still have bugs with smp and
block device virtio (temporary freeze)..

Can anyone shed some light on this, please?

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2001121group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] KVM: VMX: Execute WBINVD to keep data consistency with assigned devices

2010-06-25 Thread Jan Kiszka

Sheng Yang wrote:
 Some guest device driver may leverage the Non-Snoop I/O, and explicitly
 WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or
 CLFLUSH, we need to maintain data consistency either by:
 1: flushing cache (wbinvd) when the guest is scheduled out if there is no
 wbinvd exit, or
 2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits.
 
 Signed-off-by: Yaozu (Eddie) Dong eddie.d...@intel.com
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
 Jan-
 
 I've check if we can make it more generic. But the logic here heavily depends 
 on
 if processor have WBINVD exit feature, and the common part with SVM is no more
 than 10 lines, all in the branch of if statement.

AFAIK, all AMD processors with SVM support have wbinvd trapping. So you
can simply move the VMX part which deals with cpu_has_vmx_wbinvd_exit
into generic services to call them from SVM as well.

Or is wbinvd emulation for device pass-through an Intel-only issue? Joerg?

 So I think it's fine to keep
 them there. Maybe wbinvd_ipi() can be moved, but it's somehow strange for KVM
 scope. Any suggestion to make this wrap function more clean? I hope we have
 an marco can do that...
 
  arch/x86/include/asm/kvm_host.h |3 ++
  arch/x86/kvm/emulate.c  |5 +++-
  arch/x86/kvm/svm.c  |6 +
  arch/x86/kvm/vmx.c  |   45 
 ++-
  arch/x86/kvm/x86.c  |6 +
  5 files changed, 63 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index a57cdea..1c392c9 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -514,6 +514,8 @@ struct kvm_x86_ops {
  
   void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry);
  
 + void (*execute_wbinvd)(struct kvm_vcpu *vcpu);
 +
   const struct trace_print_flags *exit_reasons_str;
  };
  
 @@ -571,6 +573,7 @@ void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
  int kvm_emulate_halt(struct kvm_vcpu *vcpu);
  int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address);
  int emulate_clts(struct kvm_vcpu *vcpu);
 +int emulate_wbinvd(struct kvm_vcpu *vcpu);
  
  void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int 
 seg);
  int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int 
 seg);
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index abb8cec..085dcb7 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -3138,8 +3138,11 @@ twobyte_insn:
   emulate_clts(ctxt-vcpu);
   c-dst.type = OP_NONE;
   break;
 - case 0x08:  /* invd */
   case 0x09:  /* wbinvd */
 + emulate_wbinvd(ctxt-vcpu);
 + c-dst.type = OP_NONE;
 + break;
 + case 0x08:  /* invd */
   case 0x0d:  /* GrpP (prefetch) */
   case 0x18:  /* Grp16 (prefetch/nop) */
   c-dst.type = OP_NONE;
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 587b99d..6929da1 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -3424,6 +3424,10 @@ static bool svm_rdtscp_supported(void)
   return false;
  }
  
 +static void svm_execute_wbinvd(struct kvm_vcpu *vcpu)
 +{
 +}
 +
  static void svm_fpu_deactivate(struct kvm_vcpu *vcpu)
  {
   struct vcpu_svm *svm = to_svm(vcpu);
 @@ -3508,6 +3512,8 @@ static struct kvm_x86_ops svm_x86_ops = {
   .rdtscp_supported = svm_rdtscp_supported,
  
   .set_supported_cpuid = svm_set_supported_cpuid,
 +
 + .execute_wbinvd = svm_execute_wbinvd,
  };
  
  static int __init svm_init(void)
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index e565689..fd6c7e6 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -29,6 +29,7 @@
  #include linux/ftrace_event.h
  #include linux/slab.h
  #include linux/tboot.h
 +#include linux/cpumask.h
  #include kvm_cache_regs.h
  #include x86.h
  
 @@ -170,6 +171,8 @@ struct vcpu_vmx {
   u32 exit_reason;
  
   bool rdtscp_enabled;
 +
 + cpumask_t wbinvd_dirty_mask;
  };
  
  static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
 @@ -412,6 +415,12 @@ static inline bool cpu_has_virtual_nmis(void)
   return vmcs_config.pin_based_exec_ctrl  PIN_BASED_VIRTUAL_NMIS;
  }
  
 +static inline bool cpu_has_vmx_wbinvd_exit(void)
 +{
 + return vmcs_config.cpu_based_2nd_exec_ctrl 
 + SECONDARY_EXEC_WBINVD_EXITING;
 +}
 +
  static inline bool report_flexpriority(void)
  {
   return flexpriority_enabled;
 @@ -874,6 +883,11 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx)
   preempt_enable();
  }
  
 +static void wbinvd_ipi(void *opaque)
 +{
 + wbinvd();
 +}
 +
  /*
   * Switches to specified vcpu, until a matching vcpu_put(), but assumes
   * vcpu mutex is already taken.
 @@ -905,6 +919,15 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu,

[ kvm-Bugs-1949429 ] Windows XP 2003 - 64-bit Editions may FAIL during setup

2010-06-25 Thread SourceForge.net

Bugs item #1949429, was opened at 2008-04-23 09:40
Message generated for change (Comment added) made by jessorensen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1949429group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: Windows XP  2003 - 64-bit Editions may FAIL during setup

Initial Comment:
Windows XP  2003 - 64-bit Editions may FAIL during setup. Guest OS stucks 
during the second stage setup (graphical stage), and proceed nowhere. I must 
kill VM manually and restart setup from scratch.

Reproducible: Sometimes.

It applies to all KVM-60 series (from KVM-60 up to KVM-67) on Intel. Other KVM 
versions below and above may be affected as well.

I do not have any debug, because it is hard to reproduce.

-Alexey Technologov, 23.04.2008.

--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-06-25 15:34

Message:
Hi,

Are you still seeing this, or can we close the bug?

Just ran a 2003x64 install test here and encountered no problems, but your
report states it only happens sometimes?

Thanks,
Jes


--

Comment By: Technologov (technologov)
Date: 2008-08-03 10:38

Message:
Logged In: YES 
user_id=1839746
Originator: YES

Still happens with KVM-71.

-Alexey, 3.8.2008.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1949429group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2034672 ] guest: BUG: soft lockup - CPU#0 stuck for 41s!

2010-06-25 Thread SourceForge.net

Bugs item #2034672, was opened at 2008-08-01 08:22
Message generated for change (Comment added) made by jessorensen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2034672group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Rafal Wijata (ravpl)
Assigned to: Nobody/Anonymous (nobody)
Summary: guest: BUG: soft lockup - CPU#0 stuck for 41s!

Initial Comment:
host: kvm71, 64bit 2.6.25.11-60.fc8, 8Gram, 2*E5420(8cores), 3ware raid10
guest: 64bit 2.6.18-92.1.6.el5, 5Gram, 6cpus, hdd on raw file.

I know this bug happens even in non-virtual machines(browsing internet shows 
that clearly), but inside kvm I'm getting excessive rate of this bug (under 
load, even few times a hour)

An example can be found at end of this message. The record was something over 
500 seconds !!

Now, I suspect it has something to do with the network or net driver. There's 
almost always either swapper or network service in the backtrace.
But I cannot confirm for surely.

BUG: soft lockup - CPU#0 stuck for 41s! [events/0:20]
CPU 0:
Modules linked in: nfsd exportfs auth_rpcgss ipv6 xfrm_nalgo crypto_api autofs4 
nfs lockd fscache nfs_acl sunrpc dm_mirror dm_multipath dm_mod video sbs 
backlight i2c_ec button battery asus_acpi acpi_memhotplug ac lp floppy loop 
ide_cd parport_pc i2c_piix4 serio_raw parport cdrom i2c_core e1000 pcspkr 
ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 20, comm: events/0 Not tainted 2.6.18-92.1.6.el5 #1
RIP: 0010:[80011ec7]  [80011ec7] __do_softirq+0x53/0xd6
RSP: 0018:80418f60  EFLAGS: 0206
RAX: 0002 RBX: 803b6f80 RCX: 0380
RDX: 81015f9e7fd8 RSI: 0280 RDI: 81015f9d97a0
RBP: 80418ee0 R08: 0001 R09: 810080bf5000
R10: 0046 R11: 0246 R12: 8005dc8e
R13: 0002 R14: 80077090 R15: 80418ee0
FS:  () GS:8039f000() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 00300c203080 CR3: 00015df0a000 CR4: 06e0

Call Trace:
 IRQ  [8005e2fc] call_softirq+0x1c/0x28
 [8006c6e4] do_softirq+0x2c/0x85
 [8005dc8e] apic_timer_interrupt+0x66/0x6c
 EOI  [80064af8] _spin_unlock_irqrestore+0x8/0x9
 [880fdc61] :e1000:e1000_update_stats+0x5f6/0x5fd
 [88101ed5] :e1000:e1000_watchdog_task+0x535/0x65a
 [8004cea9] run_workqueue+0x94/0xe4
 [800497be] worker_thread+0x0/0x122
 [800498ae] worker_thread+0xf0/0x122
 [8008ad76] default_wake_function+0x0/0xe
 [8003253d] kthread+0xfe/0x132
 [8005dfb1] child_rip+0xa/0x11
 [8003243f] kthread+0x0/0x132
 [8005dfa7] child_rip+0x0/0x11

BUG: soft lockup - CPU#2 stuck for 17s! [swapper:0]
CPU 2:
Modules linked in: nfsd exportfs auth_rpcgss ipv6 xfrm_nalgo crypto_api autofs4 
nfs lockd fscache nfs_acl sunrpc dm_mirror dm_multipath dm_mod video sbs 
backlight i2c_ec button battery asus_acpi acpi_memhotplug ac lp floppy loop 
ide_cd parport_pc i2c_piix4 serio_raw parport cdrom i2c_core e1000 pcspkr 
ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-92.1.6.el5 #1
RIP: 0010:[8006aed7]  [8006aed7] default_idle+0x29/0x50
RSP: 0018:810104e63ef0  EFLAGS: 0246
RAX:  RBX: 0002 RCX: 
RDX:  RSI: 0001 RDI: 802e6658
RBP: 810104e1d270 R08: 810104e62000 R09: 003e
R10: 810104f64038 R11:  R12: 0c51b3f5
R13: 3434e623bb62 R14: 81015f9db7e0 R15: 810104e1d080
FS:  () GS:810104e1cec0() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 2b3a3b647230 CR3: 00201000 CR4: 06e0

Call Trace:
 [80048b1d] cpu_idle+0x95/0xb8
 [800767da] start_secondary+0x45a/0x469


--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-06-25 15:44

Message:
Hi,

Looking through old bugs. Do you still see this problem or can we close
the bug?

I believe a lot of these problems have been fixed in more recent KVM, but
if you could let us know that would be great.

Thanks,
Jes


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2034672group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-1817779 ] KVM crash with Windows XP guest because of ACPI

2010-06-25 Thread SourceForge.net

Bugs item #1817779, was opened at 2007-10-22 13:02
Message generated for change (Settings changed) made by jessorensen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1817779group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: KVM crash with Windows XP guest because of ACPI

Initial Comment:
Host: Fedora7, 64-bit, Intel CPU, KVM-48.

When I start Windows XP guest, that was installed with ACPI enabled, without 
ACPI in KVM, KVM crashes.

The command is:
[alex...@pink-intel ~]$ ./qemu-kvm -hda 
/isos/disks-vm/alexeye/WindowsXP-Pro.vmdk -m 512 -no-acpi

With -no-kvm it stucks, but not crashes.

The same crash happens with -no-acpi -no-kvm-irqchip parameters.

-Technologov

--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-06-25 15:55

Message:
With recent KVM kernel 2.6.32 and qemu-kvm 0.12.1, an XP guest installed
with ACPI no longer takes down qemu-kvm when booted with the -no-acpi
flag.

As expected Windows refuses to boot and offers safe mode and then bails,
since too many system parameters is changed, but qemu-kvm survives it
fine.

Closing

Jes


--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-06-11 11:12

Message:
Hi,

Looking through old bugs - please let us know if this still happens with
recent QEMU/KVM. If not, lets close this bug.

Thanks,
Jes


--

Comment By: argoo (argoo)
Date: 2007-10-26 19:41

Message:
Logged In: YES 
user_id=865799
Originator: NO

I recommend following this workaround...
http://kvm.qumranet.com/kvmwiki/Windows_ACPI_Workaround

--

Comment By: Technologov (technologov)
Date: 2007-10-22 15:01

Message:
Logged In: YES 
user_id=1839746
Originator: YES

Attached stack with unhandled vm exit.

--

Comment By: Technologov (technologov)
Date: 2007-10-22 13:18

Message:
Logged In: YES 
user_id=1839746
Originator: YES

Attached stack with unhandled vm exit.

--

Comment By: Technologov (technologov)
Date: 2007-10-22 13:03

Message:
Logged In: YES 
user_id=1839746
Originator: YES

File Added: KVM48-VMX64-WindowsXP-no-acpi.txt

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1817779group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] virtio: Support releasing lock during kick

2010-06-25 Thread Stefan Hajnoczi

On Fri, Jun 25, 2010 at 01:43:17PM +0300, Michael S. Tsirkin wrote:
 On Fri, Jun 25, 2010 at 12:39:21PM +0930, Rusty Russell wrote:
  On Thu, 24 Jun 2010 03:00:30 pm Stefan Hajnoczi wrote:
   On Wed, Jun 23, 2010 at 11:12 PM, Anthony Liguori anth...@codemonkey.ws 
   wrote:
Shouldn't it be possible to just drop the lock before invoking
virtqueue_kick() and reacquire it afterwards?  There's nothing in that
virtqueue_kick() path that the lock is protecting AFAICT.
   
   No, that would lead to a race condition because vq-num_added is
   modified by both virtqueue_add_buf_gfp() and virtqueue_kick().
   Without a lock held during virtqueue_kick() another vcpu could add
   bufs while vq-num_added is used and cleared by virtqueue_kick():
  
  Right, this dovetails with another proposed change (was it Michael?)
  where we would update the avail idx inside add_buf, rather than waiting
  until kick.  This means a barrier inside add_buf, but that's probably
  fine.
  
  If we do that, then we don't need a lock on virtqueue_kick.
  
  Michael, thoughts?
 
 Maybe not even that: I think we could just do virtio_wmb()
 in add, and keep the mb() in kick.
 
 What I'm a bit worried about is contention on the cacheline
 including index and flags: the more we write to that line,
 the worse it gets.
 
 So need to test performance impact of this change:
 I didn't find time to do this yet, as I am trying
 to finalize the used index publishing patches.
 Any takers?
 
 Do we see performance improvement after making kick lockless?

There was no guest CPU reduction or I/O throughput increase with my
patch when running 4 dd iflag=direct bs=4k if=/dev/vdb of=/dev/null
processes.  However, the lock_stat numbers above show clear improvement
of the lock hold/wait times.

I was hoping to see guest CPU utilization go down and I/O throughput go
up, so there is still investigation to do with my patch in isolation.
Although I'd like to try it later, putting my patch on top of your avail
idx work is too early because it will be harder to reason about the
performance with both patches present at the same time.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] virtio: Support releasing lock during kick

2010-06-25 Thread Michael S. Tsirkin

On Fri, Jun 25, 2010 at 04:31:44PM +0100, Stefan Hajnoczi wrote:
 On Fri, Jun 25, 2010 at 01:43:17PM +0300, Michael S. Tsirkin wrote:
  On Fri, Jun 25, 2010 at 12:39:21PM +0930, Rusty Russell wrote:
   On Thu, 24 Jun 2010 03:00:30 pm Stefan Hajnoczi wrote:
On Wed, Jun 23, 2010 at 11:12 PM, Anthony Liguori 
anth...@codemonkey.ws wrote:
 Shouldn't it be possible to just drop the lock before invoking
 virtqueue_kick() and reacquire it afterwards?  There's nothing in that
 virtqueue_kick() path that the lock is protecting AFAICT.

No, that would lead to a race condition because vq-num_added is
modified by both virtqueue_add_buf_gfp() and virtqueue_kick().
Without a lock held during virtqueue_kick() another vcpu could add
bufs while vq-num_added is used and cleared by virtqueue_kick():
   
   Right, this dovetails with another proposed change (was it Michael?)
   where we would update the avail idx inside add_buf, rather than waiting
   until kick.  This means a barrier inside add_buf, but that's probably
   fine.
   
   If we do that, then we don't need a lock on virtqueue_kick.
   
   Michael, thoughts?
  
  Maybe not even that: I think we could just do virtio_wmb()
  in add, and keep the mb() in kick.
  
  What I'm a bit worried about is contention on the cacheline
  including index and flags: the more we write to that line,
  the worse it gets.
  
  So need to test performance impact of this change:
  I didn't find time to do this yet, as I am trying
  to finalize the used index publishing patches.
  Any takers?
  
  Do we see performance improvement after making kick lockless?
 
 There was no guest CPU reduction or I/O throughput increase with my
 patch when running 4 dd iflag=direct bs=4k if=/dev/vdb of=/dev/null
 processes.  However, the lock_stat numbers above show clear improvement
 of the lock hold/wait times.
 
 I was hoping to see guest CPU utilization go down and I/O throughput go
 up, so there is still investigation to do with my patch in isolation.
 Although I'd like to try it later, putting my patch on top of your avail
 idx work is too early because it will be harder to reason about the
 performance with both patches present at the same time.
 
 Stefan

What about host CPU utilization?
Also, are you using PARAVIRT_SPINLOCKS?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] virtio: Support releasing lock during kick

2010-06-25 Thread Stefan Hajnoczi

On Fri, Jun 25, 2010 at 06:32:20PM +0300, Michael S. Tsirkin wrote:
 On Fri, Jun 25, 2010 at 04:31:44PM +0100, Stefan Hajnoczi wrote:
  On Fri, Jun 25, 2010 at 01:43:17PM +0300, Michael S. Tsirkin wrote:
   On Fri, Jun 25, 2010 at 12:39:21PM +0930, Rusty Russell wrote:
On Thu, 24 Jun 2010 03:00:30 pm Stefan Hajnoczi wrote:
 On Wed, Jun 23, 2010 at 11:12 PM, Anthony Liguori 
 anth...@codemonkey.ws wrote:
  Shouldn't it be possible to just drop the lock before invoking
  virtqueue_kick() and reacquire it afterwards?  There's nothing in 
  that
  virtqueue_kick() path that the lock is protecting AFAICT.
 
 No, that would lead to a race condition because vq-num_added is
 modified by both virtqueue_add_buf_gfp() and virtqueue_kick().
 Without a lock held during virtqueue_kick() another vcpu could add
 bufs while vq-num_added is used and cleared by virtqueue_kick():

Right, this dovetails with another proposed change (was it Michael?)
where we would update the avail idx inside add_buf, rather than waiting
until kick.  This means a barrier inside add_buf, but that's probably
fine.

If we do that, then we don't need a lock on virtqueue_kick.

Michael, thoughts?
   
   Maybe not even that: I think we could just do virtio_wmb()
   in add, and keep the mb() in kick.
   
   What I'm a bit worried about is contention on the cacheline
   including index and flags: the more we write to that line,
   the worse it gets.
   
   So need to test performance impact of this change:
   I didn't find time to do this yet, as I am trying
   to finalize the used index publishing patches.
   Any takers?
   
   Do we see performance improvement after making kick lockless?
  
  There was no guest CPU reduction or I/O throughput increase with my
  patch when running 4 dd iflag=direct bs=4k if=/dev/vdb of=/dev/null
  processes.  However, the lock_stat numbers above show clear improvement
  of the lock hold/wait times.
  
  I was hoping to see guest CPU utilization go down and I/O throughput go
  up, so there is still investigation to do with my patch in isolation.
  Although I'd like to try it later, putting my patch on top of your avail
  idx work is too early because it will be harder to reason about the
  performance with both patches present at the same time.
  
  Stefan
 
 What about host CPU utilization?

There is data available for host CPU utilization, I need to dig it up.

 Also, are you using PARAVIRT_SPINLOCKS?

No.  I haven't found much documentation on paravirt spinlocks other than
the commit that introduced them:

  commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8
  Author: Jeremy Fitzhardinge jer...@goop.org
  Date:   Mon Jul 7 12:07:51 2008 -0700

  paravirt: introduce a lock-byte spinlock implementation

PARAVIRT_SPINLOCKS is not set in the config I use, probably because of
the associated performance issue that causes distros to build without
them:

  commit b4ecc126991b30fe5f9a59dfacda046aeac124b2
  Author: Jeremy Fitzhardinge jer...@goop.org
  Date:   Wed May 13 17:16:55 2009 -0700

  x86: Fix performance regression caused by paravirt_ops on native
  kernels

I would expect performance results to be smoother with
PARAVIRT_SPINLOCKS for the guest kernel.  I will add it for future runs,
thanks for pointing it out.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2063072 ] compiling problem with tcg_ctx

2010-06-25 Thread SourceForge.net

Bugs item #2063072, was opened at 2008-08-20 23:29
Message generated for change (Comment added) made by jessorensen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2063072group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: None
Status: Closed
Resolution: Works For Me
Priority: 5
Private: No
Submitted By: Jana Delego (janado)
Assigned to: Anthony Liguori (aliguori)
Summary: compiling problem with tcg_ctx

Initial Comment:
When compiling kvm using the --disable-cpu-emulation flag on a 64 bit Intel 
Ubuntu, the compiler aborts with error undefined reference to tcg_ctx,

This problem exists since kvm-70.


--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-06-25 18:10

Message:
upstream qemu-kvm builds and boots fine with --disable-cpu-emulation now.

Closing
Jes


--

Comment By: Avi Kivity (avik)
Date: 2008-10-02 16:05

Message:
Well, it would be nice to support --disable-cpu-emulation, for example if
you're worried about tcg security holes or tcg performance.

--

Comment By: Anthony Liguori (aliguori)
Date: 2008-09-29 15:56

Message:
--disable-cpu-emulation should not be used with x86.  It only exists as an
ugly hack because ia64 doesn't support TCG.

--

Comment By: Shen Okinudo (okinu)
Date: 2008-09-29 03:37

Message:
This bug persists in kvm-76

--

Comment By: Marshal Newrock (freedombi)
Date: 2008-09-02 01:40

Message:
Logged In: YES 
user_id=2201280
Originator: NO

This seems to work with kvm-74.  The patch allowed compilation, and the
guest appears to be running well.

--

Comment By: Amit Shah (amitshah)
Date: 2008-08-29 11:59

Message:
Logged In: YES 
user_id=201894
Originator: NO

I'm not sure if this will make qemu work properly, but it fixes the build
(also attached). Can you confirm if this works?

commit 244cafe6688940c25c81b31aa223c9e24656806e
Author: Amit Shah amit.s...@qumranet.com
Date:   Fri Aug 29 15:20:14 2008 +0530

KVM: QEMU: Fix userspace build with --disable-cpu-emulation

I'm not sure this will work properly, but fixes the build.
ppc might need something like this as well

Signed-off-by: Amit Shah amit.s...@qumranet.com

diff --git a/qemu/target-i386/fake-exec.c b/qemu/target-i386/fake-exec.c
index 737286d..552089b 100644
--- a/qemu/target-i386/fake-exec.c
+++ b/qemu/target-i386/fake-exec.c
@@ -12,6 +12,13 @@
  */
 #include exec.h
 #include cpu.h
+#include tcg.h
+
+/* code generation context */
+TCGContext tcg_ctx;
+
+uint16_t gen_opc_buf[OPC_BUF_SIZE];
+TCGArg gen_opparam_buf[OPPARAM_BUF_SIZE];

 int code_copy_enabled = 0;

@@ -45,10 +52,6 @@ int cpu_x86_gen_code(CPUState *env, TranslationBlock
*tb, int *gen_code_size_ptr
 return 0;
 }

-void flush_icache_range(unsigned long start, unsigned long stop)
-{
-}
-
 void optimize_flags_init(void)
 {
 }


File Added: 0001-KVM-QEMU-Fix-userspace-build-with-disable-cpu-em.patch

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2063072group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-1949429 ] Windows XP 2003 - 64-bit Editions may FAIL during setup

2010-06-25 Thread SourceForge.net

Bugs item #1949429, was opened at 2008-04-23 10:40
Message generated for change (Comment added) made by technologov
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1949429group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Closed
Resolution: Works For Me
Priority: 7
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: Windows XP  2003 - 64-bit Editions may FAIL during setup

Initial Comment:
Windows XP  2003 - 64-bit Editions may FAIL during setup. Guest OS stucks 
during the second stage setup (graphical stage), and proceed nowhere. I must 
kill VM manually and restart setup from scratch.

Reproducible: Sometimes.

It applies to all KVM-60 series (from KVM-60 up to KVM-67) on Intel. Other KVM 
versions below and above may be affected as well.

I do not have any debug, because it is hard to reproduce.

-Alexey Technologov, 23.04.2008.

--

Comment By: Technologov (technologov)
Date: 2010-06-25 19:27

Message:
Nope, I can't reproduce this anymore.

Running on RHEL-5.5 (and it's default KVM shipped with the distro).

--

Comment By: Jes Sorensen (jessorensen)
Date: 2010-06-25 16:34

Message:
Hi,

Are you still seeing this, or can we close the bug?

Just ran a 2003x64 install test here and encountered no problems, but your
report states it only happens sometimes?

Thanks,
Jes


--

Comment By: Technologov (technologov)
Date: 2008-08-03 11:38

Message:
Logged In: YES 
user_id=1839746
Originator: YES

Still happens with KVM-71.

-Alexey, 3.8.2008.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1949429group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM in remote server with bridge in a single ethernet interface

2010-06-25 Thread Armando Montiel

Hi,

I have only one ethernet port in a remote server. (eth0)

I have a public address with x.x.x.164 netmask 255.255.255.240 gw x.x.x.161

and want to use in my guest OS the next available ip address (x.x.x.165 netmask 
255.255.255.240 gw x.x.x.161)

Is this posible with brctl to achieve this?

I did a file called ifcfg-br0 with:


DEVICE=br0
TYPE=Bridge
BOOTPROTO=none
BROADCAST=x.x.x.175
HWADDR=xx:xx:xx:xx:xx:xx
IPADDR=x.x.x.164
NETMASK=255.255.255.240
NETWORK=x.x.x.160
ONBOOT=yes
GATEWAY=x.x.x.161

then replace ifcfg-eth0 with:
DEVICE=eth0
BRIDGE=br0
#BOOTPROTO=none
ONBOOT=yes

then reboot, after that i use:



I was connected to my remote server but problems begin when I assigned the 
x.x.x.165 ip addres to the guest OS with

virt-manager to begin installation. I lost the remote connection. Maybe I miss 
something to avoid loosing the connection ?

i'm still receiving ping from x.x.x.165 but x.x.x.164 is not responding.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-1900228 ] Time on guest slows down sometimes...

2010-06-25 Thread SourceForge.net

Bugs item #1900228, was opened at 2008-02-23 10:26
Message generated for change (Comment added) made by glommer
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1900228group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: amd
Group: None
Status: Open
Resolution: None
Priority: 6
Private: No
Submitted By: stevie1024 (stevie1024)
Assigned to: Nobody/Anonymous (nobody)
Summary: Time on guest slows down sometimes...

Initial Comment:
I run kvm version 60 on an linux 2.6.24 kernel on an AMD 64 processor (see 
below for details).

I installed 2 guest machines, one linux and one windows (XP) and started them 
both. The clocks on both guests are sometimes slowed down.

If e.g. I play some youtube clips on the Windows guest, the clock of the 
Windows guest starts lagging about 15% (time is about 15% slower than host, or 
real, time).

If I cause some load on the linux guest, e.g. with 'tar jcvf test /usr', the 
clock of this guest also runs about 5% slower. And there's a message in the 
client syslog saying 'warning: many lost ticks, Your time source seems to be 
instable or some driver is hogging interupts'.

Running e.g. 'tar jcvf test /usr' on the host doesn't have any influence on the 
guest clocks.

I think this is quite a serious bug, as 'ntp' can't compensate for this 
instability. If I understand correctly ntp can only compensate for clocks that 
are too slow or too fast by a reasonably fixed (low) percentage.


/proc/cpuinfo:

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 95
model name  : AMD Athlon(tm) Processor LE-1600
stepping: 3
cpu MHz : 2204.998
cache size  : 1024 KB
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt rdtscp lm 
3dnowext 3dnow up rep_good pni cx16 lahf_lm svm extapic cr8_legacy
bogomips: 4413.24
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

uname -a:

Linux bigbird 2.6.24-1-amd64 #1 SMP Mon Feb 11 13:47:43 UTC 2008 x86_64 
GNU/Linux

clients were started with:

kvm -m 1024 -usb -hda wxp_test.img -net nic,vlan=0 -net 
user,vlan=0,hostname=wxp-test  -boot c  -vnc :0

and

kvm -m 512 -usb -hda debian_server.img -net nic,vlan=0 -net 
user,vlan=0,hostname=debian-server -vnc :1



--

Comment By: Glauber de Oliveira Costa (glommer)
Date: 2010-06-25 17:01

Message:
Specially for windows guests, the option -rt-td-hack may be helpful. For
linux guests, if you can run a new enough guest, using a guest with
kvmclock can make the clock go in sync .

--

Comment By: stevie1024 (stevie1024)
Date: 2008-02-23 10:32

Message:
Logged In: YES 
user_id=2017347
Originator: YES

I forgot to mention, I think bug 1826080 is related to this, but I think
my description is more general.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1900228group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/4] KVM: cleanup: remove kvm_get_dirty_log()

2010-06-25 Thread Alexander Graf


On 23.06.2010, at 08:01, Takuya Yoshikawa wrote:

 kvm_get_dirty_log() is a helper function for kvm_vm_ioctl_get_dirty_log() 
 which
 is currently used by ia64 and ppc and the following is what it is doing:
 
  - sanity checks
  - bitmap scan to check if the slot is dirty
  - copy_to_user()
 
 Considering the fact that x86 is not using this anymore and sanity checks must
 be done before kvm_ia64_sync_dirty_log(), we can say that this is not working
 for code sharing effectively. So we just remove this.

This patch plus 4/4 broke dirty bitmap updating on PPC. I didn't get around to 
track down why, but I figured you should now. Is there any way to get you a PPC 
development box? A simple G4 or G5 should be 200$ on ebay by now :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

KVM in remote server with bridge in a single ethernet interface

2010-06-25 Thread Armando Montiel

Hi,

I have only one ethernet port in a remote server. (eth0)

I have a public address with x.x.x.164 netmask 255.255.255.240 gw x.x.x.161

and want to use in my guest OS the next available ip address (x.x.x.165
netmask 255.255.255.240 gw x.x.x.161)

Is this posible with brctl to achieve this?

I did a file called ifcfg-br0 with:


DEVICE=br0
TYPE=Bridge
BOOTPROTO=none
BROADCAST=x.x.x.175
HWADDR=xx:xx:xx:xx:xx:xx
IPADDR=x.x.x.164
NETMASK=255.255.255.240
NETWORK=x.x.x.160
ONBOOT=yes
GATEWAY=x.x.x.161

then replace ifcfg-eth0 with:
DEVICE=eth0
BRIDGE=br0
#BOOTPROTO=none
ONBOOT=yes

then reboot, after that i use:



I was connected to my remote server but problems begin when I assigned
the x.x.x.165 ip addres to the guest OS with

virt-manager to begin installation. I lost the remote connection. Maybe
I miss something to avoid loosing the connection ?

i'm still receiving ping from x.x.x.165 but x.x.x.164 is not responding.


Here is my config:

Distro: Centos 5.5 x64

Linux v2.noc.com.mx 2.6.18-194.3.1.el5 #1 SMP Thu May 13 13:08:30 EDT
2010 x86_64 x86_64 x86_64 GNU/Linux

kvm-83-164.el5_5.9
kvm-qemu-img-83-164.el5_5.9
kmod-kvm-83-164.el5_5.9
etherboot-zroms-kvm-5.4.4-13.el5.centos

eth0  Link encap:Ethernet  HWaddr xx:xx:xx:xx:xx:xx
  inet addr:x.x.x.164  Bcast:x.x.x.175  Mask:255.255.255.240
  inet6 addr: fe80::225:90ff:fe04:7874/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:7140395 errors:0 dropped:0 overruns:0 frame:0
  TX packets:2491842 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:9632227753 (8.9 GiB)  TX bytes:226154906 (215.6 MiB)
  Memory:fb5e-fb60

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:109391 errors:0 dropped:0 overruns:0 frame:0
  TX packets:109391 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:46719292 (44.5 MiB)  TX bytes:46719292 (44.5 MiB)

virbr0Link encap:Ethernet  HWaddr 00:00:00:00:00:00
  inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
  inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:357 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:0 (0.0 b)  TX bytes:110402 (107.8 KiB)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S_32 MMU debug compile fixes

2010-06-25 Thread Alexander Graf

Due to previous changes, the Book3S_32 guest MMU code didn't compile properly
when enabling debugging.

This patch repairs the broken code paths, making it possible to define DEBUG_MMU
and friends again.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_32_mmu.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 3292d76..079760b 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -104,7 +104,7 @@ static hva_t kvmppc_mmu_book3s_32_get_pteg(struct 
kvmppc_vcpu_book3s *vcpu_book3
pteg = (vcpu_book3s-sdr1  0x) | hash;
 
dprintk(MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n,
-   vcpu_book3s-vcpu.arch.pc, eaddr, vcpu_book3s-sdr1, pteg,
+   kvmppc_get_pc(vcpu_book3s-vcpu), eaddr, vcpu_book3s-sdr1, 
pteg,
sre-vsid);
 
r = gfn_to_hva(vcpu_book3s-vcpu.kvm, pteg  PAGE_SHIFT);
@@ -269,7 +269,7 @@ no_page_found:
dprintk_pte(KVM MMU: No PTE found (sdr1=0x%llx ptegp=0x%lx)\n,
to_book3s(vcpu)-sdr1, ptegp);
for (i=0; i16; i+=2) {
-   dprintk_pte(   %02d: 0x%x - 0x%x (0x%llx)\n,
+   dprintk_pte(   %02d: 0x%x - 0x%x (0x%x)\n,
i, pteg[i], pteg[i+1], ptem);
}
}
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Faster MMU lookups for Book3s

2010-06-25 Thread Alexander Graf

Book3s suffered from my really bad shadow MMU implementation so far. So
I finally got around to implement a combined hash and list mechanism that
allows for much faster lookup of mapped pages.

To show that it really is faster, I tried to run simple process spawning
code inside the guest with and without these patches:

[without]

debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello  /dev/null; done

real0m20.235s
user0m10.418s
sys 0m9.766s

[with]

debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello  /dev/null; done

real0m14.659s
user0m8.967s
sys 0m5.688s

So as you can see, performance improved significantly.

Alexander Graf (2):
  KVM: PPC: Add generic hpte management functions
  KVM: PPC: Make use of hash based Shadow MMU

 arch/powerpc/include/asm/kvm_book3s.h |7 +
 arch/powerpc/include/asm/kvm_host.h   |   18 ++-
 arch/powerpc/kvm/Makefile |2 +
 arch/powerpc/kvm/book3s_32_mmu_host.c |  104 ++---
 arch/powerpc/kvm/book3s_64_mmu_host.c |   98 +---
 arch/powerpc/kvm/book3s_mmu_hpte.c|  286 +
 6 files changed, 327 insertions(+), 188 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_mmu_hpte.c

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Make use of hash based Shadow MMU

2010-06-25 Thread Alexander Graf

We just introduced generic functions to handle shadow pages on PPC.
This patch makes the respective backends make use of them, getting
rid of a lot of duplicate code along the way.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h |7 ++
 arch/powerpc/include/asm/kvm_host.h   |   18 +-
 arch/powerpc/kvm/Makefile |2 +
 arch/powerpc/kvm/book3s_32_mmu_host.c |  104 +++-
 arch/powerpc/kvm/book3s_64_mmu_host.c |   98 ++
 5 files changed, 41 insertions(+), 188 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 4e99559..a96e405 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -115,6 +115,13 @@ extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu 
*vcpu);
 extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte);
 extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr);
 extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
+
+extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache 
*pte);
+extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu);
+extern void kvmppc_mmu_hpte_destroy(struct kvm_vcpu *vcpu);
+extern int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu);
+extern void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache 
*pte);
+
 extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, 
bool data);
 extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, 
bool data);
 extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int 
vec);
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 0c9ad86..895eb63 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -38,7 +38,13 @@
 #define KVM_NR_PAGE_SIZES  1
 #define KVM_PAGES_PER_HPAGE(x) (1UL31)
 
-#define HPTEG_CACHE_NUM 1024
+#define HPTEG_CACHE_NUM(1  15)
+#define HPTEG_HASH_BITS_PTE13
+#define HPTEG_HASH_BITS_VPTE   13
+#define HPTEG_HASH_BITS_VPTE_LONG  5
+#define HPTEG_HASH_NUM_PTE (1  HPTEG_HASH_BITS_PTE)
+#define HPTEG_HASH_NUM_VPTE(1  HPTEG_HASH_BITS_VPTE)
+#define HPTEG_HASH_NUM_VPTE_LONG   (1  HPTEG_HASH_BITS_VPTE_LONG)
 
 struct kvm;
 struct kvm_run;
@@ -151,6 +157,9 @@ struct kvmppc_mmu {
 };
 
 struct hpte_cache {
+   struct list_head list_pte;
+   struct list_head list_vpte;
+   struct list_head list_vpte_long;
u64 host_va;
u64 pfn;
ulong slot;
@@ -282,8 +291,11 @@ struct kvm_vcpu_arch {
unsigned long pending_exceptions;
 
 #ifdef CONFIG_PPC_BOOK3S
-   struct hpte_cache hpte_cache[HPTEG_CACHE_NUM];
-   int hpte_cache_offset;
+   struct kmem_cache *hpte_cache;
+   struct list_head hpte_hash_pte[HPTEG_HASH_NUM_PTE];
+   struct list_head hpte_hash_vpte[HPTEG_HASH_NUM_VPTE];
+   struct list_head hpte_hash_vpte_long[HPTEG_HASH_NUM_VPTE_LONG];
+   int hpte_cache_count;
 #endif
 };
 
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index ff43606..d45c818 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -45,6 +45,7 @@ kvm-book3s_64-objs := \
book3s.o \
book3s_emulate.o \
book3s_interrupts.o \
+   book3s_mmu_hpte.o \
book3s_64_mmu_host.o \
book3s_64_mmu.o \
book3s_32_mmu.o
@@ -57,6 +58,7 @@ kvm-book3s_32-objs := \
book3s.o \
book3s_emulate.o \
book3s_interrupts.o \
+   book3s_mmu_hpte.o \
book3s_32_mmu_host.o \
book3s_32_mmu.o
 kvm-objs-$(CONFIG_KVM_BOOK3S_32) := $(kvm-book3s_32-objs)
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 904f5ac..0b51ef8 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -58,105 +58,19 @@
 static ulong htab;
 static u32 htabmask;
 
-static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
+void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 {
volatile u32 *pteg;
 
-   dprintk_mmu(KVM: Flushing SPTE: 0x%llx (0x%llx) - 0x%llx\n,
-   pte-pte.eaddr, pte-pte.vpage, pte-host_va);
-
+   /* Remove from host HTAB */
pteg = (u32*)pte-slot;
-
pteg[0] = 0;
+
+   /* And make sure it's gone from the TLB too */
asm volatile (sync);
asm volatile (tlbie %0 : : r (pte-pte.eaddr) : memory);
asm volatile (sync);
asm volatile (tlbsync);
-
-   pte-host_va = 0;
-
-   if (pte-pte.may_write)
-   kvm_release_pfn_dirty(pte-pfn);
-   else
-   kvm_release_pfn_clean(pte-pfn);
-}
-
-void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong

[PATCH] KVM: PPC: Add generic hpte management functions

2010-06-25 Thread Alexander Graf

Currently the shadow paging code keeps an array of entries it knows about.
Whenever the guest invalidates an entry, we loop through that entry,
trying to invalidate matching parts.

While this is a really simple implementation, it is probably the most
ineffective one possible. So instead, let's keep an array of lists around
that are indexed by a hash. This way each PTE can be added by 4 list_add,
removed by 4 list_del invocations and the search only needs to loop through
entries that share the same hash.

This patch implements said lookup and exports generic functions that both
the 32-bit and 64-bit backend can use.

Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - remove hpte_all list
  - lookup all using vpte_long lists
  - decrease size of vpte_long hash
  - fix missing brackets
---
 arch/powerpc/kvm/book3s_mmu_hpte.c |  286 
 1 files changed, 286 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_mmu_hpte.c

diff --git a/arch/powerpc/kvm/book3s_mmu_hpte.c 
b/arch/powerpc/kvm/book3s_mmu_hpte.c
new file mode 100644
index 000..5826e61
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_mmu_hpte.c
@@ -0,0 +1,286 @@
+/*
+ * Copyright (C) 2010 SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ * Alexander Graf ag...@suse.de
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include linux/kvm_host.h
+#include linux/hash.h
+#include linux/slab.h
+
+#include asm/kvm_ppc.h
+#include asm/kvm_book3s.h
+#include asm/machdep.h
+#include asm/mmu_context.h
+#include asm/hw_irq.h
+
+#define PTE_SIZE   12
+
+/* #define DEBUG_MMU */
+
+#ifdef DEBUG_MMU
+#define dprintk_mmu(a, ...) printk(KERN_INFO a, __VA_ARGS__)
+#else
+#define dprintk_mmu(a, ...) do { } while(0)
+#endif
+
+static inline u64 kvmppc_mmu_hash_pte(u64 eaddr) {
+   return hash_64(eaddr  PTE_SIZE, HPTEG_HASH_BITS_PTE);
+}
+
+static inline u64 kvmppc_mmu_hash_vpte(u64 vpage) {
+   return hash_64(vpage  0xfULL, HPTEG_HASH_BITS_VPTE);
+}
+
+static inline u64 kvmppc_mmu_hash_vpte_long(u64 vpage) {
+   return hash_64((vpage  0xff000ULL)  12,
+  HPTEG_HASH_BITS_VPTE_LONG);
+}
+
+void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
+{
+   u64 index;
+
+   /* Add to ePTE list */
+   index = kvmppc_mmu_hash_pte(pte-pte.eaddr);
+   list_add(pte-list_pte, vcpu-arch.hpte_hash_pte[index]);
+
+   /* Add to vPTE list */
+   index = kvmppc_mmu_hash_vpte(pte-pte.vpage);
+   list_add(pte-list_vpte, vcpu-arch.hpte_hash_vpte[index]);
+
+   /* Add to vPTE_long list */
+   index = kvmppc_mmu_hash_vpte_long(pte-pte.vpage);
+   list_add(pte-list_vpte_long, vcpu-arch.hpte_hash_vpte_long[index]);
+}
+
+static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
+{
+   dprintk_mmu(KVM: Flushing SPT: 0x%lx (0x%llx) - 0x%llx\n,
+   pte-pte.eaddr, pte-pte.vpage, pte-host_va);
+
+   /* Different for 32 and 64 bit */
+   kvmppc_mmu_invalidate_pte(vcpu, pte);
+
+   if (pte-pte.may_write)
+   kvm_release_pfn_dirty(pte-pfn);
+   else
+   kvm_release_pfn_clean(pte-pfn);
+
+   list_del(pte-list_pte);
+   list_del(pte-list_vpte);
+   list_del(pte-list_vpte_long);
+
+   kmem_cache_free(vcpu-arch.hpte_cache, pte);
+}
+
+static void kvmppc_mmu_pte_flush_all(struct kvm_vcpu *vcpu)
+{
+   struct hpte_cache *pte, *tmp;
+   int i;
+
+   for (i = 0; i  HPTEG_HASH_NUM_VPTE_LONG; i++) {
+   struct list_head *list = vcpu-arch.hpte_hash_vpte_long[i];
+
+   list_for_each_entry_safe(pte, tmp, list, list_vpte_long) {
+   /* Jump over the helper entry */
+   if (pte-list_vpte_long == list)
+   continue;
+
+   invalidate_pte(vcpu, pte);
+   }
+   }
+}
+
+void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong ea_mask)
+{
+   u64 i;
+
+   dprintk_mmu(KVM: Flushing %d Shadow PTEs: 0x%lx  0x%lx\n,
+   vcpu-arch.hpte_cache_count, guest_ea, ea_mask);
+
+   guest_ea = ea_mask;
+
+   switch (ea_mask) {
+   case ~0xfffUL:
+   {
+   struct list_head *list;
+   struct hpte_cache *pte, *tmp;
+
+

Re: [PATCH] KVM: PPC: Add generic hpte management functions

2010-06-25 Thread Alexander Graf


On 26.06.2010, at 01:16, Alexander Graf wrote:

 Currently the shadow paging code keeps an array of entries it knows about.
 Whenever the guest invalidates an entry, we loop through that entry,
 trying to invalidate matching parts.
 
 While this is a really simple implementation, it is probably the most
 ineffective one possible. So instead, let's keep an array of lists around
 that are indexed by a hash. This way each PTE can be added by 4 list_add,
 removed by 4 list_del invocations and the search only needs to loop through
 entries that share the same hash.
 
 This patch implements said lookup and exports generic functions that both
 the 32-bit and 64-bit backend can use.

Yikes - I forgot -n.

This is patch 1/2.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/26] KVM: PPC: Introduce shared page

2010-06-25 Thread Alexander Graf

For transparent variable sharing between the hypervisor and guest, I introduce
a shared page. This shared page will contain all the registers the guest can
read and write safely without exiting guest context.

This patch only implements the stubs required for the basic structure of the
shared page. The actual register moving follows.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h |2 ++
 arch/powerpc/include/asm/kvm_para.h |5 +
 arch/powerpc/kernel/asm-offsets.c   |1 +
 arch/powerpc/kvm/44x.c  |7 +++
 arch/powerpc/kvm/book3s.c   |7 +++
 arch/powerpc/kvm/e500.c |7 +++
 6 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 895eb63..bca9391 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include linux/interrupt.h
 #include linux/types.h
 #include linux/kvm_types.h
+#include linux/kvm_para.h
 #include asm/kvm_asm.h
 
 #define KVM_MAX_VCPUS 1
@@ -289,6 +290,7 @@ struct kvm_vcpu_arch {
struct tasklet_struct tasklet;
u64 dec_jiffies;
unsigned long pending_exceptions;
+   struct kvm_vcpu_arch_shared *shared;
 
 #ifdef CONFIG_PPC_BOOK3S
struct kmem_cache *hpte_cache;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 2d48f6a..1485ba8 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -20,6 +20,11 @@
 #ifndef __POWERPC_KVM_PARA_H__
 #define __POWERPC_KVM_PARA_H__
 
+#include linux/types.h
+
+struct kvm_vcpu_arch_shared {
+};
+
 #ifdef __KERNEL__
 
 static inline int kvm_para_available(void)
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 496cc5b..944f593 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -400,6 +400,7 @@ int main(void)
DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6));
DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7));
DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid));
+   DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared));
 
/* book3s */
 #ifdef CONFIG_PPC_BOOK3S
diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c
index 73c0a3f..e7b1f3f 100644
--- a/arch/powerpc/kvm/44x.c
+++ b/arch/powerpc/kvm/44x.c
@@ -123,8 +123,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, 
unsigned int id)
if (err)
goto free_vcpu;
 
+   vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO);
+   if (!vcpu-arch.shared)
+   goto uninit_vcpu;
+
return vcpu;
 
+uninit_vcpu:
+   kvm_vcpu_uninit(vcpu);
 free_vcpu:
kmem_cache_free(kvm_vcpu_cache, vcpu_44x);
 out:
@@ -135,6 +141,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
 
+   free_page((unsigned long)vcpu-arch.shared);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu_44x);
 }
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 884d4a5..ba79b35 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1247,6 +1247,10 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm 
*kvm, unsigned int id)
if (err)
goto free_shadow_vcpu;
 
+   vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO);
+   if (!vcpu-arch.shared)
+   goto uninit_vcpu;
+
vcpu-arch.host_retip = kvm_return_point;
vcpu-arch.host_msr = mfmsr();
 #ifdef CONFIG_PPC_BOOK3S_64
@@ -1277,6 +1281,8 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, 
unsigned int id)
 
return vcpu;
 
+uninit_vcpu:
+   kvm_vcpu_uninit(vcpu);
 free_shadow_vcpu:
kfree(vcpu_book3s-shadow_vcpu);
 free_vcpu:
@@ -1289,6 +1295,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
 
+   free_page((unsigned long)vcpu-arch.shared);
kvm_vcpu_uninit(vcpu);
kfree(vcpu_book3s-shadow_vcpu);
vfree(vcpu_book3s);
diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
index e8a00b0..71750f2 100644
--- a/arch/powerpc/kvm/e500.c
+++ b/arch/powerpc/kvm/e500.c
@@ -117,8 +117,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, 
unsigned int id)
if (err)
goto uninit_vcpu;
 
+   vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO);
+   if (!vcpu-arch.shared)
+   goto uninit_tlb;
+
return vcpu;
 
+uninit_tlb:
+   kvmppc_e500_tlb_uninit(vcpu_e500);
 uninit_vcpu:
kvm_vcpu_uninit(vcpu);
 free_vcpu:
@@ -131,6 +137,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_e500 *vcpu_e500 =

[PATCH 09/26] KVM: PPC: Add PV guest scratch registers

2010-06-25 Thread Alexander Graf

While running in hooked code we need to store register contents out because
we must not clobber any registers.

So let's add some fields to the shared page we can just happily write to.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_para.h |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index d1fe9ae..edf8f83 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,9 @@
 #include linux/types.h
 
 struct kvm_vcpu_arch_shared {
+   __u64 scratch1;
+   __u64 scratch2;
+   __u64 scratch3;
__u64 critical; /* Guest may not get interrupts if == r1 */
__u64 sprg0;
__u64 sprg1;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/26] KVM: PPC: Tell guest about pending interrupts

2010-06-25 Thread Alexander Graf

When the guest turns on interrupts again, it needs to know if we have an
interrupt pending for it. Because if so, it should rather get out of guest
context and get the interrupt.

So we introduce a new field in the shared page that we use to tell the guest
that there's a pending interrupt lying around.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_para.h |1 +
 arch/powerpc/kvm/book3s.c   |7 +++
 arch/powerpc/kvm/booke.c|7 +++
 3 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index edf8f83..c7305d7 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -36,6 +36,7 @@ struct kvm_vcpu_arch_shared {
__u64 dar;
__u64 msr;
__u32 dsisr;
+   __u32 int_pending;  /* Tells the guest if we have an interrupt */
 };
 
 #define KVM_PVR_PARA   0x4b564d3f /* KVM? */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index f0e8047..e76c950 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -334,6 +334,7 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, 
unsigned int priority)
 void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 {
unsigned long *pending = vcpu-arch.pending_exceptions;
+   unsigned long old_pending = vcpu-arch.pending_exceptions;
unsigned int priority;
 
 #ifdef EXIT_DEBUG
@@ -353,6 +354,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 BITS_PER_BYTE * sizeof(*pending),
 priority + 1);
}
+
+   /* Tell the guest about our interrupt status */
+   if (*pending)
+   vcpu-arch.shared-int_pending = 1;
+   else if (old_pending)
+   vcpu-arch.shared-int_pending = 0;
 }
 
 void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 485f8fa..2229df9 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -221,6 +221,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
 void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 {
unsigned long *pending = vcpu-arch.pending_exceptions;
+   unsigned long old_pending = vcpu-arch.pending_exceptions;
unsigned int priority;
 
priority = __ffs(*pending);
@@ -232,6 +233,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 BITS_PER_BYTE * sizeof(*pending),
 priority + 1);
}
+
+   /* Tell the guest about our interrupt status */
+   if (*pending)
+   vcpu-arch.shared-int_pending = 1;
+   else if (old_pending)
+   vcpu-arch.shared-int_pending = 0;
 }
 
 /**
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/26] KVM: PPC: Add PV guest critical sections

2010-06-25 Thread Alexander Graf

When running in hooked code we need a way to disable interrupts without
clobbering any interrupts or exiting out to the hypervisor.

To achieve this, we have an additional critical field in the shared page. If
that field is equal to the r1 register of the guest, it tells the hypervisor
that we're in such a critical section and thus may not receive any interrupts.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_para.h |1 +
 arch/powerpc/kvm/book3s.c   |   15 +--
 arch/powerpc/kvm/booke.c|   12 
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index eaab306..d1fe9ae 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #include linux/types.h
 
 struct kvm_vcpu_arch_shared {
+   __u64 critical; /* Guest may not get interrupts if == r1 */
__u64 sprg0;
__u64 sprg1;
__u64 sprg2;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index e8001c5..f0e8047 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -251,14 +251,25 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, 
unsigned int priority)
int deliver = 1;
int vec = 0;
ulong flags = 0ULL;
+   ulong crit_raw = vcpu-arch.shared-critical;
+   ulong crit_r1 = kvmppc_get_gpr(vcpu, 1);
+   bool crit;
+
+   /* Truncate crit indicators in 32 bit mode */
+   if (!(vcpu-arch.shared-msr  MSR_SF)) {
+   crit_raw = 0x;
+   crit_r1 = 0x;
+   }
+
+   crit = (crit_raw == crit_r1);
 
switch (priority) {
case BOOK3S_IRQPRIO_DECREMENTER:
-   deliver = vcpu-arch.shared-msr  MSR_EE;
+   deliver = (vcpu-arch.shared-msr  MSR_EE)  !crit;
vec = BOOK3S_INTERRUPT_DECREMENTER;
break;
case BOOK3S_IRQPRIO_EXTERNAL:
-   deliver = vcpu-arch.shared-msr  MSR_EE;
+   deliver = (vcpu-arch.shared-msr  MSR_EE)  !crit;
vec = BOOK3S_INTERRUPT_EXTERNAL;
break;
case BOOK3S_IRQPRIO_SYSTEM_RESET:
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index e7d1216..485f8fa 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -147,6 +147,17 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
int allowed = 0;
ulong uninitialized_var(msr_mask);
bool update_esr = false, update_dear = false;
+   ulong crit_raw = vcpu-arch.shared-critical;
+   ulong crit_r1 = kvmppc_get_gpr(vcpu, 1);
+   bool crit;
+
+   /* Truncate crit indicators in 32 bit mode */
+   if (!(vcpu-arch.shared-msr  MSR_SF)) {
+   crit_raw = 0x;
+   crit_r1 = 0x;
+   }
+
+   crit = (crit_raw == crit_r1);
 
switch (priority) {
case BOOKE_IRQPRIO_DTLB_MISS:
@@ -181,6 +192,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
case BOOKE_IRQPRIO_DECREMENTER:
case BOOKE_IRQPRIO_FIT:
allowed = vcpu-arch.shared-msr  MSR_EE;
+   allowed = allowed  !crit;
msr_mask = MSR_CE|MSR_ME|MSR_DE;
break;
case BOOKE_IRQPRIO_DEBUG:
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/26] KVM: PPC: Convert SPRG[0-4] to shared page

2010-06-25 Thread Alexander Graf

When in kernel mode there are 4 additional registers available that are
simple data storage. Instead of exiting to the hypervisor to read and
write those, we can just share them with the guest using the page.

This patch converts all users of the current field to the shared page.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h |4 
 arch/powerpc/include/asm/kvm_para.h |4 
 arch/powerpc/kvm/book3s.c   |   16 
 arch/powerpc/kvm/booke.c|   16 
 arch/powerpc/kvm/emulate.c  |   24 
 5 files changed, 36 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 6bcf62f..83c45ea 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -216,10 +216,6 @@ struct kvm_vcpu_arch {
ulong guest_owned_ext;
 #endif
u32 mmucr;
-   ulong sprg0;
-   ulong sprg1;
-   ulong sprg2;
-   ulong sprg3;
ulong sprg4;
ulong sprg5;
ulong sprg6;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index d7fc6c2..e402999 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,10 @@
 #include linux/types.h
 
 struct kvm_vcpu_arch_shared {
+   __u64 sprg0;
+   __u64 sprg1;
+   __u64 sprg2;
+   __u64 sprg3;
__u64 srr0;
__u64 srr1;
__u64 dar;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index b144697..5a6f055 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1062,10 +1062,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
regs-srr0 = vcpu-arch.shared-srr0;
regs-srr1 = vcpu-arch.shared-srr1;
regs-pid = vcpu-arch.pid;
-   regs-sprg0 = vcpu-arch.sprg0;
-   regs-sprg1 = vcpu-arch.sprg1;
-   regs-sprg2 = vcpu-arch.sprg2;
-   regs-sprg3 = vcpu-arch.sprg3;
+   regs-sprg0 = vcpu-arch.shared-sprg0;
+   regs-sprg1 = vcpu-arch.shared-sprg1;
+   regs-sprg2 = vcpu-arch.shared-sprg2;
+   regs-sprg3 = vcpu-arch.shared-sprg3;
regs-sprg5 = vcpu-arch.sprg4;
regs-sprg6 = vcpu-arch.sprg5;
regs-sprg7 = vcpu-arch.sprg6;
@@ -1088,10 +1088,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
kvmppc_set_msr(vcpu, regs-msr);
vcpu-arch.shared-srr0 = regs-srr0;
vcpu-arch.shared-srr1 = regs-srr1;
-   vcpu-arch.sprg0 = regs-sprg0;
-   vcpu-arch.sprg1 = regs-sprg1;
-   vcpu-arch.sprg2 = regs-sprg2;
-   vcpu-arch.sprg3 = regs-sprg3;
+   vcpu-arch.shared-sprg0 = regs-sprg0;
+   vcpu-arch.shared-sprg1 = regs-sprg1;
+   vcpu-arch.shared-sprg2 = regs-sprg2;
+   vcpu-arch.shared-sprg3 = regs-sprg3;
vcpu-arch.sprg5 = regs-sprg4;
vcpu-arch.sprg6 = regs-sprg5;
vcpu-arch.sprg7 = regs-sprg6;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 8b546fe..984c461 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -495,10 +495,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
regs-srr0 = vcpu-arch.shared-srr0;
regs-srr1 = vcpu-arch.shared-srr1;
regs-pid = vcpu-arch.pid;
-   regs-sprg0 = vcpu-arch.sprg0;
-   regs-sprg1 = vcpu-arch.sprg1;
-   regs-sprg2 = vcpu-arch.sprg2;
-   regs-sprg3 = vcpu-arch.sprg3;
+   regs-sprg0 = vcpu-arch.shared-sprg0;
+   regs-sprg1 = vcpu-arch.shared-sprg1;
+   regs-sprg2 = vcpu-arch.shared-sprg2;
+   regs-sprg3 = vcpu-arch.shared-sprg3;
regs-sprg5 = vcpu-arch.sprg4;
regs-sprg6 = vcpu-arch.sprg5;
regs-sprg7 = vcpu-arch.sprg6;
@@ -521,10 +521,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
kvmppc_set_msr(vcpu, regs-msr);
vcpu-arch.shared-srr0 = regs-srr0;
vcpu-arch.shared-srr1 = regs-srr1;
-   vcpu-arch.sprg0 = regs-sprg0;
-   vcpu-arch.sprg1 = regs-sprg1;
-   vcpu-arch.sprg2 = regs-sprg2;
-   vcpu-arch.sprg3 = regs-sprg3;
+   vcpu-arch.shared-sprg0 = regs-sprg0;
+   vcpu-arch.shared-sprg1 = regs-sprg1;
+   vcpu-arch.shared-sprg2 = regs-sprg2;
+   vcpu-arch.shared-sprg3 = regs-sprg3;
vcpu-arch.sprg5 = regs-sprg4;
vcpu-arch.sprg6 = regs-sprg5;
vcpu-arch.sprg7 = regs-sprg6;
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index ad0fa4f..454869b 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -263,13 +263,17 @@ int kvmppc_emulate_instruction(struct kvm_run *run, 
struct kvm_vcpu *vcpu)
kvmppc_set_gpr(vcpu, rt, get_tb()); break;
 
case SPRN_SPRG0:
-   kvmppc_set_gpr(vcpu, rt,

[PATCH 26/26] KVM: PPC: Add Documentation about PV interface

2010-06-25 Thread Alexander Graf

We just introduced a new PV interface that screams for documentation. So here
it is - a shiny new and awesome text file describing the internal works of
the PPC KVM paravirtual interface.

Signed-off-by: Alexander Graf ag...@suse.de
---
 Documentation/kvm/ppc-pv.txt |  164 ++
 1 files changed, 164 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/kvm/ppc-pv.txt

diff --git a/Documentation/kvm/ppc-pv.txt b/Documentation/kvm/ppc-pv.txt
new file mode 100644
index 000..7cbcd51
--- /dev/null
+++ b/Documentation/kvm/ppc-pv.txt
@@ -0,0 +1,164 @@
+The PPC KVM paravirtual interface
+=
+
+The basic execution principle by which KVM on PowerPC works is to run all 
kernel
+space code in PR=1 which is user space. This way we trap all privileged
+instructions and can emulate them accordingly.
+
+Unfortunately that is also the downfall. There are quite some privileged
+instructions that needlessly return us to the hypervisor even though they
+could be handled differently.
+
+This is what the PPC PV interface helps with. It takes privileged instructions
+and transforms them into unprivileged ones with some help from the hypervisor.
+This cuts down virtualization costs by about 50% on some of my benchmarks.
+
+The code for that interface can be found in arch/powerpc/kernel/kvm*
+
+Querying for existence
+==
+
+To find out if we're running on KVM or not, we overlay the PVR register. 
Usually
+the PVR register contains an id that identifies your CPU type. If, however, you
+pass KVM_PVR_PARA in the register that you want the PVR result in, the register
+still contains KVM_PVR_PARA after the mfpvr call.
+
+   LOAD_REG_IMM(r5, KVM_PVR_PARA)
+   mfpvr   r5
+   [r5 still contains KVM_PVR_PARA]
+
+Once determined to run under a PV capable KVM, you can now use hypercalls as
+described below.
+
+PPC hypercalls
+==
+
+The only viable ways to reliably get from guest context to host context are:
+
+   1) Call an invalid instruction
+   2) Call the sc instruction with a parameter to sc
+   3) Call the sc instruction with parameters in GPRs
+
+Method 1 is always a bad idea. Invalid instructions can be replaced later on
+by valid instructions, rendering the interface broken.
+
+Method 2 also has downfalls. If the parameter to sc is != 0 the spec is
+rather unclear if the sc is targeted directly for the hypervisor or the
+supervisor. It would also require that we read the syscall issuing instruction
+every time a syscall is issued, slowing down guest syscalls.
+
+Method 3 is what KVM uses. We pass magic constants (KVM_SC_MAGIC_R3 and
+KVM_SC_MAGIC_R4) in r3 and r4 respectively. If a syscall instruction with these
+magic values arrives from the guest's kernel mode, we take the syscall as a
+hypercall.
+
+The parameters are as follows:
+
+   r3  KVM_SC_MAGIC_R3
+   r4  KVM_SC_MAGIC_R4
+   r5  Hypercall number
+   r6  First parameter
+   r7  Second parameter
+   r8  Third parameter
+   r9  Fourth parameter
+
+Hypercall definitions are shared in generic code, so the same hypercall numbers
+apply for x86 and powerpc alike.
+
+The magic page
+==
+
+To enable communication between the hypervisor and guest there is a new shared
+page that contains parts of supervisor visible register state. The guest can
+map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
+
+With this hypercall issued the guest always gets the magic page mapped at the
+desired location in effective and physical address space. For now, we always
+map the page to -4096. This way we can access it using absolute load and store
+functions. The following instruction reads the first field of the magic page:
+
+   ld  rX, -4096(0)
+
+The interface is designed to be extensible should there be need later to add
+additional registers to the magic page. If you add fields to the magic page,
+also define a new hypercall feature to indicate that the host can give you more
+registers. Only if the host supports the additional features, make use of them.
+
+The magic page has the following layout as described in
+arch/powerpc/include/asm/kvm_para.h:
+
+struct kvm_vcpu_arch_shared {
+   __u64 scratch1;
+   __u64 scratch2;
+   __u64 scratch3;
+   __u64 critical; /* Guest may not get interrupts if == r1 */
+   __u64 sprg0;
+   __u64 sprg1;
+   __u64 sprg2;
+   __u64 sprg3;
+   __u64 srr0;
+   __u64 srr1;
+   __u64 dar;
+   __u64 msr;
+   __u32 dsisr;
+   __u32 int_pending;  /* Tells the guest if we have an interrupt */
+};
+
+Additions to the page must only occur at the end. Struct fields are always 32
+bit aligned.
+
+Patched instructions
+
+
+The ld and std instructions are transormed to lwz and stw instructions

[PATCH 21/26] KVM: PPC: Introduce kvm_tmp framework

2010-06-25 Thread Alexander Graf

We will soon require more sophisticated methods to replace single instructions
with multiple instructions. We do that by branching to a memory region where we
write replacement code for the instruction to.

This region needs to be within 32 MB of the patched instruction though, because
that's the furthest we can jump with immediate branches.

So we keep 1MB of free space around in bss. After we're done initing we can just
tell the mm system that the unused pages are free, but until then we have enough
space to fit all our code in.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm.c |   41 +++--
 1 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index b091f94..7e8fe6f 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -64,6 +64,8 @@
 #define KVM_INST_TLBSYNC   0x7c00046c
 
 static bool kvm_patching_worked = true;
+static char kvm_tmp[1024 * 1024];
+static int kvm_tmp_index;
 
 static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt)
 {
@@ -98,6 +100,23 @@ static void kvm_patch_ins_nop(u32 *inst)
*inst = KVM_INST_NOP;
 }
 
+static u32 *kvm_alloc(int len)
+{
+   u32 *p;
+
+   if ((kvm_tmp_index + len)  ARRAY_SIZE(kvm_tmp)) {
+   printk(KERN_ERR KVM: No more space (%d + %d)\n,
+   kvm_tmp_index, len);
+   kvm_patching_worked = false;
+   return NULL;
+   }
+
+   p = (void*)kvm_tmp[kvm_tmp_index];
+   kvm_tmp_index += len;
+
+   return p;
+}
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -197,12 +216,27 @@ static void kvm_use_magic_page(void)
kvm_check_ins(p);
 }
 
+static void kvm_free_tmp(void)
+{
+   unsigned long start, end;
+
+   start = (ulong)kvm_tmp[kvm_tmp_index + (PAGE_SIZE - 1)]  PAGE_MASK;
+   end = (ulong)kvm_tmp[ARRAY_SIZE(kvm_tmp)]  PAGE_MASK;
+
+   /* Free the tmp space we don't need */
+   for (; start  end; start += PAGE_SIZE) {
+   ClearPageReserved(virt_to_page(start));
+   init_page_count(virt_to_page(start));
+   free_page(start);
+   totalram_pages++;
+   }
+}
+
 static int __init kvm_guest_init(void)
 {
-   char *p;
 
if (!kvm_para_available())
-   return 0;
+   goto free_tmp;
 
if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE))
kvm_use_magic_page();
@@ -210,6 +244,9 @@ static int __init kvm_guest_init(void)
printk(KERN_INFO KVM: Live patching for a fast VM %s\n,
 kvm_patching_worked ? worked : failed);
 
+free_tmp:
+   kvm_free_tmp();
+
return 0;
 }
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/26] KVM: PPC: Expose magic page support to guest

2010-06-25 Thread Alexander Graf

Now that we have the shared page in place and the MMU code knows about
the magic page, we can expose that capability to the guest!

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_para.h |2 ++
 arch/powerpc/kvm/powerpc.c  |   11 +++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index c7305d7..9f8efa4 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -43,6 +43,8 @@ struct kvm_vcpu_arch_shared {
 #define KVM_SC_MAGIC_R30x4b564d52 /* KVMR */
 #define KVM_SC_MAGIC_R40x554c455a /* ULEZ */
 
+#define KVM_FEATURE_MAGIC_PAGE 1
+
 #ifdef __KERNEL__
 
 static inline int kvm_para_available(void)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index fe7a1c8..1d28a81 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -60,8 +60,19 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
}
 
switch (nr) {
+   case KVM_HC_PPC_MAP_MAGIC_PAGE:
+   {
+   vcpu-arch.magic_page_pa = param1;
+   vcpu-arch.magic_page_ea = param2;
+
+   r = 0;
+   break;
+   }
case KVM_HC_FEATURES:
r = 0;
+#if !defined(CONFIG_KVM_440) /* XXX missing bits on 440 */
+   r |= (1  KVM_FEATURE_MAGIC_PAGE);
+#endif
break;
default:
r = -KVM_ENOSYS;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/26] KVM: PPC: Magic Page BookE support

2010-06-25 Thread Alexander Graf

As we now have Book3s support for the magic page, we also need BookE to
join in on the party.

This patch implements generic magic page logic for BookE and specific
TLB logic for e500. I didn't have any 440 around, so I didn't dare to
blindly try and write up broken code.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/booke.c|   29 +
 arch/powerpc/kvm/e500_tlb.c |   19 +--
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 2229df9..7957aa4 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -241,6 +241,31 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
vcpu-arch.shared-int_pending = 0;
 }
 
+/* Check if a DTLB miss was on the magic page. Returns !0 if so. */
+int kvmppc_dtlb_magic_page(struct kvm_vcpu *vcpu, ulong eaddr)
+{
+   ulong mp_ea = vcpu-arch.magic_page_ea;
+   ulong gpaddr = vcpu-arch.magic_page_pa;
+   int gtlb_index = 11 | (1  16); /* Random number in TLB1 */
+
+   /* Check for existence of magic page */
+   if(likely(!mp_ea))
+   return 0;
+
+   /* Check if we're on the magic page */
+   if(likely((eaddr  12) != (mp_ea  12)))
+   return 0;
+
+   /* Don't map in user mode */
+   if(vcpu-arch.shared-msr  MSR_PR)
+   return 0;
+
+   kvmppc_mmu_map(vcpu, vcpu-arch.magic_page_ea, gpaddr, gtlb_index);
+   kvmppc_account_exit(vcpu, DTLB_VIRT_MISS_EXITS);
+
+   return 1;
+}
+
 /**
  * kvmppc_handle_exit
  *
@@ -308,6 +333,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
r = RESUME_HOST;
break;
case EMULATE_FAIL:
+   case EMULATE_DO_MMIO:
/* XXX Deliver Program interrupt to guest. */
printk(KERN_CRIT %s: emulation at %lx failed (%08x)\n,
   __func__, vcpu-arch.pc, vcpu-arch.last_inst);
@@ -377,6 +403,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
gpa_t gpaddr;
gfn_t gfn;
 
+   if (kvmppc_dtlb_magic_page(vcpu, eaddr))
+   break;
+
/* Check the guest TLB. */
gtlb_index = kvmppc_mmu_dtlb_index(vcpu, eaddr);
if (gtlb_index  0) {
diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index 66845a5..f5582ca 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -295,9 +295,22 @@ static inline void kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
struct page *new_page;
struct tlbe *stlbe;
hpa_t hpaddr;
+   u32 mas2 = gtlbe-mas2;
+   u32 mas3 = gtlbe-mas3;
 
stlbe = vcpu_e500-shadow_tlb[tlbsel][esel];
 
+   if ((vcpu_e500-vcpu.arch.magic_page_ea) 
+   ((vcpu_e500-vcpu.arch.magic_page_pa  PAGE_SHIFT) == gfn) 
+   !(vcpu_e500-vcpu.arch.shared-msr  MSR_PR)) {
+   mas2 = 0;
+   mas3 = E500_TLB_SUPER_PERM_MASK;
+   hpaddr = virt_to_phys(vcpu_e500-vcpu.arch.shared);
+   new_page = pfn_to_page(hpaddr  PAGE_SHIFT);
+   get_page(new_page);
+   goto mapped;
+   }
+
/* Get reference to new page. */
new_page = gfn_to_page(vcpu_e500-vcpu.kvm, gfn);
if (is_error_page(new_page)) {
@@ -305,6 +318,8 @@ static inline void kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
kvm_release_page_clean(new_page);
return;
}
+
+mapped:
hpaddr = page_to_phys(new_page);
 
/* Drop reference to old page. */
@@ -316,10 +331,10 @@ static inline void kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
stlbe-mas1 = MAS1_TSIZE(BOOK3E_PAGESZ_4K)
| MAS1_TID(get_tlb_tid(gtlbe)) | MAS1_TS | MAS1_VALID;
stlbe-mas2 = (gvaddr  MAS2_EPN)
-   | e500_shadow_mas2_attrib(gtlbe-mas2,
+   | e500_shadow_mas2_attrib(mas2,
vcpu_e500-vcpu.arch.shared-msr  MSR_PR);
stlbe-mas3 = (hpaddr  MAS3_RPN)
-   | e500_shadow_mas3_attrib(gtlbe-mas3,
+   | e500_shadow_mas3_attrib(mas3,
vcpu_e500-vcpu.arch.shared-msr  MSR_PR);
stlbe-mas7 = (hpaddr  32)  MAS7_RPN;
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 25/26] KVM: PPC: PV wrteei

2010-06-25 Thread Alexander Graf

On BookE the preferred way to write the EE bit is the wrteei instruction. It
already encodes the EE bit in the instruction.

So in order to get BookE some speedups as well, let's also PV'nize thati
instruction.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm.c  |   50 
 arch/powerpc/kernel/kvm_emul.S |   41 
 2 files changed, 91 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 3557bc8..85e2163 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -66,6 +66,9 @@
 #define KVM_INST_MTMSRD_L1 0x7c010164
 #define KVM_INST_MTMSR 0x7c000124
 
+#define KVM_INST_WRTEEI_0  0x7c000146
+#define KVM_INST_WRTEEI_1  0x7c008146
+
 static bool kvm_patching_worked = true;
 static char kvm_tmp[1024 * 1024];
 static int kvm_tmp_index;
@@ -200,6 +203,47 @@ static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt)
*inst = KVM_INST_B | (distance_start  KVM_INST_B_MASK);
 }
 
+#ifdef CONFIG_BOOKE
+
+extern u32 kvm_emulate_wrteei_branch_offs;
+extern u32 kvm_emulate_wrteei_ee_offs;
+extern u32 kvm_emulate_wrteei_len;
+extern u32 kvm_emulate_wrteei[];
+
+static void kvm_patch_ins_wrteei(u32 *inst)
+{
+   u32 *p;
+   int distance_start;
+   int distance_end;
+   ulong next_inst;
+
+   p = kvm_alloc(kvm_emulate_wrteei_len * 4);
+   if (!p)
+   return;
+
+   /* Find out where we are and put everything there */
+   distance_start = (ulong)p - (ulong)inst;
+   next_inst = ((ulong)inst + 4);
+   distance_end = next_inst - (ulong)p[kvm_emulate_wrteei_branch_offs];
+
+   /* Make sure we only write valid b instructions */
+   if (distance_start  KVM_INST_B_MAX) {
+   kvm_patching_worked = false;
+   return;
+   }
+
+   /* Modify the chunk to fit the invocation */
+   memcpy(p, kvm_emulate_wrteei, kvm_emulate_wrteei_len * 4);
+   p[kvm_emulate_wrteei_branch_offs] |= distance_end  KVM_INST_B_MASK;
+   p[kvm_emulate_wrteei_ee_offs] |= (*inst  MSR_EE);
+   flush_icache_range((ulong)p, (ulong)p + kvm_emulate_wrteei_len * 4);
+
+   /* Patch the invocation */
+   *inst = KVM_INST_B | (distance_start  KVM_INST_B_MASK);
+}
+
+#endif
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -289,6 +333,12 @@ static void kvm_check_ins(u32 *inst)
}
 
switch (_inst) {
+#ifdef CONFIG_BOOKE
+   case KVM_INST_WRTEEI_0:
+   case KVM_INST_WRTEEI_1:
+   kvm_patch_ins_wrteei(inst);
+   break;
+#endif
}
 
flush_icache_range((ulong)inst, (ulong)inst + 4);
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index ccf5a42..b79b9de 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -194,3 +194,44 @@ kvm_emulate_mtmsr_orig_ins_offs:
 .global kvm_emulate_mtmsr_len
 kvm_emulate_mtmsr_len:
.long (kvm_emulate_mtmsr_end - kvm_emulate_mtmsr) / 4
+
+
+
+.global kvm_emulate_wrteei
+kvm_emulate_wrteei:
+
+   SCRATCH_SAVE
+
+   /* Fetch old MSR in r31 */
+   LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+   /* Remove MSR_EE from old MSR */
+   li  r30, 0
+   ori r30, r30, MSR_EE
+   andcr31, r31, r30
+
+   /* OR new MSR_EE onto the old MSR */
+kvm_emulate_wrteei_ee:
+   ori r31, r31, 0
+
+   /* Write new MSR value back */
+   STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+   SCRATCH_RESTORE
+
+   /* Go back to caller */
+kvm_emulate_wrteei_branch:
+   b   .
+kvm_emulate_wrteei_end:
+
+.global kvm_emulate_wrteei_branch_offs
+kvm_emulate_wrteei_branch_offs:
+   .long (kvm_emulate_wrteei_branch - kvm_emulate_wrteei) / 4
+
+.global kvm_emulate_wrteei_ee_offs
+kvm_emulate_wrteei_ee_offs:
+   .long (kvm_emulate_wrteei_ee - kvm_emulate_wrteei) / 4
+
+.global kvm_emulate_wrteei_len
+kvm_emulate_wrteei_len:
+   .long (kvm_emulate_wrteei_end - kvm_emulate_wrteei) / 4
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/26] KVM: PPC: Make RMO a define

2010-06-25 Thread Alexander Graf

On PowerPC it's very normal to not support all of the physical RAM in real mode.
To check if we're matching on the shared page or not, we need to know the limits
so we can restrain ourselves to that range.

So let's make it a define instead of open-coding it. And while at it, let's also
increase it.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h |2 ++
 arch/powerpc/kvm/book3s.c   |4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 83c45ea..e35c1ac 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -47,6 +47,8 @@
 #define HPTEG_HASH_NUM_VPTE(1  HPTEG_HASH_BITS_VPTE)
 #define HPTEG_HASH_NUM_VPTE_LONG   (1  HPTEG_HASH_BITS_VPTE_LONG)
 
+#define KVM_RMO0x0fffULL
+
 struct kvm;
 struct kvm_run;
 struct kvm_vcpu;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index e76c950..2f55aa5 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -462,7 +462,7 @@ static int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, 
bool data,
r = vcpu-arch.mmu.xlate(vcpu, eaddr, pte, data);
} else {
pte-eaddr = eaddr;
-   pte-raddr = eaddr  0x;
+   pte-raddr = eaddr  KVM_RMO;
pte-vpage = VSID_REAL | eaddr  12;
pte-may_read = true;
pte-may_write = true;
@@ -576,7 +576,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
pte.may_execute = true;
pte.may_read = true;
pte.may_write = true;
-   pte.raddr = eaddr  0x;
+   pte.raddr = eaddr  KVM_RMO;
pte.eaddr = eaddr;
pte.vpage = eaddr  12;
}
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 20/26] KVM: PPC: PV tlbsync to nop

2010-06-25 Thread Alexander Graf

With our current MMU scheme we don't need to know about the tlbsync instruction.
So we can just nop it out.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index b165b20..b091f94 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -61,6 +61,8 @@
 #define KVM_INST_MTSPR_DAR 0x7c1303a6
 #define KVM_INST_MTSPR_DSISR   0x7c1203a6
 
+#define KVM_INST_TLBSYNC   0x7c00046c
+
 static bool kvm_patching_worked = true;
 
 static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt)
@@ -91,6 +93,11 @@ static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt)
*inst = KVM_INST_STW | rt | (addr  0xfffc);
 }
 
+static void kvm_patch_ins_nop(u32 *inst)
+{
+   *inst = KVM_INST_NOP;
+}
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -159,6 +166,11 @@ static void kvm_check_ins(u32 *inst)
case KVM_INST_MTSPR_DSISR:
kvm_patch_ins_stw(inst, magic_var(dsisr), inst_rt);
break;
+
+   /* Nops */
+   case KVM_INST_TLBSYNC:
+   kvm_patch_ins_nop(inst);
+   break;
}
 
switch (_inst) {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 24/26] KVM: PPC: PV mtmsrd L=0 and mtmsr

2010-06-25 Thread Alexander Graf

There is also a form of mtmsr where all bits need to be addressed. While the
PPC64 Linux kernel behaves resonably well here, the PPC32 one never uses the
L=1 form but does mtmsr even for simple things like only changing EE.

So we need to hook into that one as well and check for a mask of bits that we
deem safe to change from within guest context.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm.c  |   51 
 arch/powerpc/kernel/kvm_emul.S |   84 
 2 files changed, 135 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 71153d0..3557bc8 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -62,7 +62,9 @@
 #define KVM_INST_MTSPR_DSISR   0x7c1203a6
 
 #define KVM_INST_TLBSYNC   0x7c00046c
+#define KVM_INST_MTMSRD_L0 0x7c000164
 #define KVM_INST_MTMSRD_L1 0x7c010164
+#define KVM_INST_MTMSR 0x7c000124
 
 static bool kvm_patching_worked = true;
 static char kvm_tmp[1024 * 1024];
@@ -155,6 +157,49 @@ static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt)
*inst = KVM_INST_B | (distance_start  KVM_INST_B_MASK);
 }
 
+extern u32 kvm_emulate_mtmsr_branch_offs;
+extern u32 kvm_emulate_mtmsr_reg1_offs;
+extern u32 kvm_emulate_mtmsr_reg2_offs;
+extern u32 kvm_emulate_mtmsr_reg3_offs;
+extern u32 kvm_emulate_mtmsr_orig_ins_offs;
+extern u32 kvm_emulate_mtmsr_len;
+extern u32 kvm_emulate_mtmsr[];
+
+static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt)
+{
+   u32 *p;
+   int distance_start;
+   int distance_end;
+   ulong next_inst;
+
+   p = kvm_alloc(kvm_emulate_mtmsr_len * 4);
+   if (!p)
+   return;
+
+   /* Find out where we are and put everything there */
+   distance_start = (ulong)p - (ulong)inst;
+   next_inst = ((ulong)inst + 4);
+   distance_end = next_inst - (ulong)p[kvm_emulate_mtmsr_branch_offs];
+
+   /* Make sure we only write valid b instructions */
+   if (distance_start  KVM_INST_B_MAX) {
+   kvm_patching_worked = false;
+   return;
+   }
+
+   /* Modify the chunk to fit the invocation */
+   memcpy(p, kvm_emulate_mtmsr, kvm_emulate_mtmsr_len * 4);
+   p[kvm_emulate_mtmsr_branch_offs] |= distance_end  KVM_INST_B_MASK;
+   p[kvm_emulate_mtmsr_reg1_offs] |= rt;
+   p[kvm_emulate_mtmsr_reg2_offs] |= rt;
+   p[kvm_emulate_mtmsr_reg3_offs] |= rt;
+   p[kvm_emulate_mtmsr_orig_ins_offs] = *inst;
+   flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsr_len * 4);
+
+   /* Patch the invocation */
+   *inst = KVM_INST_B | (distance_start  KVM_INST_B_MASK);
+}
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -235,6 +280,12 @@ static void kvm_check_ins(u32 *inst)
if (get_rt(inst_rt)  30)
kvm_patch_ins_mtmsrd(inst, inst_rt);
break;
+   case KVM_INST_MTMSR:
+   case KVM_INST_MTMSRD_L0:
+   /* We use r30 and r31 during the hook */
+   if (get_rt(inst_rt)  30)
+   kvm_patch_ins_mtmsr(inst, inst_rt);
+   break;
}
 
switch (_inst) {
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index 25e6683..ccf5a42 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -110,3 +110,87 @@ kvm_emulate_mtmsrd_reg_offs:
 .global kvm_emulate_mtmsrd_len
 kvm_emulate_mtmsrd_len:
.long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4
+
+
+#define MSR_SAFE_BITS (MSR_EE | MSR_CE | MSR_ME | MSR_RI)
+#define MSR_CRITICAL_BITS ~MSR_SAFE_BITS
+
+.global kvm_emulate_mtmsr
+kvm_emulate_mtmsr:
+
+   SCRATCH_SAVE
+
+   /* Fetch old MSR in r31 */
+   LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+   /* Find the changed bits between old and new MSR */
+kvm_emulate_mtmsr_reg1:
+   xor r31, r0, r31
+
+   /* Check if we need to really do mtmsr */
+   LOAD_REG_IMMEDIATE(r30, MSR_CRITICAL_BITS)
+   and.r31, r31, r30
+
+   /* No critical bits changed? Maybe we can stay in the guest. */
+   beq maybe_stay_in_guest
+
+do_mtmsr:
+
+   SCRATCH_RESTORE
+
+   /* Just fire off the mtmsr if it's critical */
+kvm_emulate_mtmsr_orig_ins:
+   mtmsr   r0
+
+   b   kvm_emulate_mtmsr_branch
+
+maybe_stay_in_guest:
+
+   /* Check if we have to fetch an interrupt */
+   lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
+   cmpwi   r31, 0
+   beq+no_mtmsr
+
+   /* Check if we may trigger an interrupt */
+kvm_emulate_mtmsr_reg2:
+   andi.   r31, r0, MSR_EE
+   beq no_mtmsr
+
+   b   do_mtmsr
+
+no_mtmsr:
+
+   /* Put MSR into magic page because we don't call mtmsr */
+kvm_emulate_mtmsr_reg3:
+   STL64(r0, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+   SCRATCH_RESTORE
+
+   /* Go back to

[PATCH 18/26] KVM: PPC: KVM PV guest stubs

2010-06-25 Thread Alexander Graf

We will soon start and replace instructions from the text section with
other, paravirtualized versions. To ease the readability of those patches
I split out the generic looping and magic page mapping code out.

This patch still only contains stubs. But at least it loops through the
text section :).

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm.c |   59 +
 1 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 2d8dd73..d873bc6 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -32,3 +32,62 @@
 #define KVM_MAGIC_PAGE (-4096L)
 #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
 
+static bool kvm_patching_worked = true;
+
+static void kvm_map_magic_page(void *data)
+{
+   kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
+  KVM_MAGIC_PAGE,  /* Physical Address */
+  KVM_MAGIC_PAGE); /* Effective Address */
+}
+
+static void kvm_check_ins(u32 *inst)
+{
+   u32 _inst = *inst;
+   u32 inst_no_rt = _inst  ~KVM_MASK_RT;
+   u32 inst_rt = _inst  KVM_MASK_RT;
+
+   switch (inst_no_rt) {
+   }
+
+   switch (_inst) {
+   }
+
+   flush_icache_range((ulong)inst, (ulong)inst + 4);
+}
+
+static void kvm_use_magic_page(void)
+{
+   u32 *p;
+   u32 *start, *end;
+
+   /* Tell the host to map the magic page to -4096 on all CPUs */
+
+   on_each_cpu(kvm_map_magic_page, NULL, 1);
+
+   /* Now loop through all code and find instructions */
+
+   start = (void*)_stext;
+   end = (void*)_etext;
+
+   for (p = start; p  end; p++)
+   kvm_check_ins(p);
+}
+
+static int __init kvm_guest_init(void)
+{
+   char *p;
+
+   if (!kvm_para_available())
+   return 0;
+
+   if (kvm_para_has_feature(KVM_FEATURE_MAGIC_PAGE))
+   kvm_use_magic_page();
+
+   printk(KERN_INFO KVM: Live patching for a fast VM %s\n,
+kvm_patching_worked ? worked : failed);
+
+   return 0;
+}
+
+postcore_initcall(kvm_guest_init);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/26] KVM: PPC: First magic page steps

2010-06-25 Thread Alexander Graf

We will be introducing a method to project the shared page in guest context.
As soon as we're talking about this coupling, the shared page is colled magic
page.

This patch introduces simple defines, so the follow-up patches are easier to
read.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h |2 ++
 include/linux/kvm_para.h|1 +
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index e35c1ac..5f8c214 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -285,6 +285,8 @@ struct kvm_vcpu_arch {
u64 dec_jiffies;
unsigned long pending_exceptions;
struct kvm_vcpu_arch_shared *shared;
+   unsigned long magic_page_pa; /* phys addr to map the magic page to */
+   unsigned long magic_page_ea; /* effect. addr to map the magic page to */
 
 #ifdef CONFIG_PPC_BOOK3S
struct kmem_cache *hpte_cache;
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 3b8080e..ac2015a 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -18,6 +18,7 @@
 #define KVM_HC_VAPIC_POLL_IRQ  1
 #define KVM_HC_MMU_OP  2
 #define KVM_HC_FEATURES3
+#define KVM_HC_PPC_MAP_MAGIC_PAGE  4
 
 /*
  * hypercalls use architecture specific
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 23/26] KVM: PPC: PV mtmsrd L=1

2010-06-25 Thread Alexander Graf

The PowerPC ISA has a special instruction for mtmsr that only changes the EE
and RI bits, namely the L=1 form.

Since that one is reasonably often occuring and simple to implement, let's
go with this first. Writing EE=0 is always just a store. Doing EE=1 also
requires us to check for pending interrupts and if necessary exit back to the
hypervisor.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm.c  |   45 
 arch/powerpc/kernel/kvm_emul.S |   56 
 2 files changed, 101 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 7e8fe6f..71153d0 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -62,6 +62,7 @@
 #define KVM_INST_MTSPR_DSISR   0x7c1203a6
 
 #define KVM_INST_TLBSYNC   0x7c00046c
+#define KVM_INST_MTMSRD_L1 0x7c010164
 
 static bool kvm_patching_worked = true;
 static char kvm_tmp[1024 * 1024];
@@ -117,6 +118,43 @@ static u32 *kvm_alloc(int len)
return p;
 }
 
+extern u32 kvm_emulate_mtmsrd_branch_offs;
+extern u32 kvm_emulate_mtmsrd_reg_offs;
+extern u32 kvm_emulate_mtmsrd_len;
+extern u32 kvm_emulate_mtmsrd[];
+
+static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt)
+{
+   u32 *p;
+   int distance_start;
+   int distance_end;
+   ulong next_inst;
+
+   p = kvm_alloc(kvm_emulate_mtmsrd_len * 4);
+   if (!p)
+   return;
+
+   /* Find out where we are and put everything there */
+   distance_start = (ulong)p - (ulong)inst;
+   next_inst = ((ulong)inst + 4);
+   distance_end = next_inst - (ulong)p[kvm_emulate_mtmsrd_branch_offs];
+
+   /* Make sure we only write valid b instructions */
+   if (distance_start  KVM_INST_B_MAX) {
+   kvm_patching_worked = false;
+   return;
+   }
+
+   /* Modify the chunk to fit the invocation */
+   memcpy(p, kvm_emulate_mtmsrd, kvm_emulate_mtmsrd_len * 4);
+   p[kvm_emulate_mtmsrd_branch_offs] |= distance_end  KVM_INST_B_MASK;
+   p[kvm_emulate_mtmsrd_reg_offs] |= rt;
+   flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsrd_len * 4);
+
+   /* Patch the invocation */
+   *inst = KVM_INST_B | (distance_start  KVM_INST_B_MASK);
+}
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -190,6 +228,13 @@ static void kvm_check_ins(u32 *inst)
case KVM_INST_TLBSYNC:
kvm_patch_ins_nop(inst);
break;
+
+   /* Rewrites */
+   case KVM_INST_MTMSRD_L1:
+   /* We use r30 and r31 during the hook */
+   if (get_rt(inst_rt)  30)
+   kvm_patch_ins_mtmsrd(inst, inst_rt);
+   break;
}
 
switch (_inst) {
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index 7da835a..25e6683 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -54,3 +54,59 @@
/* Disable critical section. We are critical if \
   shared-critical == r1 and r2 is always != r1 */ \
STL64(r2, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0);
+
+.global kvm_emulate_mtmsrd
+kvm_emulate_mtmsrd:
+
+   SCRATCH_SAVE
+
+   /* Put MSR  ~(MSR_EE|MSR_RI) in r31 */
+   LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+   lis r30, (~(MSR_EE | MSR_RI))@h
+   ori r30, r30, (~(MSR_EE | MSR_RI))@l
+   and r31, r31, r30
+
+   /* OR the register's (MSR_EE|MSR_RI) on MSR */
+kvm_emulate_mtmsrd_reg:
+   andi.   r30, r0, (MSR_EE|MSR_RI)
+   or  r31, r31, r30
+
+   /* Put MSR back into magic page */
+   STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+   /* Check if we have to fetch an interrupt */
+   lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
+   cmpwi   r31, 0
+   beq+no_check
+
+   /* Check if we may trigger an interrupt */
+   andi.   r30, r30, MSR_EE
+   beq no_check
+
+   SCRATCH_RESTORE
+
+   /* Nag hypervisor */
+   tlbsync
+
+   b   kvm_emulate_mtmsrd_branch
+
+no_check:
+
+   SCRATCH_RESTORE
+
+   /* Go back to caller */
+kvm_emulate_mtmsrd_branch:
+   b   .
+kvm_emulate_mtmsrd_end:
+
+.global kvm_emulate_mtmsrd_branch_offs
+kvm_emulate_mtmsrd_branch_offs:
+   .long (kvm_emulate_mtmsrd_branch - kvm_emulate_mtmsrd) / 4
+
+.global kvm_emulate_mtmsrd_reg_offs
+kvm_emulate_mtmsrd_reg_offs:
+   .long (kvm_emulate_mtmsrd_reg - kvm_emulate_mtmsrd) / 4
+
+.global kvm_emulate_mtmsrd_len
+kvm_emulate_mtmsrd_len:
+   .long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 17/26] KVM: PPC: Generic KVM PV guest support

2010-06-25 Thread Alexander Graf

We have all the hypervisor pieces in place now, but the guest parts are still
missing.

This patch implements basic awareness of KVM when running Linux as guest. It
doesn't do anything with it yet though.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/Makefile  |2 ++
 arch/powerpc/kernel/asm-offsets.c |   15 +++
 arch/powerpc/kernel/kvm.c |   34 ++
 arch/powerpc/kernel/kvm_emul.S|   27 +++
 arch/powerpc/platforms/Kconfig|   10 ++
 5 files changed, 88 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kernel/kvm.c
 create mode 100644 arch/powerpc/kernel/kvm_emul.S

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 58d0572..2d7eb9e 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -125,6 +125,8 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),)
 obj-y  += ppc_save_regs.o
 endif
 
+obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o
+
 # Disable GCOV in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
 GCOV_PROFILE_ftrace.o := n
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index a55d47e..e3e740b 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -465,6 +465,21 @@ int main(void)
DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
 #endif /* CONFIG_PPC_BOOK3S */
 #endif
+
+#ifdef CONFIG_KVM_GUEST
+   DEFINE(KVM_MAGIC_SCRATCH1, offsetof(struct kvm_vcpu_arch_shared,
+   scratch1));
+   DEFINE(KVM_MAGIC_SCRATCH2, offsetof(struct kvm_vcpu_arch_shared,
+   scratch2));
+   DEFINE(KVM_MAGIC_SCRATCH3, offsetof(struct kvm_vcpu_arch_shared,
+   scratch3));
+   DEFINE(KVM_MAGIC_INT, offsetof(struct kvm_vcpu_arch_shared,
+  int_pending));
+   DEFINE(KVM_MAGIC_MSR, offsetof(struct kvm_vcpu_arch_shared, msr));
+   DEFINE(KVM_MAGIC_CRITICAL, offsetof(struct kvm_vcpu_arch_shared,
+   critical));
+#endif
+
 #ifdef CONFIG_44x
DEFINE(PGD_T_LOG2, PGD_T_LOG2);
DEFINE(PTE_T_LOG2, PTE_T_LOG2);
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
new file mode 100644
index 000..2d8dd73
--- /dev/null
+++ b/arch/powerpc/kernel/kvm.c
@@ -0,0 +1,34 @@
+/*
+ * Copyright (C) 2010 SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ * Alexander Graf ag...@suse.de
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include linux/kvm_host.h
+#include linux/init.h
+#include linux/kvm_para.h
+#include linux/slab.h
+
+#include asm/reg.h
+#include asm/kvm_ppc.h
+#include asm/sections.h
+#include asm/cacheflush.h
+#include asm/disassemble.h
+
+#define KVM_MAGIC_PAGE (-4096L)
+#define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
+
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
new file mode 100644
index 000..c7b9fc9
--- /dev/null
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -0,0 +1,27 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2010
+ *
+ * Authors: Alexander Graf ag...@suse.de
+ */
+
+#include asm/ppc_asm.h
+#include asm/kvm_asm.h
+#include asm/reg.h
+#include asm/page.h
+#include asm/asm-offsets.h
+
+#define KVM_MAGIC_PAGE (-4096)
+
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index d1663db..1744349 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -21,6 +21,16 @@ source

[PATCH 19/26] KVM: PPC: PV instructions to loads and stores

2010-06-25 Thread Alexander Graf

Some instructions can simply be replaced by load and store instructions to
or from the magic page.

This patch replaces often called instructions that fall into the above category.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm.c |  111 +
 1 files changed, 111 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index d873bc6..b165b20 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -32,8 +32,65 @@
 #define KVM_MAGIC_PAGE (-4096L)
 #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
 
+#define KVM_INST_LWZ   0x8000
+#define KVM_INST_STW   0x9000
+#define KVM_INST_LD0xe800
+#define KVM_INST_STD   0xf800
+#define KVM_INST_NOP   0x6000
+#define KVM_INST_B 0x4800
+#define KVM_INST_B_MASK0x03ff
+#define KVM_INST_B_MAX 0x01ff
+
+#define KVM_MASK_RT0x03e0
+#define KVM_INST_MFMSR 0x7ca6
+#define KVM_INST_MFSPR_SPRG0   0x7c1042a6
+#define KVM_INST_MFSPR_SPRG1   0x7c1142a6
+#define KVM_INST_MFSPR_SPRG2   0x7c1242a6
+#define KVM_INST_MFSPR_SPRG3   0x7c1342a6
+#define KVM_INST_MFSPR_SRR00x7c1a02a6
+#define KVM_INST_MFSPR_SRR10x7c1b02a6
+#define KVM_INST_MFSPR_DAR 0x7c1302a6
+#define KVM_INST_MFSPR_DSISR   0x7c1202a6
+
+#define KVM_INST_MTSPR_SPRG0   0x7c1043a6
+#define KVM_INST_MTSPR_SPRG1   0x7c1143a6
+#define KVM_INST_MTSPR_SPRG2   0x7c1243a6
+#define KVM_INST_MTSPR_SPRG3   0x7c1343a6
+#define KVM_INST_MTSPR_SRR00x7c1a03a6
+#define KVM_INST_MTSPR_SRR10x7c1b03a6
+#define KVM_INST_MTSPR_DAR 0x7c1303a6
+#define KVM_INST_MTSPR_DSISR   0x7c1203a6
+
 static bool kvm_patching_worked = true;
 
+static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt)
+{
+#ifdef CONFIG_64BIT
+   *inst = KVM_INST_LD | rt | (addr  0xfffc);
+#else
+   *inst = KVM_INST_LWZ | rt | ((addr + 4)  0xfffc);
+#endif
+}
+
+static void kvm_patch_ins_lwz(u32 *inst, long addr, u32 rt)
+{
+   *inst = KVM_INST_LWZ | rt | (addr  0x);
+}
+
+static void kvm_patch_ins_std(u32 *inst, long addr, u32 rt)
+{
+#ifdef CONFIG_64BIT
+   *inst = KVM_INST_STD | rt | (addr  0xfffc);
+#else
+   *inst = KVM_INST_STW | rt | ((addr + 4)  0xfffc);
+#endif
+}
+
+static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt)
+{
+   *inst = KVM_INST_STW | rt | (addr  0xfffc);
+}
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -48,6 +105,60 @@ static void kvm_check_ins(u32 *inst)
u32 inst_rt = _inst  KVM_MASK_RT;
 
switch (inst_no_rt) {
+   /* Loads */
+   case KVM_INST_MFMSR:
+   kvm_patch_ins_ld(inst, magic_var(msr), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SPRG0:
+   kvm_patch_ins_ld(inst, magic_var(sprg0), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SPRG1:
+   kvm_patch_ins_ld(inst, magic_var(sprg1), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SPRG2:
+   kvm_patch_ins_ld(inst, magic_var(sprg2), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SPRG3:
+   kvm_patch_ins_ld(inst, magic_var(sprg3), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SRR0:
+   kvm_patch_ins_ld(inst, magic_var(srr0), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SRR1:
+   kvm_patch_ins_ld(inst, magic_var(srr1), inst_rt);
+   break;
+   case KVM_INST_MFSPR_DAR:
+   kvm_patch_ins_ld(inst, magic_var(dar), inst_rt);
+   break;
+   case KVM_INST_MFSPR_DSISR:
+   kvm_patch_ins_lwz(inst, magic_var(dsisr), inst_rt);
+   break;
+
+   /* Stores */
+   case KVM_INST_MTSPR_SPRG0:
+   kvm_patch_ins_std(inst, magic_var(sprg0), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SPRG1:
+   kvm_patch_ins_std(inst, magic_var(sprg1), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SPRG2:
+   kvm_patch_ins_std(inst, magic_var(sprg2), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SPRG3:
+   kvm_patch_ins_std(inst, magic_var(sprg3), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SRR0:
+   kvm_patch_ins_std(inst, magic_var(srr0), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SRR1:
+   kvm_patch_ins_std(inst, magic_var(srr1), inst_rt);
+   break;
+   case KVM_INST_MTSPR_DAR:
+   kvm_patch_ins_std(inst, magic_var(dar), inst_rt);
+   break;
+   case KVM_INST_MTSPR_DSISR:
+   kvm_patch_ins_stw(inst, magic_var(dsisr), inst_rt);
+   break;
}
 
switch (_inst) {
-- 
1.6.0.2

--
To unsubscribe from this

[PATCH 22/26] KVM: PPC: PV assembler helpers

2010-06-25 Thread Alexander Graf

When we hook an instruction we need to make sure we don't clobber any of
the registers at that point. So we write them out to scratch space in the
magic page. To make sure we don't fall into a race with another piece of
hooked code, we need to disable interrupts.

To make the later patches and code in general easier readable, let's introduce
a set of defines that save and restore r30, r31 and cr. Let's also define some
helpers to read the lower 32 bits of a 64 bit field on 32 bit systems.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm_emul.S |   29 +
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index c7b9fc9..7da835a 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -25,3 +25,32 @@
 
 #define KVM_MAGIC_PAGE (-4096)
 
+#ifdef CONFIG_64BIT
+#define LL64(reg, offs, reg2)  ld  reg, (offs)(reg2)
+#define STL64(reg, offs, reg2) std reg, (offs)(reg2)
+#else
+#define LL64(reg, offs, reg2)  lwz reg, (offs + 4)(reg2)
+#define STL64(reg, offs, reg2) stw reg, (offs + 4)(reg2)
+#endif
+
+#define SCRATCH_SAVE   \
+   /* Enable critical section. We are critical if  \
+  shared-critical == r1 */\
+   STL64(r1, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0);  \
+   \
+   /* Save state */\
+   PPC_STL r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH1)(0);  \
+   PPC_STL r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH2)(0);  \
+   mfcrr31;\
+   stw r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH3)(0);
+
+#define SCRATCH_RESTORE
\
+   /* Restore state */ \
+   PPC_LL  r31, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH1)(0);  \
+   lwz r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH3)(0);  \
+   mtcrr30;\
+   PPC_LL  r30, (KVM_MAGIC_PAGE + KVM_MAGIC_SCRATCH2)(0);  \
+   \
+   /* Disable critical section. We are critical if \
+  shared-critical == r1 and r2 is always != r1 */ \
+   STL64(r2, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/26] KVM: PPC: Magic Page Book3s support

2010-06-25 Thread Alexander Graf

We need to override EA as well as PA lookups for the magic page. When the guest
tells us to project it, the magic page overrides any guest mappings.

In order to reflect that, we need to hook into all the MMU layers of KVM to
force map the magic page if necessary.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s.c |7 +++
 arch/powerpc/kvm/book3s_32_mmu.c  |   16 
 arch/powerpc/kvm/book3s_32_mmu_host.c |   12 
 arch/powerpc/kvm/book3s_64_mmu.c  |   30 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c |   12 
 5 files changed, 76 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2f55aa5..6ce7fa1 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -551,6 +551,13 @@ mmio:
 
 static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
+   ulong mp_pa = vcpu-arch.magic_page_pa;
+
+   if (unlikely(mp_pa) 
+   unlikely((mp_pa  KVM_RMO)  PAGE_SHIFT == gfn)) {
+   return 1;
+   }
+
return kvm_is_visible_gfn(vcpu-kvm, gfn);
 }
 
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 41130c8..d2bd1a6 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -281,8 +281,24 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
  struct kvmppc_pte *pte, bool data)
 {
int r;
+   ulong mp_ea = vcpu-arch.magic_page_ea;
 
pte-eaddr = eaddr;
+
+   /* Magic page override */
+   if (unlikely(mp_ea) 
+   unlikely((eaddr  ~0xfffULL) == (mp_ea  ~0xfffULL)) 
+   !(vcpu-arch.shared-msr  MSR_PR)) {
+   pte-vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data);
+   pte-raddr = vcpu-arch.magic_page_pa | (pte-raddr  0xfff);
+   pte-raddr = KVM_RMO;
+   pte-may_execute = true;
+   pte-may_read = true;
+   pte-may_write = true;
+
+   return 0;
+   }
+
r = kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data);
if (r  0)
   r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, true);
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 67b8c38..658d3e0 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -145,6 +145,16 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
kvmppc_pte *orig_pte)
bool primary = false;
bool evict = false;
struct hpte_cache *pte;
+   ulong mp_pa = vcpu-arch.magic_page_pa;
+
+   /* Magic page override */
+   if (unlikely(mp_pa) 
+   unlikely((orig_pte-raddr  ~0xfffUL  KVM_RMO) ==
+(mp_pa  ~0xfffUL  KVM_RMO))) {
+   hpaddr = (pfn_t)virt_to_phys(vcpu-arch.shared);
+   get_page(pfn_to_page(hpaddr  PAGE_SHIFT));
+   goto mapped;
+   }
 
/* Get host physical address for gpa */
hpaddr = gfn_to_pfn(vcpu-kvm, orig_pte-raddr  PAGE_SHIFT);
@@ -155,6 +165,8 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
kvmppc_pte *orig_pte)
}
hpaddr = PAGE_SHIFT;
 
+mapped:
+
/* and write the mapping ea - hpa into the pt */
vcpu-arch.mmu.esid_to_vsid(vcpu, orig_pte-eaddr  SID_SHIFT, vsid);
map = find_sid_vsid(vcpu, vsid);
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 58aa840..4a2e5fc 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -163,6 +163,22 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
bool found = false;
bool perm_err = false;
int second = 0;
+   ulong mp_ea = vcpu-arch.magic_page_ea;
+
+   /* Magic page override */
+   if (unlikely(mp_ea) 
+   unlikely((eaddr  ~0xfffULL) == (mp_ea  ~0xfffULL)) 
+   !(vcpu-arch.shared-msr  MSR_PR)) {
+   gpte-eaddr = eaddr;
+   gpte-vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu, eaddr, data);
+   gpte-raddr = vcpu-arch.magic_page_pa | (gpte-raddr  0xfff);
+   gpte-raddr = KVM_RMO;
+   gpte-may_execute = true;
+   gpte-may_read = true;
+   gpte-may_write = true;
+
+   return 0;
+   }
 
slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, eaddr);
if (!slbe)
@@ -445,6 +461,7 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct 
kvm_vcpu *vcpu, ulong esid,
ulong ea = esid  SID_SHIFT;
struct kvmppc_slb *slb;
u64 gvsid = esid;
+   ulong mp_ea = vcpu-arch.magic_page_ea;
 
if (vcpu-arch.shared-msr  (MSR_DR|MSR_IR)) {
slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), ea);
@@ -464,7 +481,7 @@ static int

[PATCH 07/26] KVM: PPC: Implement hypervisor interface

2010-06-25 Thread Alexander Graf

To communicate with KVM directly we need to plumb some sort of interface
between the guest and KVM. Usually those interfaces use hypercalls.

This hypercall implementation is described in the last patch of the series
in a special documentation file. Please read that for further information.

This patch implements stubs to handle KVM PPC hypercalls on the host and
guest side alike.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_para.h |  100 ++-
 arch/powerpc/include/asm/kvm_ppc.h  |1 +
 arch/powerpc/kvm/book3s.c   |   10 +++-
 arch/powerpc/kvm/booke.c|   11 -
 arch/powerpc/kvm/emulate.c  |   11 -
 arch/powerpc/kvm/powerpc.c  |   28 ++
 include/linux/kvm_para.h|1 +
 7 files changed, 156 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index e402999..eaab306 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -34,16 +34,112 @@ struct kvm_vcpu_arch_shared {
__u32 dsisr;
 };
 
+#define KVM_PVR_PARA   0x4b564d3f /* KVM? */
+#define KVM_SC_MAGIC_R30x4b564d52 /* KVMR */
+#define KVM_SC_MAGIC_R40x554c455a /* ULEZ */
+
 #ifdef __KERNEL__
 
 static inline int kvm_para_available(void)
 {
-   return 0;
+   unsigned long pvr = KVM_PVR_PARA;
+
+   asm volatile(mfpvr %0 : =r(pvr) : 0(pvr));
+   return pvr == KVM_PVR_PARA;
+}
+
+static inline long kvm_hypercall0(unsigned int nr)
+{
+   unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3;
+   unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4;
+   unsigned long register _nr asm(r5) = nr;
+
+   asm volatile(sc
+: =r(r3)
+: r(r3), r(r4), r(_nr)
+: memory);
+
+   return r3;
 }
 
+static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
+{
+   unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3;
+   unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4;
+   unsigned long register _nr asm(r5) = nr;
+   unsigned long register _p1 asm(r6) = p1;
+
+   asm volatile(sc
+: =r(r3)
+: r(r3), r(r4), r(_nr), r(_p1)
+: memory);
+
+   return r3;
+}
+
+static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
+ unsigned long p2)
+{
+   unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3;
+   unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4;
+   unsigned long register _nr asm(r5) = nr;
+   unsigned long register _p1 asm(r6) = p1;
+   unsigned long register _p2 asm(r7) = p2;
+
+   asm volatile(sc
+: =r(r3)
+: r(r3), r(r4), r(_nr), r(_p1), r(_p2)
+: memory);
+
+   return r3;
+}
+
+static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3)
+{
+   unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3;
+   unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4;
+   unsigned long register _nr asm(r5) = nr;
+   unsigned long register _p1 asm(r6) = p1;
+   unsigned long register _p2 asm(r7) = p2;
+   unsigned long register _p3 asm(r8) = p3;
+
+   asm volatile(sc
+: =r(r3)
+: r(r3), r(r4), r(_nr), r(_p1), r(_p2), r(_p3)
+: memory);
+
+   return r3;
+}
+
+static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3,
+ unsigned long p4)
+{
+   unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3;
+   unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4;
+   unsigned long register _nr asm(r5) = nr;
+   unsigned long register _p1 asm(r6) = p1;
+   unsigned long register _p2 asm(r7) = p2;
+   unsigned long register _p3 asm(r8) = p3;
+   unsigned long register _p4 asm(r9) = p4;
+
+   asm volatile(sc
+: =r(r3)
+: r(r3), r(r4), r(_nr), r(_p1), r(_p2), r(_p3),
+  r(_p4)
+: memory);
+
+   return r3;
+}
+
+
 static inline unsigned int kvm_arch_para_features(void)
 {
-   return 0;
+   if (!kvm_para_available())
+   return 0;
+
+   return kvm_hypercall0(KVM_HC_FEATURES);
 }
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 18d139e..ecb3bc7 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -107,6 +107,7 @@ extern int kvmppc_booke_init(void);
 extern void kvmppc_booke_exit(void);
 
 extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
+extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);

[PATCH 03/26] KVM: PPC: Convert DSISR to shared page

2010-06-25 Thread Alexander Graf

The DSISR register contains information about a data page fault. It is fully
read/write from inside the guest context and we don't need to worry about
interacting based on writes of this register.

This patch converts all users of the current field to the shared page.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h|1 -
 arch/powerpc/include/asm/kvm_para.h  |1 +
 arch/powerpc/kvm/book3s.c|   11 ++-
 arch/powerpc/kvm/book3s_emulate.c|6 +++---
 arch/powerpc/kvm/book3s_paired_singles.c |2 +-
 5 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index a96e405..4f29caa 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -85,7 +85,6 @@ struct kvmppc_vcpu_book3s {
u64 hid[6];
u64 gqr[8];
int slb_nr;
-   u32 dsisr;
u64 sdr1;
u64 hior;
u64 msr_mask;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index a17dc52..9f7565b 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -24,6 +24,7 @@
 
 struct kvm_vcpu_arch_shared {
__u64 msr;
+   __u32 dsisr;
 };
 
 #ifdef __KERNEL__
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 3dd3003..57fd73e 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -595,15 +595,16 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
if (page_found == -ENOENT) {
/* Page not found in guest PTE entries */
vcpu-arch.dear = kvmppc_get_fault_dar(vcpu);
-   to_book3s(vcpu)-dsisr = to_svcpu(vcpu)-fault_dsisr;
+   vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr;
vcpu-arch.shared-msr |=
(to_svcpu(vcpu)-shadow_srr1  0xf800ULL);
kvmppc_book3s_queue_irqprio(vcpu, vec);
} else if (page_found == -EPERM) {
/* Storage protection */
vcpu-arch.dear = kvmppc_get_fault_dar(vcpu);
-   to_book3s(vcpu)-dsisr = to_svcpu(vcpu)-fault_dsisr  
~DSISR_NOHPTE;
-   to_book3s(vcpu)-dsisr |= DSISR_PROTFAULT;
+   vcpu-arch.shared-dsisr =
+   to_svcpu(vcpu)-fault_dsisr  ~DSISR_NOHPTE;
+   vcpu-arch.shared-dsisr |= DSISR_PROTFAULT;
vcpu-arch.shared-msr |=
(to_svcpu(vcpu)-shadow_srr1  0xf800ULL);
kvmppc_book3s_queue_irqprio(vcpu, vec);
@@ -867,7 +868,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
r = kvmppc_handle_pagefault(run, vcpu, dar, exit_nr);
} else {
vcpu-arch.dear = dar;
-   to_book3s(vcpu)-dsisr = to_svcpu(vcpu)-fault_dsisr;
+   vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr;
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
kvmppc_mmu_pte_flush(vcpu, vcpu-arch.dear, ~0xFFFUL);
r = RESUME_GUEST;
@@ -994,7 +995,7 @@ program_interrupt:
}
case BOOK3S_INTERRUPT_ALIGNMENT:
if (kvmppc_read_inst(vcpu) == EMULATE_DONE) {
-   to_book3s(vcpu)-dsisr = kvmppc_alignment_dsisr(vcpu,
+   vcpu-arch.shared-dsisr = kvmppc_alignment_dsisr(vcpu,
kvmppc_get_last_inst(vcpu));
vcpu-arch.dear = kvmppc_alignment_dar(vcpu,
kvmppc_get_last_inst(vcpu));
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 35d3c16..9982ff1 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -221,7 +221,7 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
else if (r == -EPERM)
dsisr |= DSISR_PROTFAULT;
 
-   to_book3s(vcpu)-dsisr = dsisr;
+   vcpu-arch.shared-dsisr = dsisr;
to_svcpu(vcpu)-fault_dsisr = dsisr;
 
kvmppc_book3s_queue_irqprio(vcpu,
@@ -327,7 +327,7 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, int rs)
to_book3s(vcpu)-sdr1 = spr_val;
break;
case SPRN_DSISR:
-   to_book3s(vcpu)-dsisr = spr_val;
+   vcpu-arch.shared-dsisr = spr_val;
break;
case SPRN_DAR:
vcpu-arch.dear = spr_val;
@@ -440,7 +440,7 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int 
sprn, int rt)
kvmppc_set_gpr(vcpu, rt, to_book3s(vcpu)-sdr1);

[PATCH 00/26] KVM PPC PV framework

2010-06-25 Thread Alexander Graf

On PPC we run PR=0 (kernel mode) code in PR=1 (user mode) and don't use the
hypervisor extensions.

While that is all great to show that virtualization is possible, there are
quite some cases where the emulation overhead of privileged instructions is
killing performance.

This patchset tackles exactly that issue. It introduces a paravirtual framework
using which KVM and Linux share a page to exchange register state with. That
way we don't have to switch to the hypervisor just to change a value of a
privileged register.

To prove my point, I ran the same test I did for the MMU optimizations against
the PV framework. Here are the results:

[without]

debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello  /dev/null; done

real0m14.659s
user0m8.967s
sys 0m5.688s

[with]

debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello  /dev/null; done

real0m7.557s
user0m4.121s
sys 0m3.426s


So this is a significant performance improvement! I'm quite happy how fast this
whole thing becomes :)

I tried to take all comments I've heard from people so far about such a PV
framework into account. In case you told me something before that is a no-go
and I still did it, please just tell me again.

Now go and have fun with fast VMs on PPC! Get yourself a G5 on ebay and start
experiencing the power yourself. - heh

Alexander Graf (26):
  KVM: PPC: Introduce shared page
  KVM: PPC: Convert MSR to shared page
  KVM: PPC: Convert DSISR to shared page
  KVM: PPC: Convert DAR to shared page.
  KVM: PPC: Convert SRR0 and SRR1 to shared page
  KVM: PPC: Convert SPRG[0-4] to shared page
  KVM: PPC: Implement hypervisor interface
  KVM: PPC: Add PV guest critical sections
  KVM: PPC: Add PV guest scratch registers
  KVM: PPC: Tell guest about pending interrupts
  KVM: PPC: Make RMO a define
  KVM: PPC: First magic page steps
  KVM: PPC: Magic Page Book3s support
  KVM: PPC: Magic Page BookE support
  KVM: PPC: Expose magic page support to guest
  KVM: Move kvm_guest_init out of generic code
  KVM: PPC: Generic KVM PV guest support
  KVM: PPC: KVM PV guest stubs
  KVM: PPC: PV instructions to loads and stores
  KVM: PPC: PV tlbsync to nop
  KVM: PPC: Introduce kvm_tmp framework
  KVM: PPC: PV assembler helpers
  KVM: PPC: PV mtmsrd L=1
  KVM: PPC: PV mtmsrd L=0 and mtmsr
  KVM: PPC: PV wrteei
  KVM: PPC: Add Documentation about PV interface

 Documentation/kvm/ppc-pv.txt |  164 
 arch/powerpc/include/asm/kvm_book3s.h|1 -
 arch/powerpc/include/asm/kvm_host.h  |   14 +-
 arch/powerpc/include/asm/kvm_para.h  |  121 +-
 arch/powerpc/include/asm/kvm_ppc.h   |1 +
 arch/powerpc/kernel/Makefile |2 +
 arch/powerpc/kernel/asm-offsets.c|   18 ++-
 arch/powerpc/kernel/kvm.c|  399 ++
 arch/powerpc/kernel/kvm_emul.S   |  237 ++
 arch/powerpc/kvm/44x.c   |7 +
 arch/powerpc/kvm/44x_tlb.c   |8 +-
 arch/powerpc/kvm/book3s.c|  162 -
 arch/powerpc/kvm/book3s_32_mmu.c |   28 ++-
 arch/powerpc/kvm/book3s_32_mmu_host.c|   16 +-
 arch/powerpc/kvm/book3s_64_mmu.c |   42 +++-
 arch/powerpc/kvm/book3s_64_mmu_host.c|   16 +-
 arch/powerpc/kvm/book3s_emulate.c|   25 +-
 arch/powerpc/kvm/book3s_paired_singles.c |   11 +-
 arch/powerpc/kvm/booke.c |  110 +++--
 arch/powerpc/kvm/booke.h |6 +-
 arch/powerpc/kvm/booke_emulate.c |   14 +-
 arch/powerpc/kvm/booke_interrupts.S  |3 +-
 arch/powerpc/kvm/e500.c  |7 +
 arch/powerpc/kvm/e500_tlb.c  |   31 ++-
 arch/powerpc/kvm/e500_tlb.h  |2 +-
 arch/powerpc/kvm/emulate.c   |   47 +++-
 arch/powerpc/kvm/powerpc.c   |   42 +++-
 arch/powerpc/platforms/Kconfig   |   10 +
 arch/x86/include/asm/kvm_para.h  |6 +
 include/linux/kvm_para.h |7 +-
 30 files changed, 1383 insertions(+), 174 deletions(-)
 create mode 100644 Documentation/kvm/ppc-pv.txt
 create mode 100644 arch/powerpc/kernel/kvm.c
 create mode 100644 arch/powerpc/kernel/kvm_emul.S

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/26] KVM: PPC: Convert SRR0 and SRR1 to shared page

2010-06-25 Thread Alexander Graf

The SRR0 and SRR1 registers contain cached values of the PC and MSR
respectively. They get written to by the hypervisor when an interrupt
occurs or directly by the kernel. They are also used to tell the rfi(d)
instruction where to jump to.

Because it only gets touched on defined events that, it's very simple to
share with the guest. Hypervisor and guest both have full r/w access.

This patch converts all users of the current field to the shared page.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h |2 --
 arch/powerpc/include/asm/kvm_para.h |2 ++
 arch/powerpc/kvm/book3s.c   |   12 ++--
 arch/powerpc/kvm/book3s_emulate.c   |4 ++--
 arch/powerpc/kvm/booke.c|   15 ---
 arch/powerpc/kvm/booke_emulate.c|4 ++--
 arch/powerpc/kvm/emulate.c  |   12 
 7 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 108dabc..6bcf62f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -224,8 +224,6 @@ struct kvm_vcpu_arch {
ulong sprg5;
ulong sprg6;
ulong sprg7;
-   ulong srr0;
-   ulong srr1;
ulong csrr0;
ulong csrr1;
ulong dsrr0;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index ec72a1c..d7fc6c2 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,8 @@
 #include linux/types.h
 
 struct kvm_vcpu_arch_shared {
+   __u64 srr0;
+   __u64 srr1;
__u64 dar;
__u64 msr;
__u32 dsisr;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 245bd2d..b144697 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -162,8 +162,8 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
 
 void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags)
 {
-   vcpu-arch.srr0 = kvmppc_get_pc(vcpu);
-   vcpu-arch.srr1 = vcpu-arch.shared-msr | flags;
+   vcpu-arch.shared-srr0 = kvmppc_get_pc(vcpu);
+   vcpu-arch.shared-srr1 = vcpu-arch.shared-msr | flags;
kvmppc_set_pc(vcpu, to_book3s(vcpu)-hior + vec);
vcpu-arch.mmu.reset_msr(vcpu);
 }
@@ -1059,8 +1059,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
regs-lr = kvmppc_get_lr(vcpu);
regs-xer = kvmppc_get_xer(vcpu);
regs-msr = vcpu-arch.shared-msr;
-   regs-srr0 = vcpu-arch.srr0;
-   regs-srr1 = vcpu-arch.srr1;
+   regs-srr0 = vcpu-arch.shared-srr0;
+   regs-srr1 = vcpu-arch.shared-srr1;
regs-pid = vcpu-arch.pid;
regs-sprg0 = vcpu-arch.sprg0;
regs-sprg1 = vcpu-arch.sprg1;
@@ -1086,8 +1086,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
kvmppc_set_lr(vcpu, regs-lr);
kvmppc_set_xer(vcpu, regs-xer);
kvmppc_set_msr(vcpu, regs-msr);
-   vcpu-arch.srr0 = regs-srr0;
-   vcpu-arch.srr1 = regs-srr1;
+   vcpu-arch.shared-srr0 = regs-srr0;
+   vcpu-arch.shared-srr1 = regs-srr1;
vcpu-arch.sprg0 = regs-sprg0;
vcpu-arch.sprg1 = regs-sprg1;
vcpu-arch.sprg2 = regs-sprg2;
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index c147864..f333cb4 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -73,8 +73,8 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
switch (get_xop(inst)) {
case OP_19_XOP_RFID:
case OP_19_XOP_RFI:
-   kvmppc_set_pc(vcpu, vcpu-arch.srr0);
-   kvmppc_set_msr(vcpu, vcpu-arch.srr1);
+   kvmppc_set_pc(vcpu, vcpu-arch.shared-srr0);
+   kvmppc_set_msr(vcpu, vcpu-arch.shared-srr1);
*advance = 0;
break;
 
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 5844bcf..8b546fe 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -64,7 +64,8 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu)
 
printk(pc:   %08lx msr:  %08llx\n, vcpu-arch.pc, 
vcpu-arch.shared-msr);
printk(lr:   %08lx ctr:  %08lx\n, vcpu-arch.lr, vcpu-arch.ctr);
-   printk(srr0: %08lx srr1: %08lx\n, vcpu-arch.srr0, vcpu-arch.srr1);
+   printk(srr0: %08llx srr1: %08llx\n, vcpu-arch.shared-srr0,
+   vcpu-arch.shared-srr1);
 
printk(exceptions: %08lx\n, vcpu-arch.pending_exceptions);
 
@@ -189,8 +190,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
}
 
if (allowed) {
-   vcpu-arch.srr0 = vcpu-arch.pc;
-   vcpu-arch.srr1 = vcpu-arch.shared-msr;
+   vcpu-arch.shared-srr0 = vcpu-arch.pc;
+

[PATCH 02/26] KVM: PPC: Convert MSR to shared page

2010-06-25 Thread Alexander Graf

One of the most obvious registers to share with the guest directly is the
MSR. The MSR contains the interrupts enabled flag which the guest has to
toggle in critical sections.

So in order to bring the overhead of interrupt en- and disabling down, let's
put msr into the shared page. Keep in mind that even though you can fully read
its contents, writing to it doesn't always update all state. There are a few
safe fields that don't require hypervisor interaction. See the guest
implementation that follows later for reference.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h  |1 -
 arch/powerpc/include/asm/kvm_para.h  |1 +
 arch/powerpc/kernel/asm-offsets.c|2 +-
 arch/powerpc/kvm/44x_tlb.c   |8 ++--
 arch/powerpc/kvm/book3s.c|   65 --
 arch/powerpc/kvm/book3s_32_mmu.c |   12 +++---
 arch/powerpc/kvm/book3s_32_mmu_host.c|4 +-
 arch/powerpc/kvm/book3s_64_mmu.c |   12 +++---
 arch/powerpc/kvm/book3s_64_mmu_host.c|4 +-
 arch/powerpc/kvm/book3s_emulate.c|9 ++--
 arch/powerpc/kvm/book3s_paired_singles.c |7 ++-
 arch/powerpc/kvm/booke.c |   20 +-
 arch/powerpc/kvm/booke.h |6 +-
 arch/powerpc/kvm/booke_emulate.c |6 +-
 arch/powerpc/kvm/booke_interrupts.S  |3 +-
 arch/powerpc/kvm/e500_tlb.c  |   12 +++---
 arch/powerpc/kvm/e500_tlb.h  |2 +-
 arch/powerpc/kvm/powerpc.c   |3 +-
 18 files changed, 93 insertions(+), 84 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index bca9391..249c242 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -210,7 +210,6 @@ struct kvm_vcpu_arch {
u32 cr;
 #endif
 
-   ulong msr;
 #ifdef CONFIG_PPC_BOOK3S
ulong shadow_msr;
ulong hflags;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 1485ba8..a17dc52 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #include linux/types.h
 
 struct kvm_vcpu_arch_shared {
+   __u64 msr;
 };
 
 #ifdef __KERNEL__
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 944f593..a55d47e 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -394,13 +394,13 @@ int main(void)
DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, arch.host_stack));
DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid));
DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr));
-   DEFINE(VCPU_MSR, offsetof(struct kvm_vcpu, arch.msr));
DEFINE(VCPU_SPRG4, offsetof(struct kvm_vcpu, arch.sprg4));
DEFINE(VCPU_SPRG5, offsetof(struct kvm_vcpu, arch.sprg5));
DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6));
DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7));
DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid));
DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared));
+   DEFINE(VCPU_SHARED_MSR, offsetof(struct kvm_vcpu_arch_shared, msr));
 
/* book3s */
 #ifdef CONFIG_PPC_BOOK3S
diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c
index 8123125..4cbbca7 100644
--- a/arch/powerpc/kvm/44x_tlb.c
+++ b/arch/powerpc/kvm/44x_tlb.c
@@ -221,14 +221,14 @@ gpa_t kvmppc_mmu_xlate(struct kvm_vcpu *vcpu, unsigned 
int gtlb_index,
 
 int kvmppc_mmu_itlb_index(struct kvm_vcpu *vcpu, gva_t eaddr)
 {
-   unsigned int as = !!(vcpu-arch.msr  MSR_IS);
+   unsigned int as = !!(vcpu-arch.shared-msr  MSR_IS);
 
return kvmppc_44x_tlb_index(vcpu, eaddr, vcpu-arch.pid, as);
 }
 
 int kvmppc_mmu_dtlb_index(struct kvm_vcpu *vcpu, gva_t eaddr)
 {
-   unsigned int as = !!(vcpu-arch.msr  MSR_DS);
+   unsigned int as = !!(vcpu-arch.shared-msr  MSR_DS);
 
return kvmppc_44x_tlb_index(vcpu, eaddr, vcpu-arch.pid, as);
 }
@@ -353,7 +353,7 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, 
gpa_t gpaddr,
 
stlbe.word1 = (hpaddr  0xfc00) | ((hpaddr  32)  0xf);
stlbe.word2 = kvmppc_44x_tlb_shadow_attrib(flags,
-   vcpu-arch.msr  MSR_PR);
+   vcpu-arch.shared-msr  
MSR_PR);
stlbe.tid = !(asid  0xff);
 
/* Keep track of the reference so we can properly release it later. */
@@ -422,7 +422,7 @@ static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu,
 
/* Does it match current guest AS? */
/* XXX what about IS != DS? */
-   if (get_tlb_ts(tlbe) != !!(vcpu-arch.msr  MSR_IS))
+   if (get_tlb_ts(tlbe) != !!(vcpu-arch.shared-msr  MSR_IS))
return 0;
 
gpa = get_tlb_raddr(tlbe);
diff --git

[PATCH 04/26] KVM: PPC: Convert DAR to shared page.

2010-06-25 Thread Alexander Graf

The DAR register contains the address a data page fault occured at. This
register behaves pretty much like a simple data storage register that gets
written to on data faults. There is no hypervisor interaction required on
read or write.

This patch converts all users of the current field to the shared page.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h  |1 -
 arch/powerpc/include/asm/kvm_para.h  |1 +
 arch/powerpc/kvm/book3s.c|   14 +++---
 arch/powerpc/kvm/book3s_emulate.c|6 +++---
 arch/powerpc/kvm/book3s_paired_singles.c |2 +-
 arch/powerpc/kvm/booke.c |2 +-
 arch/powerpc/kvm/booke_emulate.c |4 ++--
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 249c242..108dabc 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -230,7 +230,6 @@ struct kvm_vcpu_arch {
ulong csrr1;
ulong dsrr0;
ulong dsrr1;
-   ulong dear;
ulong esr;
u32 dec;
u32 decar;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 9f7565b..ec72a1c 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #include linux/types.h
 
 struct kvm_vcpu_arch_shared {
+   __u64 dar;
__u64 msr;
__u32 dsisr;
 };
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 57fd73e..245bd2d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -594,14 +594,14 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 
if (page_found == -ENOENT) {
/* Page not found in guest PTE entries */
-   vcpu-arch.dear = kvmppc_get_fault_dar(vcpu);
+   vcpu-arch.shared-dar = kvmppc_get_fault_dar(vcpu);
vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr;
vcpu-arch.shared-msr |=
(to_svcpu(vcpu)-shadow_srr1  0xf800ULL);
kvmppc_book3s_queue_irqprio(vcpu, vec);
} else if (page_found == -EPERM) {
/* Storage protection */
-   vcpu-arch.dear = kvmppc_get_fault_dar(vcpu);
+   vcpu-arch.shared-dar = kvmppc_get_fault_dar(vcpu);
vcpu-arch.shared-dsisr =
to_svcpu(vcpu)-fault_dsisr  ~DSISR_NOHPTE;
vcpu-arch.shared-dsisr |= DSISR_PROTFAULT;
@@ -610,7 +610,7 @@ int kvmppc_handle_pagefault(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
kvmppc_book3s_queue_irqprio(vcpu, vec);
} else if (page_found == -EINVAL) {
/* Page not found in guest SLB */
-   vcpu-arch.dear = kvmppc_get_fault_dar(vcpu);
+   vcpu-arch.shared-dar = kvmppc_get_fault_dar(vcpu);
kvmppc_book3s_queue_irqprio(vcpu, vec + 0x80);
} else if (!is_mmio 
   kvmppc_visible_gfn(vcpu, pte.raddr  PAGE_SHIFT)) {
@@ -867,17 +867,17 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
if (to_svcpu(vcpu)-fault_dsisr  DSISR_NOHPTE) {
r = kvmppc_handle_pagefault(run, vcpu, dar, exit_nr);
} else {
-   vcpu-arch.dear = dar;
+   vcpu-arch.shared-dar = dar;
vcpu-arch.shared-dsisr = to_svcpu(vcpu)-fault_dsisr;
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
-   kvmppc_mmu_pte_flush(vcpu, vcpu-arch.dear, ~0xFFFUL);
+   kvmppc_mmu_pte_flush(vcpu, dar, ~0xFFFUL);
r = RESUME_GUEST;
}
break;
}
case BOOK3S_INTERRUPT_DATA_SEGMENT:
if (kvmppc_mmu_map_segment(vcpu, kvmppc_get_fault_dar(vcpu))  
0) {
-   vcpu-arch.dear = kvmppc_get_fault_dar(vcpu);
+   vcpu-arch.shared-dar = kvmppc_get_fault_dar(vcpu);
kvmppc_book3s_queue_irqprio(vcpu,
BOOK3S_INTERRUPT_DATA_SEGMENT);
}
@@ -997,7 +997,7 @@ program_interrupt:
if (kvmppc_read_inst(vcpu) == EMULATE_DONE) {
vcpu-arch.shared-dsisr = kvmppc_alignment_dsisr(vcpu,
kvmppc_get_last_inst(vcpu));
-   vcpu-arch.dear = kvmppc_alignment_dar(vcpu,
+   vcpu-arch.shared-dar = kvmppc_alignment_dar(vcpu,
kvmppc_get_last_inst(vcpu));
kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
}
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 9982ff1..c147864 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++

Re: [PATCH v2 3/4] KVM: cleanup: remove kvm_get_dirty_log()

2010-06-25 Thread Takuya Yoshikawa

On Fri, 25 Jun 2010 21:25:57 +0200
Alexander Graf ag...@suse.de wrote:

 
 This patch plus 4/4 broke dirty bitmap updating on PPC. I didn't get around 
 to track down why, but I figured you should now. Is there any way to get you 
 a PPC development box? A simple G4 or G5 should be 200$ on ebay by now :).
 

I'm sorry, I thought this change was just trivial code transformation
and test for x86 would be OK: but not actually. Probably the reason is
around the timing of copy_to_user() and newly introduced clear_user()
for clean slot.

 
 Alex
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/4] KVM: cleanup: remove kvm_get_dirty_log()

2010-06-25 Thread Takuya Yoshikawa


 This patch plus 4/4 broke dirty bitmap updating on PPC. I didn't get around 
 to track down why, but I figured you should now. Is there any way to get you 
 a PPC development box? A simple G4 or G5 should be 200$ on ebay by now :).
 

A simple G4 or G5, thanks for the info, I'll buy one.

I hope I can contribute a bit from there to kvm-ppc :).


 
 Alex
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Where is the entry point of hypercalls in kvm

2010-06-25 Thread Balachandar

Hello,
 I am trying to understand the virtio mechanism in linux. I read that
the kick function will notify the host side about the newly published
buffers. I am looking especially at virtio_net.Once a packet is ready
for transmission the kick function is called. From here i where does
it go? Which code contains the backend driver of virtio. Where is the
code in the hypervisor which this kick will go to? Thank you...

Thanks,
Bala
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm/ppc: fix build warning

2010-06-25 Thread Denis Kirjanov


On 06/25/2010 12:42 AM, Alexander Graf wrote:


On 24.06.2010, at 21:44, Denis Kirjanov wrote:


Fix build warning:
arch/powerpc/kvm/book3s_64_mmu.c: In function 
'kvmppc_mmu_book3s_64_esid_to_vsid':
arch/powerpc/kvm/book3s_64_mmu.c:446: warning: 'slb' may be used uninitialized 
in this function
Signed-off-by: Denis Kirjanovdkirja...@kernel.org


Are you sure this isn't a broken compiler? I don't see where it could be used 
uninitialized.


I'm using gcc version 4.3.4 (Gentoo 4.3.4 p1.1, pie-10.1.5)
slb pointer initialized inside conditional branch
and used later in the case case MSR_DR|MSR_IR

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm/ppc: fix build warning

2010-06-25 Thread Alexander Graf


On 25.06.2010, at 11:02, Denis Kirjanov wrote:

 On 06/25/2010 12:42 AM, Alexander Graf wrote:
 
 On 24.06.2010, at 21:44, Denis Kirjanov wrote:
 
 Fix build warning:
 arch/powerpc/kvm/book3s_64_mmu.c: In function 
 'kvmppc_mmu_book3s_64_esid_to_vsid':
 arch/powerpc/kvm/book3s_64_mmu.c:446: warning: 'slb' may be used 
 uninitialized in this function
 Signed-off-by: Denis Kirjanovdkirja...@kernel.org
 
 Are you sure this isn't a broken compiler? I don't see where it could be 
 used uninitialized.
 
 I'm using gcc version 4.3.4 (Gentoo 4.3.4 p1.1, pie-10.1.5)
 slb pointer initialized inside conditional branch
 and used later in the case case MSR_DR|MSR_IR
 

Oh, I'm apparently looking at completely different code. The same function in 
git://git.kernel.org/pub/scm/virt/kvm/kvm.git is good. Which tree did you use?


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm/ppc: fix build warning

2010-06-25 Thread Denis Kirjanov


On 06/25/2010 01:02 PM, Denis Kirjanov wrote:

On 06/25/2010 12:42 AM, Alexander Graf wrote:


On 24.06.2010, at 21:44, Denis Kirjanov wrote:


Fix build warning:
arch/powerpc/kvm/book3s_64_mmu.c: In function
'kvmppc_mmu_book3s_64_esid_to_vsid':
arch/powerpc/kvm/book3s_64_mmu.c:446: warning: 'slb' may be used
uninitialized in this function
Signed-off-by: Denis Kirjanovdkirja...@kernel.org


Are you sure this isn't a broken compiler? I don't see where it could
be used uninitialized.


I'm using gcc version 4.3.4 (Gentoo 4.3.4 p1.1, pie-10.1.5)
slb pointer initialized inside conditional branch
and used later in the case case MSR_DR|MSR_IR


This is based on linux-next tree (-next-20100623)
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/4] KVM: cleanup: remove kvm_get_dirty_log()

2010-06-25 Thread Alexander Graf


On 23.06.2010, at 08:01, Takuya Yoshikawa wrote:

 kvm_get_dirty_log() is a helper function for kvm_vm_ioctl_get_dirty_log() 
 which
 is currently used by ia64 and ppc and the following is what it is doing:
 
  - sanity checks
  - bitmap scan to check if the slot is dirty
  - copy_to_user()
 
 Considering the fact that x86 is not using this anymore and sanity checks must
 be done before kvm_ia64_sync_dirty_log(), we can say that this is not working
 for code sharing effectively. So we just remove this.

This patch plus 4/4 broke dirty bitmap updating on PPC. I didn't get around to 
track down why, but I figured you should now. Is there any way to get you a PPC 
development box? A simple G4 or G5 should be 200$ on ebay by now :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Book3S_32 MMU debug compile fixes

2010-06-25 Thread Alexander Graf

Due to previous changes, the Book3S_32 guest MMU code didn't compile properly
when enabling debugging.

This patch repairs the broken code paths, making it possible to define DEBUG_MMU
and friends again.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_32_mmu.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 3292d76..079760b 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -104,7 +104,7 @@ static hva_t kvmppc_mmu_book3s_32_get_pteg(struct 
kvmppc_vcpu_book3s *vcpu_book3
pteg = (vcpu_book3s-sdr1  0x) | hash;
 
dprintk(MMU: pc=0x%lx eaddr=0x%lx sdr1=0x%llx pteg=0x%x vsid=0x%x\n,
-   vcpu_book3s-vcpu.arch.pc, eaddr, vcpu_book3s-sdr1, pteg,
+   kvmppc_get_pc(vcpu_book3s-vcpu), eaddr, vcpu_book3s-sdr1, 
pteg,
sre-vsid);
 
r = gfn_to_hva(vcpu_book3s-vcpu.kvm, pteg  PAGE_SHIFT);
@@ -269,7 +269,7 @@ no_page_found:
dprintk_pte(KVM MMU: No PTE found (sdr1=0x%llx ptegp=0x%lx)\n,
to_book3s(vcpu)-sdr1, ptegp);
for (i=0; i16; i+=2) {
-   dprintk_pte(   %02d: 0x%x - 0x%x (0x%llx)\n,
+   dprintk_pte(   %02d: 0x%x - 0x%x (0x%x)\n,
i, pteg[i], pteg[i+1], ptem);
}
}
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: PPC: Make use of hash based Shadow MMU

2010-06-25 Thread Alexander Graf

We just introduced generic functions to handle shadow pages on PPC.
This patch makes the respective backends make use of them, getting
rid of a lot of duplicate code along the way.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h |7 ++
 arch/powerpc/include/asm/kvm_host.h   |   18 +-
 arch/powerpc/kvm/Makefile |2 +
 arch/powerpc/kvm/book3s_32_mmu_host.c |  104 +++-
 arch/powerpc/kvm/book3s_64_mmu_host.c |   98 ++
 5 files changed, 41 insertions(+), 188 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 4e99559..a96e405 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -115,6 +115,13 @@ extern void kvmppc_mmu_book3s_32_init(struct kvm_vcpu 
*vcpu);
 extern int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct kvmppc_pte *pte);
 extern int kvmppc_mmu_map_segment(struct kvm_vcpu *vcpu, ulong eaddr);
 extern void kvmppc_mmu_flush_segments(struct kvm_vcpu *vcpu);
+
+extern void kvmppc_mmu_hpte_cache_map(struct kvm_vcpu *vcpu, struct hpte_cache 
*pte);
+extern struct hpte_cache *kvmppc_mmu_hpte_cache_next(struct kvm_vcpu *vcpu);
+extern void kvmppc_mmu_hpte_destroy(struct kvm_vcpu *vcpu);
+extern int kvmppc_mmu_hpte_init(struct kvm_vcpu *vcpu);
+extern void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache 
*pte);
+
 extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, 
bool data);
 extern int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr, 
bool data);
 extern void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int 
vec);
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 0c9ad86..895eb63 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -38,7 +38,13 @@
 #define KVM_NR_PAGE_SIZES  1
 #define KVM_PAGES_PER_HPAGE(x) (1UL31)
 
-#define HPTEG_CACHE_NUM 1024
+#define HPTEG_CACHE_NUM(1  15)
+#define HPTEG_HASH_BITS_PTE13
+#define HPTEG_HASH_BITS_VPTE   13
+#define HPTEG_HASH_BITS_VPTE_LONG  5
+#define HPTEG_HASH_NUM_PTE (1  HPTEG_HASH_BITS_PTE)
+#define HPTEG_HASH_NUM_VPTE(1  HPTEG_HASH_BITS_VPTE)
+#define HPTEG_HASH_NUM_VPTE_LONG   (1  HPTEG_HASH_BITS_VPTE_LONG)
 
 struct kvm;
 struct kvm_run;
@@ -151,6 +157,9 @@ struct kvmppc_mmu {
 };
 
 struct hpte_cache {
+   struct list_head list_pte;
+   struct list_head list_vpte;
+   struct list_head list_vpte_long;
u64 host_va;
u64 pfn;
ulong slot;
@@ -282,8 +291,11 @@ struct kvm_vcpu_arch {
unsigned long pending_exceptions;
 
 #ifdef CONFIG_PPC_BOOK3S
-   struct hpte_cache hpte_cache[HPTEG_CACHE_NUM];
-   int hpte_cache_offset;
+   struct kmem_cache *hpte_cache;
+   struct list_head hpte_hash_pte[HPTEG_HASH_NUM_PTE];
+   struct list_head hpte_hash_vpte[HPTEG_HASH_NUM_VPTE];
+   struct list_head hpte_hash_vpte_long[HPTEG_HASH_NUM_VPTE_LONG];
+   int hpte_cache_count;
 #endif
 };
 
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index ff43606..d45c818 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -45,6 +45,7 @@ kvm-book3s_64-objs := \
book3s.o \
book3s_emulate.o \
book3s_interrupts.o \
+   book3s_mmu_hpte.o \
book3s_64_mmu_host.o \
book3s_64_mmu.o \
book3s_32_mmu.o
@@ -57,6 +58,7 @@ kvm-book3s_32-objs := \
book3s.o \
book3s_emulate.o \
book3s_interrupts.o \
+   book3s_mmu_hpte.o \
book3s_32_mmu_host.o \
book3s_32_mmu.o
 kvm-objs-$(CONFIG_KVM_BOOK3S_32) := $(kvm-book3s_32-objs)
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 904f5ac..0b51ef8 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -58,105 +58,19 @@
 static ulong htab;
 static u32 htabmask;
 
-static void invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
+void kvmppc_mmu_invalidate_pte(struct kvm_vcpu *vcpu, struct hpte_cache *pte)
 {
volatile u32 *pteg;
 
-   dprintk_mmu(KVM: Flushing SPTE: 0x%llx (0x%llx) - 0x%llx\n,
-   pte-pte.eaddr, pte-pte.vpage, pte-host_va);
-
+   /* Remove from host HTAB */
pteg = (u32*)pte-slot;
-
pteg[0] = 0;
+
+   /* And make sure it's gone from the TLB too */
asm volatile (sync);
asm volatile (tlbie %0 : : r (pte-pte.eaddr) : memory);
asm volatile (sync);
asm volatile (tlbsync);
-
-   pte-host_va = 0;
-
-   if (pte-pte.may_write)
-   kvm_release_pfn_dirty(pte-pfn);
-   else
-   kvm_release_pfn_clean(pte-pfn);
-}
-
-void kvmppc_mmu_pte_flush(struct kvm_vcpu *vcpu, ulong guest_ea, ulong

Re: [PATCH] KVM: PPC: Add generic hpte management functions

2010-06-25 Thread Alexander Graf


On 26.06.2010, at 01:16, Alexander Graf wrote:

 Currently the shadow paging code keeps an array of entries it knows about.
 Whenever the guest invalidates an entry, we loop through that entry,
 trying to invalidate matching parts.
 
 While this is a really simple implementation, it is probably the most
 ineffective one possible. So instead, let's keep an array of lists around
 that are indexed by a hash. This way each PTE can be added by 4 list_add,
 removed by 4 list_del invocations and the search only needs to loop through
 entries that share the same hash.
 
 This patch implements said lookup and exports generic functions that both
 the 32-bit and 64-bit backend can use.

Yikes - I forgot -n.

This is patch 1/2.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/26] KVM: PPC: Tell guest about pending interrupts

2010-06-25 Thread Alexander Graf

When the guest turns on interrupts again, it needs to know if we have an
interrupt pending for it. Because if so, it should rather get out of guest
context and get the interrupt.

So we introduce a new field in the shared page that we use to tell the guest
that there's a pending interrupt lying around.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_para.h |1 +
 arch/powerpc/kvm/book3s.c   |7 +++
 arch/powerpc/kvm/booke.c|7 +++
 3 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index edf8f83..c7305d7 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -36,6 +36,7 @@ struct kvm_vcpu_arch_shared {
__u64 dar;
__u64 msr;
__u32 dsisr;
+   __u32 int_pending;  /* Tells the guest if we have an interrupt */
 };
 
 #define KVM_PVR_PARA   0x4b564d3f /* KVM? */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index f0e8047..e76c950 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -334,6 +334,7 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, 
unsigned int priority)
 void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 {
unsigned long *pending = vcpu-arch.pending_exceptions;
+   unsigned long old_pending = vcpu-arch.pending_exceptions;
unsigned int priority;
 
 #ifdef EXIT_DEBUG
@@ -353,6 +354,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 BITS_PER_BYTE * sizeof(*pending),
 priority + 1);
}
+
+   /* Tell the guest about our interrupt status */
+   if (*pending)
+   vcpu-arch.shared-int_pending = 1;
+   else if (old_pending)
+   vcpu-arch.shared-int_pending = 0;
 }
 
 void kvmppc_set_pvr(struct kvm_vcpu *vcpu, u32 pvr)
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 485f8fa..2229df9 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -221,6 +221,7 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
 void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 {
unsigned long *pending = vcpu-arch.pending_exceptions;
+   unsigned long old_pending = vcpu-arch.pending_exceptions;
unsigned int priority;
 
priority = __ffs(*pending);
@@ -232,6 +233,12 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
 BITS_PER_BYTE * sizeof(*pending),
 priority + 1);
}
+
+   /* Tell the guest about our interrupt status */
+   if (*pending)
+   vcpu-arch.shared-int_pending = 1;
+   else if (old_pending)
+   vcpu-arch.shared-int_pending = 0;
 }
 
 /**
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/26] KVM: PPC: Convert SPRG[0-4] to shared page

2010-06-25 Thread Alexander Graf

When in kernel mode there are 4 additional registers available that are
simple data storage. Instead of exiting to the hypervisor to read and
write those, we can just share them with the guest using the page.

This patch converts all users of the current field to the shared page.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h |4 
 arch/powerpc/include/asm/kvm_para.h |4 
 arch/powerpc/kvm/book3s.c   |   16 
 arch/powerpc/kvm/booke.c|   16 
 arch/powerpc/kvm/emulate.c  |   24 
 5 files changed, 36 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 6bcf62f..83c45ea 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -216,10 +216,6 @@ struct kvm_vcpu_arch {
ulong guest_owned_ext;
 #endif
u32 mmucr;
-   ulong sprg0;
-   ulong sprg1;
-   ulong sprg2;
-   ulong sprg3;
ulong sprg4;
ulong sprg5;
ulong sprg6;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index d7fc6c2..e402999 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,10 @@
 #include linux/types.h
 
 struct kvm_vcpu_arch_shared {
+   __u64 sprg0;
+   __u64 sprg1;
+   __u64 sprg2;
+   __u64 sprg3;
__u64 srr0;
__u64 srr1;
__u64 dar;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index b144697..5a6f055 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1062,10 +1062,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
regs-srr0 = vcpu-arch.shared-srr0;
regs-srr1 = vcpu-arch.shared-srr1;
regs-pid = vcpu-arch.pid;
-   regs-sprg0 = vcpu-arch.sprg0;
-   regs-sprg1 = vcpu-arch.sprg1;
-   regs-sprg2 = vcpu-arch.sprg2;
-   regs-sprg3 = vcpu-arch.sprg3;
+   regs-sprg0 = vcpu-arch.shared-sprg0;
+   regs-sprg1 = vcpu-arch.shared-sprg1;
+   regs-sprg2 = vcpu-arch.shared-sprg2;
+   regs-sprg3 = vcpu-arch.shared-sprg3;
regs-sprg5 = vcpu-arch.sprg4;
regs-sprg6 = vcpu-arch.sprg5;
regs-sprg7 = vcpu-arch.sprg6;
@@ -1088,10 +1088,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
kvmppc_set_msr(vcpu, regs-msr);
vcpu-arch.shared-srr0 = regs-srr0;
vcpu-arch.shared-srr1 = regs-srr1;
-   vcpu-arch.sprg0 = regs-sprg0;
-   vcpu-arch.sprg1 = regs-sprg1;
-   vcpu-arch.sprg2 = regs-sprg2;
-   vcpu-arch.sprg3 = regs-sprg3;
+   vcpu-arch.shared-sprg0 = regs-sprg0;
+   vcpu-arch.shared-sprg1 = regs-sprg1;
+   vcpu-arch.shared-sprg2 = regs-sprg2;
+   vcpu-arch.shared-sprg3 = regs-sprg3;
vcpu-arch.sprg5 = regs-sprg4;
vcpu-arch.sprg6 = regs-sprg5;
vcpu-arch.sprg7 = regs-sprg6;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 8b546fe..984c461 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -495,10 +495,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
regs-srr0 = vcpu-arch.shared-srr0;
regs-srr1 = vcpu-arch.shared-srr1;
regs-pid = vcpu-arch.pid;
-   regs-sprg0 = vcpu-arch.sprg0;
-   regs-sprg1 = vcpu-arch.sprg1;
-   regs-sprg2 = vcpu-arch.sprg2;
-   regs-sprg3 = vcpu-arch.sprg3;
+   regs-sprg0 = vcpu-arch.shared-sprg0;
+   regs-sprg1 = vcpu-arch.shared-sprg1;
+   regs-sprg2 = vcpu-arch.shared-sprg2;
+   regs-sprg3 = vcpu-arch.shared-sprg3;
regs-sprg5 = vcpu-arch.sprg4;
regs-sprg6 = vcpu-arch.sprg5;
regs-sprg7 = vcpu-arch.sprg6;
@@ -521,10 +521,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
kvmppc_set_msr(vcpu, regs-msr);
vcpu-arch.shared-srr0 = regs-srr0;
vcpu-arch.shared-srr1 = regs-srr1;
-   vcpu-arch.sprg0 = regs-sprg0;
-   vcpu-arch.sprg1 = regs-sprg1;
-   vcpu-arch.sprg2 = regs-sprg2;
-   vcpu-arch.sprg3 = regs-sprg3;
+   vcpu-arch.shared-sprg0 = regs-sprg0;
+   vcpu-arch.shared-sprg1 = regs-sprg1;
+   vcpu-arch.shared-sprg2 = regs-sprg2;
+   vcpu-arch.shared-sprg3 = regs-sprg3;
vcpu-arch.sprg5 = regs-sprg4;
vcpu-arch.sprg6 = regs-sprg5;
vcpu-arch.sprg7 = regs-sprg6;
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index ad0fa4f..454869b 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -263,13 +263,17 @@ int kvmppc_emulate_instruction(struct kvm_run *run, 
struct kvm_vcpu *vcpu)
kvmppc_set_gpr(vcpu, rt, get_tb()); break;
 
case SPRN_SPRG0:
-   kvmppc_set_gpr(vcpu, rt,

[PATCH 01/26] KVM: PPC: Introduce shared page

2010-06-25 Thread Alexander Graf

For transparent variable sharing between the hypervisor and guest, I introduce
a shared page. This shared page will contain all the registers the guest can
read and write safely without exiting guest context.

This patch only implements the stubs required for the basic structure of the
shared page. The actual register moving follows.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h |2 ++
 arch/powerpc/include/asm/kvm_para.h |5 +
 arch/powerpc/kernel/asm-offsets.c   |1 +
 arch/powerpc/kvm/44x.c  |7 +++
 arch/powerpc/kvm/book3s.c   |7 +++
 arch/powerpc/kvm/e500.c |7 +++
 6 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 895eb63..bca9391 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include linux/interrupt.h
 #include linux/types.h
 #include linux/kvm_types.h
+#include linux/kvm_para.h
 #include asm/kvm_asm.h
 
 #define KVM_MAX_VCPUS 1
@@ -289,6 +290,7 @@ struct kvm_vcpu_arch {
struct tasklet_struct tasklet;
u64 dec_jiffies;
unsigned long pending_exceptions;
+   struct kvm_vcpu_arch_shared *shared;
 
 #ifdef CONFIG_PPC_BOOK3S
struct kmem_cache *hpte_cache;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 2d48f6a..1485ba8 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -20,6 +20,11 @@
 #ifndef __POWERPC_KVM_PARA_H__
 #define __POWERPC_KVM_PARA_H__
 
+#include linux/types.h
+
+struct kvm_vcpu_arch_shared {
+};
+
 #ifdef __KERNEL__
 
 static inline int kvm_para_available(void)
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 496cc5b..944f593 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -400,6 +400,7 @@ int main(void)
DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6));
DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7));
DEFINE(VCPU_SHADOW_PID, offsetof(struct kvm_vcpu, arch.shadow_pid));
+   DEFINE(VCPU_SHARED, offsetof(struct kvm_vcpu, arch.shared));
 
/* book3s */
 #ifdef CONFIG_PPC_BOOK3S
diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c
index 73c0a3f..e7b1f3f 100644
--- a/arch/powerpc/kvm/44x.c
+++ b/arch/powerpc/kvm/44x.c
@@ -123,8 +123,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, 
unsigned int id)
if (err)
goto free_vcpu;
 
+   vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO);
+   if (!vcpu-arch.shared)
+   goto uninit_vcpu;
+
return vcpu;
 
+uninit_vcpu:
+   kvm_vcpu_uninit(vcpu);
 free_vcpu:
kmem_cache_free(kvm_vcpu_cache, vcpu_44x);
 out:
@@ -135,6 +141,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
 
+   free_page((unsigned long)vcpu-arch.shared);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu_44x);
 }
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 884d4a5..ba79b35 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1247,6 +1247,10 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm 
*kvm, unsigned int id)
if (err)
goto free_shadow_vcpu;
 
+   vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO);
+   if (!vcpu-arch.shared)
+   goto uninit_vcpu;
+
vcpu-arch.host_retip = kvm_return_point;
vcpu-arch.host_msr = mfmsr();
 #ifdef CONFIG_PPC_BOOK3S_64
@@ -1277,6 +1281,8 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, 
unsigned int id)
 
return vcpu;
 
+uninit_vcpu:
+   kvm_vcpu_uninit(vcpu);
 free_shadow_vcpu:
kfree(vcpu_book3s-shadow_vcpu);
 free_vcpu:
@@ -1289,6 +1295,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
 
+   free_page((unsigned long)vcpu-arch.shared);
kvm_vcpu_uninit(vcpu);
kfree(vcpu_book3s-shadow_vcpu);
vfree(vcpu_book3s);
diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
index e8a00b0..71750f2 100644
--- a/arch/powerpc/kvm/e500.c
+++ b/arch/powerpc/kvm/e500.c
@@ -117,8 +117,14 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, 
unsigned int id)
if (err)
goto uninit_vcpu;
 
+   vcpu-arch.shared = (void*)__get_free_page(GFP_KERNEL|__GFP_ZERO);
+   if (!vcpu-arch.shared)
+   goto uninit_tlb;
+
return vcpu;
 
+uninit_tlb:
+   kvmppc_e500_tlb_uninit(vcpu_e500);
 uninit_vcpu:
kvm_vcpu_uninit(vcpu);
 free_vcpu:
@@ -131,6 +137,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
 {
struct kvmppc_vcpu_e500 *vcpu_e500 =

[PATCH 09/26] KVM: PPC: Add PV guest scratch registers

2010-06-25 Thread Alexander Graf

While running in hooked code we need to store register contents out because
we must not clobber any registers.

So let's add some fields to the shared page we can just happily write to.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_para.h |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index d1fe9ae..edf8f83 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,9 @@
 #include linux/types.h
 
 struct kvm_vcpu_arch_shared {
+   __u64 scratch1;
+   __u64 scratch2;
+   __u64 scratch3;
__u64 critical; /* Guest may not get interrupts if == r1 */
__u64 sprg0;
__u64 sprg1;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 26/26] KVM: PPC: Add Documentation about PV interface

2010-06-25 Thread Alexander Graf

We just introduced a new PV interface that screams for documentation. So here
it is - a shiny new and awesome text file describing the internal works of
the PPC KVM paravirtual interface.

Signed-off-by: Alexander Graf ag...@suse.de
---
 Documentation/kvm/ppc-pv.txt |  164 ++
 1 files changed, 164 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/kvm/ppc-pv.txt

diff --git a/Documentation/kvm/ppc-pv.txt b/Documentation/kvm/ppc-pv.txt
new file mode 100644
index 000..7cbcd51
--- /dev/null
+++ b/Documentation/kvm/ppc-pv.txt
@@ -0,0 +1,164 @@
+The PPC KVM paravirtual interface
+=
+
+The basic execution principle by which KVM on PowerPC works is to run all 
kernel
+space code in PR=1 which is user space. This way we trap all privileged
+instructions and can emulate them accordingly.
+
+Unfortunately that is also the downfall. There are quite some privileged
+instructions that needlessly return us to the hypervisor even though they
+could be handled differently.
+
+This is what the PPC PV interface helps with. It takes privileged instructions
+and transforms them into unprivileged ones with some help from the hypervisor.
+This cuts down virtualization costs by about 50% on some of my benchmarks.
+
+The code for that interface can be found in arch/powerpc/kernel/kvm*
+
+Querying for existence
+==
+
+To find out if we're running on KVM or not, we overlay the PVR register. 
Usually
+the PVR register contains an id that identifies your CPU type. If, however, you
+pass KVM_PVR_PARA in the register that you want the PVR result in, the register
+still contains KVM_PVR_PARA after the mfpvr call.
+
+   LOAD_REG_IMM(r5, KVM_PVR_PARA)
+   mfpvr   r5
+   [r5 still contains KVM_PVR_PARA]
+
+Once determined to run under a PV capable KVM, you can now use hypercalls as
+described below.
+
+PPC hypercalls
+==
+
+The only viable ways to reliably get from guest context to host context are:
+
+   1) Call an invalid instruction
+   2) Call the sc instruction with a parameter to sc
+   3) Call the sc instruction with parameters in GPRs
+
+Method 1 is always a bad idea. Invalid instructions can be replaced later on
+by valid instructions, rendering the interface broken.
+
+Method 2 also has downfalls. If the parameter to sc is != 0 the spec is
+rather unclear if the sc is targeted directly for the hypervisor or the
+supervisor. It would also require that we read the syscall issuing instruction
+every time a syscall is issued, slowing down guest syscalls.
+
+Method 3 is what KVM uses. We pass magic constants (KVM_SC_MAGIC_R3 and
+KVM_SC_MAGIC_R4) in r3 and r4 respectively. If a syscall instruction with these
+magic values arrives from the guest's kernel mode, we take the syscall as a
+hypercall.
+
+The parameters are as follows:
+
+   r3  KVM_SC_MAGIC_R3
+   r4  KVM_SC_MAGIC_R4
+   r5  Hypercall number
+   r6  First parameter
+   r7  Second parameter
+   r8  Third parameter
+   r9  Fourth parameter
+
+Hypercall definitions are shared in generic code, so the same hypercall numbers
+apply for x86 and powerpc alike.
+
+The magic page
+==
+
+To enable communication between the hypervisor and guest there is a new shared
+page that contains parts of supervisor visible register state. The guest can
+map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
+
+With this hypercall issued the guest always gets the magic page mapped at the
+desired location in effective and physical address space. For now, we always
+map the page to -4096. This way we can access it using absolute load and store
+functions. The following instruction reads the first field of the magic page:
+
+   ld  rX, -4096(0)
+
+The interface is designed to be extensible should there be need later to add
+additional registers to the magic page. If you add fields to the magic page,
+also define a new hypercall feature to indicate that the host can give you more
+registers. Only if the host supports the additional features, make use of them.
+
+The magic page has the following layout as described in
+arch/powerpc/include/asm/kvm_para.h:
+
+struct kvm_vcpu_arch_shared {
+   __u64 scratch1;
+   __u64 scratch2;
+   __u64 scratch3;
+   __u64 critical; /* Guest may not get interrupts if == r1 */
+   __u64 sprg0;
+   __u64 sprg1;
+   __u64 sprg2;
+   __u64 sprg3;
+   __u64 srr0;
+   __u64 srr1;
+   __u64 dar;
+   __u64 msr;
+   __u32 dsisr;
+   __u32 int_pending;  /* Tells the guest if we have an interrupt */
+};
+
+Additions to the page must only occur at the end. Struct fields are always 32
+bit aligned.
+
+Patched instructions
+
+
+The ld and std instructions are transormed to lwz and stw instructions

[PATCH 14/26] KVM: PPC: Magic Page BookE support

2010-06-25 Thread Alexander Graf

As we now have Book3s support for the magic page, we also need BookE to
join in on the party.

This patch implements generic magic page logic for BookE and specific
TLB logic for e500. I didn't have any 440 around, so I didn't dare to
blindly try and write up broken code.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/booke.c|   29 +
 arch/powerpc/kvm/e500_tlb.c |   19 +--
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 2229df9..7957aa4 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -241,6 +241,31 @@ void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
vcpu-arch.shared-int_pending = 0;
 }
 
+/* Check if a DTLB miss was on the magic page. Returns !0 if so. */
+int kvmppc_dtlb_magic_page(struct kvm_vcpu *vcpu, ulong eaddr)
+{
+   ulong mp_ea = vcpu-arch.magic_page_ea;
+   ulong gpaddr = vcpu-arch.magic_page_pa;
+   int gtlb_index = 11 | (1  16); /* Random number in TLB1 */
+
+   /* Check for existence of magic page */
+   if(likely(!mp_ea))
+   return 0;
+
+   /* Check if we're on the magic page */
+   if(likely((eaddr  12) != (mp_ea  12)))
+   return 0;
+
+   /* Don't map in user mode */
+   if(vcpu-arch.shared-msr  MSR_PR)
+   return 0;
+
+   kvmppc_mmu_map(vcpu, vcpu-arch.magic_page_ea, gpaddr, gtlb_index);
+   kvmppc_account_exit(vcpu, DTLB_VIRT_MISS_EXITS);
+
+   return 1;
+}
+
 /**
  * kvmppc_handle_exit
  *
@@ -308,6 +333,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
r = RESUME_HOST;
break;
case EMULATE_FAIL:
+   case EMULATE_DO_MMIO:
/* XXX Deliver Program interrupt to guest. */
printk(KERN_CRIT %s: emulation at %lx failed (%08x)\n,
   __func__, vcpu-arch.pc, vcpu-arch.last_inst);
@@ -377,6 +403,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
gpa_t gpaddr;
gfn_t gfn;
 
+   if (kvmppc_dtlb_magic_page(vcpu, eaddr))
+   break;
+
/* Check the guest TLB. */
gtlb_index = kvmppc_mmu_dtlb_index(vcpu, eaddr);
if (gtlb_index  0) {
diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index 66845a5..f5582ca 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -295,9 +295,22 @@ static inline void kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
struct page *new_page;
struct tlbe *stlbe;
hpa_t hpaddr;
+   u32 mas2 = gtlbe-mas2;
+   u32 mas3 = gtlbe-mas3;
 
stlbe = vcpu_e500-shadow_tlb[tlbsel][esel];
 
+   if ((vcpu_e500-vcpu.arch.magic_page_ea) 
+   ((vcpu_e500-vcpu.arch.magic_page_pa  PAGE_SHIFT) == gfn) 
+   !(vcpu_e500-vcpu.arch.shared-msr  MSR_PR)) {
+   mas2 = 0;
+   mas3 = E500_TLB_SUPER_PERM_MASK;
+   hpaddr = virt_to_phys(vcpu_e500-vcpu.arch.shared);
+   new_page = pfn_to_page(hpaddr  PAGE_SHIFT);
+   get_page(new_page);
+   goto mapped;
+   }
+
/* Get reference to new page. */
new_page = gfn_to_page(vcpu_e500-vcpu.kvm, gfn);
if (is_error_page(new_page)) {
@@ -305,6 +318,8 @@ static inline void kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
kvm_release_page_clean(new_page);
return;
}
+
+mapped:
hpaddr = page_to_phys(new_page);
 
/* Drop reference to old page. */
@@ -316,10 +331,10 @@ static inline void kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
stlbe-mas1 = MAS1_TSIZE(BOOK3E_PAGESZ_4K)
| MAS1_TID(get_tlb_tid(gtlbe)) | MAS1_TS | MAS1_VALID;
stlbe-mas2 = (gvaddr  MAS2_EPN)
-   | e500_shadow_mas2_attrib(gtlbe-mas2,
+   | e500_shadow_mas2_attrib(mas2,
vcpu_e500-vcpu.arch.shared-msr  MSR_PR);
stlbe-mas3 = (hpaddr  MAS3_RPN)
-   | e500_shadow_mas3_attrib(gtlbe-mas3,
+   | e500_shadow_mas3_attrib(mas3,
vcpu_e500-vcpu.arch.shared-msr  MSR_PR);
stlbe-mas7 = (hpaddr  32)  MAS7_RPN;
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 23/26] KVM: PPC: PV mtmsrd L=1

2010-06-25 Thread Alexander Graf

The PowerPC ISA has a special instruction for mtmsr that only changes the EE
and RI bits, namely the L=1 form.

Since that one is reasonably often occuring and simple to implement, let's
go with this first. Writing EE=0 is always just a store. Doing EE=1 also
requires us to check for pending interrupts and if necessary exit back to the
hypervisor.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm.c  |   45 
 arch/powerpc/kernel/kvm_emul.S |   56 
 2 files changed, 101 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 7e8fe6f..71153d0 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -62,6 +62,7 @@
 #define KVM_INST_MTSPR_DSISR   0x7c1203a6
 
 #define KVM_INST_TLBSYNC   0x7c00046c
+#define KVM_INST_MTMSRD_L1 0x7c010164
 
 static bool kvm_patching_worked = true;
 static char kvm_tmp[1024 * 1024];
@@ -117,6 +118,43 @@ static u32 *kvm_alloc(int len)
return p;
 }
 
+extern u32 kvm_emulate_mtmsrd_branch_offs;
+extern u32 kvm_emulate_mtmsrd_reg_offs;
+extern u32 kvm_emulate_mtmsrd_len;
+extern u32 kvm_emulate_mtmsrd[];
+
+static void kvm_patch_ins_mtmsrd(u32 *inst, u32 rt)
+{
+   u32 *p;
+   int distance_start;
+   int distance_end;
+   ulong next_inst;
+
+   p = kvm_alloc(kvm_emulate_mtmsrd_len * 4);
+   if (!p)
+   return;
+
+   /* Find out where we are and put everything there */
+   distance_start = (ulong)p - (ulong)inst;
+   next_inst = ((ulong)inst + 4);
+   distance_end = next_inst - (ulong)p[kvm_emulate_mtmsrd_branch_offs];
+
+   /* Make sure we only write valid b instructions */
+   if (distance_start  KVM_INST_B_MAX) {
+   kvm_patching_worked = false;
+   return;
+   }
+
+   /* Modify the chunk to fit the invocation */
+   memcpy(p, kvm_emulate_mtmsrd, kvm_emulate_mtmsrd_len * 4);
+   p[kvm_emulate_mtmsrd_branch_offs] |= distance_end  KVM_INST_B_MASK;
+   p[kvm_emulate_mtmsrd_reg_offs] |= rt;
+   flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsrd_len * 4);
+
+   /* Patch the invocation */
+   *inst = KVM_INST_B | (distance_start  KVM_INST_B_MASK);
+}
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -190,6 +228,13 @@ static void kvm_check_ins(u32 *inst)
case KVM_INST_TLBSYNC:
kvm_patch_ins_nop(inst);
break;
+
+   /* Rewrites */
+   case KVM_INST_MTMSRD_L1:
+   /* We use r30 and r31 during the hook */
+   if (get_rt(inst_rt)  30)
+   kvm_patch_ins_mtmsrd(inst, inst_rt);
+   break;
}
 
switch (_inst) {
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index 7da835a..25e6683 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -54,3 +54,59 @@
/* Disable critical section. We are critical if \
   shared-critical == r1 and r2 is always != r1 */ \
STL64(r2, KVM_MAGIC_PAGE + KVM_MAGIC_CRITICAL, 0);
+
+.global kvm_emulate_mtmsrd
+kvm_emulate_mtmsrd:
+
+   SCRATCH_SAVE
+
+   /* Put MSR  ~(MSR_EE|MSR_RI) in r31 */
+   LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+   lis r30, (~(MSR_EE | MSR_RI))@h
+   ori r30, r30, (~(MSR_EE | MSR_RI))@l
+   and r31, r31, r30
+
+   /* OR the register's (MSR_EE|MSR_RI) on MSR */
+kvm_emulate_mtmsrd_reg:
+   andi.   r30, r0, (MSR_EE|MSR_RI)
+   or  r31, r31, r30
+
+   /* Put MSR back into magic page */
+   STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+   /* Check if we have to fetch an interrupt */
+   lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
+   cmpwi   r31, 0
+   beq+no_check
+
+   /* Check if we may trigger an interrupt */
+   andi.   r30, r30, MSR_EE
+   beq no_check
+
+   SCRATCH_RESTORE
+
+   /* Nag hypervisor */
+   tlbsync
+
+   b   kvm_emulate_mtmsrd_branch
+
+no_check:
+
+   SCRATCH_RESTORE
+
+   /* Go back to caller */
+kvm_emulate_mtmsrd_branch:
+   b   .
+kvm_emulate_mtmsrd_end:
+
+.global kvm_emulate_mtmsrd_branch_offs
+kvm_emulate_mtmsrd_branch_offs:
+   .long (kvm_emulate_mtmsrd_branch - kvm_emulate_mtmsrd) / 4
+
+.global kvm_emulate_mtmsrd_reg_offs
+kvm_emulate_mtmsrd_reg_offs:
+   .long (kvm_emulate_mtmsrd_reg - kvm_emulate_mtmsrd) / 4
+
+.global kvm_emulate_mtmsrd_len
+kvm_emulate_mtmsrd_len:
+   .long (kvm_emulate_mtmsrd_end - kvm_emulate_mtmsrd) / 4
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 19/26] KVM: PPC: PV instructions to loads and stores

2010-06-25 Thread Alexander Graf

Some instructions can simply be replaced by load and store instructions to
or from the magic page.

This patch replaces often called instructions that fall into the above category.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm.c |  111 +
 1 files changed, 111 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index d873bc6..b165b20 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -32,8 +32,65 @@
 #define KVM_MAGIC_PAGE (-4096L)
 #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
 
+#define KVM_INST_LWZ   0x8000
+#define KVM_INST_STW   0x9000
+#define KVM_INST_LD0xe800
+#define KVM_INST_STD   0xf800
+#define KVM_INST_NOP   0x6000
+#define KVM_INST_B 0x4800
+#define KVM_INST_B_MASK0x03ff
+#define KVM_INST_B_MAX 0x01ff
+
+#define KVM_MASK_RT0x03e0
+#define KVM_INST_MFMSR 0x7ca6
+#define KVM_INST_MFSPR_SPRG0   0x7c1042a6
+#define KVM_INST_MFSPR_SPRG1   0x7c1142a6
+#define KVM_INST_MFSPR_SPRG2   0x7c1242a6
+#define KVM_INST_MFSPR_SPRG3   0x7c1342a6
+#define KVM_INST_MFSPR_SRR00x7c1a02a6
+#define KVM_INST_MFSPR_SRR10x7c1b02a6
+#define KVM_INST_MFSPR_DAR 0x7c1302a6
+#define KVM_INST_MFSPR_DSISR   0x7c1202a6
+
+#define KVM_INST_MTSPR_SPRG0   0x7c1043a6
+#define KVM_INST_MTSPR_SPRG1   0x7c1143a6
+#define KVM_INST_MTSPR_SPRG2   0x7c1243a6
+#define KVM_INST_MTSPR_SPRG3   0x7c1343a6
+#define KVM_INST_MTSPR_SRR00x7c1a03a6
+#define KVM_INST_MTSPR_SRR10x7c1b03a6
+#define KVM_INST_MTSPR_DAR 0x7c1303a6
+#define KVM_INST_MTSPR_DSISR   0x7c1203a6
+
 static bool kvm_patching_worked = true;
 
+static void kvm_patch_ins_ld(u32 *inst, long addr, u32 rt)
+{
+#ifdef CONFIG_64BIT
+   *inst = KVM_INST_LD | rt | (addr  0xfffc);
+#else
+   *inst = KVM_INST_LWZ | rt | ((addr + 4)  0xfffc);
+#endif
+}
+
+static void kvm_patch_ins_lwz(u32 *inst, long addr, u32 rt)
+{
+   *inst = KVM_INST_LWZ | rt | (addr  0x);
+}
+
+static void kvm_patch_ins_std(u32 *inst, long addr, u32 rt)
+{
+#ifdef CONFIG_64BIT
+   *inst = KVM_INST_STD | rt | (addr  0xfffc);
+#else
+   *inst = KVM_INST_STW | rt | ((addr + 4)  0xfffc);
+#endif
+}
+
+static void kvm_patch_ins_stw(u32 *inst, long addr, u32 rt)
+{
+   *inst = KVM_INST_STW | rt | (addr  0xfffc);
+}
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -48,6 +105,60 @@ static void kvm_check_ins(u32 *inst)
u32 inst_rt = _inst  KVM_MASK_RT;
 
switch (inst_no_rt) {
+   /* Loads */
+   case KVM_INST_MFMSR:
+   kvm_patch_ins_ld(inst, magic_var(msr), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SPRG0:
+   kvm_patch_ins_ld(inst, magic_var(sprg0), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SPRG1:
+   kvm_patch_ins_ld(inst, magic_var(sprg1), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SPRG2:
+   kvm_patch_ins_ld(inst, magic_var(sprg2), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SPRG3:
+   kvm_patch_ins_ld(inst, magic_var(sprg3), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SRR0:
+   kvm_patch_ins_ld(inst, magic_var(srr0), inst_rt);
+   break;
+   case KVM_INST_MFSPR_SRR1:
+   kvm_patch_ins_ld(inst, magic_var(srr1), inst_rt);
+   break;
+   case KVM_INST_MFSPR_DAR:
+   kvm_patch_ins_ld(inst, magic_var(dar), inst_rt);
+   break;
+   case KVM_INST_MFSPR_DSISR:
+   kvm_patch_ins_lwz(inst, magic_var(dsisr), inst_rt);
+   break;
+
+   /* Stores */
+   case KVM_INST_MTSPR_SPRG0:
+   kvm_patch_ins_std(inst, magic_var(sprg0), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SPRG1:
+   kvm_patch_ins_std(inst, magic_var(sprg1), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SPRG2:
+   kvm_patch_ins_std(inst, magic_var(sprg2), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SPRG3:
+   kvm_patch_ins_std(inst, magic_var(sprg3), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SRR0:
+   kvm_patch_ins_std(inst, magic_var(srr0), inst_rt);
+   break;
+   case KVM_INST_MTSPR_SRR1:
+   kvm_patch_ins_std(inst, magic_var(srr1), inst_rt);
+   break;
+   case KVM_INST_MTSPR_DAR:
+   kvm_patch_ins_std(inst, magic_var(dar), inst_rt);
+   break;
+   case KVM_INST_MTSPR_DSISR:
+   kvm_patch_ins_stw(inst, magic_var(dsisr), inst_rt);
+   break;
}
 
switch (_inst) {
-- 
1.6.0.2

--
To unsubscribe from this

[PATCH 17/26] KVM: PPC: Generic KVM PV guest support

2010-06-25 Thread Alexander Graf

We have all the hypervisor pieces in place now, but the guest parts are still
missing.

This patch implements basic awareness of KVM when running Linux as guest. It
doesn't do anything with it yet though.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/Makefile  |2 ++
 arch/powerpc/kernel/asm-offsets.c |   15 +++
 arch/powerpc/kernel/kvm.c |   34 ++
 arch/powerpc/kernel/kvm_emul.S|   27 +++
 arch/powerpc/platforms/Kconfig|   10 ++
 5 files changed, 88 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/kernel/kvm.c
 create mode 100644 arch/powerpc/kernel/kvm_emul.S

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 58d0572..2d7eb9e 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -125,6 +125,8 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),)
 obj-y  += ppc_save_regs.o
 endif
 
+obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o
+
 # Disable GCOV in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
 GCOV_PROFILE_ftrace.o := n
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index a55d47e..e3e740b 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -465,6 +465,21 @@ int main(void)
DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
 #endif /* CONFIG_PPC_BOOK3S */
 #endif
+
+#ifdef CONFIG_KVM_GUEST
+   DEFINE(KVM_MAGIC_SCRATCH1, offsetof(struct kvm_vcpu_arch_shared,
+   scratch1));
+   DEFINE(KVM_MAGIC_SCRATCH2, offsetof(struct kvm_vcpu_arch_shared,
+   scratch2));
+   DEFINE(KVM_MAGIC_SCRATCH3, offsetof(struct kvm_vcpu_arch_shared,
+   scratch3));
+   DEFINE(KVM_MAGIC_INT, offsetof(struct kvm_vcpu_arch_shared,
+  int_pending));
+   DEFINE(KVM_MAGIC_MSR, offsetof(struct kvm_vcpu_arch_shared, msr));
+   DEFINE(KVM_MAGIC_CRITICAL, offsetof(struct kvm_vcpu_arch_shared,
+   critical));
+#endif
+
 #ifdef CONFIG_44x
DEFINE(PGD_T_LOG2, PGD_T_LOG2);
DEFINE(PTE_T_LOG2, PTE_T_LOG2);
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
new file mode 100644
index 000..2d8dd73
--- /dev/null
+++ b/arch/powerpc/kernel/kvm.c
@@ -0,0 +1,34 @@
+/*
+ * Copyright (C) 2010 SUSE Linux Products GmbH. All rights reserved.
+ *
+ * Authors:
+ * Alexander Graf ag...@suse.de
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#include linux/kvm_host.h
+#include linux/init.h
+#include linux/kvm_para.h
+#include linux/slab.h
+
+#include asm/reg.h
+#include asm/kvm_ppc.h
+#include asm/sections.h
+#include asm/cacheflush.h
+#include asm/disassemble.h
+
+#define KVM_MAGIC_PAGE (-4096L)
+#define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
+
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
new file mode 100644
index 000..c7b9fc9
--- /dev/null
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -0,0 +1,27 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright SUSE Linux Products GmbH 2010
+ *
+ * Authors: Alexander Graf ag...@suse.de
+ */
+
+#include asm/ppc_asm.h
+#include asm/kvm_asm.h
+#include asm/reg.h
+#include asm/page.h
+#include asm/asm-offsets.h
+
+#define KVM_MAGIC_PAGE (-4096)
+
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index d1663db..1744349 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -21,6 +21,16 @@ source

[PATCH 13/26] KVM: PPC: Magic Page Book3s support

2010-06-25 Thread Alexander Graf

We need to override EA as well as PA lookups for the magic page. When the guest
tells us to project it, the magic page overrides any guest mappings.

In order to reflect that, we need to hook into all the MMU layers of KVM to
force map the magic page if necessary.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s.c |7 +++
 arch/powerpc/kvm/book3s_32_mmu.c  |   16 
 arch/powerpc/kvm/book3s_32_mmu_host.c |   12 
 arch/powerpc/kvm/book3s_64_mmu.c  |   30 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c |   12 
 5 files changed, 76 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 2f55aa5..6ce7fa1 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -551,6 +551,13 @@ mmio:
 
 static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 {
+   ulong mp_pa = vcpu-arch.magic_page_pa;
+
+   if (unlikely(mp_pa) 
+   unlikely((mp_pa  KVM_RMO)  PAGE_SHIFT == gfn)) {
+   return 1;
+   }
+
return kvm_is_visible_gfn(vcpu-kvm, gfn);
 }
 
diff --git a/arch/powerpc/kvm/book3s_32_mmu.c b/arch/powerpc/kvm/book3s_32_mmu.c
index 41130c8..d2bd1a6 100644
--- a/arch/powerpc/kvm/book3s_32_mmu.c
+++ b/arch/powerpc/kvm/book3s_32_mmu.c
@@ -281,8 +281,24 @@ static int kvmppc_mmu_book3s_32_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
  struct kvmppc_pte *pte, bool data)
 {
int r;
+   ulong mp_ea = vcpu-arch.magic_page_ea;
 
pte-eaddr = eaddr;
+
+   /* Magic page override */
+   if (unlikely(mp_ea) 
+   unlikely((eaddr  ~0xfffULL) == (mp_ea  ~0xfffULL)) 
+   !(vcpu-arch.shared-msr  MSR_PR)) {
+   pte-vpage = kvmppc_mmu_book3s_32_ea_to_vp(vcpu, eaddr, data);
+   pte-raddr = vcpu-arch.magic_page_pa | (pte-raddr  0xfff);
+   pte-raddr = KVM_RMO;
+   pte-may_execute = true;
+   pte-may_read = true;
+   pte-may_write = true;
+
+   return 0;
+   }
+
r = kvmppc_mmu_book3s_32_xlate_bat(vcpu, eaddr, pte, data);
if (r  0)
   r = kvmppc_mmu_book3s_32_xlate_pte(vcpu, eaddr, pte, data, true);
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 67b8c38..658d3e0 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -145,6 +145,16 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
kvmppc_pte *orig_pte)
bool primary = false;
bool evict = false;
struct hpte_cache *pte;
+   ulong mp_pa = vcpu-arch.magic_page_pa;
+
+   /* Magic page override */
+   if (unlikely(mp_pa) 
+   unlikely((orig_pte-raddr  ~0xfffUL  KVM_RMO) ==
+(mp_pa  ~0xfffUL  KVM_RMO))) {
+   hpaddr = (pfn_t)virt_to_phys(vcpu-arch.shared);
+   get_page(pfn_to_page(hpaddr  PAGE_SHIFT));
+   goto mapped;
+   }
 
/* Get host physical address for gpa */
hpaddr = gfn_to_pfn(vcpu-kvm, orig_pte-raddr  PAGE_SHIFT);
@@ -155,6 +165,8 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
kvmppc_pte *orig_pte)
}
hpaddr = PAGE_SHIFT;
 
+mapped:
+
/* and write the mapping ea - hpa into the pt */
vcpu-arch.mmu.esid_to_vsid(vcpu, orig_pte-eaddr  SID_SHIFT, vsid);
map = find_sid_vsid(vcpu, vsid);
diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 58aa840..4a2e5fc 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -163,6 +163,22 @@ static int kvmppc_mmu_book3s_64_xlate(struct kvm_vcpu 
*vcpu, gva_t eaddr,
bool found = false;
bool perm_err = false;
int second = 0;
+   ulong mp_ea = vcpu-arch.magic_page_ea;
+
+   /* Magic page override */
+   if (unlikely(mp_ea) 
+   unlikely((eaddr  ~0xfffULL) == (mp_ea  ~0xfffULL)) 
+   !(vcpu-arch.shared-msr  MSR_PR)) {
+   gpte-eaddr = eaddr;
+   gpte-vpage = kvmppc_mmu_book3s_64_ea_to_vp(vcpu, eaddr, data);
+   gpte-raddr = vcpu-arch.magic_page_pa | (gpte-raddr  0xfff);
+   gpte-raddr = KVM_RMO;
+   gpte-may_execute = true;
+   gpte-may_read = true;
+   gpte-may_write = true;
+
+   return 0;
+   }
 
slbe = kvmppc_mmu_book3s_64_find_slbe(vcpu_book3s, eaddr);
if (!slbe)
@@ -445,6 +461,7 @@ static int kvmppc_mmu_book3s_64_esid_to_vsid(struct 
kvm_vcpu *vcpu, ulong esid,
ulong ea = esid  SID_SHIFT;
struct kvmppc_slb *slb;
u64 gvsid = esid;
+   ulong mp_ea = vcpu-arch.magic_page_ea;
 
if (vcpu-arch.shared-msr  (MSR_DR|MSR_IR)) {
slb = kvmppc_mmu_book3s_64_find_slbe(to_book3s(vcpu), ea);
@@ -464,7 +481,7 @@ static int

[PATCH 15/26] KVM: PPC: Expose magic page support to guest

2010-06-25 Thread Alexander Graf

Now that we have the shared page in place and the MMU code knows about
the magic page, we can expose that capability to the guest!

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_para.h |2 ++
 arch/powerpc/kvm/powerpc.c  |   11 +++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index c7305d7..9f8efa4 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -43,6 +43,8 @@ struct kvm_vcpu_arch_shared {
 #define KVM_SC_MAGIC_R30x4b564d52 /* KVMR */
 #define KVM_SC_MAGIC_R40x554c455a /* ULEZ */
 
+#define KVM_FEATURE_MAGIC_PAGE 1
+
 #ifdef __KERNEL__
 
 static inline int kvm_para_available(void)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index fe7a1c8..1d28a81 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -60,8 +60,19 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
}
 
switch (nr) {
+   case KVM_HC_PPC_MAP_MAGIC_PAGE:
+   {
+   vcpu-arch.magic_page_pa = param1;
+   vcpu-arch.magic_page_ea = param2;
+
+   r = 0;
+   break;
+   }
case KVM_HC_FEATURES:
r = 0;
+#if !defined(CONFIG_KVM_440) /* XXX missing bits on 440 */
+   r |= (1  KVM_FEATURE_MAGIC_PAGE);
+#endif
break;
default:
r = -KVM_ENOSYS;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/26] KVM: PPC: Implement hypervisor interface

2010-06-25 Thread Alexander Graf

To communicate with KVM directly we need to plumb some sort of interface
between the guest and KVM. Usually those interfaces use hypercalls.

This hypercall implementation is described in the last patch of the series
in a special documentation file. Please read that for further information.

This patch implements stubs to handle KVM PPC hypercalls on the host and
guest side alike.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_para.h |  100 ++-
 arch/powerpc/include/asm/kvm_ppc.h  |1 +
 arch/powerpc/kvm/book3s.c   |   10 +++-
 arch/powerpc/kvm/booke.c|   11 -
 arch/powerpc/kvm/emulate.c  |   11 -
 arch/powerpc/kvm/powerpc.c  |   28 ++
 include/linux/kvm_para.h|1 +
 7 files changed, 156 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index e402999..eaab306 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -34,16 +34,112 @@ struct kvm_vcpu_arch_shared {
__u32 dsisr;
 };
 
+#define KVM_PVR_PARA   0x4b564d3f /* KVM? */
+#define KVM_SC_MAGIC_R30x4b564d52 /* KVMR */
+#define KVM_SC_MAGIC_R40x554c455a /* ULEZ */
+
 #ifdef __KERNEL__
 
 static inline int kvm_para_available(void)
 {
-   return 0;
+   unsigned long pvr = KVM_PVR_PARA;
+
+   asm volatile(mfpvr %0 : =r(pvr) : 0(pvr));
+   return pvr == KVM_PVR_PARA;
+}
+
+static inline long kvm_hypercall0(unsigned int nr)
+{
+   unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3;
+   unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4;
+   unsigned long register _nr asm(r5) = nr;
+
+   asm volatile(sc
+: =r(r3)
+: r(r3), r(r4), r(_nr)
+: memory);
+
+   return r3;
 }
 
+static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
+{
+   unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3;
+   unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4;
+   unsigned long register _nr asm(r5) = nr;
+   unsigned long register _p1 asm(r6) = p1;
+
+   asm volatile(sc
+: =r(r3)
+: r(r3), r(r4), r(_nr), r(_p1)
+: memory);
+
+   return r3;
+}
+
+static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
+ unsigned long p2)
+{
+   unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3;
+   unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4;
+   unsigned long register _nr asm(r5) = nr;
+   unsigned long register _p1 asm(r6) = p1;
+   unsigned long register _p2 asm(r7) = p2;
+
+   asm volatile(sc
+: =r(r3)
+: r(r3), r(r4), r(_nr), r(_p1), r(_p2)
+: memory);
+
+   return r3;
+}
+
+static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3)
+{
+   unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3;
+   unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4;
+   unsigned long register _nr asm(r5) = nr;
+   unsigned long register _p1 asm(r6) = p1;
+   unsigned long register _p2 asm(r7) = p2;
+   unsigned long register _p3 asm(r8) = p3;
+
+   asm volatile(sc
+: =r(r3)
+: r(r3), r(r4), r(_nr), r(_p1), r(_p2), r(_p3)
+: memory);
+
+   return r3;
+}
+
+static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3,
+ unsigned long p4)
+{
+   unsigned long register r3 asm(r3) = KVM_SC_MAGIC_R3;
+   unsigned long register r4 asm(r4) = KVM_SC_MAGIC_R4;
+   unsigned long register _nr asm(r5) = nr;
+   unsigned long register _p1 asm(r6) = p1;
+   unsigned long register _p2 asm(r7) = p2;
+   unsigned long register _p3 asm(r8) = p3;
+   unsigned long register _p4 asm(r9) = p4;
+
+   asm volatile(sc
+: =r(r3)
+: r(r3), r(r4), r(_nr), r(_p1), r(_p2), r(_p3),
+  r(_p4)
+: memory);
+
+   return r3;
+}
+
+
 static inline unsigned int kvm_arch_para_features(void)
 {
-   return 0;
+   if (!kvm_para_available())
+   return 0;
+
+   return kvm_hypercall0(KVM_HC_FEATURES);
 }
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 18d139e..ecb3bc7 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -107,6 +107,7 @@ extern int kvmppc_booke_init(void);
 extern void kvmppc_booke_exit(void);
 
 extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
+extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);

[PATCH 25/26] KVM: PPC: PV wrteei

2010-06-25 Thread Alexander Graf

On BookE the preferred way to write the EE bit is the wrteei instruction. It
already encodes the EE bit in the instruction.

So in order to get BookE some speedups as well, let's also PV'nize thati
instruction.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kernel/kvm.c  |   50 
 arch/powerpc/kernel/kvm_emul.S |   41 
 2 files changed, 91 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 3557bc8..85e2163 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -66,6 +66,9 @@
 #define KVM_INST_MTMSRD_L1 0x7c010164
 #define KVM_INST_MTMSR 0x7c000124
 
+#define KVM_INST_WRTEEI_0  0x7c000146
+#define KVM_INST_WRTEEI_1  0x7c008146
+
 static bool kvm_patching_worked = true;
 static char kvm_tmp[1024 * 1024];
 static int kvm_tmp_index;
@@ -200,6 +203,47 @@ static void kvm_patch_ins_mtmsr(u32 *inst, u32 rt)
*inst = KVM_INST_B | (distance_start  KVM_INST_B_MASK);
 }
 
+#ifdef CONFIG_BOOKE
+
+extern u32 kvm_emulate_wrteei_branch_offs;
+extern u32 kvm_emulate_wrteei_ee_offs;
+extern u32 kvm_emulate_wrteei_len;
+extern u32 kvm_emulate_wrteei[];
+
+static void kvm_patch_ins_wrteei(u32 *inst)
+{
+   u32 *p;
+   int distance_start;
+   int distance_end;
+   ulong next_inst;
+
+   p = kvm_alloc(kvm_emulate_wrteei_len * 4);
+   if (!p)
+   return;
+
+   /* Find out where we are and put everything there */
+   distance_start = (ulong)p - (ulong)inst;
+   next_inst = ((ulong)inst + 4);
+   distance_end = next_inst - (ulong)p[kvm_emulate_wrteei_branch_offs];
+
+   /* Make sure we only write valid b instructions */
+   if (distance_start  KVM_INST_B_MAX) {
+   kvm_patching_worked = false;
+   return;
+   }
+
+   /* Modify the chunk to fit the invocation */
+   memcpy(p, kvm_emulate_wrteei, kvm_emulate_wrteei_len * 4);
+   p[kvm_emulate_wrteei_branch_offs] |= distance_end  KVM_INST_B_MASK;
+   p[kvm_emulate_wrteei_ee_offs] |= (*inst  MSR_EE);
+   flush_icache_range((ulong)p, (ulong)p + kvm_emulate_wrteei_len * 4);
+
+   /* Patch the invocation */
+   *inst = KVM_INST_B | (distance_start  KVM_INST_B_MASK);
+}
+
+#endif
+
 static void kvm_map_magic_page(void *data)
 {
kvm_hypercall2(KVM_HC_PPC_MAP_MAGIC_PAGE,
@@ -289,6 +333,12 @@ static void kvm_check_ins(u32 *inst)
}
 
switch (_inst) {
+#ifdef CONFIG_BOOKE
+   case KVM_INST_WRTEEI_0:
+   case KVM_INST_WRTEEI_1:
+   kvm_patch_ins_wrteei(inst);
+   break;
+#endif
}
 
flush_icache_range((ulong)inst, (ulong)inst + 4);
diff --git a/arch/powerpc/kernel/kvm_emul.S b/arch/powerpc/kernel/kvm_emul.S
index ccf5a42..b79b9de 100644
--- a/arch/powerpc/kernel/kvm_emul.S
+++ b/arch/powerpc/kernel/kvm_emul.S
@@ -194,3 +194,44 @@ kvm_emulate_mtmsr_orig_ins_offs:
 .global kvm_emulate_mtmsr_len
 kvm_emulate_mtmsr_len:
.long (kvm_emulate_mtmsr_end - kvm_emulate_mtmsr) / 4
+
+
+
+.global kvm_emulate_wrteei
+kvm_emulate_wrteei:
+
+   SCRATCH_SAVE
+
+   /* Fetch old MSR in r31 */
+   LL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+   /* Remove MSR_EE from old MSR */
+   li  r30, 0
+   ori r30, r30, MSR_EE
+   andcr31, r31, r30
+
+   /* OR new MSR_EE onto the old MSR */
+kvm_emulate_wrteei_ee:
+   ori r31, r31, 0
+
+   /* Write new MSR value back */
+   STL64(r31, KVM_MAGIC_PAGE + KVM_MAGIC_MSR, 0)
+
+   SCRATCH_RESTORE
+
+   /* Go back to caller */
+kvm_emulate_wrteei_branch:
+   b   .
+kvm_emulate_wrteei_end:
+
+.global kvm_emulate_wrteei_branch_offs
+kvm_emulate_wrteei_branch_offs:
+   .long (kvm_emulate_wrteei_branch - kvm_emulate_wrteei) / 4
+
+.global kvm_emulate_wrteei_ee_offs
+kvm_emulate_wrteei_ee_offs:
+   .long (kvm_emulate_wrteei_ee - kvm_emulate_wrteei) / 4
+
+.global kvm_emulate_wrteei_len
+kvm_emulate_wrteei_len:
+   .long (kvm_emulate_wrteei_end - kvm_emulate_wrteei) / 4
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/26] KVM: PPC: Convert SRR0 and SRR1 to shared page

2010-06-25 Thread Alexander Graf

The SRR0 and SRR1 registers contain cached values of the PC and MSR
respectively. They get written to by the hypervisor when an interrupt
occurs or directly by the kernel. They are also used to tell the rfi(d)
instruction where to jump to.

Because it only gets touched on defined events that, it's very simple to
share with the guest. Hypervisor and guest both have full r/w access.

This patch converts all users of the current field to the shared page.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h |2 --
 arch/powerpc/include/asm/kvm_para.h |2 ++
 arch/powerpc/kvm/book3s.c   |   12 ++--
 arch/powerpc/kvm/book3s_emulate.c   |4 ++--
 arch/powerpc/kvm/booke.c|   15 ---
 arch/powerpc/kvm/booke_emulate.c|4 ++--
 arch/powerpc/kvm/emulate.c  |   12 
 7 files changed, 28 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 108dabc..6bcf62f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -224,8 +224,6 @@ struct kvm_vcpu_arch {
ulong sprg5;
ulong sprg6;
ulong sprg7;
-   ulong srr0;
-   ulong srr1;
ulong csrr0;
ulong csrr1;
ulong dsrr0;
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index ec72a1c..d7fc6c2 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -23,6 +23,8 @@
 #include linux/types.h
 
 struct kvm_vcpu_arch_shared {
+   __u64 srr0;
+   __u64 srr1;
__u64 dar;
__u64 msr;
__u32 dsisr;
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 245bd2d..b144697 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -162,8 +162,8 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
 
 void kvmppc_inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 flags)
 {
-   vcpu-arch.srr0 = kvmppc_get_pc(vcpu);
-   vcpu-arch.srr1 = vcpu-arch.shared-msr | flags;
+   vcpu-arch.shared-srr0 = kvmppc_get_pc(vcpu);
+   vcpu-arch.shared-srr1 = vcpu-arch.shared-msr | flags;
kvmppc_set_pc(vcpu, to_book3s(vcpu)-hior + vec);
vcpu-arch.mmu.reset_msr(vcpu);
 }
@@ -1059,8 +1059,8 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
regs-lr = kvmppc_get_lr(vcpu);
regs-xer = kvmppc_get_xer(vcpu);
regs-msr = vcpu-arch.shared-msr;
-   regs-srr0 = vcpu-arch.srr0;
-   regs-srr1 = vcpu-arch.srr1;
+   regs-srr0 = vcpu-arch.shared-srr0;
+   regs-srr1 = vcpu-arch.shared-srr1;
regs-pid = vcpu-arch.pid;
regs-sprg0 = vcpu-arch.sprg0;
regs-sprg1 = vcpu-arch.sprg1;
@@ -1086,8 +1086,8 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
kvmppc_set_lr(vcpu, regs-lr);
kvmppc_set_xer(vcpu, regs-xer);
kvmppc_set_msr(vcpu, regs-msr);
-   vcpu-arch.srr0 = regs-srr0;
-   vcpu-arch.srr1 = regs-srr1;
+   vcpu-arch.shared-srr0 = regs-srr0;
+   vcpu-arch.shared-srr1 = regs-srr1;
vcpu-arch.sprg0 = regs-sprg0;
vcpu-arch.sprg1 = regs-sprg1;
vcpu-arch.sprg2 = regs-sprg2;
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index c147864..f333cb4 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -73,8 +73,8 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
switch (get_xop(inst)) {
case OP_19_XOP_RFID:
case OP_19_XOP_RFI:
-   kvmppc_set_pc(vcpu, vcpu-arch.srr0);
-   kvmppc_set_msr(vcpu, vcpu-arch.srr1);
+   kvmppc_set_pc(vcpu, vcpu-arch.shared-srr0);
+   kvmppc_set_msr(vcpu, vcpu-arch.shared-srr1);
*advance = 0;
break;
 
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 5844bcf..8b546fe 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -64,7 +64,8 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu)
 
printk(pc:   %08lx msr:  %08llx\n, vcpu-arch.pc, 
vcpu-arch.shared-msr);
printk(lr:   %08lx ctr:  %08lx\n, vcpu-arch.lr, vcpu-arch.ctr);
-   printk(srr0: %08lx srr1: %08lx\n, vcpu-arch.srr0, vcpu-arch.srr1);
+   printk(srr0: %08llx srr1: %08llx\n, vcpu-arch.shared-srr0,
+   vcpu-arch.shared-srr1);
 
printk(exceptions: %08lx\n, vcpu-arch.pending_exceptions);
 
@@ -189,8 +190,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
}
 
if (allowed) {
-   vcpu-arch.srr0 = vcpu-arch.pc;
-   vcpu-arch.srr1 = vcpu-arch.shared-msr;
+   vcpu-arch.shared-srr0 = vcpu-arch.pc;
+

[PATCH 00/26] KVM PPC PV framework

2010-06-25 Thread Alexander Graf

On PPC we run PR=0 (kernel mode) code in PR=1 (user mode) and don't use the
hypervisor extensions.

While that is all great to show that virtualization is possible, there are
quite some cases where the emulation overhead of privileged instructions is
killing performance.

This patchset tackles exactly that issue. It introduces a paravirtual framework
using which KVM and Linux share a page to exchange register state with. That
way we don't have to switch to the hypervisor just to change a value of a
privileged register.

To prove my point, I ran the same test I did for the MMU optimizations against
the PV framework. Here are the results:

[without]

debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello  /dev/null; done

real0m14.659s
user0m8.967s
sys 0m5.688s

[with]

debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello  /dev/null; done

real0m7.557s
user0m4.121s
sys 0m3.426s


So this is a significant performance improvement! I'm quite happy how fast this
whole thing becomes :)

I tried to take all comments I've heard from people so far about such a PV
framework into account. In case you told me something before that is a no-go
and I still did it, please just tell me again.

Now go and have fun with fast VMs on PPC! Get yourself a G5 on ebay and start
experiencing the power yourself. - heh

Alexander Graf (26):
  KVM: PPC: Introduce shared page
  KVM: PPC: Convert MSR to shared page
  KVM: PPC: Convert DSISR to shared page
  KVM: PPC: Convert DAR to shared page.
  KVM: PPC: Convert SRR0 and SRR1 to shared page
  KVM: PPC: Convert SPRG[0-4] to shared page
  KVM: PPC: Implement hypervisor interface
  KVM: PPC: Add PV guest critical sections
  KVM: PPC: Add PV guest scratch registers
  KVM: PPC: Tell guest about pending interrupts
  KVM: PPC: Make RMO a define
  KVM: PPC: First magic page steps
  KVM: PPC: Magic Page Book3s support
  KVM: PPC: Magic Page BookE support
  KVM: PPC: Expose magic page support to guest
  KVM: Move kvm_guest_init out of generic code
  KVM: PPC: Generic KVM PV guest support
  KVM: PPC: KVM PV guest stubs
  KVM: PPC: PV instructions to loads and stores
  KVM: PPC: PV tlbsync to nop
  KVM: PPC: Introduce kvm_tmp framework
  KVM: PPC: PV assembler helpers
  KVM: PPC: PV mtmsrd L=1
  KVM: PPC: PV mtmsrd L=0 and mtmsr
  KVM: PPC: PV wrteei
  KVM: PPC: Add Documentation about PV interface

 Documentation/kvm/ppc-pv.txt |  164 
 arch/powerpc/include/asm/kvm_book3s.h|1 -
 arch/powerpc/include/asm/kvm_host.h  |   14 +-
 arch/powerpc/include/asm/kvm_para.h  |  121 +-
 arch/powerpc/include/asm/kvm_ppc.h   |1 +
 arch/powerpc/kernel/Makefile |2 +
 arch/powerpc/kernel/asm-offsets.c|   18 ++-
 arch/powerpc/kernel/kvm.c|  399 ++
 arch/powerpc/kernel/kvm_emul.S   |  237 ++
 arch/powerpc/kvm/44x.c   |7 +
 arch/powerpc/kvm/44x_tlb.c   |8 +-
 arch/powerpc/kvm/book3s.c|  162 -
 arch/powerpc/kvm/book3s_32_mmu.c |   28 ++-
 arch/powerpc/kvm/book3s_32_mmu_host.c|   16 +-
 arch/powerpc/kvm/book3s_64_mmu.c |   42 +++-
 arch/powerpc/kvm/book3s_64_mmu_host.c|   16 +-
 arch/powerpc/kvm/book3s_emulate.c|   25 +-
 arch/powerpc/kvm/book3s_paired_singles.c |   11 +-
 arch/powerpc/kvm/booke.c |  110 +++--
 arch/powerpc/kvm/booke.h |6 +-
 arch/powerpc/kvm/booke_emulate.c |   14 +-
 arch/powerpc/kvm/booke_interrupts.S  |3 +-
 arch/powerpc/kvm/e500.c  |7 +
 arch/powerpc/kvm/e500_tlb.c  |   31 ++-
 arch/powerpc/kvm/e500_tlb.h  |2 +-
 arch/powerpc/kvm/emulate.c   |   47 +++-
 arch/powerpc/kvm/powerpc.c   |   42 +++-
 arch/powerpc/platforms/Kconfig   |   10 +
 arch/x86/include/asm/kvm_para.h  |6 +
 include/linux/kvm_para.h |7 +-
 30 files changed, 1383 insertions(+), 174 deletions(-)
 create mode 100644 Documentation/kvm/ppc-pv.txt
 create mode 100644 arch/powerpc/kernel/kvm.c
 create mode 100644 arch/powerpc/kernel/kvm_emul.S

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/26] KVM: PPC: First magic page steps

2010-06-25 Thread Alexander Graf

We will be introducing a method to project the shared page in guest context.
As soon as we're talking about this coupling, the shared page is colled magic
page.

This patch introduces simple defines, so the follow-up patches are easier to
read.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h |2 ++
 include/linux/kvm_para.h|1 +
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index e35c1ac..5f8c214 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -285,6 +285,8 @@ struct kvm_vcpu_arch {
u64 dec_jiffies;
unsigned long pending_exceptions;
struct kvm_vcpu_arch_shared *shared;
+   unsigned long magic_page_pa; /* phys addr to map the magic page to */
+   unsigned long magic_page_ea; /* effect. addr to map the magic page to */
 
 #ifdef CONFIG_PPC_BOOK3S
struct kmem_cache *hpte_cache;
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 3b8080e..ac2015a 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -18,6 +18,7 @@
 #define KVM_HC_VAPIC_POLL_IRQ  1
 #define KVM_HC_MMU_OP  2
 #define KVM_HC_FEATURES3
+#define KVM_HC_PPC_MAP_MAGIC_PAGE  4
 
 /*
  * hypercalls use architecture specific
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 105 matches

Mail list logo