RE: [PATCH v15 00/17] Provide a zero-copy method on KVM virtio-net.

2010-11-11 Thread Xin, Xiaohui
-Original Message-
From: David Miller [mailto:da...@davemloft.net]
Sent: Thursday, November 11, 2010 1:47 AM
To: Xin, Xiaohui
Cc: net...@vger.kernel.org; kvm@vger.kernel.org; linux-ker...@vger.kernel.org;
m...@redhat.com; mi...@elte.hu; herb...@gondor.apana.org.au; 
jd...@linux.intel.com
Subject: Re: [PATCH v15 00/17] Provide a zero-copy method on KVM virtio-net.

From: xiaohui@intel.com
Date: Wed, 10 Nov 2010 17:23:28 +0800

 From: Xin Xiaohui xiaohui@intel.com

2) The idea to key off of skb-dev in skb_release_data() is
   fundamentally flawed since many actions can change skb-dev on you,
   which will end up causing a leak of your external data areas.

 How about this one? If the destructor_arg is not a good candidate,
 then I have to add an apparent field in shinfo.

If destructor_arg is actually a net_device pointer or similar,
you will need to take a reference count on it or similar.

Do you mean destructor_arg will be consumed by other user?
If that case, may I add a new structure member in shinfo?
Thus only zero-copy will use it, and no need for the reference count.

Which means -- good bye performance especially on SMP.

You're going to be adding new serialization points and at
least two new atomics per packet.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: buildbot for kvm.git

2010-11-11 Thread Daniel Gollub
On Thursday, November 11, 2010 02:31:06 am Avi Kivity wrote:
 Daniel, the buildbot has been fairly effective in keeping qemu-kvm.git 
 building.  I'd like to extend that to kvm.git, especially for non-x86 
 architectures.
[...]
 Can you help with this?

Sure. I'll look into that next week.

Best Regards,
Daniel


-- 
Daniel Gollub
Linux Consultant  Developer
Mail: gol...@b1-systems.de

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH -v2] Monitor command: pfa2hva, translate guest physical address to host virtual address

2010-11-11 Thread Avi Kivity

On 11/11/2010 02:56 AM, Huang Ying wrote:

On Thu, 2010-11-11 at 00:49 +0800, Anthony Liguori wrote:
  On 11/10/2010 02:34 AM, Avi Kivity wrote:
Why the gpa -   hva mapping is not
consistent for RAM if -mempath is not used?
  
Video RAM in the range a-b and PCI mapped RAM can change gpas
(while remaining in the same hva).
  
Even for ordinary RAM, which doesn't normally change gpa/hva, I'm not
sure we want to guarantee that it won't.

  We can't universally either.  Memory hot remove potentially breaks the
  mapping and some non-x86 architectures (like ARM) can alias RAM via a
  guest triggered mechanism.

Thanks for clarification. Now I think we have two options.

1) Clearly mark gpa2hva (pfa2hva now, should renamed to gpa2hva) as a
testing only interface, and should be used only on restricted
environment (normal memory, without hot-remove, maybe only on x86).

2) Find some way to lock the gpa -  hva mapping during operating. Such
as gpa2hva_begin and gpa2hva_end and lock the mapping in between.

I think 2) may be possible. But if there are no other users, why do that
for a test case? So I think 1) may be the better option.



3) Move the poisoning code into qemu, so the command becomes

   posison-address addr

(though physical addresses aren't stable either)

4) Use -mempath on /dev/shm and poison a page in the backing file

(if poisoning works on tmpfs)

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 08/14] Add get_fw_dev_path callback for pci bus.

2010-11-11 Thread Gerd Hoffmann

On 11/10/10 18:34, Michael S. Tsirkin wrote:

On Wed, Nov 10, 2010 at 07:14:15PM +0200, Gleb Natapov wrote:


Signed-off-by: Gleb Natapovg...@redhat.com


Good stuff. We should also consider using this for
CLI and monitor. Some comments below.


Oh, we already have a table to map pci classes to descriptions for 'info 
pci'.  I'd strongly suggest to just add the fw names to that table 
instead of creating a second one ...


cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 03/14] Keep track of ISA ports ISA device is using in qdev.

2010-11-11 Thread Gerd Hoffmann

  Hi,


  register_ioport_write (s-port, 1, 1, gus_writeb, s);
  register_ioport_write (s-port, 1, 2, gus_writew, s);
+isa_init_ioport_range(dev, s-port, 2);

  register_ioport_read ((s-port + 0x100)  0xf00, 1, 1, gus_readb, s);
  register_ioport_read ((s-port + 0x100)  0xf00, 1, 2, gus_readw, s);
+isa_init_ioport_range(dev, (s-port + 0x100)  0xf00, 2);

  register_ioport_write (s-port + 6, 10, 1, gus_writeb, s);
  register_ioport_write (s-port + 6, 10, 2, gus_writew, s);
  register_ioport_read (s-port + 6, 10, 1, gus_readb, s);
  register_ioport_read (s-port + 6, 10, 2, gus_readw, s);
+isa_init_ioport_range(dev, s-port + 6, 10);


  register_ioport_write (s-port + 0x100, 8, 1, gus_writeb, s);
  register_ioport_write (s-port + 0x100, 8, 2, gus_writew, s);
  register_ioport_read (s-port + 0x100, 8, 1, gus_readb, s);
  register_ioport_read (s-port + 0x100, 8, 2, gus_readw, s);
+isa_init_ioport_range(dev, s-port + 0x100, 8);


Seeing all the duplication here and elsewhere ...

How about moving the register_ioport_{read,write} calls into 
isa_init_ioport_range() ?


cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 08/14] Add get_fw_dev_path callback for pci bus.

2010-11-11 Thread Gleb Natapov
On Thu, Nov 11, 2010 at 11:07:01AM +0100, Gerd Hoffmann wrote:
 On 11/10/10 18:34, Michael S. Tsirkin wrote:
 On Wed, Nov 10, 2010 at 07:14:15PM +0200, Gleb Natapov wrote:
 
 Signed-off-by: Gleb Natapovg...@redhat.com
 
 Good stuff. We should also consider using this for
 CLI and monitor. Some comments below.
 
 Oh, we already have a table to map pci classes to descriptions for
 'info pci'.  I'd strongly suggest to just add the fw names to that
 table instead of creating a second one ...
 
Do you mean pci_class_descriptions? For some classes open firmware spec
defines single name for all subclasses. For instance all 0Axx should be
called dock no matter what xx is. This can't be encoded in a simple
table.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: Fix host userspace gsbase corruption

2010-11-11 Thread Avi Kivity
We now use load_gs_index() to load gs safely; unfortunately this also
changes MSR_KERNEL_GS_BASE, which we managed separately.  This resulted
in confusion and breakage running 32-bit host userspace on a 64-bit kernel.

Fix by
- saving guest MSR_KERNEL_GS_BASE before we we reload the host's gs
- doing the host save/load unconditionally, instead of only when in guest
  long mode

Things can be cleaned up further, but this is the minmal fix for now.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/vmx.c |   15 +++
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9367abc..0badeac 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -821,10 +821,9 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu)
 #endif
 
 #ifdef CONFIG_X86_64
-   if (is_long_mode(vmx-vcpu)) {
-   rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base);
+   rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base);
+   if (is_long_mode(vmx-vcpu))
wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_guest_kernel_gs_base);
-   }
 #endif
for (i = 0; i  vmx-save_nmsrs; ++i)
kvm_set_shared_msr(vmx-guest_msrs[i].index,
@@ -839,11 +838,14 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
 
++vmx-vcpu.stat.host_state_reload;
vmx-host_state.loaded = 0;
+#ifdef CONFIG_X86_64
+   if (is_long_mode(vmx-vcpu))
+   rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_guest_kernel_gs_base);
+#endif
if (vmx-host_state.gs_ldt_reload_needed) {
kvm_load_ldt(vmx-host_state.ldt_sel);
 #ifdef CONFIG_X86_64
load_gs_index(vmx-host_state.gs_sel);
-   wrmsrl(MSR_KERNEL_GS_BASE, current-thread.gs);
 #else
loadsegment(gs, vmx-host_state.gs_sel);
 #endif
@@ -852,10 +854,7 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
loadsegment(fs, vmx-host_state.fs_sel);
reload_tss();
 #ifdef CONFIG_X86_64
-   if (is_long_mode(vmx-vcpu)) {
-   rdmsrl(MSR_KERNEL_GS_BASE, vmx-msr_guest_kernel_gs_base);
-   wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base);
-   }
+   wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base);
 #endif
if (current_thread_info()-status  TS_USEDFPU)
clts();
-- 
1.7.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 08/14] Add get_fw_dev_path callback for pci bus.

2010-11-11 Thread Gerd Hoffmann

Oh, we already have a table to map pci classes to descriptions for
'info pci'.  I'd strongly suggest to just add the fw names to that
table instead of creating a second one ...


Do you mean pci_class_descriptions?


Exactly.


For some classes open firmware spec
defines single name for all subclasses. For instance all 0Axx should be
called dock no matter what xx is. This can't be encoded in a simple
table.


Well, actually it is a list not a table.  Should be easy to extend, like 
adding a mask, so you can have { .class = 0xa000, .mask = 0xff00,

.fw_name = dock, .desc = whatever } to match 0xA0xx ...

cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM test: Add subtest clock_getres

2010-11-11 Thread Lucas Meneghel Rodrigues
In order to track a regression that happens using kvmclock
in guests, add test of the clock_getres() syscall, to see
what is the clocksource resolution reported by the guest.
This, combined with variants that set different -cpu params
for the qemu command line, will give us the desired outcome.

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/deps/test_clock_getres/Makefile   |   11 
 .../kvm/deps/test_clock_getres/test_clock_getres.c |   58 
 client/tests/kvm/tests/clock_getres.py |   44 +++
 client/tests/kvm/tests_base.cfg.sample |3 +
 4 files changed, 116 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/deps/test_clock_getres/Makefile
 create mode 100644 client/tests/kvm/deps/test_clock_getres/test_clock_getres.c
 create mode 100644 client/tests/kvm/tests/clock_getres.py

diff --git a/client/tests/kvm/deps/test_clock_getres/Makefile 
b/client/tests/kvm/deps/test_clock_getres/Makefile
new file mode 100644
index 000..b4f73c7
--- /dev/null
+++ b/client/tests/kvm/deps/test_clock_getres/Makefile
@@ -0,0 +1,11 @@
+CC = gcc
+PROG = test_clock_getres
+SRC = test_clock_getres.c
+LIBS = -lrt
+
+all: $(PROG)
+
+$(PROG):
+   $(CC) $(LIBS) -o $(PROG) $(SRC)
+clean:
+   rm -f $(PROG)
diff --git a/client/tests/kvm/deps/test_clock_getres/test_clock_getres.c 
b/client/tests/kvm/deps/test_clock_getres/test_clock_getres.c
new file mode 100644
index 000..81d3b9c
--- /dev/null
+++ b/client/tests/kvm/deps/test_clock_getres/test_clock_getres.c
@@ -0,0 +1,58 @@
+/*
+ *  Test clock resolution for KVM guests that have kvm-clock as clock source
+ *
+ *  Copyright (c) 2010 Red Hat, Inc
+ *  Author: Lucas Meneghel Rodrigues l...@redhat.com
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see http://www.gnu.org/licenses/.
+ */
+#include stdio.h
+#include time.h
+#include stdlib.h
+#include string.h
+
+int main(void) {
+   struct timespec res;
+   int clock_return = clock_getres(CLOCK_MONOTONIC, res);
+   char clocksource[50];
+   char line[80];
+   FILE *fr;
+   if ((fr = fopen(
+   
/sys/devices/system/clocksource/clocksource0/current_clocksource,
+   rt)) == NULL) {
+   perror(fopen);
+   return EXIT_FAILURE;
+   }
+   while (fgets(line, 80, fr) != NULL) {
+   sscanf(line, %s, clocksource);
+   }
+   fclose(fr);
+   if (!strncmp(clocksource, kvm-clock, strlen(kvm-clock))) {
+   if (clock_return == 0) {
+   if (res.tv_sec  1 || res.tv_nsec  100) {
+   printf(FAIL: clock_getres returned bad clock 
resolution\n);
+   return EXIT_FAILURE;
+   } else {
+   printf(PASS: check successful\n);
+   return EXIT_SUCCESS;
+   }
+   } else {
+   printf(FAIL: clock_getres failed\n);
+   return EXIT_FAILURE;
+   }
+   } else {
+   printf(FAIL: invalid clock source: %s\n, clocksource);
+   return EXIT_FAILURE;
+   }
+}
diff --git a/client/tests/kvm/tests/clock_getres.py 
b/client/tests/kvm/tests/clock_getres.py
new file mode 100644
index 000..7cadae8
--- /dev/null
+++ b/client/tests/kvm/tests/clock_getres.py
@@ -0,0 +1,44 @@
+import logging, time, os
+from autotest_lib.client.common_lib import error
+from autotest_lib.client.common_lib import utils
+import kvm_test_utils, kvm_utils
+
+
+def run_clock_getres(test, params, env):
+
+Verify if guests using kvm-clock as the time source have a sane clock
+resolution.
+
+@param test: kvm test object.
+@param params: Dictionary with test parameters.
+@param env: Dictionary with the test environment.
+
+t_name = test_clock_getres
+base_dir = /tmp
+
+deps_dir = os.path.join(test.bindir, deps, t_name)
+os.chdir(deps_dir)
+try:
+utils.system(make clean)
+utils.system(make)
+except:
+raise error.TestError(Failed to compile %s % t_name)
+
+test_clock = os.path.join(deps_dir, t_name)
+if not os.path.isfile(test_clock):
+raise error.TestError(Could not find %s % t_name)
+
+vm = kvm_test_utils.get_living_vm(env, 

RE: Unable to start VM using COWed image

2010-11-11 Thread Prasad Joshi
 From: Prasad Joshi
 Sent: 10 November 2010 13:01
 To: Stefan Hajnoczi
 Cc: Keqin Hong; kvm@vger.kernel.org
 Subject: RE: Unable to start VM using COWed image

 From: Stefan Hajnoczi [stefa...@gmail.com]
 Sent: 10 November 2010 12:47
 To: Prasad Joshi
 Cc: Keqin Hong; kvm@vger.kernel.org
 Subject: Re: Unable to start VM using COWed image

 On Wed, Nov 10, 2010 at 12:40 PM, Prasad Joshi
p.g.jo...@student.reading.ac.uk wrote:
 From: Stefan Hajnoczi [stefa...@gmail.com]
 Sent: 10 November 2010 11:12
 To: Prasad Joshi
 Cc: Keqin Hong; kvm@vger.kernel.org
 Subject: Re: Unable to start VM using COWed image

 On Wed, Nov 10, 2010 at 10:08 AM, Prasad Joshi
 p.g.jo...@student.reading.ac.uk wrote:
 Where can I get the code of the qemu-kvm program?
 I cloned the qemu-lvm git repository and compiled the code. But it looks 
 like qemu-kvm program is not part of this code.--

 qemu-kvm.git contains the qemu-kvm codebase but the binary is built in
 x86_64-softmmu/qemu-system-x86_64.  Distro packages typically rename
 it to qemu-kvm.

 Thanks Stefan for your reply.

 I guess you pointed out the problem in the first mail. QEMU places a 
 restriction on location of the COWed file. The source image and COWed image 
 should be in the same drectory.

 In my case the source image was in directory /var/lib/libvirt/images/ and the 
 COWed image was in /home/prasad/Virtual directory.
 While debuging the source code using gdb I realized this limitation. It would 
 be good to fix this problem. I will see if I can solve this problem.

 This behavior is a feature.  You chose to use a relative backing file
 path when you used qemu-img create -b relative-path.

 If you want an absolute path you need to use qemu-img create -b
 /home/prasad/Virtual/... (i.e. specify an absolute path instead of a
 relative path).

 Oh I  see. I am such a stupid. ha ha ha ha
 Thanks a lot for letting me know.
 It worked after using absolute paths. Gr8 Thanks a lot.

Though specifying the absolute path for source image worked for me. 
Can any one please let me know the situation in which one would not want to 
specify the absolute path? 
How does relative path help? 
Advantage of using relative path rather than absolute path.

I think using the absolute path would always work.

 One more question on the same lines,
 How does QEMU detect the file is COWed and the name of the file (not whole 
 path) from it is COWed?

 COW support comes from the image file format that you choose.  A qcow
 file is not just a raw image file like the kind you can dd from a real
 disk.  Instead it has its own file format including a header and
 metadata for tracking allocated space.  The header contains the name
 of the backing file.

 Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to start VM using COWed image

2010-11-11 Thread Stefan Hajnoczi
On Thu, Nov 11, 2010 at 12:17 PM, Prasad Joshi
p.g.jo...@student.reading.ac.uk wrote:
 Though specifying the absolute path for source image worked for me.
 Can any one please let me know the situation in which one would not want to 
 specify the absolute path?
 How does relative path help?
 Advantage of using relative path rather than absolute path.

 I think using the absolute path would always work.

Relative paths are useful when sharing images with other people.  An
absolute path won't work on another machine unless you use the same
parent directory structure.  If you send me an image file with an
absolute path in your home directory, I won't be able to use it easily
on my machine.

(Actually the new qemu-img rebase -u command can be used to fix up the
image file on the destination machine but it's an extra step and not
user-friendly.)

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] virtio: Use ioeventfd for virtqueue notify

2010-11-11 Thread Stefan Hajnoczi
This is a rewrite of the virtio-ioeventfd patchset to work at the virtio-pci.c
level instead of virtio.c.  This results in better integration with the
host/guest notifier code and makes the code simpler (no more state machine).

Virtqueue notify is currently handled synchronously in userspace virtio.  This
prevents the vcpu from executing guest code while hardware emulation code
handles the notify.

On systems that support KVM, the ioeventfd mechanism can be used to make
virtqueue notify a lightweight exit by deferring hardware emulation to the
iothread and allowing the VM to continue execution.  This model is similar to
how vhost receives virtqueue notifies.

The result of this change is improved performance for userspace virtio devices.
Virtio-blk throughput increases especially for multithreaded scenarios and
virtio-net transmit throughput increases substantially.

Now that this code is in virtio-pci.c it is possible to explicitly enable
devices for which virtio-ioeventfd should be used.  Only virtio-blk and
virtio-net are enabled at this time.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify

2010-11-11 Thread Stefan Hajnoczi
Virtqueue notify is currently handled synchronously in userspace virtio.  This
prevents the vcpu from executing guest code while hardware emulation code
handles the notify.

On systems that support KVM, the ioeventfd mechanism can be used to make
virtqueue notify a lightweight exit by deferring hardware emulation to the
iothread and allowing the VM to continue execution.  This model is similar to
how vhost receives virtqueue notifies.

The result of this change is improved performance for userspace virtio devices.
Virtio-blk throughput increases especially for multithreaded scenarios and
virtio-net transmit throughput increases substantially.

Some virtio devices are known to have guest drivers which expect a notify to be
processed synchronously and spin waiting for completion.  Only enable ioeventfd
for virtio-blk and virtio-net for now.

Care must be taken not to interfere with vhost-net, which already uses
ioeventfd host notifiers.  The following list shows the behavior implemented in
this patch and is designed to take vhost-net into account:

 * VIRTIO_CONFIG_S_DRIVER_OK - assign host notifiers, 
qemu_set_fd_handler(virtio_pci_host_notifier_read)
 * reset - qemu_set_fd_handler(NULL), deassign host notifiers
 * virtio_pci_set_host_notifier(true) - qemu_set_fd_handler(NULL)
 * virtio_pci_set_host_notifier(false) - 
qemu_set_fd_handler(virtio_pci_host_notifier_read)

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 hw/virtio-pci.c |  155 +++---
 hw/virtio.c |5 ++
 hw/virtio.h |1 +
 3 files changed, 129 insertions(+), 32 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 549118d..436fc59 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -83,6 +83,10 @@
 /* Flags track per-device state like workarounds for quirks in older guests. */
 #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG  (1  0)
 
+/* Performance improves when virtqueue kick processing is decoupled from the
+ * vcpu thread using ioeventfd for some devices. */
+#define VIRTIO_PCI_FLAG_USE_IOEVENTFD   (1  1)
+
 /* QEMU doesn't strictly need write barriers since everything runs in
  * lock-step.  We'll leave the calls to wmb() in though to make it obvious for
  * KVM or if kqemu gets SMP support.
@@ -179,12 +183,108 @@ static int virtio_pci_load_queue(void * opaque, int n, 
QEMUFile *f)
 return 0;
 }
 
+static int virtio_pci_set_host_notifier_ioeventfd(VirtIOPCIProxy *proxy, int 
n, bool assign)
+{
+VirtQueue *vq = virtio_get_queue(proxy-vdev, n);
+EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
+int r;
+if (assign) {
+r = event_notifier_init(notifier, 1);
+if (r  0) {
+return r;
+}
+r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
+   proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY,
+   n, assign);
+if (r  0) {
+event_notifier_cleanup(notifier);
+}
+} else {
+r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
+   proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY,
+   n, assign);
+if (r  0) {
+return r;
+}
+event_notifier_cleanup(notifier);
+}
+return r;
+}
+
+static void virtio_pci_host_notifier_read(void *opaque)
+{
+VirtQueue *vq = opaque;
+EventNotifier *n = virtio_queue_get_host_notifier(vq);
+if (event_notifier_test_and_clear(n)) {
+virtio_queue_notify_vq(vq);
+}
+}
+
+static void virtio_pci_set_host_notifier_fd_handler(VirtIOPCIProxy *proxy, int 
n, bool assign)
+{
+VirtQueue *vq = virtio_get_queue(proxy-vdev, n);
+EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
+if (assign) {
+qemu_set_fd_handler(event_notifier_get_fd(notifier),
+virtio_pci_host_notifier_read, NULL, vq);
+} else {
+qemu_set_fd_handler(event_notifier_get_fd(notifier),
+NULL, NULL, NULL);
+}
+}
+
+static int virtio_pci_set_host_notifiers(VirtIOPCIProxy *proxy, bool assign)
+{
+int n, r;
+
+for (n = 0; n  VIRTIO_PCI_QUEUE_MAX; n++) {
+if (!virtio_queue_get_num(proxy-vdev, n)) {
+continue;
+}
+
+if (assign) {
+r = virtio_pci_set_host_notifier_ioeventfd(proxy, n, true);
+if (r  0) {
+goto assign_error;
+}
+
+virtio_pci_set_host_notifier_fd_handler(proxy, n, true);
+} else {
+virtio_pci_set_host_notifier_fd_handler(proxy, n, false);
+virtio_pci_set_host_notifier_ioeventfd(proxy, n, false);
+}
+}
+return 0;
+
+assign_error:
+proxy-flags = ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
+while (--n = 0) {
+virtio_pci_set_host_notifier_fd_handler(proxy, n, false);
+

[PATCH 1/3] virtio-pci: Rename bugs field to flags

2010-11-11 Thread Stefan Hajnoczi
The VirtIOPCIProxy bugs field is currently used to enable workarounds
for older guests.  Rename it to flags so that other per-device behavior
can be tracked.

A later patch uses the flags field to remember whether ioeventfd should
be used for virtqueue host notification.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 hw/virtio-pci.c |   15 +++
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 729917d..549118d 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -80,9 +80,8 @@
  * 12 is historical, and due to x86 page size. */
 #define VIRTIO_PCI_QUEUE_ADDR_SHIFT12
 
-/* We can catch some guest bugs inside here so we continue supporting older
-   guests. */
-#define VIRTIO_PCI_BUG_BUS_MASTER  (1  0)
+/* Flags track per-device state like workarounds for quirks in older guests. */
+#define VIRTIO_PCI_FLAG_BUS_MASTER_BUG  (1  0)
 
 /* QEMU doesn't strictly need write barriers since everything runs in
  * lock-step.  We'll leave the calls to wmb() in though to make it obvious for
@@ -95,7 +94,7 @@
 typedef struct {
 PCIDevice pci_dev;
 VirtIODevice *vdev;
-uint32_t bugs;
+uint32_t flags;
 uint32_t addr;
 uint32_t class_code;
 uint32_t nvectors;
@@ -159,7 +158,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile 
*f)
in ready state. Then we have a buggy guest OS. */
 if ((proxy-vdev-status  VIRTIO_CONFIG_S_DRIVER_OK) 
 !(proxy-pci_dev.config[PCI_COMMAND]  PCI_COMMAND_MASTER)) {
-proxy-bugs |= VIRTIO_PCI_BUG_BUS_MASTER;
+proxy-flags |= VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
 }
 return 0;
 }
@@ -185,7 +184,7 @@ static void virtio_pci_reset(DeviceState *d)
 VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev);
 virtio_reset(proxy-vdev);
 msix_reset(proxy-pci_dev);
-proxy-bugs = 0;
+proxy-flags = 0;
 }
 
 static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
@@ -235,7 +234,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
some safety checks. */
 if ((val  VIRTIO_CONFIG_S_DRIVER_OK) 
 !(proxy-pci_dev.config[PCI_COMMAND]  PCI_COMMAND_MASTER)) {
-proxy-bugs |= VIRTIO_PCI_BUG_BUS_MASTER;
+proxy-flags |= VIRTIO_PCI_FLAG_BUS_MASTER_BUG;
 }
 break;
 case VIRTIO_MSI_CONFIG_VECTOR:
@@ -403,7 +402,7 @@ static void virtio_write_config(PCIDevice *pci_dev, 
uint32_t address,
 
 if (PCI_COMMAND == address) {
 if (!(val  PCI_COMMAND_MASTER)) {
-if (!(proxy-bugs  VIRTIO_PCI_BUG_BUS_MASTER)) {
+if (!(proxy-flags  VIRTIO_PCI_FLAG_BUS_MASTER_BUG)) {
 virtio_set_status(proxy-vdev,
   proxy-vdev-status  
~VIRTIO_CONFIG_S_DRIVER_OK);
 }
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] virtio-pci: Don't use ioeventfd on old kernels

2010-11-11 Thread Stefan Hajnoczi
There used to be a limit of 6 KVM io bus devices inside the kernel.  On
such a kernel, don't use ioeventfd for virtqueue host notification since
the limit is reached too easily.  This ensures that existing vhost-net
setups (which always use ioeventfd) have ioeventfds available so they
can continue to work.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 hw/virtio-pci.c |4 
 kvm-all.c   |   46 ++
 kvm-stub.c  |5 +
 kvm.h   |1 +
 4 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 436fc59..365a26b 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -646,6 +646,10 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, 
VirtIODevice *vdev,
 pci_register_bar(proxy-pci_dev, 0, size, PCI_BASE_ADDRESS_SPACE_IO,
virtio_map);
 
+if (!kvm_has_many_ioeventfds()) {
+proxy-flags = ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
+}
+
 virtio_bind_device(vdev, virtio_pci_bindings, proxy);
 proxy-host_features |= 0x1  VIRTIO_F_NOTIFY_ON_EMPTY;
 proxy-host_features |= 0x1  VIRTIO_F_BAD_FEATURE;
diff --git a/kvm-all.c b/kvm-all.c
index 37b99c7..ba302bc 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -28,6 +28,11 @@
 #include kvm.h
 #include bswap.h
 
+/* This check must be after config-host.h is included */
+#ifdef CONFIG_EVENTFD
+#include sys/eventfd.h
+#endif
+
 /* KVM uses PAGE_SIZE in it's definition of COALESCED_MMIO_MAX */
 #define PAGE_SIZE TARGET_PAGE_SIZE
 
@@ -72,6 +77,7 @@ struct KVMState
 int irqchip_in_kernel;
 int pit_in_kernel;
 int xsave, xcrs;
+int many_ioeventfds;
 };
 
 static KVMState *kvm_state;
@@ -441,6 +447,39 @@ int kvm_check_extension(KVMState *s, unsigned int 
extension)
 return ret;
 }
 
+static int kvm_check_many_ioeventfds(void)
+{
+/* Older kernels have a 6 device limit on the KVM io bus.  Find out so we
+ * can avoid creating too many ioeventfds.
+ */
+#ifdef CONFIG_EVENTFD
+int ioeventfds[7];
+int i, ret = 0;
+for (i = 0; i  ARRAY_SIZE(ioeventfds); i++) {
+ioeventfds[i] = eventfd(0, EFD_CLOEXEC);
+if (ioeventfds[i]  0) {
+break;
+}
+ret = kvm_set_ioeventfd_pio_word(ioeventfds[i], 0, i, true);
+if (ret  0) {
+close(ioeventfds[i]);
+break;
+}
+}
+
+/* Decide whether many devices are supported or not */
+ret = i == ARRAY_SIZE(ioeventfds);
+
+while (i--  0) {
+kvm_set_ioeventfd_pio_word(ioeventfds[i], 0, i, false);
+close(ioeventfds[i]);
+}
+return ret;
+#else
+return 0;
+#endif
+}
+
 static void kvm_set_phys_mem(target_phys_addr_t start_addr,
 ram_addr_t size,
 ram_addr_t phys_offset)
@@ -717,6 +756,8 @@ int kvm_init(int smp_cpus)
 kvm_state = s;
 cpu_register_phys_memory_client(kvm_cpu_phys_memory_client);
 
+s-many_ioeventfds = kvm_check_many_ioeventfds();
+
 return 0;
 
 err:
@@ -1046,6 +1087,11 @@ int kvm_has_xcrs(void)
 return kvm_state-xcrs;
 }
 
+int kvm_has_many_ioeventfds(void)
+{
+return kvm_state-many_ioeventfds;
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 if (!kvm_has_sync_mmu()) {
diff --git a/kvm-stub.c b/kvm-stub.c
index 5384a4b..33d4476 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -99,6 +99,11 @@ int kvm_has_robust_singlestep(void)
 return 0;
 }
 
+int kvm_has_many_ioeventfds(void)
+{
+return 0;
+}
+
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 }
diff --git a/kvm.h b/kvm.h
index 60a9b42..ce08d42 100644
--- a/kvm.h
+++ b/kvm.h
@@ -42,6 +42,7 @@ int kvm_has_robust_singlestep(void);
 int kvm_has_debugregs(void);
 int kvm_has_xsave(void);
 int kvm_has_xcrs(void);
+int kvm_has_many_ioeventfds(void);
 
 #ifdef NEED_CPU_H
 int kvm_init_vcpu(CPUState *env);
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GPGPU passthrough in linux kvm

2010-11-11 Thread Jason Krieg

On 10/15/2010 09:11 AM, André Weidemann wrote:

Hi Federico,

On 15.06.2010 18:18, Fede wrote:

On Wed, Jun 9, 2010 at 18:51, Adhyas Avasthiadh...@gmail.com  wrote:

I read an old email thread which talked about GPGPU passthroughin
linux-kvm. Was this implemented?

If not, are there some quick hacks I can use to enable it in my tree?
Right now, I try to follow the same commands as I do for NIC
passthrough, but it complains that my device is busy. I have two
NVidia Graphics cards on my box.


I'm working on it. It doesn't work because graphics cards have BIOSes.

I need to debug some issues with the BIOS of my 9600GT, but I think
I'm quite close to succeed.


Are you still working on implementing this feature?
Did you make some progress in debugging the BIOS issue?
I am still very much looking forward to kvm being able to pass a video 
card to a VM.




hope you soon have success so kvm can do GPU passthrough

we can not let xen be the only one who can do this see beta XenServer 5.6
now with GPU passthrough

Regards,
Jason

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: recent nested-virtualization publications

2010-11-11 Thread Joerg Roedel
On Tue, Oct 26, 2010 at 03:10:58PM +0200, Nadav Har'El wrote:

 I put copies of all above mentioned documents (in case there's
 difficulty in finding them), in
http://www.math.technion.ac.il/~nyh/nested/

Very interesting. Thanks for the overview on that topic.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM test : Adding RHEL 6 to the list of guests

2010-11-11 Thread Lucas Meneghel Rodrigues
Signed-off-by: Pradeep Kumar psuri...@linux.vnet.ibm.com
---
 client/tests/kvm/tests_base.cfg.sample   |   26 +-
 client/tests/kvm/unattended/RHEL-6-series.ks |   37 ++
 2 files changed, 62 insertions(+), 1 deletions(-)
 create mode 100644 client/tests/kvm/unattended/RHEL-6-series.ks

diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 07328df..07b6015 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -1334,6 +1334,30 @@ variants:
 #floppy = images/rhel55-64/ks.vfd
 cdrom_unattended = images/rhel55-64/ks.iso
 
+- 6.0.i386:
+no setup
+image_name = rhel6-32
+cdrom_cd1 = linux/RHEL-6.0-i386-DVD.iso
+md5sum = 291d234c93442405972689b4b41c14bc
+md5sum_1m = ee2cc3d3babe91a1d581a07099c4318b
+unattended_install.cdrom:
+unattended_file = unattended/RHEL-6-series.ks
+tftp = images/rhel60-32/tftpboot
+#floppy = images/rhel60-32/ks.vfd
+cdrom_unattended = images/rhel60-32/ks.iso
+
+- 6.0.x86_64:
+no setup
+image_name = rhel6-64
+cdrom_cd1 = linux/RHEL-6.0-x86_64-DVD.iso
+md5sum = f7141396c6a19399d63e8c195354317d
+md5sum_1m = b060eeef63e2c8700db54ae02056e80c
+unattended_install.cdrom:
+unattended_file = unattended/RHEL-6-series.ks
+tftp = images/rhel60-64/tftpboot
+#floppy = images/rhel60-64/ks.vfd
+cdrom_unattended = images/rhel60-64/ks.iso
+
 
 # Windows section
 - @Windows:
@@ -1930,7 +1954,7 @@ variants:
 
 
 virtio_net|virtio_blk|e1000|balloon_check:
-only Fedora.11 Fedora.12 Fedora.13 RHEL.5 OpenSUSE.11 SLES.11 
Ubuntu-8.10-server
+only Fedora.11 Fedora.12 Fedora.13 RHEL.5 RHEL.6 OpenSUSE.11 SLES.11 
Ubuntu-8.10-server
 # only WinXP Win2003 Win2008 WinVista Win7 Fedora.11 Fedora.12 Fedora.13 
RHEL.5 OpenSUSE.11 SLES.11 Ubuntu-8.10-server
 
 kdump:
diff --git a/client/tests/kvm/unattended/RHEL-6-series.ks 
b/client/tests/kvm/unattended/RHEL-6-series.ks
new file mode 100644
index 000..3ee84f1
--- /dev/null
+++ b/client/tests/kvm/unattended/RHEL-6-series.ks
@@ -0,0 +1,37 @@
+install
+KVM_TEST_MEDIUM
+text
+reboot
+lang en_US.UTF-8
+keyboard us
+key --skip
+network --bootproto dhcp
+rootpw 123456
+firewall --enabled --ssh
+selinux --enforcing
+timezone --utc America/New_York
+firstboot --disable
+bootloader --location=mbr --append=console=tty0 console=ttyS0,115200
+zerombr
+clearpart --all --initlabel
+autopart
+reboot
+
+%packages
+...@base
+...@development-libs
+...@development-tools
+kexec-tools
+
+%post --interpreter /usr/bin/python
+import socket, os
+os.system('dhclient')
+os.system('chkconfig sshd on')
+os.system('iptables -F')
+os.system('echo 0  /selinux/enforce')
+server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+server.bind(('', 12323))
+server.listen(1)
+(client, addr) = server.accept()
+client.send(done)
+client.close()
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM test: Add a subtest kdump

2010-11-11 Thread Lucas Meneghel Rodrigues
From: Jason Wang jasow...@redhat.com

Add a new subtest to check whether kdump work correctly in guest. This test just
try to trigger crash on each vcpu and then verify it by checking the vmcore.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/tests/kdump.py  |   80 ++
 client/tests/kvm/tests_base.cfg.sample   |   11 
 client/tests/kvm/unattended/RHEL-5-series.ks |1 +
 3 files changed, 92 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/tests/kdump.py

diff --git a/client/tests/kvm/tests/kdump.py b/client/tests/kvm/tests/kdump.py
new file mode 100644
index 000..a5843c7
--- /dev/null
+++ b/client/tests/kvm/tests/kdump.py
@@ -0,0 +1,80 @@
+import logging, time
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_test_utils, kvm_utils
+
+
+def run_kdump(test, params, env):
+
+KVM reboot test:
+1) Log into a guest
+2) Check and enable the kdump
+3) For each vcpu, trigger a crash and check the vmcore
+
+@param test: kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+
+vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
+timeout = float(params.get(login_timeout, 240))
+crash_timeout = float(params.get(crash_timeout, 360))
+session = kvm_test_utils.wait_for_login(vm, 0, timeout, 0, 2)
+def_kernel_param_cmd = (grubby --update-kernel=`grubby --default-kernel`
+ --args=crashkernel=1...@64m)
+kernel_param_cmd = params.get(kernel_param_cmd, def_kernel_param_cmd)
+def_kdump_enable_cmd = chkconfig kdump on  service kdump start
+kdump_enable_cmd = params.get(kdump_enable_cmd, def_kdump_enable_cmd)
+
+def crash_test(vcpu):
+
+Trigger a crash dump through sysrq-trigger
+
+@param vcpu: vcpu which is used to trigger a crash
+
+session = kvm_test_utils.wait_for_login(vm, 0, timeout, 0, 2)
+session.get_command_status(rm -rf /var/crash/*)
+
+logging.info(Triggering crash on vcpu %d ..., vcpu)
+crash_cmd = taskset -c %d echo c  /proc/sysrq-trigger % vcpu
+session.sendline(crash_cmd)
+
+if not kvm_utils.wait_for(lambda: not session.is_responsive(), 240, 0,
+  1):
+raise error.TestFail(Could not trigger crash on vcpu %d % vcpu)
+
+logging.info(Waiting for kernel crash dump to complete)
+session = kvm_test_utils.wait_for_login(vm, 0, crash_timeout, 0, 2)
+
+logging.info(Probing vmcore file...)
+s = session.get_command_status(ls -R /var/crash | grep vmcore)
+if s != 0:
+raise error.TestFail(Could not find the generated vmcore file)
+else:
+logging.info(Found vmcore.)
+
+session.get_command_status(rm -rf /var/crash/*)
+
+try:
+logging.info(Checking the existence of crash kernel...)
+prob_cmd = grep -q 1 /sys/kernel/kexec_crash_loaded
+s = session.get_command_status(prob_cmd)
+if s != 0:
+logging.info(Crash kernel is not loaded. Trying to load it)
+# We need to setup the kernel params
+s, o = session.get_command_status_output(kernel_param_cmd)
+if s != 0:
+raise error.TestFail(Could not add crashkernel params to
+ kernel)
+session = kvm_test_utils.reboot(vm, session, timeout=timeout);
+
+logging.info(Enabling kdump service...)
+# the initrd may be rebuilt here so we need to wait a little more
+s, o = session.get_command_status_output(kdump_enable_cmd, timeout=120)
+if s != 0:
+raise error.TestFail(Could not enable kdump service: %s % o)
+
+nvcpu = int(params.get(smp, 1))
+for i in range (nvcpu):
+crash_test(i)
+
+finally:
+session.close()
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 8fab6e2..00bd3bc 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -664,6 +664,15 @@ variants:
 image_name_snapshot1 = sn1
 image_name_snapshot2 = sn2
 
+- kdump:
+type = kdump
+# time waited for the completion of crash dump
+# crash_timeout = 360
+# command to add the crashkerne...@y to kernel cmd line
+# kernel_param_cmd = grubby --update-kernel=`grubby --default-kernel` 
--args=crashkernel=1...@64m
+# command to enable kdump service
+# kdump_enable_cmd = chkconfig kdump on  service kdump start
+
 # system_powerdown, system_reset and shutdown *must* be the last ones
 # defined (in this order), since the effect of such tests can leave
 # the VM on a bad state.
@@ -1924,6 +1933,8 @@ virtio_net|virtio_blk|e1000|balloon_check:
 only 

Re: [PATCH v2 1/2] KVM: SVM: Replace svm_has() by standard Linux cpuid accessors

2010-11-11 Thread Joerg Roedel
On Tue, Nov 09, 2010 at 04:15:42PM +0200, Avi Kivity wrote:
 - if (nr == BP_VECTOR  !svm_has(SVM_FEATURE_NRIP)) {
 + if (nr == BP_VECTOR  !static_cpu_has(X86_FEATURE_NRIPS)) {

What is static_cpu_has and why you use it only here and boot_cpu_has
in all other places?

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] KVM: SVM: Replace svm_has() by standard Linux cpuid accessors

2010-11-11 Thread Avi Kivity

On 11/11/2010 04:46 PM, Joerg Roedel wrote:

On Tue, Nov 09, 2010 at 04:15:42PM +0200, Avi Kivity wrote:
  - if (nr == BP_VECTOR  !svm_has(SVM_FEATURE_NRIP)) {
  + if (nr == BP_VECTOR  !static_cpu_has(X86_FEATURE_NRIPS)) {

What is static_cpu_has


It's like boot_cpu_has, only it works by patching instead of a dynamic test.


and why you use it only here and boot_cpu_has
in all other places?


A nano optimization, this is a more commonly used path.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 08/14] Add get_fw_dev_path callback for pci bus.

2010-11-11 Thread Michael S. Tsirkin
On Thu, Nov 11, 2010 at 11:07:01AM +0100, Gerd Hoffmann wrote:
 On 11/10/10 18:34, Michael S. Tsirkin wrote:
 On Wed, Nov 10, 2010 at 07:14:15PM +0200, Gleb Natapov wrote:
 
 Signed-off-by: Gleb Natapovg...@redhat.com
 
 Good stuff. We should also consider using this for
 CLI and monitor. Some comments below.
 
 Oh, we already have a table to map pci classes to descriptions for
 'info pci'.

You remind me, that one must be fixed to use names from
pci_ids.h

  I'd strongly suggest to just add the fw names to that
 table instead of creating a second one ...
 
 cheers,
   Gerd

No, I mean the path.

We are currently passing domain:bus:slot.function
to point at a device and we know it's broken for
nested bridges and kind of broken for multiple domains.
It's a bug that we must fix.

Class names aren't that interesting to me,
and I'd be just as happy with packing these into 2 tables as a single
one personally, it all seems to be a matter of style.
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] KVM: SVM: Replace svm_has() by standard Linux cpuid accessors

2010-11-11 Thread Joerg Roedel
On Thu, Nov 11, 2010 at 04:50:10PM +0200, Avi Kivity wrote:
 On 11/11/2010 04:46 PM, Joerg Roedel wrote:
 On Tue, Nov 09, 2010 at 04:15:42PM +0200, Avi Kivity wrote:
   - if (nr == BP_VECTOR  !svm_has(SVM_FEATURE_NRIP)) {
   + if (nr == BP_VECTOR  !static_cpu_has(X86_FEATURE_NRIPS)) {

 What is static_cpu_has

 It's like boot_cpu_has, only it works by patching instead of a dynamic test.

 and why you use it only here and boot_cpu_has
 in all other places?

 A nano optimization, this is a more commonly used path.

Ok, I was just curious because I couldn't find the static_cpu_has by a
quick grep. Thanks for the explanation.

Acked-by: Joerg Roedel joerg.roe...@amd.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify

2010-11-11 Thread Michael S. Tsirkin
On Thu, Nov 11, 2010 at 01:47:21PM +, Stefan Hajnoczi wrote:
 Virtqueue notify is currently handled synchronously in userspace virtio.  This
 prevents the vcpu from executing guest code while hardware emulation code
 handles the notify.
 
 On systems that support KVM, the ioeventfd mechanism can be used to make
 virtqueue notify a lightweight exit by deferring hardware emulation to the
 iothread and allowing the VM to continue execution.  This model is similar to
 how vhost receives virtqueue notifies.
 
 The result of this change is improved performance for userspace virtio 
 devices.
 Virtio-blk throughput increases especially for multithreaded scenarios and
 virtio-net transmit throughput increases substantially.
 
 Some virtio devices are known to have guest drivers which expect a notify to 
 be
 processed synchronously and spin waiting for completion.  Only enable 
 ioeventfd
 for virtio-blk and virtio-net for now.
 
 Care must be taken not to interfere with vhost-net, which already uses
 ioeventfd host notifiers.  The following list shows the behavior implemented 
 in
 this patch and is designed to take vhost-net into account:
 
  * VIRTIO_CONFIG_S_DRIVER_OK - assign host notifiers, 
 qemu_set_fd_handler(virtio_pci_host_notifier_read)

we should also deassign when VIRTIO_CONFIG_S_DRIVER_OK is cleared
by io write or bus master bit?

  * reset - qemu_set_fd_handler(NULL), deassign host notifiers
  * virtio_pci_set_host_notifier(true) - qemu_set_fd_handler(NULL)
  * virtio_pci_set_host_notifier(false) - 
 qemu_set_fd_handler(virtio_pci_host_notifier_read)
 
 Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 ---
  hw/virtio-pci.c |  155 +++---
  hw/virtio.c |5 ++
  hw/virtio.h |1 +
  3 files changed, 129 insertions(+), 32 deletions(-)
 
 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index 549118d..436fc59 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -83,6 +83,10 @@
  /* Flags track per-device state like workarounds for quirks in older guests. 
 */
  #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG  (1  0)
  
 +/* Performance improves when virtqueue kick processing is decoupled from the
 + * vcpu thread using ioeventfd for some devices. */
 +#define VIRTIO_PCI_FLAG_USE_IOEVENTFD   (1  1)
 +
  /* QEMU doesn't strictly need write barriers since everything runs in
   * lock-step.  We'll leave the calls to wmb() in though to make it obvious 
 for
   * KVM or if kqemu gets SMP support.
 @@ -179,12 +183,108 @@ static int virtio_pci_load_queue(void * opaque, int n, 
 QEMUFile *f)
  return 0;
  }
  
 +static int virtio_pci_set_host_notifier_ioeventfd(VirtIOPCIProxy *proxy, int 
 n, bool assign)
 +{
 +VirtQueue *vq = virtio_get_queue(proxy-vdev, n);
 +EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
 +int r;
 +if (assign) {
 +r = event_notifier_init(notifier, 1);
 +if (r  0) {
 +return r;
 +}
 +r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
 +   proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY,
 +   n, assign);
 +if (r  0) {
 +event_notifier_cleanup(notifier);
 +}
 +} else {
 +r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
 +   proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY,
 +   n, assign);
 +if (r  0) {
 +return r;
 +}
 +event_notifier_cleanup(notifier);
 +}
 +return r;
 +}
 +
 +static void virtio_pci_host_notifier_read(void *opaque)
 +{
 +VirtQueue *vq = opaque;
 +EventNotifier *n = virtio_queue_get_host_notifier(vq);
 +if (event_notifier_test_and_clear(n)) {
 +virtio_queue_notify_vq(vq);
 +}
 +}
 +
 +static void virtio_pci_set_host_notifier_fd_handler(VirtIOPCIProxy *proxy, 
 int n, bool assign)
 +{
 +VirtQueue *vq = virtio_get_queue(proxy-vdev, n);
 +EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
 +if (assign) {
 +qemu_set_fd_handler(event_notifier_get_fd(notifier),
 +virtio_pci_host_notifier_read, NULL, vq);
 +} else {
 +qemu_set_fd_handler(event_notifier_get_fd(notifier),
 +NULL, NULL, NULL);
 +}
 +}
 +
 +static int virtio_pci_set_host_notifiers(VirtIOPCIProxy *proxy, bool assign)
 +{
 +int n, r;
 +
 +for (n = 0; n  VIRTIO_PCI_QUEUE_MAX; n++) {
 +if (!virtio_queue_get_num(proxy-vdev, n)) {
 +continue;
 +}
 +
 +if (assign) {
 +r = virtio_pci_set_host_notifier_ioeventfd(proxy, n, true);
 +if (r  0) {
 +goto assign_error;
 +}
 +
 +virtio_pci_set_host_notifier_fd_handler(proxy, n, true);
 +} else {
 +virtio_pci_set_host_notifier_fd_handler(proxy, n, false);
 +  

Re: [PATCHv3 08/14] Add get_fw_dev_path callback for pci bus.

2010-11-11 Thread Gleb Natapov
On Thu, Nov 11, 2010 at 05:05:11PM +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 11, 2010 at 11:07:01AM +0100, Gerd Hoffmann wrote:
  On 11/10/10 18:34, Michael S. Tsirkin wrote:
  On Wed, Nov 10, 2010 at 07:14:15PM +0200, Gleb Natapov wrote:
  
  Signed-off-by: Gleb Natapovg...@redhat.com
  
  Good stuff. We should also consider using this for
  CLI and monitor. Some comments below.
  
  Oh, we already have a table to map pci classes to descriptions for
  'info pci'.
 
 You remind me, that one must be fixed to use names from
 pci_ids.h
 
For now I used numbers like other elements of the table. If I
would do the conversion I would use convenience marcros to not have
names like PCI_CLASS_NETWORK_TOKEN_RING all over the table :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] KVM: assigned dev: Big endian support for MSI-X MMIO

2010-11-11 Thread Michael S. Tsirkin
On Thu, Nov 11, 2010 at 03:47:00PM +0800, Sheng Yang wrote:
 Add marco for big-endian machine.(Untested!)
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com

I presume this is tested at the same level as the previous patch?
So you want to fold this into the previous patch.

Also, please build with sparse (C=1) to find some endian-ness issues.

 ---
  virt/kvm/assigned-dev.c |7 ---
  1 files changed, 4 insertions(+), 3 deletions(-)
 
 diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
 index 3010d7d..15b5f74 100644
 --- a/virt/kvm/assigned-dev.c
 +++ b/virt/kvm/assigned-dev.c
 @@ -848,9 +848,9 @@ static int msix_mmio_read(struct kvm_io_device *this, 
 gpa_t addr, int len,
   idx = (addr - adev-msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
   if ((addr % PCI_MSIX_ENTRY_SIZE) ==
   PCI_MSIX_ENTRY_VECTOR_CTRL)
 - *(unsigned long *)val =
 + *(unsigned long *)val = le32_to_cpu(

This should be cpu_to_le32. And val cast to __le32.

   test_bit(idx, adev-msix_mask_bitmap) ?
 - PCI_MSIX_ENTRY_CTRL_MASKBIT : 0;
 + PCI_MSIX_ENTRY_CTRL_MASKBIT : 0);
   else
   r = -EOPNOTSUPP;
   goto out;

In this function,
entry must be __le32 too, and fix up endian-ness where you fill it in.

 @@ -869,6 +869,7 @@ static int msix_mmio_read(struct kvm_io_device *this, 
 gpa_t addr, int len,
   adev-msix_mask_bitmap);
   memcpy(val, entry[addr % PCI_MSIX_ENTRY_SIZE / sizeof *entry], len);
  
 + *(unsigned long *)val = le32_to_cpu(*(unsigned long *)val);
  out:
   mutex_unlock(adev-kvm-lock);
   return r;
 @@ -881,7 +882,7 @@ static int msix_mmio_write(struct kvm_io_device *this, 
 gpa_t addr, int len,
   container_of(this, struct kvm_assigned_dev_kernel,
msix_mmio_dev);
   int idx, r = 0;
 - unsigned long new_val = *(unsigned long *)val;
 + unsigned long new_val = cpu_to_le32(*(unsigned long *)val);

__le32

  
   mutex_lock(adev-kvm-lock);
   if (!msix_mmio_in_range(adev, addr, len)) {
 -- 
 1.7.0.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify

2010-11-11 Thread Christoph Hellwig
On Thu, Nov 11, 2010 at 01:47:21PM +, Stefan Hajnoczi wrote:
 Some virtio devices are known to have guest drivers which expect a notify to 
 be
 processed synchronously and spin waiting for completion.  Only enable 
 ioeventfd
 for virtio-blk and virtio-net for now.

Who guarantees that less common virtio-blk and virtio-net guest drivers
for non-Linux OSes are fine with it?  Maybe you should add a feature flag
that the guest has to ACK to enable it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify

2010-11-11 Thread Avi Kivity

On 11/11/2010 03:47 PM, Stefan Hajnoczi wrote:

Some virtio devices are known to have guest drivers which expect a notify to be
processed synchronously and spin waiting for completion.  Only enable ioeventfd
for virtio-blk and virtio-net for now.


Which drivers are these?

I only know of the virtio-gl driver, which isn't upstream.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify

2010-11-11 Thread Michael S. Tsirkin
On Thu, Nov 11, 2010 at 06:59:29PM +0200, Avi Kivity wrote:
 On 11/11/2010 03:47 PM, Stefan Hajnoczi wrote:
 Some virtio devices are known to have guest drivers which expect a notify to 
 be
 processed synchronously and spin waiting for completion.  Only enable 
 ioeventfd
 for virtio-blk and virtio-net for now.
 
 Which drivers are these?
 
 I only know of the virtio-gl driver, which isn't upstream.

I think that PXE virtio drivers do polling. But I think this change does
not break these drivers. It'll probably make them a bit slower.  Right,
Gleb?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 08/14] Add get_fw_dev_path callback for pci bus.

2010-11-11 Thread Michael S. Tsirkin
On Thu, Nov 11, 2010 at 06:07:53PM +0200, Gleb Natapov wrote:
 On Thu, Nov 11, 2010 at 05:05:11PM +0200, Michael S. Tsirkin wrote:
  On Thu, Nov 11, 2010 at 11:07:01AM +0100, Gerd Hoffmann wrote:
   On 11/10/10 18:34, Michael S. Tsirkin wrote:
   On Wed, Nov 10, 2010 at 07:14:15PM +0200, Gleb Natapov wrote:
   
   Signed-off-by: Gleb Natapovg...@redhat.com
   
   Good stuff. We should also consider using this for
   CLI and monitor. Some comments below.
   
   Oh, we already have a table to map pci classes to descriptions for
   'info pci'.
  
  You remind me, that one must be fixed to use names from
  pci_ids.h
  
 For now I used numbers like other elements of the table.

Fair enough.

 If I
 would do the conversion I would use convenience marcros to not have
 names like PCI_CLASS_NETWORK_TOKEN_RING all over the table :)
 
 --
   Gleb.

I'd prefer to keep it simple :)

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] PCI: Add mask bit definition for MSI-X table

2010-11-11 Thread Jesse Barnes
On Thu, 11 Nov 2010 15:46:55 +0800
Sheng Yang sh...@linux.intel.com wrote:

 Then we can use it instead of magic number 1.
 
 Reviewed-by: Hidetoshi Seto seto.hideto...@jp.fujitsu.com
 Cc: Matthew Wilcox wi...@linux.intel.com
 Cc: Jesse Barnes jbar...@virtuousgeek.org
 Cc: linux-...@vger.kernel.org
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  drivers/pci/msi.c|5 +++--
  include/linux/pci_regs.h |1 +
  2 files changed, 4 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
 index 69b7be3..095634e 100644
 --- a/drivers/pci/msi.c
 +++ b/drivers/pci/msi.c
 @@ -158,8 +158,9 @@ static u32 __msix_mask_irq(struct msi_desc *desc, u32 
 flag)
   u32 mask_bits = desc-masked;
   unsigned offset = desc-msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
   PCI_MSIX_ENTRY_VECTOR_CTRL;
 - mask_bits = ~1;
 - mask_bits |= flag;
 + mask_bits = ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
 + if (flag)
 + mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
   writel(mask_bits, desc-mask_base + offset);
  
   return mask_bits;
 diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
 index acfc224..ff51632 100644
 --- a/include/linux/pci_regs.h
 +++ b/include/linux/pci_regs.h
 @@ -313,6 +313,7 @@
  #define  PCI_MSIX_ENTRY_UPPER_ADDR   4
  #define  PCI_MSIX_ENTRY_DATA 8
  #define  PCI_MSIX_ENTRY_VECTOR_CTRL  12
 +#define   PCI_MSIX_ENTRY_CTRL_MASKBIT1
  
  /* CompactPCI Hotswap Register */
  

Applied 1/7 and 2/7 to my linux-next tree, thanks.

If it's easier to push them both through the kvm tree let me know; you
can just add my acked-by in that case.

-- 
Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify

2010-11-11 Thread Gleb Natapov
On Thu, Nov 11, 2010 at 07:12:40PM +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 11, 2010 at 06:59:29PM +0200, Avi Kivity wrote:
  On 11/11/2010 03:47 PM, Stefan Hajnoczi wrote:
  Some virtio devices are known to have guest drivers which expect a notify 
  to be
  processed synchronously and spin waiting for completion.  Only enable 
  ioeventfd
  for virtio-blk and virtio-net for now.
  
  Which drivers are these?
  
  I only know of the virtio-gl driver, which isn't upstream.
 
 I think that PXE virtio drivers do polling. But I think this change does
 not break these drivers. It'll probably make them a bit slower.  Right,
 Gleb?
 
Can't tell about PXE virtio drivers, but virtio-blk in seabios does
polling.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Always false condition in rmap_add

2010-11-11 Thread Prasad Joshi
Hello All,

I have question on code of rmap_add

Here is the code of the function

 613 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 614 {

624 rmapp = gfn_to_rmap(vcpu-kvm, gfn, sp-role.level);
 625 if (!*rmapp) {
 626 rmap_printk(rmap_add: %p %llx 0-1\n, spte, *spte);
 627 *rmapp = (unsigned long)spte;
 628 } else if (!(*rmapp  1)) {
 629 rmap_printk(rmap_add: %p %llx 1-many\n, spte, *spte);
 630 desc = mmu_alloc_rmap_desc(vcpu);
 631 desc-sptes[0] = (u64 *)*rmapp;
 632 desc-sptes[1] = spte;
 633 *rmapp = (unsigned long)desc | 1;
 634 ++count;
 635 } else {

The line 628 checks whether the last bit of the rmapp is 1.
If it is one then line 633 assigns a new value to rmapp with and sets
the last bit to 1.

But the line 633 is the only place that sets rmapp's last bit is set
to 1. IMHO the condition on line 628 would never be true.

Please let me know if I am wrong.

Thanks and Regards,
Prasad
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] virtio: Use ioeventfd for virtqueue notify

2010-11-11 Thread Michael S. Tsirkin
On Thu, Nov 11, 2010 at 01:47:19PM +, Stefan Hajnoczi wrote:
 This is a rewrite of the virtio-ioeventfd patchset to work at the virtio-pci.c
 level instead of virtio.c.  This results in better integration with the
 host/guest notifier code and makes the code simpler (no more state machine).

Yep, much cleaner. Sent some comments separately.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 00/14] boot order specification

2010-11-11 Thread Blue Swirl
On Thu, Nov 11, 2010 at 10:21 AM, Gerd Hoffmann kra...@redhat.com wrote:
 On 11/10/10 18:14, Gleb Natapov wrote:

 This is current sate of the patch series for people to comment on.
 I am using open firmware naming scheme to specify device path names.

 Names look like this on pci machine:
 /p...@i0cf8/i...@1,1/dr...@1/d...@0
 /p...@i0cf8/i...@1/f...@03f1/flo...@1
 /p...@i0cf8/i...@1/f...@03f1/flo...@0
 /p...@i0cf8/i...@1,1/dr...@1/d...@1
 /p...@i0cf8/i...@1,1/dr...@0/d...@0
 /p...@i0cf8/s...@3/d...@0
 /p...@i0cf8/ether...@4/ethernet-...@0
 /p...@i0cf8/ether...@5/ethernet-...@0
 /p...@i0cf8/i...@1,1/dr...@0/d...@1
 /p...@i0cf8/i...@1/i...@01e8/dr...@0/d...@0
 /p...@i0cf8/u...@1,2/netw...@0/ether...@0
 /p...@i0cf8/u...@1,2/h...@1/netw...@0/ether...@0

 Good stuff overall, but see replies to patches for some nits.

 IIRC some powerpc (+sparc?) boards pass a device tree to the guest.  So with
 this (and maybe some more bits) we might be able to dynamically generate a
 device tree from our qdev tree?  Any comments from the ppc/sparc folks on
 this?

Sparc does not use the device tree in real life, except a binary
machine description is used by Sun4v hypervisor. But using a device
tree could simplify how OpenBIOS (for both Sparc and PPC) supports
different machine types.

Device tree is already used by for example Bamboo (PPC) and Petalogix
(Microblaze) boards.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem booting Microsoft Windows KVM virtual machine

2010-11-11 Thread RaSca

Il giorno Gio 04 Nov 2010 14:04:59 CET, Martin Maurer ha scritto:

Hi,
Before you begin, prepare the running w2k to use ide disk (and boot on KVM with 
ide disks)
For w2k I followed this link (solution 2 worked for me): 
http://www.motherboard.windowsreinstall.com/problems.htm
And I am using clonezilla live cd´s to do cloning jobs, for me this is a fast 
and convenient way.
Also, take a look on our 
http://pve.proxmox.com/wiki/Migration_of_servers_to_Proxmox_VE, including some 
hints for several sceanrios.
Br, Martin


Hi Martin,
I can finally confirm that the solution 2 of the link you gave me worked 
for me too!
I followed those indications before creating a vmdk with vmware 
converter, then I converted the vmdk in a raw image with qemu-img 
convert and finally doing a dd in the drbd device done the trick!


Still remains the fact that doing P2V with Win2k is some kind of wild 
horse (for example if you use the vmvga video driver it freezes), but 
you give me a big hand. So thank you very much!


--
Raoul Scarazzini
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
ra...@miamammausalinux.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Always false condition in rmap_add

2010-11-11 Thread Joerg Roedel
On Thu, Nov 11, 2010 at 06:38:47PM +, Prasad Joshi wrote:
 Hello All,
 
 I have question on code of rmap_add
 
 Here is the code of the function
 
  613 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
  614 {
 
 624 rmapp = gfn_to_rmap(vcpu-kvm, gfn, sp-role.level);
  625 if (!*rmapp) {
  626 rmap_printk(rmap_add: %p %llx 0-1\n, spte, *spte);
  627 *rmapp = (unsigned long)spte;
  628 } else if (!(*rmapp  1)) {
  629 rmap_printk(rmap_add: %p %llx 1-many\n, spte, *spte);
  630 desc = mmu_alloc_rmap_desc(vcpu);
  631 desc-sptes[0] = (u64 *)*rmapp;
  632 desc-sptes[1] = spte;
  633 *rmapp = (unsigned long)desc | 1;
  634 ++count;
  635 } else {
 
 The line 628 checks whether the last bit of the rmapp is 1.
 If it is one then line 633 assigns a new value to rmapp with and sets
 the last bit to 1.

No, line 628 checks if the last bit is not 1, so the check works.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ceph/rbd block driver for qemu-kvm (v7)

2010-11-11 Thread Stefan Hajnoczi
On Fri, Oct 15, 2010 at 8:54 PM, Christian Brunner c...@muc.de wrote:
 Hi,

 once again, Yehuda committed fixes for all the suggestions made on the
 list (and more). Here is the next update for the ceph/rbd block driver.

 Please let us know if there are any pending issues.

 For those who didn't follow the previous postings:

 This is an block driver for the distributed file system Ceph
 (http://ceph.newdream.net/). This driver uses librados (which
 is part of the Ceph server) for direct access to the Ceph object
 store and is running entirely in userspace (Yehuda also
 wrote a driver for the linux kernel, that can be used to access
 rbd volumes as a block device).

 Kind Regards,
 Christian

 Signed-off-by: Christian Brunner c...@muc.de
 Signed-off-by: Yehuda Sadeh yeh...@hq.newdream.net
 ---
  Makefile.objs |1 +
  block/rbd.c   | 1059 
 +
  block/rbd_types.h |   71 
  configure |   31 ++
  4 files changed, 1162 insertions(+), 0 deletions(-)
  create mode 100644 block/rbd.c
  create mode 100644 block/rbd_types.h

This patch is close now.  Just some minor issues below.


 diff --git a/Makefile.objs b/Makefile.objs
 index 6ee077c..56a13c1 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -19,6 +19,7 @@ block-nested-y += parallels.o nbd.o blkdebug.o
  block-nested-$(CONFIG_WIN32) += raw-win32.o
  block-nested-$(CONFIG_POSIX) += raw-posix.o
  block-nested-$(CONFIG_CURL) += curl.o
 +block-nested-$(CONFIG_RBD) += rbd.o

  block-obj-y +=  $(addprefix block/, $(block-nested-y))

 diff --git a/block/rbd.c b/block/rbd.c
 new file mode 100644
 index 000..fbfb93e
 --- /dev/null
 +++ b/block/rbd.c
 @@ -0,0 +1,1059 @@
 +/*
 + * QEMU Block driver for RADOS (Ceph)
 + *
 + * Copyright (C) 2010 Christian Brunner c...@muc.de
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.  See
 + * the COPYING file in the top-level directory.
 + *
 + */
 +
 +#include qemu-common.h
 +#include qemu-error.h
 +
 +#include rbd_types.h
 +#include block_int.h
 +
 +#include rados/librados.h
 +
 +
 +
 +/*
 + * When specifying the image filename use:
 + *
 + * rbd:poolname/devicename
 + *
 + * poolname must be the name of an existing rados pool
 + *
 + * devicename is the basename for all objects used to
 + * emulate the raw device.
 + *
 + * Metadata information (image size, ...) is stored in an
 + * object with the name devicename.rbd.
 + *
 + * The raw device is split into 4MB sized objects by default.
 + * The sequencenumber is encoded in a 12 byte long hex-string,
 + * and is attached to the devicename, separated by a dot.
 + * e.g. devicename.1234567890ab
 + *
 + */
 +
 +#define OBJ_MAX_SIZE (1UL  OBJ_DEFAULT_OBJ_ORDER)
 +
 +typedef struct RBDAIOCB {
 +BlockDriverAIOCB common;
 +QEMUBH *bh;
 +int ret;
 +QEMUIOVector *qiov;
 +char *bounce;
 +int write;
 +int64_t sector_num;
 +int aiocnt;
 +int error;
 +struct BDRVRBDState *s;
 +int cancelled;
 +} RBDAIOCB;
 +
 +typedef struct RADOSCB {
 +int rcbid;
 +RBDAIOCB *acb;
 +struct BDRVRBDState *s;
 +int done;
 +int64_t segsize;
 +char *buf;
 +int ret;
 +} RADOSCB;
 +
 +#define RBD_FD_READ 0
 +#define RBD_FD_WRITE 1
 +
 +typedef struct BDRVRBDState {
 +int fds[2];
 +rados_pool_t pool;
 +rados_pool_t header_pool;
 +char name[RBD_MAX_OBJ_NAME_SIZE];
 +char block_name[RBD_MAX_BLOCK_NAME_SIZE];
 +uint64_t size;
 +uint64_t objsize;
 +int qemu_aio_count;
 +int read_only;
 +int event_reader_pos;
 +RADOSCB *event_rcb;
 +} BDRVRBDState;
 +
 +typedef struct rbd_obj_header_ondisk RbdHeader1;
 +
 +static int rbd_next_tok(char *dst, int dst_len,
 +char *src, char delim,
 +const char *name,
 +char **p)
 +{
 +int l;
 +char *end;
 +
 +*p = NULL;
 +
 +if (delim != '\0') {
 +end = strchr(src, delim);
 +if (end) {
 +*p = end + 1;
 +*end = '\0';
 +}
 +}
 +l = strlen(src);
 +if (l = dst_len) {
 +error_report(%s too long, name);
 +return -EINVAL;
 +} else if (l == 0) {
 +error_report(%s too short, name);
 +return -EINVAL;
 +}
 +
 +pstrcpy(dst, dst_len, src);
 +
 +return 0;
 +}
 +
 +static int rbd_parsename(const char *filename,
 + char *pool, int pool_len,
 + char *snap, int snap_len,
 + char *name, int name_len)
 +{
 +const char *start;
 +char *p, *buf;
 +int ret;
 +
 +if (!strstart(filename, rbd:, start)) {
 +return -EINVAL;
 +}
 +
 +buf = qemu_strdup(start);
 +p = buf;
 +
 +ret = rbd_next_tok(pool, pool_len, p, '/', pool name, p);
 +if (ret  0 || !p) {
 +ret = -EINVAL;
 +goto done;
 +}
 +ret = rbd_next_tok(name, name_len, p, '@', object name, p);
 +if (ret  0) {
 + 

Re: [PATCH -v2] Monitor command: pfa2hva, translate guest physical address to host virtual address

2010-11-11 Thread Huang Ying
On Thu, 2010-11-11 at 17:39 +0800, Avi Kivity wrote:
 On 11/11/2010 02:56 AM, Huang Ying wrote:
  On Thu, 2010-11-11 at 00:49 +0800, Anthony Liguori wrote:
On 11/10/2010 02:34 AM, Avi Kivity wrote:
  Why the gpa -   hva mapping is not
  consistent for RAM if -mempath is not used?

  Video RAM in the range a-b and PCI mapped RAM can change gpas
  (while remaining in the same hva).

  Even for ordinary RAM, which doesn't normally change gpa/hva, I'm not
  sure we want to guarantee that it won't.
  
We can't universally either.  Memory hot remove potentially breaks the
mapping and some non-x86 architectures (like ARM) can alias RAM via a
guest triggered mechanism.
 
  Thanks for clarification. Now I think we have two options.
 
  1) Clearly mark gpa2hva (pfa2hva now, should renamed to gpa2hva) as a
  testing only interface, and should be used only on restricted
  environment (normal memory, without hot-remove, maybe only on x86).
 
  2) Find some way to lock the gpa -  hva mapping during operating. Such
  as gpa2hva_begin and gpa2hva_end and lock the mapping in between.
 
  I think 2) may be possible. But if there are no other users, why do that
  for a test case? So I think 1) may be the better option.
 
 
 3) Move the poisoning code into qemu, so the command becomes
 
 posison-address addr
 
 (though physical addresses aren't stable either)

Poisoning includes:

a) gva - gpa
b) gpa - hva
c) hva - hpa
d) inject the MCE into host via some external tool

poison-address need to do b, c, d. Do you think it is good to do all
these inside qemu?

 4) Use -mempath on /dev/shm and poison a page in the backing file

If we can poison a page in the backing file, how do we know the
corresponding gpa and hpa?

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] KVM: assigned dev: Big endian support for MSI-X MMIO

2010-11-11 Thread Sheng Yang
On Friday 12 November 2010 00:12:21 Michael S. Tsirkin wrote:
 On Thu, Nov 11, 2010 at 03:47:00PM +0800, Sheng Yang wrote:
  Add marco for big-endian machine.(Untested!)
  
  Signed-off-by: Sheng Yang sh...@linux.intel.com
 
 I presume this is tested at the same level as the previous patch?
 So you want to fold this into the previous patch.

Of course tested on Intel's platform, but it didn't make sense for big-endian 
patch...

I'd like to make it separate still, because I can't test it properly.
 
 Also, please build with sparse (C=1) to find some endian-ness issues.

Sure. (Have it run with the patches, nothing new found...)

 
  ---
  
   virt/kvm/assigned-dev.c |7 ---
   1 files changed, 4 insertions(+), 3 deletions(-)
  
  diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
  index 3010d7d..15b5f74 100644
  --- a/virt/kvm/assigned-dev.c
  +++ b/virt/kvm/assigned-dev.c
  @@ -848,9 +848,9 @@ static int msix_mmio_read(struct kvm_io_device *this,
  gpa_t addr, int len,
  
  idx = (addr - adev-msix_mmio_base) / PCI_MSIX_ENTRY_SIZE;
  if ((addr % PCI_MSIX_ENTRY_SIZE) ==
  
  PCI_MSIX_ENTRY_VECTOR_CTRL)
  
  -   *(unsigned long *)val =
  +   *(unsigned long *)val = le32_to_cpu(
 
 This should be cpu_to_le32. And val cast to __le32.

I think we can just cast val to __le32 here? The result should be big-endian.

No, no, I suppose all parameter of msix_mmio_read/write should be little-endian 
because it's being written into PCI MSI-X table?
 
  test_bit(idx, adev-msix_mask_bitmap) ?
  
  -   PCI_MSIX_ENTRY_CTRL_MASKBIT : 0;
  +   PCI_MSIX_ENTRY_CTRL_MASKBIT : 0);
  
  else
  
  r = -EOPNOTSUPP;
  
  goto out;
 
 In this function,
 entry must be __le32 too, and fix up endian-ness where you fill it in.
 
  @@ -869,6 +869,7 @@ static int msix_mmio_read(struct kvm_io_device *this,
  gpa_t addr, int len,
  
  adev-msix_mask_bitmap);
  
  memcpy(val, entry[addr % PCI_MSIX_ENTRY_SIZE / sizeof *entry], len);
  
  +   *(unsigned long *)val = le32_to_cpu(*(unsigned long *)val);
  
   out:
  mutex_unlock(adev-kvm-lock);
  return r;
  
  @@ -881,7 +882,7 @@ static int msix_mmio_write(struct kvm_io_device
  *this, gpa_t addr, int len,
  
  container_of(this, struct kvm_assigned_dev_kernel,
  
   msix_mmio_dev);
  
  int idx, r = 0;
  
  -   unsigned long new_val = *(unsigned long *)val;
  +   unsigned long new_val = cpu_to_le32(*(unsigned long *)val);
 
 __le32

Should it be 

unsigned long new_val = le32_to_cpu(*(unsigned long *)val);

*val should be little endian I guess?

Or Michael, could you provide a patch for this? I really don't think I can 
guarantee correctness of this patch...

--
regards
Yang, Sheng

 
  mutex_lock(adev-kvm-lock);
  if (!msix_mmio_in_range(adev, addr, len)) {
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/8] PCI capability and device assignment improvements

2010-11-11 Thread Alex Williamson
This series attempts to clean up capability support between common
code and device assignment.  In doing so, we can move existing MSI 
MSI-X capabilities to offsets matching real hardware, and further
enable more capabilities to be exposed.

The last patch is only for RFC, I'd like some input on what we should
pass directly and where we should only provide read-only/emulated
access.  Patches 1-7 are submitted for commit.  Thanks,

Alex

---

Alex Williamson (8):
  device-assignment: pass through and stub more PCI caps
  pci: Pass ID for capability read/write handlers
  device-assignment: Move PCI capabilities to match physical hardware
  pci: Remove cap.length, cap.start, cap.supported
  pci: Replace used bitmap with capability byte map
  device-assignment: Use PCI capabilities support
  pci: Remove pci_enable_capability_support()
  pci: pci_default_cap_write_config ignores wmask


 hw/device-assignment.c |  273 
 hw/pci.c   |  103 +++---
 hw/pci.h   |   25 ++--
 3 files changed, 256 insertions(+), 145 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/8] pci: pci_default_cap_write_config ignores wmask

2010-11-11 Thread Alex Williamson
Make use of wmask, just like the rest of config space.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/pci.c |   19 ---
 1 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 92aaa85..12c47ac 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1175,13 +1175,14 @@ uint32_t pci_default_read_config(PCIDevice *d,
 return pci_read_config(d, address, len);
 }
 
-static void pci_write_config(PCIDevice *pci_dev,
- uint32_t address, uint32_t val, int len)
+static void pci_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 {
 int i;
-for (i = 0; i  len; i++) {
-pci_dev-config[address + i] = val  0xff;
-val = 8;
+uint32_t config_size = pci_config_size(d);
+
+for (i = 0; i  l  addr + i  config_size; val = 8, ++i) {
+uint8_t wmask = d-wmask[addr + i];
+d-config[addr + i] = (d-config[addr + i]  ~wmask) | (val  wmask);
 }
 }
 
@@ -1207,18 +1208,14 @@ void pci_default_cap_write_config(PCIDevice *pci_dev,
 
 void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 {
-int i, was_irq_disabled = pci_irq_disabled(d);
-uint32_t config_size = pci_config_size(d);
+int was_irq_disabled = pci_irq_disabled(d);
 
 if (pci_access_cap_config(d, addr, l)) {
 d-cap.config_write(d, addr, val, l);
 return;
 }
 
-for (i = 0; i  l  addr + i  config_size; val = 8, ++i) {
-uint8_t wmask = d-wmask[addr + i];
-d-config[addr + i] = (d-config[addr + i]  ~wmask) | (val  wmask);
-}
+pci_write_config(d, addr, val, l);
 
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
 if (kvm_enabled()  kvm_irqchip_in_kernel() 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/8] pci: Remove pci_enable_capability_support()

2010-11-11 Thread Alex Williamson
This interface doesn't make much sense, adding a capability can
take care of everything, just provide a means to register
capability read/write handlers.

Device assignment does it's own thing, so requires a couple
ugly hacks that will be cleaned by subsequent patches.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/device-assignment.c |   12 ---
 hw/pci.c   |   52 +---
 hw/pci.h   |9 +++-
 3 files changed, 35 insertions(+), 38 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index bde231d..311c197 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1292,7 +1292,12 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 PCIRegion *pci_region = dev-real_device.regions;
 int next_cap_pt = 0;
 
+pci_dev-cap.supported = 1;
+pci_dev-cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR;
 pci_dev-cap.length = 0;
+pci_dev-config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+pci_dev-config[PCI_CAPABILITY_LIST] = pci_dev-cap.start;
+
 #ifdef KVM_CAP_IRQ_ROUTING
 #ifdef KVM_CAP_DEVICE_MSI
 /* Expose MSI capability
@@ -1468,9 +1473,10 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 dev-h_busnr = dev-host.bus;
 dev-h_devfn = PCI_DEVFN(dev-host.dev, dev-host.func);
 
-if (pci_enable_capability_support(pci_dev, 0, NULL,
-assigned_device_pci_cap_write_config,
-assigned_device_pci_cap_init)  0)
+pci_register_capability_handlers(pci_dev, NULL,
+ assigned_device_pci_cap_write_config);
+
+if (assigned_device_pci_cap_init(pci_dev)  0)
 goto out;
 
 /* assign device to guest */
diff --git a/hw/pci.c b/hw/pci.c
index 12c47ac..6b2b320 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -787,6 +787,8 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 config_write = pci_default_write_config;
 pci_dev-config_read = config_read;
 pci_dev-config_write = config_write;
+pci_dev-cap.config_read = pci_default_cap_read_config;
+pci_dev-cap.config_write = pci_default_cap_write_config;
 bus-devices[devfn] = pci_dev;
 pci_dev-irq = qemu_allocate_irqs(pci_set_irq, pci_dev, PCI_NUM_PINS);
 pci_dev-version_id = 2; /* Current pci device vmstate version */
@@ -1893,35 +1895,21 @@ PCIDevice *pci_create_simple_multifunction(PCIBus *bus, 
int devfn,
 return dev;
 }
 
-int pci_enable_capability_support(PCIDevice *pci_dev,
-  uint32_t config_start,
-  PCICapConfigReadFunc *config_read,
-  PCICapConfigWriteFunc *config_write,
-  PCICapConfigInitFunc *config_init)
+void pci_register_capability_handlers(PCIDevice *pdev,
+  PCICapConfigReadFunc *config_read,
+  PCICapConfigWriteFunc *config_write)
 {
-if (!pci_dev)
-return -ENODEV;
-
-pci_dev-config[0x06] |= 0x10; // status = capabilities
-
-if (config_start == 0)
-   pci_dev-cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR;
-else if (config_start = 0x40  config_start  0xff)
-pci_dev-cap.start = config_start;
-else
-return -EINVAL;
+if (config_read) {
+pdev-cap.config_read = config_read;
+} else {
+pdev-cap.config_read = pci_default_cap_read_config;
+}
 
-if (config_read)
-pci_dev-cap.config_read = config_read;
-else
-pci_dev-cap.config_read = pci_default_cap_read_config;
-if (config_write)
-pci_dev-cap.config_write = config_write;
-else
-pci_dev-cap.config_write = pci_default_cap_write_config;
-pci_dev-cap.supported = 1;
-pci_dev-config[PCI_CAPABILITY_LIST] = pci_dev-cap.start;
-return config_init(pci_dev);
+if (config_write) {
+pdev-cap.config_write = config_write;
+} else {
+pdev-cap.config_write = pci_default_cap_write_config;
+}
 }
 
 PCIDevice *pci_create(PCIBus *bus, int devfn, const char *name)
@@ -2045,12 +2033,16 @@ int pci_add_capability_at_offset(PCIDevice *pdev, 
uint8_t cap_id,
 config[PCI_CAP_LIST_ID] = cap_id;
 config[PCI_CAP_LIST_NEXT] = pdev-config[PCI_CAPABILITY_LIST];
 pdev-config[PCI_CAPABILITY_LIST] = offset;
-pdev-config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
 memset(pdev-used + offset, 0xFF, size);
 /* Make capability read-only by default */
 memset(pdev-wmask + offset, 0, size);
 /* Check capability by default */
 memset(pdev-cmask + offset, 0xFF, size);
+
+pdev-config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+pdev-cap.supported = 1;
+pdev-cap.start = pdev-cap.start ? MIN(pdev-cap.start, offset) : offset;
+
 return offset;
 }
 
@@ -2078,8 +2070,10 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, 
uint8_t size)
 

[PATCH 3/8] device-assignment: Use PCI capabilities support

2010-11-11 Thread Alex Williamson
Convert to use common pci_add_capabilities() rather than creating
our own mess.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/device-assignment.c |  112 +++-
 1 files changed, 63 insertions(+), 49 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 311c197..74cdd26 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -38,6 +38,7 @@
 #include device-assignment.h
 #include loader.h
 #include monitor.h
+#include range.h
 #include pci/header.h
 
 /* From linux/ioport.h */
@@ -1075,17 +1076,17 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, 
unsigned int ctrl_pos)
 }
 
 if (ctrl_byte  PCI_MSI_FLAGS_ENABLE) {
+int pos = ctrl_pos - PCI_MSI_FLAGS;
 assigned_dev-entry = calloc(1, sizeof(struct kvm_irq_routing_entry));
 if (!assigned_dev-entry) {
 perror(assigned_dev_update_msi: );
 return;
 }
 assigned_dev-entry-u.msi.address_lo =
-*(uint32_t *)(pci_dev-config + pci_dev-cap.start +
-  PCI_MSI_ADDRESS_LO);
+pci_get_long(pci_dev-config + pos + PCI_MSI_ADDRESS_LO);
 assigned_dev-entry-u.msi.address_hi = 0;
-assigned_dev-entry-u.msi.data = *(uint16_t *)(pci_dev-config +
-pci_dev-cap.start + PCI_MSI_DATA_32);
+assigned_dev-entry-u.msi.data =
+pci_get_word(pci_dev-config + pos + PCI_MSI_DATA_32);
 assigned_dev-entry-type = KVM_IRQ_ROUTING_MSI;
 r = kvm_get_irq_route_gsi();
 if (r  0) {
@@ -1123,10 +1124,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 struct kvm_assigned_msix_entry msix_entry;
 void *va = adev-msix_table_page;
 
-if (adev-cap.available  ASSIGNED_DEVICE_CAP_MSI)
-pos = pci_dev-cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
-else
-pos = pci_dev-cap.start;
+pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
 
 entries_max_nr = *(uint16_t *)(pci_dev-config + pos + 2);
 entries_max_nr = PCI_MSIX_TABSIZE;
@@ -1260,26 +1258,23 @@ static void 
assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
  uint32_t val, int len)
 {
 AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
-unsigned int pos = pci_dev-cap.start, ctrl_pos;
 
 pci_default_cap_write_config(pci_dev, address, val, len);
 #ifdef KVM_CAP_IRQ_ROUTING
 #ifdef KVM_CAP_DEVICE_MSI
 if (assigned_dev-cap.available  ASSIGNED_DEVICE_CAP_MSI) {
-ctrl_pos = pos + PCI_MSI_FLAGS;
-if (address = ctrl_pos  address + len  ctrl_pos)
-assigned_dev_update_msi(pci_dev, ctrl_pos);
-pos += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
+int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSI);
+if (ranges_overlap(address, len, pos + PCI_MSI_FLAGS, 1)) {
+assigned_dev_update_msi(pci_dev, pos + PCI_MSI_FLAGS);
+}
 }
 #endif
 #ifdef KVM_CAP_DEVICE_MSIX
 if (assigned_dev-cap.available  ASSIGNED_DEVICE_CAP_MSIX) {
-ctrl_pos = pos + 3;
-if (address = ctrl_pos  address + len  ctrl_pos) {
-ctrl_pos--; /* control is word long */
-assigned_dev_update_msix(pci_dev, ctrl_pos);
+int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
+if (ranges_overlap(address, len, pos + PCI_MSIX_FLAGS + 1, 1)) {
+assigned_dev_update_msix(pci_dev, pos + PCI_MSIX_FLAGS);
}
-pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
 }
 #endif
 #endif
@@ -1290,58 +1285,77 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 {
 AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev);
 PCIRegion *pci_region = dev-real_device.regions;
-int next_cap_pt = 0;
 
-pci_dev-cap.supported = 1;
-pci_dev-cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR;
+/* Clear initial capabilities pointer and status copied from hw */
+pci_set_byte(pci_dev-config + PCI_CAPABILITY_LIST, 0);
+pci_set_word(pci_dev-config + PCI_STATUS,
+ pci_get_word(pci_dev-config + PCI_STATUS) 
+ ~PCI_STATUS_CAP_LIST);
+
 pci_dev-cap.length = 0;
-pci_dev-config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
-pci_dev-config[PCI_CAPABILITY_LIST] = pci_dev-cap.start;
 
 #ifdef KVM_CAP_IRQ_ROUTING
 #ifdef KVM_CAP_DEVICE_MSI
 /* Expose MSI capability
  * MSI capability is the 1st capability in capability config */
 if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) {
+int vpos, ppos;
+uint16_t flags;
+
 dev-cap.available |= ASSIGNED_DEVICE_CAP_MSI;
-memset(pci_dev-config[pci_dev-cap.start + pci_dev-cap.length],
-   0, PCI_CAPABILITY_CONFIG_MSI_LENGTH);
-pci_dev-config[pci_dev-cap.start + pci_dev-cap.length] =
-PCI_CAP_ID_MSI;
+vpos = 

[PATCH 4/8] pci: Replace used bitmap with capability byte map

2010-11-11 Thread Alex Williamson
Capabilities are allocated in bytes, so we can track both whether
a byte is used and by what capability in the same structure.

Remove pci_reserve_capability() as there are no users.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/pci.c |   16 +---
 hw/pci.h |6 ++
 2 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 6b2b320..e1e8a77 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -730,7 +730,7 @@ static void pci_config_alloc(PCIDevice *pci_dev)
 pci_dev-config = qemu_mallocz(config_size);
 pci_dev-cmask = qemu_mallocz(config_size);
 pci_dev-wmask = qemu_mallocz(config_size);
-pci_dev-used = qemu_mallocz(config_size);
+pci_dev-cap_map = qemu_mallocz(config_size);
 }
 
 static void pci_config_free(PCIDevice *pci_dev)
@@ -738,7 +738,7 @@ static void pci_config_free(PCIDevice *pci_dev)
 qemu_free(pci_dev-config);
 qemu_free(pci_dev-cmask);
 qemu_free(pci_dev-wmask);
-qemu_free(pci_dev-used);
+qemu_free(pci_dev-cap_map);
 }
 
 /* -1 for devfn means auto assign */
@@ -1928,7 +1928,7 @@ static int pci_find_space(PCIDevice *pdev, uint8_t size)
 int offset = PCI_CONFIG_HEADER_SIZE;
 int i;
 for (i = PCI_CONFIG_HEADER_SIZE; i  config_size; ++i)
-if (pdev-used[i])
+if (pdev-cap_map[i])
 offset = i + 1;
 else if (i - offset + 1 == size)
 return offset;
@@ -2033,7 +2033,7 @@ int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t 
cap_id,
 config[PCI_CAP_LIST_ID] = cap_id;
 config[PCI_CAP_LIST_NEXT] = pdev-config[PCI_CAPABILITY_LIST];
 pdev-config[PCI_CAPABILITY_LIST] = offset;
-memset(pdev-used + offset, 0xFF, size);
+memset(pdev-cap_map + offset, cap_id, size);
 /* Make capability read-only by default */
 memset(pdev-wmask + offset, 0, size);
 /* Check capability by default */
@@ -2068,7 +2068,7 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, 
uint8_t size)
 memset(pdev-wmask + offset, 0xff, size);
 /* Clear cmask as device-specific registers can't be checked */
 memset(pdev-cmask + offset, 0, size);
-memset(pdev-used + offset, 0, size);
+memset(pdev-cap_map + offset, 0, size);
 
 if (!pdev-config[PCI_CAPABILITY_LIST]) {
 pdev-config[PCI_STATUS] = ~PCI_STATUS_CAP_LIST;
@@ -2076,12 +2076,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, 
uint8_t size)
 }
 }
 
-/* Reserve space for capability at a known offset (to call after load). */
-void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size)
-{
-memset(pdev-used + offset, 0xff, size);
-}
-
 uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
 {
 return pci_find_capability_list(pdev, cap_id, NULL);
diff --git a/hw/pci.h b/hw/pci.h
index 0712e55..2265c70 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -151,8 +151,8 @@ struct PCIDevice {
 /* Used to implement R/W bytes */
 uint8_t *wmask;
 
-/* Used to allocate config space for capabilities. */
-uint8_t *used;
+/* Used to allocate config space and track capabilities. */
+uint8_t *cap_map;
 
 /* the following fields are read only */
 PCIBus *bus;
@@ -239,8 +239,6 @@ int pci_add_capability_at_offset(PCIDevice *pci_dev, 
uint8_t cap_id,
 
 void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
 
-void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t size);
-
 uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
 
 uint32_t pci_default_read_config(PCIDevice *d,

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/8] pci: Remove cap.length, cap.start, cap.supported

2010-11-11 Thread Alex Williamson
Capabilities aren't required to be contiguous, so cap.length never
really made much sense.  Likewise, cap.start is mostly meaningless
too.  Both of these are better served by the capability map.  We
can also get rid of cap.supported, since it's really now unused
and redundant with flag in the status word anyway.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/device-assignment.c |4 
 hw/pci.c   |8 +---
 hw/pci.h   |2 --
 3 files changed, 1 insertions(+), 13 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 74cdd26..322fa9f 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1292,8 +1292,6 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
  pci_get_word(pci_dev-config + PCI_STATUS) 
  ~PCI_STATUS_CAP_LIST);
 
-pci_dev-cap.length = 0;
-
 #ifdef KVM_CAP_IRQ_ROUTING
 #ifdef KVM_CAP_DEVICE_MSI
 /* Expose MSI capability
@@ -1320,7 +1318,6 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
  PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
 pci_set_long(pci_dev-wmask + vpos + PCI_MSI_ADDRESS_LO, 0xfffc);
 pci_set_long(pci_dev-wmask + vpos + PCI_MSI_DATA_32, 0x);
-pci_dev-cap.length += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
 }
 #endif
 #ifdef KVM_CAP_DEVICE_MSIX
@@ -1356,7 +1353,6 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 bar_nr = msix_table_entry  PCI_MSIX_BIR;
 msix_table_entry = ~PCI_MSIX_BIR;
 dev-msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
-pci_dev-cap.length += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
 }
 #endif
 #endif
diff --git a/hw/pci.c b/hw/pci.c
index e1e8a77..d566e33 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1190,10 +1190,7 @@ static void pci_write_config(PCIDevice *d, uint32_t 
addr, uint32_t val, int l)
 
 int pci_access_cap_config(PCIDevice *pci_dev, uint32_t address, int len)
 {
-if (pci_dev-cap.supported  address = pci_dev-cap.start 
-(address + len)  pci_dev-cap.start + pci_dev-cap.length)
-return 1;
-return 0;
+return pci_dev-cap_map[address];
 }
 
 uint32_t pci_default_cap_read_config(PCIDevice *pci_dev,
@@ -2040,8 +2037,6 @@ int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t 
cap_id,
 memset(pdev-cmask + offset, 0xFF, size);
 
 pdev-config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
-pdev-cap.supported = 1;
-pdev-cap.start = pdev-cap.start ? MIN(pdev-cap.start, offset) : offset;
 
 return offset;
 }
@@ -2072,7 +2067,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, 
uint8_t size)
 
 if (!pdev-config[PCI_CAPABILITY_LIST]) {
 pdev-config[PCI_STATUS] = ~PCI_STATUS_CAP_LIST;
-pdev-cap.start = pdev-cap.length = 0;
 }
 }
 
diff --git a/hw/pci.h b/hw/pci.h
index 2265c70..177008a 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -208,8 +208,6 @@ struct PCIDevice {
 
 /* Device capability configuration space */
 struct {
-int supported;
-unsigned int start, length;
 PCICapConfigReadFunc *config_read;
 PCICapConfigWriteFunc *config_write;
 } cap;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] device-assignment: Move PCI capabilities to match physical hardware

2010-11-11 Thread Alex Williamson
Now that common PCI code doesn't have a hangup on capabilities
being contiguous, move assigned device capabilities to match
their offset on physical hardware.  This helps for drivers that
assume a capability configuration and don't bother searching.

We can also remove several calls to assigned_dev_pci_read_* because
we're overlaying the capability at the same location as the initial
copy we made of config space.  We can therefore just use pci_get_*.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/device-assignment.c |   67 +++-
 1 files changed, 21 insertions(+), 46 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 322fa9f..39f19be 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -366,16 +366,6 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, 
int pos)
 return (uint8_t)assigned_dev_pci_read(d, pos, 1);
 }
 
-static uint16_t assigned_dev_pci_read_word(PCIDevice *d, int pos)
-{
-return (uint16_t)assigned_dev_pci_read(d, pos, 2);
-}
-
-static uint32_t assigned_dev_pci_read_long(PCIDevice *d, int pos)
-{
-return assigned_dev_pci_read(d, pos, 4);
-}
-
 static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
 {
 int id;
@@ -1285,6 +1275,7 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 {
 AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev);
 PCIRegion *pci_region = dev-real_device.regions;
+int pos;
 
 /* Clear initial capabilities pointer and status copied from hw */
 pci_set_byte(pci_dev-config + PCI_CAPABILITY_LIST, 0);
@@ -1296,60 +1287,44 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 #ifdef KVM_CAP_DEVICE_MSI
 /* Expose MSI capability
  * MSI capability is the 1st capability in capability config */
-if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) {
-int vpos, ppos;
-uint16_t flags;
-
+if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI))) {
 dev-cap.available |= ASSIGNED_DEVICE_CAP_MSI;
-vpos = pci_add_capability(pci_dev, PCI_CAP_ID_MSI,
-  PCI_CAPABILITY_CONFIG_MSI_LENGTH);
-
-memset(pci_dev-config + vpos + PCI_CAP_FLAGS, 0,
-   PCI_CAPABILITY_CONFIG_MSI_LENGTH - PCI_CAP_FLAGS);
+pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_MSI, pos,
+ PCI_CAPABILITY_CONFIG_MSI_LENGTH);
 
 /* Only 32-bit/no-mask currently supported */
-ppos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI);
-flags = assigned_dev_pci_read_word(pci_dev, ppos + PCI_MSI_FLAGS);
-flags = PCI_MSI_FLAGS_QMASK;
-pci_set_word(pci_dev-config + vpos + PCI_MSI_FLAGS, flags);
+pci_set_word(pci_dev-config + pos + PCI_MSI_FLAGS,
+ pci_get_word(pci_dev-config + pos + PCI_MSI_FLAGS) 
+ PCI_MSI_FLAGS_QMASK);
+pci_set_long(pci_dev-config + pos + PCI_MSI_ADDRESS_LO, 0);
+pci_set_word(pci_dev-config + pos + PCI_MSI_DATA_32, 0);
 
 /* Set writable fields */
-pci_set_word(pci_dev-wmask + vpos + PCI_MSI_FLAGS,
+pci_set_word(pci_dev-wmask + pos + PCI_MSI_FLAGS,
  PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
-pci_set_long(pci_dev-wmask + vpos + PCI_MSI_ADDRESS_LO, 0xfffc);
-pci_set_long(pci_dev-wmask + vpos + PCI_MSI_DATA_32, 0x);
+pci_set_long(pci_dev-wmask + pos + PCI_MSI_ADDRESS_LO, 0xfffc);
+pci_set_word(pci_dev-wmask + pos + PCI_MSI_DATA_32, 0x);
 }
 #endif
 #ifdef KVM_CAP_DEVICE_MSIX
 /* Expose MSI-X capability */
-if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX)) {
-int vpos, ppos, entry_nr, bar_nr;
+if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX))) {
+int bar_nr;
 uint32_t msix_table_entry;
 
 dev-cap.available |= ASSIGNED_DEVICE_CAP_MSIX;
-vpos = pci_add_capability(pci_dev, PCI_CAP_ID_MSIX,
-   PCI_CAPABILITY_CONFIG_MSIX_LENGTH);
+pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_MSIX, pos,
+ PCI_CAPABILITY_CONFIG_MSIX_LENGTH);
 
-memset(pci_dev-config + vpos + PCI_CAP_FLAGS, 0,
-   PCI_CAPABILITY_CONFIG_MSIX_LENGTH - PCI_CAP_FLAGS);
+pci_set_word(pci_dev-config + pos + PCI_MSIX_FLAGS,
+ pci_get_word(pci_dev-config + pos + PCI_MSIX_FLAGS) 
+ PCI_MSIX_TABSIZE);
 
 /* Only enable and function mask bits are writable */
-pci_set_word(pci_dev-wmask + vpos + PCI_MSIX_FLAGS,
+pci_set_word(pci_dev-wmask + pos + PCI_MSIX_FLAGS,
  PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
 
-ppos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX);
-
-entry_nr = assigned_dev_pci_read_word(pci_dev, ppos + PCI_MSIX_FLAGS);
-entry_nr = PCI_MSIX_TABSIZE;
-

[PATCH 7/8] pci: Pass ID for capability read/write handlers

2010-11-11 Thread Alex Williamson
Any handlers that actually want to interact with specific capabilities
are going to want to know the capability ID being accessed.  With the
capability map, this is readily available, so we can save handlers the
trouble of figuring it out.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/device-assignment.c |   36 ++--
 hw/pci.c   |   14 --
 hw/pci.h   |8 
 3 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 39f19be..179c7dc 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1244,30 +1244,38 @@ static void assigned_dev_update_msix(PCIDevice 
*pci_dev, unsigned int ctrl_pos)
 #endif
 #endif
 
-static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t 
address,
+static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
+ uint8_t cap_id,
+ uint32_t address,
  uint32_t val, int len)
 {
-AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
+pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
 
-pci_default_cap_write_config(pci_dev, address, val, len);
+switch (cap_id) {
 #ifdef KVM_CAP_IRQ_ROUTING
+case PCI_CAP_ID_MSI:
 #ifdef KVM_CAP_DEVICE_MSI
-if (assigned_dev-cap.available  ASSIGNED_DEVICE_CAP_MSI) {
-int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSI);
-if (ranges_overlap(address, len, pos + PCI_MSI_FLAGS, 1)) {
-assigned_dev_update_msi(pci_dev, pos + PCI_MSI_FLAGS);
+{
+uint8_t cap = pci_find_capability(pci_dev, cap_id);
+if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
+assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
+}
 }
-}
 #endif
+break;
+
+case PCI_CAP_ID_MSIX:
 #ifdef KVM_CAP_DEVICE_MSIX
-if (assigned_dev-cap.available  ASSIGNED_DEVICE_CAP_MSIX) {
-int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
-if (ranges_overlap(address, len, pos + PCI_MSIX_FLAGS + 1, 1)) {
-assigned_dev_update_msix(pci_dev, pos + PCI_MSIX_FLAGS);
-   }
-}
+{
+uint8_t cap = pci_find_capability(pci_dev, cap_id);
+if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
+assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
+}
+}
 #endif
+break;
 #endif
+}
 return;
 }
 
diff --git a/hw/pci.c b/hw/pci.c
index d566e33..e1c0917 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1168,10 +1168,11 @@ static uint32_t pci_read_config(PCIDevice *d,
 uint32_t pci_default_read_config(PCIDevice *d,
  uint32_t address, int len)
 {
+uint8_t cap_id;
 assert(len == 1 || len == 2 || len == 4);
 
-if (pci_access_cap_config(d, address, len)) {
-return d-cap.config_read(d, address, len);
+if ((cap_id = pci_access_cap_config(d, address, len))) {
+return d-cap.config_read(d, cap_id, address, len);
 }
 
 return pci_read_config(d, address, len);
@@ -1193,13 +1194,13 @@ int pci_access_cap_config(PCIDevice *pci_dev, uint32_t 
address, int len)
 return pci_dev-cap_map[address];
 }
 
-uint32_t pci_default_cap_read_config(PCIDevice *pci_dev,
+uint32_t pci_default_cap_read_config(PCIDevice *pci_dev, uint8_t cap_id,
  uint32_t address, int len)
 {
 return pci_read_config(pci_dev, address, len);
 }
 
-void pci_default_cap_write_config(PCIDevice *pci_dev,
+void pci_default_cap_write_config(PCIDevice *pci_dev, uint8_t cap_id,
   uint32_t address, uint32_t val, int len)
 {
 pci_write_config(pci_dev, address, val, len);
@@ -1208,9 +1209,10 @@ void pci_default_cap_write_config(PCIDevice *pci_dev,
 void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 {
 int was_irq_disabled = pci_irq_disabled(d);
+uint8_t cap_id;
 
-if (pci_access_cap_config(d, addr, l)) {
-d-cap.config_write(d, addr, val, l);
+if ((cap_id = pci_access_cap_config(d, addr, l))) {
+d-cap.config_write(d, cap_id, addr, val, l);
 return;
 }
 
diff --git a/hw/pci.h b/hw/pci.h
index 177008a..3f0b4e0 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -83,9 +83,9 @@ typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int 
region_num,
 pcibus_t addr, pcibus_t size, int type);
 typedef int PCIUnregisterFunc(PCIDevice *pci_dev);
 
-typedef void PCICapConfigWriteFunc(PCIDevice *pci_dev,
+typedef void PCICapConfigWriteFunc(PCIDevice *pci_dev, uint8_t cap_id,
uint32_t address, uint32_t val, int len);
-typedef uint32_t PCICapConfigReadFunc(PCIDevice *pci_dev,
+typedef 

[RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps

2010-11-11 Thread Alex Williamson
Some drivers depend on finding capabilities like power management,
PCI express/X, vital product data, or vendor specific fields.  Now
that we have better capability support, we can pass more of these
tables through to the guest.  Note that VPD and VNDR are direct pass
through capabilies, the rest are mostly empty shells with a few
writable bits where necessary.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/device-assignment.c |  160 +---
 1 files changed, 149 insertions(+), 11 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 179c7dc..1b228ad 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -366,6 +366,27 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, 
int pos)
 return (uint8_t)assigned_dev_pci_read(d, pos, 1);
 }
 
+static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, int 
len)
+{
+AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
+ssize_t ret;
+int fd = pci_dev-real_device.config_fd;
+
+again:
+ret = pwrite(fd, val, len, pos);
+if (ret != len) {
+   if ((ret  0)  (errno == EINTR || errno == EAGAIN))
+   goto again;
+
+   fprintf(stderr, %s: pwrite failed, ret = %zd errno = %d\n,
+   __func__, ret, errno);
+
+   exit(1);
+}
+
+return;
+}
+
 static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
 {
 int id;
@@ -1244,37 +1265,75 @@ static void assigned_dev_update_msix(PCIDevice 
*pci_dev, unsigned int ctrl_pos)
 #endif
 #endif
 
+static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
+uint8_t cap_id,
+uint32_t address, int len)
+{
+uint8_t cap;
+
+switch (cap_id) {
+
+case PCI_CAP_ID_VPD:
+cap = pci_find_capability(pci_dev, cap_id);
+if (address - cap = PCI_CAP_FLAGS) {
+return assigned_dev_pci_read(pci_dev, address, len);
+}
+break;
+
+case PCI_CAP_ID_VNDR:
+cap = pci_find_capability(pci_dev, cap_id);
+if (address - cap  PCI_CAP_FLAGS) {
+return assigned_dev_pci_read(pci_dev, address, len);
+}
+break;
+}
+
+return pci_default_cap_read_config(pci_dev, cap_id, address, len);
+}
+
 static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
  uint8_t cap_id,
  uint32_t address,
  uint32_t val, int len)
 {
+uint8_t cap;
+
 pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
 
 switch (cap_id) {
 #ifdef KVM_CAP_IRQ_ROUTING
 case PCI_CAP_ID_MSI:
 #ifdef KVM_CAP_DEVICE_MSI
-{
-uint8_t cap = pci_find_capability(pci_dev, cap_id);
-if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
-assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
-}
+cap = pci_find_capability(pci_dev, cap_id);
+if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
+assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
 }
 #endif
 break;
 
 case PCI_CAP_ID_MSIX:
 #ifdef KVM_CAP_DEVICE_MSIX
-{
-uint8_t cap = pci_find_capability(pci_dev, cap_id);
-if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
-assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
-}
+cap = pci_find_capability(pci_dev, cap_id);
+if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
+assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
 }
 #endif
 break;
 #endif
+
+case PCI_CAP_ID_VPD:
+cap = pci_find_capability(pci_dev, cap_id);
+if (address - cap = PCI_CAP_FLAGS) {
+assigned_dev_pci_write(pci_dev, address, val, len);
+}
+break;
+
+case PCI_CAP_ID_VNDR:
+cap = pci_find_capability(pci_dev, cap_id);
+if (address - cap  PCI_CAP_FLAGS) {
+assigned_dev_pci_write(pci_dev, address, val, len);
+}
+break;
 }
 return;
 }
@@ -1340,6 +1399,84 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 #endif
 #endif
 
+/* Minimal PM support, make the state bits writable so the guest
+ * thinks it's doing something. */
+if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PM))) {
+uint16_t pmc, pmcsr;
+pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_PM, pos,
+ PCI_PM_SIZEOF);
+
+pmc = pci_get_word(pci_dev-config + pos + PCI_CAP_FLAGS);
+pmc = (PCI_PM_CAP_VER_MASK | PCI_PM_CAP_DSI);
+pci_set_word(pci_dev-config + pos + PCI_CAP_FLAGS, pmc);
+
+pmcsr = pci_get_word(pci_dev-config + pos + 

Re: [PATCH 1/8] pci: pci_default_cap_write_config ignores wmask

2010-11-11 Thread Michael S. Tsirkin
On Thu, Nov 11, 2010 at 07:55:01PM -0700, Alex Williamson wrote:
 Make use of wmask, just like the rest of config space.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
  hw/pci.c |   19 ---
  1 files changed, 8 insertions(+), 11 deletions(-)
 
 diff --git a/hw/pci.c b/hw/pci.c
 index 92aaa85..12c47ac 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -1175,13 +1175,14 @@ uint32_t pci_default_read_config(PCIDevice *d,
  return pci_read_config(d, address, len);
  }
  
 -static void pci_write_config(PCIDevice *pci_dev,
 - uint32_t address, uint32_t val, int len)
 +static void pci_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int 
 l)
  {
  int i;
 -for (i = 0; i  len; i++) {
 -pci_dev-config[address + i] = val  0xff;
 -val = 8;
 +uint32_t config_size = pci_config_size(d);
 +
 +for (i = 0; i  l  addr + i  config_size; val = 8, ++i) {
 +uint8_t wmask = d-wmask[addr + i];
 +d-config[addr + i] = (d-config[addr + i]  ~wmask) | (val  wmask);
  }
  }



Let's not name an internal static helper pci_write_config.
This is really update_config_by_mask or something like that.
But see below: maybe we don't need it at all?

 @@ -1207,18 +1208,14 @@ void pci_default_cap_write_config(PCIDevice *pci_dev,
  
  void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int 
 l)
  {
 -int i, was_irq_disabled = pci_irq_disabled(d);
 -uint32_t config_size = pci_config_size(d);
 +int was_irq_disabled = pci_irq_disabled(d);
  
  if (pci_access_cap_config(d, addr, l)) {
  d-cap.config_write(d, addr, val, l);
  return;
  }
  

I would like to also examine the need for _cap_
functions. Why can assigned devices just do

pci_default_write_config 
if (range_overlap(...msi)) {
}
if (range_overlap(...msix)) {
}
and then we could remove all the _cap_ extensions
altogether?

 -for (i = 0; i  l  addr + i  config_size; val = 8, ++i) {
 -uint8_t wmask = d-wmask[addr + i];
 -d-config[addr + i] = (d-config[addr + i]  ~wmask) | (val  wmask);
 -}
 +pci_write_config(d, addr, val, l);
  
  #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
  if (kvm_enabled()  kvm_irqchip_in_kernel() 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps

2010-11-11 Thread Michael S. Tsirkin
On Thu, Nov 11, 2010 at 07:56:46PM -0700, Alex Williamson wrote:
 Some drivers depend on finding capabilities like power management,
 PCI express/X, vital product data, or vendor specific fields.  Now
 that we have better capability support, we can pass more of these
 tables through to the guest.  Note that VPD and VNDR are direct pass
 through capabilies, the rest are mostly empty shells with a few
 writable bits where necessary.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
  hw/device-assignment.c |  160 
 +---
  1 files changed, 149 insertions(+), 11 deletions(-)
 
 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 179c7dc..1b228ad 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -366,6 +366,27 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, 
 int pos)
  return (uint8_t)assigned_dev_pci_read(d, pos, 1);
  }
  
 +static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, int 
 len)
 +{
 +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
 +ssize_t ret;
 +int fd = pci_dev-real_device.config_fd;
 +
 +again:
 +ret = pwrite(fd, val, len, pos);
 +if (ret != len) {
 + if ((ret  0)  (errno == EINTR || errno == EAGAIN))
 + goto again;


do {} while() ?

 +
 + fprintf(stderr, %s: pwrite failed, ret = %zd errno = %d\n,
 + __func__, ret, errno);
 +
 + exit(1);
 +}
 +
 +return;
 +}
 +
  static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
  {
  int id;
 @@ -1244,37 +1265,75 @@ static void assigned_dev_update_msix(PCIDevice 
 *pci_dev, unsigned int ctrl_pos)
  #endif
  #endif
  
 +static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
 +uint8_t cap_id,
 +uint32_t address, int 
 len)
 +{
 +uint8_t cap;
 +
 +switch (cap_id) {
 +
 +case PCI_CAP_ID_VPD:
 +cap = pci_find_capability(pci_dev, cap_id);
 +if (address - cap = PCI_CAP_FLAGS) {
 +return assigned_dev_pci_read(pci_dev, address, len);
 +}
 +break;
 +
 +case PCI_CAP_ID_VNDR:
 +cap = pci_find_capability(pci_dev, cap_id);
 +if (address - cap  PCI_CAP_FLAGS) {
 +return assigned_dev_pci_read(pci_dev, address, len);
 +}
 +break;
 +}
 +
 +return pci_default_cap_read_config(pci_dev, cap_id, address, len);
 +}
 +
  static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
   uint8_t cap_id,
   uint32_t address,
   uint32_t val, int len)
  {
 +uint8_t cap;
 +
  pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
  
  switch (cap_id) {
  #ifdef KVM_CAP_IRQ_ROUTING
  case PCI_CAP_ID_MSI:
  #ifdef KVM_CAP_DEVICE_MSI
 -{
 -uint8_t cap = pci_find_capability(pci_dev, cap_id);
 -if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
 -assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
 -}
 +cap = pci_find_capability(pci_dev, cap_id);
 +if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
 +assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
  }
  #endif
  break;
  
  case PCI_CAP_ID_MSIX:
  #ifdef KVM_CAP_DEVICE_MSIX
 -{
 -uint8_t cap = pci_find_capability(pci_dev, cap_id);
 -if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
 -assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
 -}
 +cap = pci_find_capability(pci_dev, cap_id);
 +if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
 +assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
  }
  #endif
  break;
  #endif
 +
 +case PCI_CAP_ID_VPD:
 +cap = pci_find_capability(pci_dev, cap_id);
 +if (address - cap = PCI_CAP_FLAGS) {
 +assigned_dev_pci_write(pci_dev, address, val, len);
 +}
 +break;
 +
 +case PCI_CAP_ID_VNDR:
 +cap = pci_find_capability(pci_dev, cap_id);
 +if (address - cap  PCI_CAP_FLAGS) {
 +assigned_dev_pci_write(pci_dev, address, val, len);
 +}
 +break;

I have a feeling we should use overlap functions instead of
address math. What do you think?
Also - put cap offsets in assigned device structure to avoid
find calls?

  }
  return;
  }
 @@ -1340,6 +1399,84 @@ static int assigned_device_pci_cap_init(PCIDevice 
 *pci_dev)
  #endif
  #endif
  
 +/* Minimal PM support, make the state bits writable so the guest
 + * thinks it's doing something. */
 +if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PM))) {
 +

Re: [PATCH 0/8] PCI capability and device assignment improvements

2010-11-11 Thread Michael S. Tsirkin
On Thu, Nov 11, 2010 at 07:54:49PM -0700, Alex Williamson wrote:
 This series attempts to clean up capability support between common
 code and device assignment.  In doing so, we can move existing MSI 
 MSI-X capabilities to offsets matching real hardware, and further
 enable more capabilities to be exposed.

Very good cleanup overall. Some further suggestions posted.
Thanks!


 The last patch is only for RFC, I'd like some input on what we should
 pass directly and where we should only provide read-only/emulated
 access.  Patches 1-7 are submitted for commit.  Thanks,
 
 Alex
 
 ---
 
 Alex Williamson (8):
   device-assignment: pass through and stub more PCI caps
   pci: Pass ID for capability read/write handlers
   device-assignment: Move PCI capabilities to match physical hardware
   pci: Remove cap.length, cap.start, cap.supported
   pci: Replace used bitmap with capability byte map
   device-assignment: Use PCI capabilities support
   pci: Remove pci_enable_capability_support()
   pci: pci_default_cap_write_config ignores wmask
 
 
  hw/device-assignment.c |  273 
 
  hw/pci.c   |  103 +++---
  hw/pci.h   |   25 ++--
  3 files changed, 256 insertions(+), 145 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/8] pci: Replace used bitmap with capability byte map

2010-11-11 Thread Michael S. Tsirkin
On Thu, Nov 11, 2010 at 07:55:43PM -0700, Alex Williamson wrote:
 Capabilities are allocated in bytes, so we can track both whether
 a byte is used and by what capability in the same structure.
 
 Remove pci_reserve_capability() as there are no users.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com

I actually wanted to remove the used array completely and ask
all users to add offsets directly.
Will this be needed then?

 ---
 
  hw/pci.c |   16 +---
  hw/pci.h |6 ++
  2 files changed, 7 insertions(+), 15 deletions(-)
 
 diff --git a/hw/pci.c b/hw/pci.c
 index 6b2b320..e1e8a77 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -730,7 +730,7 @@ static void pci_config_alloc(PCIDevice *pci_dev)
  pci_dev-config = qemu_mallocz(config_size);
  pci_dev-cmask = qemu_mallocz(config_size);
  pci_dev-wmask = qemu_mallocz(config_size);
 -pci_dev-used = qemu_mallocz(config_size);
 +pci_dev-cap_map = qemu_mallocz(config_size);
  }
  
  static void pci_config_free(PCIDevice *pci_dev)
 @@ -738,7 +738,7 @@ static void pci_config_free(PCIDevice *pci_dev)
  qemu_free(pci_dev-config);
  qemu_free(pci_dev-cmask);
  qemu_free(pci_dev-wmask);
 -qemu_free(pci_dev-used);
 +qemu_free(pci_dev-cap_map);
  }
  
  /* -1 for devfn means auto assign */
 @@ -1928,7 +1928,7 @@ static int pci_find_space(PCIDevice *pdev, uint8_t size)
  int offset = PCI_CONFIG_HEADER_SIZE;
  int i;
  for (i = PCI_CONFIG_HEADER_SIZE; i  config_size; ++i)
 -if (pdev-used[i])
 +if (pdev-cap_map[i])
  offset = i + 1;
  else if (i - offset + 1 == size)
  return offset;
 @@ -2033,7 +2033,7 @@ int pci_add_capability_at_offset(PCIDevice *pdev, 
 uint8_t cap_id,
  config[PCI_CAP_LIST_ID] = cap_id;
  config[PCI_CAP_LIST_NEXT] = pdev-config[PCI_CAPABILITY_LIST];
  pdev-config[PCI_CAPABILITY_LIST] = offset;
 -memset(pdev-used + offset, 0xFF, size);
 +memset(pdev-cap_map + offset, cap_id, size);
  /* Make capability read-only by default */
  memset(pdev-wmask + offset, 0, size);
  /* Check capability by default */
 @@ -2068,7 +2068,7 @@ void pci_del_capability(PCIDevice *pdev, uint8_t 
 cap_id, uint8_t size)
  memset(pdev-wmask + offset, 0xff, size);
  /* Clear cmask as device-specific registers can't be checked */
  memset(pdev-cmask + offset, 0, size);
 -memset(pdev-used + offset, 0, size);
 +memset(pdev-cap_map + offset, 0, size);
  
  if (!pdev-config[PCI_CAPABILITY_LIST]) {
  pdev-config[PCI_STATUS] = ~PCI_STATUS_CAP_LIST;
 @@ -2076,12 +2076,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t 
 cap_id, uint8_t size)
  }
  }
  
 -/* Reserve space for capability at a known offset (to call after load). */
 -void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size)
 -{
 -memset(pdev-used + offset, 0xff, size);
 -}
 -
  uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
  {
  return pci_find_capability_list(pdev, cap_id, NULL);
 diff --git a/hw/pci.h b/hw/pci.h
 index 0712e55..2265c70 100644
 --- a/hw/pci.h
 +++ b/hw/pci.h
 @@ -151,8 +151,8 @@ struct PCIDevice {
  /* Used to implement R/W bytes */
  uint8_t *wmask;
  
 -/* Used to allocate config space for capabilities. */
 -uint8_t *used;
 +/* Used to allocate config space and track capabilities. */
 +uint8_t *cap_map;
  
  /* the following fields are read only */
  PCIBus *bus;
 @@ -239,8 +239,6 @@ int pci_add_capability_at_offset(PCIDevice *pci_dev, 
 uint8_t cap_id,
  
  void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t 
 cap_size);
  
 -void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t 
 size);
 -
  uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
  
  uint32_t pci_default_read_config(PCIDevice *d,
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/8] pci: pci_default_cap_write_config ignores wmask

2010-11-11 Thread Alex Williamson
On Fri, 2010-11-12 at 07:22 +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 11, 2010 at 07:55:01PM -0700, Alex Williamson wrote:
  Make use of wmask, just like the rest of config space.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
  
   hw/pci.c |   19 ---
   1 files changed, 8 insertions(+), 11 deletions(-)
  
  diff --git a/hw/pci.c b/hw/pci.c
  index 92aaa85..12c47ac 100644
  --- a/hw/pci.c
  +++ b/hw/pci.c
  @@ -1175,13 +1175,14 @@ uint32_t pci_default_read_config(PCIDevice *d,
   return pci_read_config(d, address, len);
   }
   
  -static void pci_write_config(PCIDevice *pci_dev,
  - uint32_t address, uint32_t val, int len)
  +static void pci_write_config(PCIDevice *d, uint32_t addr, uint32_t val, 
  int l)
   {
   int i;
  -for (i = 0; i  len; i++) {
  -pci_dev-config[address + i] = val  0xff;
  -val = 8;
  +uint32_t config_size = pci_config_size(d);
  +
  +for (i = 0; i  l  addr + i  config_size; val = 8, ++i) {
  +uint8_t wmask = d-wmask[addr + i];
  +d-config[addr + i] = (d-config[addr + i]  ~wmask) | (val  
  wmask);
   }
   }
 
 
 
 Let's not name an internal static helper pci_write_config.
 This is really update_config_by_mask or something like that.
 But see below: maybe we don't need it at all?

The function already exists, I just made it do what it seems like it
should have been doing already.

  @@ -1207,18 +1208,14 @@ void pci_default_cap_write_config(PCIDevice 
  *pci_dev,
   
   void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, 
  int l)
   {
  -int i, was_irq_disabled = pci_irq_disabled(d);
  -uint32_t config_size = pci_config_size(d);
  +int was_irq_disabled = pci_irq_disabled(d);
   
   if (pci_access_cap_config(d, addr, l)) {
   d-cap.config_write(d, addr, val, l);
   return;
   }
   
 
 I would like to also examine the need for _cap_
 functions. Why can assigned devices just do
 
   pci_default_write_config 
   if (range_overlap(...msi)) {
   }
   if (range_overlap(...msix)) {
   }
 and then we could remove all the _cap_ extensions
 altogether?

I think that somewhere we need to track what capabilities are at what
offset, config space isn't a performance path, but that look horribly
inefficient and gets worse with more capabilities.

Why don't we define capability id 0xff as normal config space (first 64
bytes), then add the capability id to read/write_config (this is what
vfio does).  Then the driver can split capability handling off from
their main functions if they want.  Anyway, I think such an improvement
could be added incrementally later.  Thanks,

Alex

  -for (i = 0; i  l  addr + i  config_size; val = 8, ++i) {
  -uint8_t wmask = d-wmask[addr + i];
  -d-config[addr + i] = (d-config[addr + i]  ~wmask) | (val  
  wmask);
  -}
  +pci_write_config(d, addr, val, l);
   
   #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
   if (kvm_enabled()  kvm_irqchip_in_kernel() 



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/8] pci: Replace used bitmap with capability byte map

2010-11-11 Thread Alex Williamson
On Fri, 2010-11-12 at 07:40 +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 11, 2010 at 07:55:43PM -0700, Alex Williamson wrote:
  Capabilities are allocated in bytes, so we can track both whether
  a byte is used and by what capability in the same structure.
  
  Remove pci_reserve_capability() as there are no users.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
 
 I actually wanted to remove the used array completely and ask
 all users to add offsets directly.
 Will this be needed then?

Can you give an example, I don't understand what you mean by asking
users to add offsets directly.  I think some kind of tracking what's
where in config space needs to be done somewhere and the common PCI code
seems like it'd be the place.  Thanks,

Alex

  ---
  
   hw/pci.c |   16 +---
   hw/pci.h |6 ++
   2 files changed, 7 insertions(+), 15 deletions(-)
  
  diff --git a/hw/pci.c b/hw/pci.c
  index 6b2b320..e1e8a77 100644
  --- a/hw/pci.c
  +++ b/hw/pci.c
  @@ -730,7 +730,7 @@ static void pci_config_alloc(PCIDevice *pci_dev)
   pci_dev-config = qemu_mallocz(config_size);
   pci_dev-cmask = qemu_mallocz(config_size);
   pci_dev-wmask = qemu_mallocz(config_size);
  -pci_dev-used = qemu_mallocz(config_size);
  +pci_dev-cap_map = qemu_mallocz(config_size);
   }
   
   static void pci_config_free(PCIDevice *pci_dev)
  @@ -738,7 +738,7 @@ static void pci_config_free(PCIDevice *pci_dev)
   qemu_free(pci_dev-config);
   qemu_free(pci_dev-cmask);
   qemu_free(pci_dev-wmask);
  -qemu_free(pci_dev-used);
  +qemu_free(pci_dev-cap_map);
   }
   
   /* -1 for devfn means auto assign */
  @@ -1928,7 +1928,7 @@ static int pci_find_space(PCIDevice *pdev, uint8_t 
  size)
   int offset = PCI_CONFIG_HEADER_SIZE;
   int i;
   for (i = PCI_CONFIG_HEADER_SIZE; i  config_size; ++i)
  -if (pdev-used[i])
  +if (pdev-cap_map[i])
   offset = i + 1;
   else if (i - offset + 1 == size)
   return offset;
  @@ -2033,7 +2033,7 @@ int pci_add_capability_at_offset(PCIDevice *pdev, 
  uint8_t cap_id,
   config[PCI_CAP_LIST_ID] = cap_id;
   config[PCI_CAP_LIST_NEXT] = pdev-config[PCI_CAPABILITY_LIST];
   pdev-config[PCI_CAPABILITY_LIST] = offset;
  -memset(pdev-used + offset, 0xFF, size);
  +memset(pdev-cap_map + offset, cap_id, size);
   /* Make capability read-only by default */
   memset(pdev-wmask + offset, 0, size);
   /* Check capability by default */
  @@ -2068,7 +2068,7 @@ void pci_del_capability(PCIDevice *pdev, uint8_t 
  cap_id, uint8_t size)
   memset(pdev-wmask + offset, 0xff, size);
   /* Clear cmask as device-specific registers can't be checked */
   memset(pdev-cmask + offset, 0, size);
  -memset(pdev-used + offset, 0, size);
  +memset(pdev-cap_map + offset, 0, size);
   
   if (!pdev-config[PCI_CAPABILITY_LIST]) {
   pdev-config[PCI_STATUS] = ~PCI_STATUS_CAP_LIST;
  @@ -2076,12 +2076,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t 
  cap_id, uint8_t size)
   }
   }
   
  -/* Reserve space for capability at a known offset (to call after load). */
  -void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size)
  -{
  -memset(pdev-used + offset, 0xff, size);
  -}
  -
   uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
   {
   return pci_find_capability_list(pdev, cap_id, NULL);
  diff --git a/hw/pci.h b/hw/pci.h
  index 0712e55..2265c70 100644
  --- a/hw/pci.h
  +++ b/hw/pci.h
  @@ -151,8 +151,8 @@ struct PCIDevice {
   /* Used to implement R/W bytes */
   uint8_t *wmask;
   
  -/* Used to allocate config space for capabilities. */
  -uint8_t *used;
  +/* Used to allocate config space and track capabilities. */
  +uint8_t *cap_map;
   
   /* the following fields are read only */
   PCIBus *bus;
  @@ -239,8 +239,6 @@ int pci_add_capability_at_offset(PCIDevice *pci_dev, 
  uint8_t cap_id,
   
   void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t 
  cap_size);
   
  -void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t 
  size);
  -
   uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
   
   uint32_t pci_default_read_config(PCIDevice *d,



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps

2010-11-11 Thread Alex Williamson
On Fri, 2010-11-12 at 07:36 +0200, Michael S. Tsirkin wrote:
 On Thu, Nov 11, 2010 at 07:56:46PM -0700, Alex Williamson wrote:
  Some drivers depend on finding capabilities like power management,
  PCI express/X, vital product data, or vendor specific fields.  Now
  that we have better capability support, we can pass more of these
  tables through to the guest.  Note that VPD and VNDR are direct pass
  through capabilies, the rest are mostly empty shells with a few
  writable bits where necessary.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
  
   hw/device-assignment.c |  160 
  +---
   1 files changed, 149 insertions(+), 11 deletions(-)
  
  diff --git a/hw/device-assignment.c b/hw/device-assignment.c
  index 179c7dc..1b228ad 100644
  --- a/hw/device-assignment.c
  +++ b/hw/device-assignment.c
  @@ -366,6 +366,27 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice 
  *d, int pos)
   return (uint8_t)assigned_dev_pci_read(d, pos, 1);
   }
   
  +static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, 
  int len)
  +{
  +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
  +ssize_t ret;
  +int fd = pci_dev-real_device.config_fd;
  +
  +again:
  +ret = pwrite(fd, val, len, pos);
  +if (ret != len) {
  +   if ((ret  0)  (errno == EINTR || errno == EAGAIN))
  +   goto again;
 
 
 do {} while() ?

Sure, this is just a copy of another place that does something similar.
They should either be merged or both converted in a separate patch.

  +
  +   fprintf(stderr, %s: pwrite failed, ret = %zd errno = %d\n,
  +   __func__, ret, errno);
  +
  +   exit(1);
  +}
  +
  +return;
  +}
  +
   static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
   {
   int id;
  @@ -1244,37 +1265,75 @@ static void assigned_dev_update_msix(PCIDevice 
  *pci_dev, unsigned int ctrl_pos)
   #endif
   #endif
   
  +static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
  +uint8_t cap_id,
  +uint32_t address, int 
  len)
  +{
  +uint8_t cap;
  +
  +switch (cap_id) {
  +
  +case PCI_CAP_ID_VPD:
  +cap = pci_find_capability(pci_dev, cap_id);
  +if (address - cap = PCI_CAP_FLAGS) {
  +return assigned_dev_pci_read(pci_dev, address, len);
  +}
  +break;
  +
  +case PCI_CAP_ID_VNDR:
  +cap = pci_find_capability(pci_dev, cap_id);
  +if (address - cap  PCI_CAP_FLAGS) {
  +return assigned_dev_pci_read(pci_dev, address, len);
  +}
  +break;
  +}
  +
  +return pci_default_cap_read_config(pci_dev, cap_id, address, len);
  +}
  +
   static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
uint8_t cap_id,
uint32_t address,
uint32_t val, int len)
   {
  +uint8_t cap;
  +
   pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
   
   switch (cap_id) {
   #ifdef KVM_CAP_IRQ_ROUTING
   case PCI_CAP_ID_MSI:
   #ifdef KVM_CAP_DEVICE_MSI
  -{
  -uint8_t cap = pci_find_capability(pci_dev, cap_id);
  -if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
  -assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
  -}
  +cap = pci_find_capability(pci_dev, cap_id);
  +if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
  +assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
   }
   #endif
   break;
   
   case PCI_CAP_ID_MSIX:
   #ifdef KVM_CAP_DEVICE_MSIX
  -{
  -uint8_t cap = pci_find_capability(pci_dev, cap_id);
  -if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) 
  {
  -assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
  -}
  +cap = pci_find_capability(pci_dev, cap_id);
  +if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
  +assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
   }
   #endif
   break;
   #endif
  +
  +case PCI_CAP_ID_VPD:
  +cap = pci_find_capability(pci_dev, cap_id);
  +if (address - cap = PCI_CAP_FLAGS) {
  +assigned_dev_pci_write(pci_dev, address, val, len);
  +}
  +break;
  +
  +case PCI_CAP_ID_VNDR:
  +cap = pci_find_capability(pci_dev, cap_id);
  +if (address - cap  PCI_CAP_FLAGS) {
  +assigned_dev_pci_write(pci_dev, address, val, len);
  +}
  +break;
 
 I have a feeling we should use overlap functions instead of
 address math. What do you think?

if (!ranges_overlap(address - cap, len, 0, PCI_CAP_FLAGS)) ?


[PATCH v2 1/5] KVM: MMU: fix missing post sync audit

2010-11-11 Thread Xiao Guangrong
Add AUDIT_POST_SYNC audit for long mode shadow page

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5275c50..f3fad4f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2539,6 +2539,7 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu)
hpa_t root = vcpu-arch.mmu.root_hpa;
sp = page_header(root);
mmu_sync_children(vcpu, sp);
+   trace_kvm_mmu_audit(vcpu, AUDIT_POST_SYNC);
return;
}
for (i = 0; i  4; ++i) {
-- 
1.7.0.4
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/5] KVM: MMU: clear apfs if page state is changed

2010-11-11 Thread Xiao Guangrong
If CR0.PG is changed, the page fault cann't be avoid when the prefault address
is accessed later

And it also fix a bug: it can retry a page enabled #PF in page disabled context
if mmu is shadow page

This idear is from Gleb Natapov

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/x86.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc29223..c071d73 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -525,6 +525,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 
kvm_x86_ops-set_cr0(vcpu, cr0);
 
+   if ((cr0 ^ old_cr0)  X86_CR0_PG)
+   kvm_clear_async_pf_completion_queue(vcpu);
+
if ((cr0 ^ old_cr0)  update_bits)
kvm_mmu_reset_context(vcpu);
return 0;
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/5] KVM: MMU: support apf for nonpaing guest

2010-11-11 Thread Xiao Guangrong
Let's support apf for nonpaing guest

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/mmu.c |   12 +---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f3fad4f..5ee5b97 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2286,7 +2286,11 @@ static int kvm_handle_bad_page(struct kvm *kvm, gfn_t 
gfn, pfn_t pfn)
return 1;
 }
 
-static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn)
+static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn,
+gva_t gva, pfn_t *pfn, bool write, bool *writable);
+
+static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn,
+bool no_apf)
 {
int r;
int level;
@@ -2307,7 +2311,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
int write, gfn_t gfn)
 
mmu_seq = vcpu-kvm-mmu_notifier_seq;
smp_rmb();
-   pfn = gfn_to_pfn_prot(vcpu-kvm, gfn, write, map_writable);
+
+   if (try_async_pf(vcpu, no_apf, gfn, v, pfn, write, map_writable))
+   return 0;
 
/* mmio */
if (is_error_pfn(pfn))
@@ -2594,7 +2600,7 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
gfn = gva  PAGE_SHIFT;
 
return nonpaging_map(vcpu, gva  PAGE_MASK,
-error_code  PFERR_WRITE_MASK, gfn);
+error_code  PFERR_WRITE_MASK, gfn, no_apf);
 }
 
 static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/5] KVM: MMU: retry #PF for softmmu

2010-11-11 Thread Xiao Guangrong
Retry #PF for softmmu only when the current vcpu has the same
root shadow page as the time when #PF occurs. it means they
have same paging environment

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |6 ++
 arch/x86/kvm/mmu.c  |   35 ++-
 arch/x86/kvm/x86.c  |   15 ++-
 virt/kvm/async_pf.c |1 +
 4 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b04c0fa..2cefe00 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -192,6 +192,8 @@ struct kvm_mmu_page {
struct list_head link;
struct hlist_node hash_link;
 
+   struct kref apfs_counter;
+
/*
 * The following two entries are used to key the shadow page in the
 * hash table.
@@ -600,6 +602,7 @@ struct kvm_x86_ops {
 struct kvm_arch_async_pf {
u32 token;
gfn_t gfn;
+   struct kvm_mmu_page *root_sp;
bool direct_map;
 };
 
@@ -698,6 +701,8 @@ void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 
 int fx_init(struct kvm_vcpu *vcpu);
 
+struct kvm_mmu_page *get_vcpu_root_sp(struct kvm_vcpu *vcpu, gva_t gva);
+void kvm_mmu_release_apf_sp(struct kvm_mmu_page *sp);
 void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu);
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
   const u8 *new, int bytes,
@@ -816,6 +821,7 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
 
+void kvm_arch_clear_async_pf(struct kvm_async_pf *work);
 void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work);
 void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index bdb9fa9..4b6d54c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -993,6 +993,19 @@ static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, 
int nr)
percpu_counter_add(kvm_total_used_mmu_pages, nr);
 }
 
+static void free_shadow_page(struct kref *kref)
+{
+   struct kvm_mmu_page *sp;
+
+   sp = container_of(kref, struct kvm_mmu_page, apfs_counter);
+   kmem_cache_free(mmu_page_header_cache, sp);
+}
+
+void kvm_mmu_release_apf_sp(struct kvm_mmu_page *sp)
+{
+   kref_put(sp-apfs_counter, free_shadow_page);
+}
+
 static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
ASSERT(is_empty_shadow_page(sp-spt));
@@ -1001,7 +1014,7 @@ static void kvm_mmu_free_page(struct kvm *kvm, struct 
kvm_mmu_page *sp)
__free_page(virt_to_page(sp-spt));
if (!sp-role.direct)
__free_page(virt_to_page(sp-gfns));
-   kmem_cache_free(mmu_page_header_cache, sp);
+   kvm_mmu_release_apf_sp(sp);
kvm_mod_used_mmu_pages(kvm, -1);
 }
 
@@ -1026,6 +1039,8 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu,
sp-multimapped = 0;
sp-parent_pte = parent_pte;
kvm_mod_used_mmu_pages(vcpu-kvm, +1);
+   kref_init(sp-apfs_counter);
+
return sp;
 }
 
@@ -2603,13 +2618,31 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 
gva_t gva,
 error_code  PFERR_WRITE_MASK, gfn, no_apf);
 }
 
+struct kvm_mmu_page *get_vcpu_root_sp(struct kvm_vcpu *vcpu, gva_t gva)
+{
+   struct kvm_shadow_walk_iterator iterator;
+   bool ret;
+
+   shadow_walk_init(iterator, vcpu, gva);
+   ret = shadow_walk_okay(iterator);
+   WARN_ON(!ret);
+
+   return page_header(__pa(iterator.sptep));
+}
+
 static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
 {
struct kvm_arch_async_pf arch;
+
arch.token = (vcpu-arch.apf.id++  12) | vcpu-vcpu_id;
arch.gfn = gfn;
arch.direct_map = vcpu-arch.mmu.direct_map;
 
+   if (!arch.direct_map) {
+   arch.root_sp = get_vcpu_root_sp(vcpu, gva);
+   kref_get(arch.root_sp-apfs_counter);
+   }
+
return kvm_setup_async_pf(vcpu, gva, gfn, arch);
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 003a0ca..1ecc1a9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6169,7 +6169,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, 
struct kvm_async_pf *work)
 {
int r;
 
-   if (!vcpu-arch.mmu.direct_map || !work-arch.direct_map ||
+   if (vcpu-arch.mmu.direct_map != work-arch.direct_map ||
  is_error_page(work-page))
return;
 
@@ -6177,6 +6177,10 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, 
struct kvm_async_pf *work)
if (unlikely(r))
return;
 
+   if (!vcpu-arch.mmu.direct_map 
+ get_vcpu_root_sp(vcpu, work-gva) != work-arch.root_sp)
+   return;
+
vcpu-arch.mmu.page_fault(vcpu, 

[PATCH v2 4/5] KVM: MMU: fix apf prefault if nested guest is enabled

2010-11-11 Thread Xiao Guangrong
If apf is generated in L2 guest and is completed in L1 guest, it will
prefault this apf in L1 guest's mmu context.

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/mmu.c  |1 +
 arch/x86/kvm/x86.c  |3 ++-
 3 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7f20f2c..b04c0fa 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -600,6 +600,7 @@ struct kvm_x86_ops {
 struct kvm_arch_async_pf {
u32 token;
gfn_t gfn;
+   bool direct_map;
 };
 
 extern struct kvm_x86_ops *kvm_x86_ops;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5ee5b97..bdb9fa9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2608,6 +2608,7 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, 
gva_t gva, gfn_t gfn)
struct kvm_arch_async_pf arch;
arch.token = (vcpu-arch.apf.id++  12) | vcpu-vcpu_id;
arch.gfn = gfn;
+   arch.direct_map = vcpu-arch.mmu.direct_map;
 
return kvm_setup_async_pf(vcpu, gva, gfn, arch);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c071d73..003a0ca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6169,7 +6169,8 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, 
struct kvm_async_pf *work)
 {
int r;
 
-   if (!vcpu-arch.mmu.direct_map || is_error_page(work-page))
+   if (!vcpu-arch.mmu.direct_map || !work-arch.direct_map ||
+ is_error_page(work-page))
return;
 
r = kvm_mmu_reload(vcpu);
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html