[PATCH] migrate: error message for ram_load size

2012-10-29 Thread Doug Goldstein
Report an error when ramblock's sizes mismatch with a suggestion to the
user as to what went wrong.
---

libvirt uses migration to save the state, however when performing a
distro upgrade you might get an error starting your VMs up again without
much detail. This patch attempts to remedy that with extra error
messages.

Without patch:
$ virsh start expo
error: Failed to start domain expo
error: internal error Process exited while reading console log output: char 
device redirected to /dev/pts/16
qemu: warning: error while loading state for instance 0x0 of device 'ram'
load of migration failed

With patch:
$ virsh start expo
error: Failed to start domain expo
error: internal error Process exited while reading console log output: char 
device redirected to /dev/pts/16
qemu: warning: error ramblock ':00:02.0/qxl.vrom' length 16384 != 8192. Did 
you change the ROM/BIOS or RAM size between restarts?
qemu: warning: error while loading state for instance 0x0 of device 'ram'
load of migration failed


---
 arch_init.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 8c3bb0d..33f783b 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -810,6 +810,11 @@ static int ram_load(QEMUFile *f, void *opaque, int 
version_id)
 QLIST_FOREACH(block, ram_list.blocks, next) {
 if (!strncmp(id, block-idstr, sizeof(id))) {
 if (block-length != length) {
+fprintf(stderr, qemu: warning: error ramblock 

+'%s' length %ld != %ld. Did you 
+change the ROM/BIOS or RAM size 
+between restarts?\n, id,
+block-length, length);
 ret =  -EINVAL;
 goto done;
 }
-- 
1.7.8.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1.1.1 - 1.1.2 migrate /managedsave issue

2012-10-29 Thread Doug Goldstein
On Mon, Oct 22, 2012 at 6:23 AM, Avi Kivity a...@redhat.com wrote:
 On 10/22/2012 09:04 AM, Philipp Hahn wrote:
 Hello Doug,

 On Saturday 20 October 2012 00:46:43 Doug Goldstein wrote:
 I'm using libvirt 0.10.2 and I had qemu-kvm 1.1.1 running all my VMs.
 ...
 I had upgraded to qemu-kvm 1.1.2
 ...
 qemu: warning: error while loading state for instance 0x0 of device 'ram'
 load of migration failed

 That error can be from many things. For me it was that the PXE-ROM images for
 the network cards were updated as well. Their size changed over the next
 power-of-two size, so kvm needed to allocate less/more memory and changed
 some PCI configuration registers, where the size of the ROM region is stored.
 On loading the saved state those sizes were compared and failed to validate.
 KVM then aborts loading the saved state with that little helpful message.

 So you might want to check, if your case is similar to mine.

 I diagnosed that using gdb to single step kvm until I found
 hw/pci.c#get_pci_config_device() returning -EINVAL.


 Seems reasonable.  Doug, please verify to see if it's the same issue or
 another one.

 Juan, how can we fix this?  It's clear that the option ROM size has to
 be fixed and not change whenever the blob is updated.  This will fix it
 for future releases.  But what to do about the ones in the field?

 --
 error compiling committee.c: too many arguments to function

Avi,

Please consider the following patch based off qemu master:

http://article.gmane.org/gmane.comp.emulators.kvm.devel/100231

It should hopefully help users with this issue in the future.

-- 
Doug Goldstein
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/5] Qemu: do not mark bios readonly

2012-10-29 Thread Xiao Guangrong
Jan,

On 10/26/2012 06:35 PM, Jan Kiszka wrote:

 This has two problems: We know it breaks at least Win 95 that overwrites
 its F-segment during boot. And it applies changes to the shadowed area
 (below 1 MB) also to the ROM area - I don't think that is the original
 behaviour on real hardware.

So what is the problem? It can break Win95's running?

I tried to install win95 guest but it failed to boot regardless my patchset
was applied or not. I found the information that win 95 is not supported at
http://www.linux-kvm.org/page/Guest_Support_Status

Note: before my patchset, Win 95 still can happily something into ROM area
because readonly memory is actually writable on KVM. And win95 can not run
on isapc with --no-kvm since it is no way to enable shadow ROM.

 
 What we need is paravirtual shadow write control for the ISA PC. It's on
 my todo list, maybe I will be able to look into this during the next week.
 

You idea is that modify the code of seabios and use a special way (PV) to
notify Qemu to make the bios writable?

Actually, I am confused why the guest (including bios) persistently uses
shadow ROM even if it is not supported (on ISA PC), i think the right way
is move itself to RAM under this case, no?

 BTW, your patch series should allow to drop the KVM special case from
 pc_system_firmware_init. That version, btw, treats high and low BIOS
 areas separately - but only reloads the upper area. Hmm...
 

You mean that also allow Qemu to use pflash to load bios if kvm is enabled?
We can not do that for pflash is a RD device which can not be directly written,
kvm can not emulate the instruction which implicitly write the memory. (e.g:
using this area as stack).

Thanks!

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can we run guest OS without using NAT and iptables?

2012-10-29 Thread freak 62
 Can we run guest o.s. on KVM without enabling NAT and iptables?

The reason to do this is , I wanted to disable conntrack module
from my system and to disable that I must have to delete iptable and
NAT.

 I am getting the following message, when I start guest o.s. on
KVM (iptable and NAT disabled):

Error starting domain: internal error 'Network default' is not active.

 Is their any way to run guest o.s. with NAT disabled? or Is their
any way to disable conntrack module and still can use KVM to run guest
OS ?

   I am using Ubuntu 10.04

Any help?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/5] Qemu: do not mark bios readonly

2012-10-29 Thread Xiao Guangrong
On 10/29/2012 03:44 PM, Jan Kiszka wrote:
 On 2012-10-29 08:09, Xiao Guangrong wrote:
 Jan,

 On 10/26/2012 06:35 PM, Jan Kiszka wrote:

 This has two problems: We know it breaks at least Win 95 that overwrites
 its F-segment during boot. And it applies changes to the shadowed area
 (below 1 MB) also to the ROM area - I don't think that is the original
 behaviour on real hardware.

 So what is the problem? It can break Win95's running?

 I tried to install win95 guest but it failed to boot regardless my patchset
 was applied or not. I found the information that win 95 is not supported at
 http://www.linux-kvm.org/page/Guest_Support_Status

 Note: before my patchset, Win 95 still can happily something into ROM area
 because readonly memory is actually writable on KVM. And win95 can not run
 on isapc with --no-kvm since it is no way to enable shadow ROM.
 
 Your patches causes regressions on TCG mode as that is perfectly fine
 with booting Win95 so far.

Aha, i tried accel=tcg, before my patchset, it works for -machine pc but
failed for -machine isapc (known issue for seabios). After my patchset,
it works fine for both -machine pc and isapc. :)

 


 What we need is paravirtual shadow write control for the ISA PC. It's on
 my todo list, maybe I will be able to look into this during the next week.


 You idea is that modify the code of seabios and use a special way (PV) to
 notify Qemu to make the bios writable?
 
 Yes.
 

 Actually, I am confused why the guest (including bios) persistently uses
 shadow ROM even if it is not supported (on ISA PC), i think the right way
 is move itself to RAM under this case, no?
 
 I've been told that Seabios has been built around that assumption and
 the PV shadow control would be simpler to realize.

Sounds the PV is complexer that directly making the bios area writable
(if it works).

 

 BTW, your patch series should allow to drop the KVM special case from
 pc_system_firmware_init. That version, btw, treats high and low BIOS
 areas separately - but only reloads the upper area. Hmm...


 You mean that also allow Qemu to use pflash to load bios if kvm is enabled?
 
 Yes.
 
 We can not do that for pflash is a RD device which can not be directly 
 written,
 kvm can not emulate the instruction which implicitly write the memory. (e.g:
 using this area as stack).
 
 Isn't enabling ROMD support for KVM that whole point of your patches? I

It can generate MMIO exit if ROMD be written, that means the instruction
needs kvm's help to be finished if it explicitly/implicitly write the memory.

 do not see yet what prevents this still, but it should be fixed first.

For the explicitly write memory access, it is easy to be fixed - we just need
to fetch the instruction from EIP and emulate it. But for the implicitly memory
access, fixing its emulation is really hard work. Really worth doing it?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: acpi_piix4 migration issue

2012-10-29 Thread Paolo Bonzini
Il 28/10/2012 20:40, Marcelo Tosatti ha scritto:
 
 qemu-kvm 1.2 - qemu-1.3 migration fails with
 
 Unknown savevm section type 48
 load of migration failed
 
 Due to a fix in acpi_piix4 in qemu-kvm (attached at the end of the
 message). 
 
 The problem is that qemu-kvm correctly uses 2 bytes for sts and 
 2 bytes for en fields (which is their allocated size), while qemu 
 uses 4*2 bytes for each.
 
 The fix present in qemu-kvm is correct, but, having it in qemu 1.3 would break
 qemu 1.2 - qemu 1.3 migration (while allowing qemu-kvm 1.2 - qemu 1.3
 migration).
 
 Any opinions on what to do?

Bump the .version_id and .minimum_version_id to 2 and load the QEMU 1.2
state via .load_state_old.

qemu-kvm 1.2 - qemu 1.3 migration would be broken.  qemu-kvm
downstreams that care can leave .minimum_version_id to 1.

Paolo

 
  +#define VMSTATE_GPE_ARRAY(_field, _state)\
  + {   \
  + .name   = (stringify(_field)),  \
  + .version_id = 0,\
  + .num= GPE_LEN,  \
  + .info   =vmstate_info_uint16, \
  + .size   = sizeof(uint16_t), \
  + .flags  = VMS_ARRAY | VMS_POINTER,  \
  + .offset = vmstate_offset_pointer(_state, _field, uint8_t),  \
  + }
  +
static const VMStateDescription vmstate_gpe = {
.name = gpe,
.version_id = 1,
.minimum_version_id = 1,
.minimum_version_id_old = 1,
.fields  = (VMStateField []) {
  -VMSTATE_UINT16(sts, struct gpe_regs),
  -VMSTATE_UINT16(en, struct gpe_regs),
  +VMSTATE_GPE_ARRAY(sts, ACPIGPE),
  +VMSTATE_GPE_ARRAY(en, ACPIGPE),
VMSTATE_END_OF_LIST()
}
};
 
  I'm no vmstate expert, but this does look odd.  Why both VMS_ARRAY and
  VMS_POINTER? aren't we trying to save/restore a simple 16-bit value?  Or
  at least we did before this patch.
 
 That's right. the difference is, the new member type became uint8_t*.
 Does the following help?
 
 Signed-off-by: Avi Kivity a...@redhat.com
 
 diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
 index d65a7e9..9dc6f43 100644
 --- a/hw/acpi_piix4.c
 +++ b/hw/acpi_piix4.c
 @@ -221,10 +221,9 @@ static int vmstate_acpi_post_load(void *opaque, int 
 version_id)
   {   \
   .name   = (stringify(_field)),  \
   .version_id = 0,\
 - .num= GPE_LEN,  \
   .info   = vmstate_info_uint16, \
   .size   = sizeof(uint16_t), \
 - .flags  = VMS_ARRAY | VMS_POINTER,  \
 + .flags  = VMS_SINGLE | VMS_POINTER, \
   .offset = vmstate_offset_pointer(_state, _field, uint8_t),  \
   }
  
 
 
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can we run guest OS without using NAT and iptables?

2012-10-29 Thread Stefan Hajnoczi
On Mon, Oct 29, 2012 at 12:55:43PM +0530, freak 62 wrote:
  Can we run guest o.s. on KVM without enabling NAT and iptables?
 
 The reason to do this is , I wanted to disable conntrack module
 from my system and to disable that I must have to delete iptable and
 NAT.
 
  I am getting the following message, when I start guest o.s. on
 KVM (iptable and NAT disabled):
 
 Error starting domain: internal error 'Network default' is not active.
 
  Is their any way to run guest o.s. with NAT disabled? or Is their
 any way to disable conntrack module and still can use KVM to run guest
 OS ?
 
I am using Ubuntu 10.04

This is a libvirt question since libvirt sets up the networking
configuration.  You can try a different network config either using the
virt-manager GUI tool or by editing the network XML, which is documented
here:

http://libvirt.org/formatnetwork.html

CCed libvirt mailing list.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I/O errors in guest OS after repeated migration

2012-10-29 Thread Stefan Hajnoczi
On Fri, Oct 19, 2012 at 2:55 PM, Guido Winkelmann
guido-k...@thisisnotatest.de wrote:
 Am Donnerstag, 18. Oktober 2012, 18:05:39 schrieb Avi Kivity:
 On 10/18/2012 05:50 PM, Guido Winkelmann wrote:
  Am Mittwoch, 17. Oktober 2012, 13:25:45 schrieb Brian Jackson:
  On Wednesday, October 17, 2012 10:45:14 AM Guido Winkelmann wrote:
   vda1, logical block 1858771
   Oct 17 17:12:04 localhost kernel: [  212.070600] Buffer I/O error on
   device
   vda1, logical block 1858772
   Oct 17 17:12:04 localhost kernel: [  212.070602] Buffer I/O error on
   device
   vda1, logical block 1858773
   Oct 17 17:12:04 localhost kernel: [  212.070605] Buffer I/O error on
   device
   vda1, logical block 1858774
   Oct 17 17:12:04 localhost kernel: [  212.070607] Buffer I/O error on
   device
   vda1, logical block 1858775
   Oct 17 17:12:04 localhost kernel: [  212.070610] Buffer I/O error on
   device
   vda1, logical block 1858776
   Oct 17 17:12:04 localhost kernel: [  212.070612] Buffer I/O error on
   device
   vda1, logical block 1858777
   Oct 17 17:12:04 localhost kernel: [  212.070615] Buffer I/O error on
   device
   vda1, logical block 1858778
   Oct 17 17:12:04 localhost kernel: [  212.070617] Buffer I/O error on
   device
   vda1, logical block 1858779
  
   (I was writing a large file at the time, to make sure I actually catch
   I/O
   errors as they happen)
 
  What about newer versions of qemu/kvm? But of course if those work, your
  next task is going to be git bisect it or file a bug with your distro
  that
  is using an ancient version of qemu/kvm.
 
  I've just upgraded both hosts to qemu-kvm 1.2.0
  (qemu-1.2.0-14.fc17.x86_64,
  built from spec files under http://pkgs.fedoraproject.org/cgit/qemu.git/).
 
  The bug is still there.

 If you let the guest go idle (no I/O), then migrate it, then restart the
 I/O, do the errors show?

 Just tested - yes, they do.

The -EIO error does not really reveal why there is a problem.  You can
use SystemTap probes in QEMU to find out more about the nature of the
error.

# stap -e 'probe qemu.kvm.bdrv_*, qemu.kvm.virtio_blk_*,
qemu.kvm.paio_* { printf(%s(%s)\n, probefunc(), $$parms) }' -x
$PID_OF_QEMU

Output looks like this:

bdrv_co_readv($arg1=0x7fb2397cc580 $arg2=0x80c $arg3=0x1)
bdrv_co_io_em($arg1=0x7fb2397cc580 $arg2=0x80c $arg3=0x1 $arg4=0x0
$arg5=0x7fb239da6f60)
virtio_blk_rw_complete($arg1=0x7fb23982ed10 $arg2=0x0)
virtio_blk_req_complete($arg1=0x7fb23982ed10 $arg2=0x0)

virtio_blk_rw_complete $arg2=-5 means -EIO so look for that that.
This will reveal what is happening when the error occurs.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm tools: fix rbtree-interval search

2012-10-29 Thread Kirill A. Shutemov
From: Kirill A. Shutemov kirill.shute...@linux.intel.com

I've noticed message on kvm exit:

  Warning: serial8250__exit failed.

kvm tool is not able to remove ioport range which was added previously.

The issue is caused by bug in rbtree-interval. Search algorithm in
rb_int_search_single() expects correct value of max_high. But the tree
can contain leaf nodes, which never were updated by propagate_callback().
For this kind of nodes high_max will be 0 and we will not be able to
find and remove them.

Let's initialize max_high on RB_INT_INIT() time.

Fixing this bug makes other bug visible: propagate_callback() can be
called for empty tree: node == NULL. The callback is not ready for empty
tree. Let's fix that as well.

Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com
---
 tools/kvm/include/kvm/rbtree-interval.h |3 ++-
 tools/kvm/util/rbtree-interval.c|6 +-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/include/kvm/rbtree-interval.h 
b/tools/kvm/include/kvm/rbtree-interval.h
index e97d05b..fb2102a 100644
--- a/tools/kvm/include/kvm/rbtree-interval.h
+++ b/tools/kvm/include/kvm/rbtree-interval.h
@@ -4,7 +4,8 @@
 #include linux/rbtree_augmented.h
 #include linux/types.h
 
-#define RB_INT_INIT(l, h) (struct rb_int_node){.low = l, .high = h}
+#define RB_INT_INIT(l, h) \
+   (struct rb_int_node){.low = l, .high = h, .max_high = h}
 #define rb_int(n) rb_entry(n, struct rb_int_node, node)
 
 struct rb_int_node {
diff --git a/tools/kvm/util/rbtree-interval.c b/tools/kvm/util/rbtree-interval.c
index c82ce98..d7fa96a 100644
--- a/tools/kvm/util/rbtree-interval.c
+++ b/tools/kvm/util/rbtree-interval.c
@@ -48,8 +48,12 @@ struct rb_int_node *rb_int_search_range(struct rb_root 
*root, u64 low, u64 high)
  */
 static void propagate_callback(struct rb_node *node, struct rb_node *stop)
 {
-   struct rb_int_node *i_node = rb_int(node);
+   struct rb_int_node *i_node;
 
+   if (node == stop)
+   return;
+
+   i_node = rb_int(node);
i_node-max_high = i_node-high;
 
if (node-rb_left)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Patch]KVM: enabling per domain PLE

2012-10-29 Thread Hu, Xuekun
Hi, Avi

 
 Yes, some cloud vendors already knew that different PLE values has big
 performance impact on their applications. They want one interface for them to
 set. And I think the big cloud vendors should have administrators that have
 experience on PLE tuning. :-)
 

For current stage, do you think still need to approach dynamic adaptive ple 
solution? 


 
  --
  error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] s390: Move css limits from drivers/s390/cio/ to include/asm/.

2012-10-29 Thread Cornelia Huck
There's no need to keep __MAX_SUBCHANNEL and __MAX_SSID private to the
common I/O layer when __MAX_CSSID is usable by everybody.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 arch/s390/include/asm/cio.h | 2 ++
 drivers/s390/cio/css.h  | 3 ---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/s390/include/asm/cio.h b/arch/s390/include/asm/cio.h
index 55bde60..ad2b924 100644
--- a/arch/s390/include/asm/cio.h
+++ b/arch/s390/include/asm/cio.h
@@ -9,6 +9,8 @@
 
 #define LPM_ANYPATH 0xff
 #define __MAX_CSSID 0
+#define __MAX_SUBCHANNEL 65535
+#define __MAX_SSID 3
 
 #include asm/scsw.h
 
diff --git a/drivers/s390/cio/css.h b/drivers/s390/cio/css.h
index 33bb4d8..4af3dfe 100644
--- a/drivers/s390/cio/css.h
+++ b/drivers/s390/cio/css.h
@@ -112,9 +112,6 @@ extern int for_each_subchannel(int(*fn)(struct 
subchannel_id, void *), void *);
 extern void css_reiterate_subchannels(void);
 void css_update_ssd_info(struct subchannel *sch);
 
-#define __MAX_SUBCHANNEL 65535
-#define __MAX_SSID 3
-
 struct channel_subsystem {
u8 cssid;
int valid;
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: s390: Handle hosts not supporting s390-virtio.

2012-10-29 Thread Cornelia Huck
Running under a kvm host does not necessarily imply the presence of
a page mapped above the main memory with the virtio information;
however, the code includes a hard coded access to that page.

Instead, check for the presence of the page and exit gracefully
before we hit an addressing exception if it does not exist.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 drivers/s390/kvm/kvm_virtio.c | 39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 47cccd5..76b95f3 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -419,6 +419,26 @@ static void kvm_extint_handler(struct ext_code ext_code,
 }
 
 /*
+ * For s390-virtio, we expect a page above main storage containing
+ * the virtio configuration. Try to actually load from this area
+ * in order to figure out if the host provides this page.
+ */
+static int __init test_devices_support(unsigned long addr)
+{
+   int ret = -EIO;
+
+   asm volatile(
+   0: lura0,%1\n
+   1: xgr %0,%0\n
+   2:\n
+   EX_TABLE(0b,2b)
+   EX_TABLE(1b,2b)
+   : +d (ret)
+   : a (addr)
+   : 0, cc);
+   return ret;
+}
+/*
  * Init function for virtio
  * devices are in a single page above top of normal mem
  */
@@ -429,21 +449,24 @@ static int __init kvm_devices_init(void)
if (!MACHINE_IS_KVM)
return -ENODEV;
 
+   if (test_devices_support(real_memory_size)  0)
+   /* No error. */
+   return 0;
+
+   rc = vmem_add_mapping(real_memory_size, PAGE_SIZE);
+   if (rc)
+   return rc;
+
+   kvm_devices = (void *) real_memory_size;
+
kvm_root = root_device_register(kvm_s390);
if (IS_ERR(kvm_root)) {
rc = PTR_ERR(kvm_root);
printk(KERN_ERR Could not register kvm_s390 root device);
+   vmem_remove_mapping(real_memory_size, PAGE_SIZE);
return rc;
}
 
-   rc = vmem_add_mapping(real_memory_size, PAGE_SIZE);
-   if (rc) {
-   root_device_unregister(kvm_root);
-   return rc;
-   }
-
-   kvm_devices = (void *) real_memory_size;
-
INIT_WORK(hotplug_work, hotplug_devices);
 
service_subclass_irq_register();
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] s390: Guest support for virtio-ccw.

2012-10-29 Thread Cornelia Huck
Avi, Marcelo,

I'd like to propose inclusion of the guest support patches for
virtio-ccw into 3.8.

I'm confident that the host - guest interface for virtio-ccw
is fine now, and the patches have been extensively tested by our
internal test team.

Patch 1 might conceivably be 3.7 material, though I fear it's a
bit late for that.

Patch 2 has been moved over from the host-support patchset since
the limits are needed by the guest driver as well.

Patch 4 has seen some further bugfixes (feature bits, 2G and 4G
problems, device detach handling) and is working well in our
internal environment.

Cornelia Huck (5):
  KVM: s390: Handle hosts not supporting s390-virtio.
  s390: Move css limits from drivers/s390/cio/ to include/asm/.
  s390: Add a mechanism to get the subchannel id.
  KVM: s390: Add a channel I/O based virtio transport driver.
  KVM: s390: Split out early console code.

 arch/s390/include/asm/ccwdev.h  |   5 +
 arch/s390/include/asm/cio.h |   2 +
 arch/s390/include/asm/irq.h |   1 +
 arch/s390/kernel/irq.c  |   1 +
 drivers/s390/cio/css.h  |   3 -
 drivers/s390/cio/device_ops.c   |  12 +
 drivers/s390/kvm/Makefile   |   2 +-
 drivers/s390/kvm/early_printk.c |  42 ++
 drivers/s390/kvm/kvm_virtio.c   |  64 ++-
 drivers/s390/kvm/virtio_ccw.c   | 841 
 10 files changed, 936 insertions(+), 37 deletions(-)
 create mode 100644 drivers/s390/kvm/early_printk.c
 create mode 100644 drivers/s390/kvm/virtio_ccw.c

-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] KVM: s390: Add a channel I/O based virtio transport driver.

2012-10-29 Thread Cornelia Huck
Add a driver for kvm guests that matches virtual ccw devices provided
by the host as virtio bridge devices.

These virtio-ccw devices use a special set of channel commands in order
to perform virtio functions.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 arch/s390/include/asm/irq.h   |   1 +
 arch/s390/kernel/irq.c|   1 +
 drivers/s390/kvm/Makefile |   2 +-
 drivers/s390/kvm/virtio_ccw.c | 842 ++
 4 files changed, 845 insertions(+), 1 deletion(-)
 create mode 100644 drivers/s390/kvm/virtio_ccw.c

diff --git a/arch/s390/include/asm/irq.h b/arch/s390/include/asm/irq.h
index 6703dd9..ad2ad6b 100644
--- a/arch/s390/include/asm/irq.h
+++ b/arch/s390/include/asm/irq.h
@@ -33,6 +33,7 @@ enum interruption_class {
IOINT_APB,
IOINT_ADM,
IOINT_CSC,
+   IOINT_VIR,
NMI_NMI,
NR_IRQS,
 };
diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c
index 6cdc55b..97c171a 100644
--- a/arch/s390/kernel/irq.c
+++ b/arch/s390/kernel/irq.c
@@ -58,6 +58,7 @@ static const struct irq_class intrclass_names[] = {
[IOINT_APB]  = {.name = APB, .desc = [I/O] AP Bus},
[IOINT_ADM]  = {.name = ADM, .desc = [I/O] EADM Subchannel},
[IOINT_CSC]  = {.name = CSC, .desc = [I/O] CHSC Subchannel},
+   [IOINT_VIR]  = {.name = VIR, .desc = [I/O] Virtual I/O Devices},
[NMI_NMI]= {.name = NMI, .desc = [NMI] Machine Check},
 };
 
diff --git a/drivers/s390/kvm/Makefile b/drivers/s390/kvm/Makefile
index 0815690..241891a 100644
--- a/drivers/s390/kvm/Makefile
+++ b/drivers/s390/kvm/Makefile
@@ -6,4 +6,4 @@
 # it under the terms of the GNU General Public License (version 2 only)
 # as published by the Free Software Foundation.
 
-obj-$(CONFIG_S390_GUEST) += kvm_virtio.o
+obj-$(CONFIG_S390_GUEST) += kvm_virtio.o virtio_ccw.o
diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
new file mode 100644
index 000..4be878f
--- /dev/null
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -0,0 +1,842 @@
+/*
+ * ccw based virtio transport
+ *
+ * Copyright IBM Corp. 2012
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2 only)
+ * as published by the Free Software Foundation.
+ *
+ *Author(s): Cornelia Huck cornelia.h...@de.ibm.com
+ */
+
+#include linux/kernel_stat.h
+#include linux/init.h
+#include linux/bootmem.h
+#include linux/err.h
+#include linux/virtio.h
+#include linux/virtio_config.h
+#include linux/slab.h
+#include linux/virtio_console.h
+#include linux/interrupt.h
+#include linux/virtio_ring.h
+#include linux/pfn.h
+#include linux/async.h
+#include linux/wait.h
+#include linux/list.h
+#include linux/bitops.h
+#include linux/module.h
+#include asm/io.h
+#include asm/kvm_para.h
+#include asm/setup.h
+#include asm/irq.h
+#include asm/cio.h
+#include asm/ccwdev.h
+#include asm/schid.h
+
+/*
+ * virtio related functions
+ */
+
+struct vq_config_block {
+   __u16 index;
+   __u16 num;
+} __attribute__ ((packed));
+
+#define VIRTIO_CCW_CONFIG_SIZE 0x100
+/* same as PCI config space size, should be enough for all drivers */
+
+struct virtio_ccw_device {
+   struct virtio_device vdev;
+   __u8 status;
+   __u8 config[VIRTIO_CCW_CONFIG_SIZE];
+   struct ccw_device *cdev;
+   struct ccw1 *ccw;
+   __u32 area;
+   __u32 curr_io;
+   int err;
+   wait_queue_head_t wait_q;
+   spinlock_t lock;
+   struct list_head virtqueues;
+   unsigned long indicators;
+   unsigned long indicators2;
+   struct vq_config_block *config_block;
+};
+
+struct vq_info_block {
+   __u64 queue;
+   __u32 align;
+   __u16 index;
+   __u16 num;
+} __attribute__ ((packed));
+
+struct virtio_feature_desc {
+   __u32 features;
+   __u8 index;
+} __attribute__ ((packed));
+
+struct virtio_ccw_vq_info {
+   struct virtqueue *vq;
+   int num;
+   int queue_index;
+   void *queue;
+   struct vq_info_block *info_block;
+   struct list_head node;
+};
+
+#define KVM_VIRTIO_CCW_RING_ALIGN 4096
+
+#define CCW_CMD_SET_VQ 0x13
+#define CCW_CMD_VDEV_RESET 0x33
+#define CCW_CMD_SET_IND 0x43
+#define CCW_CMD_SET_CONF_IND 0x53
+#define CCW_CMD_READ_FEAT 0x12
+#define CCW_CMD_WRITE_FEAT 0x11
+#define CCW_CMD_READ_CONF 0x22
+#define CCW_CMD_WRITE_CONF 0x21
+#define CCW_CMD_WRITE_STATUS 0x31
+#define CCW_CMD_READ_VQ_CONF 0x32
+
+#define VIRTIO_CCW_DOING_SET_VQ 0x0001
+#define VIRTIO_CCW_DOING_RESET 0x0004
+#define VIRTIO_CCW_DOING_READ_FEAT 0x0008
+#define VIRTIO_CCW_DOING_WRITE_FEAT 0x0010
+#define VIRTIO_CCW_DOING_READ_CONFIG 0x0020
+#define VIRTIO_CCW_DOING_WRITE_CONFIG 0x0040
+#define VIRTIO_CCW_DOING_WRITE_STATUS 0x0080
+#define VIRTIO_CCW_DOING_SET_IND 0x0100
+#define VIRTIO_CCW_DOING_READ_VQ_CONF 0x0200
+#define VIRTIO_CCW_DOING_SET_CONF_IND 0x0400
+#define VIRTIO_CCW_INTPARM_MASK 0x
+

[PATCH 3/5] s390: Add a mechanism to get the subchannel id.

2012-10-29 Thread Cornelia Huck
This will be needed by the new virtio-ccw transport.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 arch/s390/include/asm/ccwdev.h |  5 +
 drivers/s390/cio/device_ops.c  | 12 
 2 files changed, 17 insertions(+)

diff --git a/arch/s390/include/asm/ccwdev.h b/arch/s390/include/asm/ccwdev.h
index 1cb4bb3..9ad79f7 100644
--- a/arch/s390/include/asm/ccwdev.h
+++ b/arch/s390/include/asm/ccwdev.h
@@ -18,6 +18,9 @@ struct irb;
 struct ccw1;
 struct ccw_dev_id;
 
+/* from asm/schid.h */
+struct subchannel_id;
+
 /* simplified initializers for struct ccw_device:
  * CCW_DEVICE and CCW_DEVICE_DEVTYPE initialize one
  * entry in your MODULE_DEVICE_TABLE and set the match_flag correctly */
@@ -226,5 +229,7 @@ int ccw_device_siosl(struct ccw_device *);
 // FIXME: these have to go
 extern int _ccw_device_get_subchannel_number(struct ccw_device *);
 
+extern void ccw_device_get_schid(struct ccw_device *, struct subchannel_id *);
+
 extern void *ccw_device_get_chp_desc(struct ccw_device *, int);
 #endif /* _S390_CCWDEV_H_ */
diff --git a/drivers/s390/cio/device_ops.c b/drivers/s390/cio/device_ops.c
index ec7fb6d..2ad832f 100644
--- a/drivers/s390/cio/device_ops.c
+++ b/drivers/s390/cio/device_ops.c
@@ -763,6 +763,18 @@ _ccw_device_get_subchannel_number(struct ccw_device *cdev)
return cdev-private-schid.sch_no;
 }
 
+/**
+ * ccw_device_get_schid - obtain a subchannel id
+ * @cdev: device to obtain the id for
+ * @schid: where to fill in the values
+ */
+void ccw_device_get_schid(struct ccw_device *cdev, struct subchannel_id *schid)
+{
+   struct subchannel *sch = to_subchannel(cdev-dev.parent);
+
+   *schid = sch-schid;
+}
+EXPORT_SYMBOL_GPL(ccw_device_get_schid);
 
 MODULE_LICENSE(GPL);
 EXPORT_SYMBOL(ccw_device_set_options_mask);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] KVM: s390: Split out early console code.

2012-10-29 Thread Cornelia Huck
This code is transport agnostic and can be used by both the legacy
virtio code and virtio_ccw.

Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
---
 drivers/s390/kvm/Makefile   |  2 +-
 drivers/s390/kvm/early_printk.c | 42 +
 drivers/s390/kvm/kvm_virtio.c   | 29 ++--
 drivers/s390/kvm/virtio_ccw.c   |  1 -
 4 files changed, 45 insertions(+), 29 deletions(-)
 create mode 100644 drivers/s390/kvm/early_printk.c

diff --git a/drivers/s390/kvm/Makefile b/drivers/s390/kvm/Makefile
index 241891a..a3c8fc4 100644
--- a/drivers/s390/kvm/Makefile
+++ b/drivers/s390/kvm/Makefile
@@ -6,4 +6,4 @@
 # it under the terms of the GNU General Public License (version 2 only)
 # as published by the Free Software Foundation.
 
-obj-$(CONFIG_S390_GUEST) += kvm_virtio.o virtio_ccw.o
+obj-$(CONFIG_S390_GUEST) += kvm_virtio.o early_printk.o virtio_ccw.o
diff --git a/drivers/s390/kvm/early_printk.c b/drivers/s390/kvm/early_printk.c
new file mode 100644
index 000..7831530
--- /dev/null
+++ b/drivers/s390/kvm/early_printk.c
@@ -0,0 +1,42 @@
+/*
+ * early_printk.c - code for early console output with virtio_console
+ * split off from kvm_virtio.c
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2 only)
+ * as published by the Free Software Foundation.
+ *
+ *Author(s): Christian Borntraeger borntrae...@de.ibm.com
+ */
+
+#include linux/kernel_stat.h
+#include linux/init.h
+#include linux/err.h
+#include linux/virtio_console.h
+#include asm/kvm_para.h
+#include asm/kvm_virtio.h
+#include asm/setup.h
+#include asm/sclp.h
+
+static __init int early_put_chars(u32 vtermno, const char *buf, int count)
+{
+   char scratch[17];
+   unsigned int len = count;
+
+   if (len  sizeof(scratch) - 1)
+   len = sizeof(scratch) - 1;
+   scratch[len] = '\0';
+   memcpy(scratch, buf, len);
+   kvm_hypercall1(KVM_S390_VIRTIO_NOTIFY, __pa(scratch));
+   return len;
+}
+
+static int __init s390_virtio_console_init(void)
+{
+   if (sclp_has_vt220() || sclp_has_linemode())
+   return -ENODEV;
+   return virtio_cons_early_init(early_put_chars);
+}
+console_initcall(s390_virtio_console_init);
diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
index 76b95f3..6cdc66a 100644
--- a/drivers/s390/kvm/kvm_virtio.c
+++ b/drivers/s390/kvm/kvm_virtio.c
@@ -17,7 +17,6 @@
 #include linux/virtio.h
 #include linux/virtio_config.h
 #include linux/slab.h
-#include linux/virtio_console.h
 #include linux/interrupt.h
 #include linux/virtio_ring.h
 #include linux/export.h
@@ -25,9 +24,9 @@
 #include asm/io.h
 #include asm/kvm_para.h
 #include asm/kvm_virtio.h
-#include asm/sclp.h
 #include asm/setup.h
 #include asm/irq.h
+#include asm/sclp.h
 
 #define VIRTIO_SUBCODE_64 0x0D00
 
@@ -450,8 +449,7 @@ static int __init kvm_devices_init(void)
return -ENODEV;
 
if (test_devices_support(real_memory_size)  0)
-   /* No error. */
-   return 0;
+   return -ENODEV;
 
rc = vmem_add_mapping(real_memory_size, PAGE_SIZE);
if (rc)
@@ -476,29 +474,6 @@ static int __init kvm_devices_init(void)
return 0;
 }
 
-/* code for early console output with virtio_console */
-static __init int early_put_chars(u32 vtermno, const char *buf, int count)
-{
-   char scratch[17];
-   unsigned int len = count;
-
-   if (len  sizeof(scratch) - 1)
-   len = sizeof(scratch) - 1;
-   scratch[len] = '\0';
-   memcpy(scratch, buf, len);
-   kvm_hypercall1(KVM_S390_VIRTIO_NOTIFY, __pa(scratch));
-   return len;
-}
-
-static int __init s390_virtio_console_init(void)
-{
-   if (sclp_has_vt220() || sclp_has_linemode())
-   return -ENODEV;
-   return virtio_cons_early_init(early_put_chars);
-}
-console_initcall(s390_virtio_console_init);
-
-
 /*
  * We do this after core stuff, but before the drivers.
  */
diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 4be878f..135126a 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -17,7 +17,6 @@
 #include linux/virtio.h
 #include linux/virtio_config.h
 #include linux/slab.h
-#include linux/virtio_console.h
 #include linux/interrupt.h
 #include linux/virtio_ring.h
 #include linux/pfn.h
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for 2012-10-30

2012-10-29 Thread Juan Quintela

Hi

Please send in any agenda topics you are interested in.

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 RFC 0/3] kvm: Improving undercommit,overcommit scenarios

2012-10-29 Thread Raghavendra K T
 In some special scenarios like #vcpu = #pcpu, PLE handler may
prove very costly, because there is no need to iterate over vcpus
and do unsuccessful yield_to burning CPU.

 Similarly, when we have large number of small guests, it is
possible that a spinning vcpu fails to yield_to any vcpu of same
VM and go back and spin. This is also not effective when we are
over-committed. Instead, we do a yield() so that we give chance
to other VMs to run.

This patch tries to optimize above scenarios.

 The first patch optimizes all the yield_to by bailing out when there
 is no need to continue yield_to (i.e., when there is only one task 
 in source and target rq).

 Second patch uses that in PLE handler.
 
 Third patch uses overall system load knowledge to take decison on
 continuing in yield_to handler, and also yielding in overcommits.
 To be precise, 
 * loadavg is converted to a scale of 2048  / per CPU 
 * a load value of less than 1024 is considered as undercommit and we
 return from PLE handler in those cases 
 * a load value of greater than 3586 (1.75 * 2048) is considered as overcommit
  and  we yield to other VMs in such cases.

(let threshold = 2048)
Rationale for using threshold/2 for undercommit limit:
 Having a load below (0.5 * threshold) is used to avoid (the concern rasied by 
Rik)
scenarios where we still have lock holder preempted vcpu waiting to be
scheduled. (scenario arises when rq length is  1 even when we are under
committed)

Rationale for using (1.75 * threshold) for overcommit scenario:
This is a heuristic where we should probably see rq length  1
and a vcpu of a different VM is waiting to be scheduled.

 Related future work (independent of this series):
 
 - Dynamically changing PLE window depending on system load.

 Result on 3.7.0-rc1 kernel shows around 146% improvement for ebizzy 1x
 with 32 core PLE machine with 32 vcpu guest.
 I believe we should get very good improvements for overcommit (especially  2)
 on large machines with small vcpu guests. (Could not test this as I do not have
 access to a bigger machine)

base = 3.7.0-rc1 
machine: 32 core mx3850 x5 PLE mc

--+---+---+---++---+
   ebizzy (rec/sec higher is beter)
--+---+---+---++---+
basestdev   patched stdev   %improve 
--+---+---+---++---+
1x  2543.375020.29036279.375082.5226   146.89143   
2x  2410.875096.43272450.7500   207.8136 1.65396
3x  2184.9167   205.52262178.97.2034-0.30131
--+---+---+---++---+

--+---+---+---++---+
dbench (throughput in MB/sec. higher is better)
--+---+---+---++---+
basestdev   patched stdev   %improve 
--+---+---+---++---+
1x  5545.4330   596.43447042.8510  1012.092427.00272
2x  1993.097043.65481990.620075.7837-0.12428
3x  1295.386722.39971315.520836.0075 1.55429
--+---+---+---++---+

 Changes since V1:
 - Discard the idea of exporting nrrunning and optimize in core scheduler 
(Peter)
 - Use yield() instead of schedule in overcommit scenarios (Rik)
 - Use loadavg knowledge to detect undercommit/overcommit

 Peter Zijlstra (1):
  Bail out of yield_to when source and target runqueue has one task

 Raghavendra K T (2):
  Handle yield_to failure return for potential undercommit case
  Check system load and handle different commit cases accordingly

 Please let me know your comments and suggestions.

 Link for V1:
 https://lkml.org/lkml/2012/9/21/168

 kernel/sched/core.c | 25 +++--
 virt/kvm/kvm_main.c | 56 
++--
 2 files changed, 65 insertions(+), 16 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 RFC 1/3] sched: Bail out of yield_to when source and target runqueue has one task

2012-10-29 Thread Raghavendra K T
From: Peter Zijlstra pet...@infradead.org

In case of undercomitted scenarios, especially in large guests
yield_to overhead is significantly high. when run queue length of
source and target is one, take an opportunity to bail out and return
-ESRCH. This return condition can be further exploited to quickly come
out of PLE handler.

Signed-off-by: Peter Zijlstra pet...@infradead.org
Raghavendra, Checking the rq length of target vcpu condition added.
Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---

 kernel/sched/core.c |   25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..fc219a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
  * It's the caller's job to ensure that the target task struct
  * can't go away on us before we can do any checks.
  *
- * Returns true if we indeed boosted the target task.
+ * Returns:
+ * true (0) if we indeed boosted the target task.
+ * false (0) if we failed to boost the target.
+ * -ESRCH if there's no task to yield to.
  */
 bool __sched yield_to(struct task_struct *p, bool preempt)
 {
@@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
preempt)
 
 again:
p_rq = task_rq(p);
+   /*
+* If we're the only runnable task on the rq and target rq also
+* has only one task, there's absolutely no point in yielding.
+*/
+   if (rq-nr_running == 1  p_rq-nr_running == 1) {
+   yielded = -ESRCH;
+   goto out_irq;
+   }
+
double_rq_lock(rq, p_rq);
while (task_rq(p) != p_rq) {
double_rq_unlock(rq, p_rq);
@@ -4310,13 +4322,13 @@ again:
}
 
if (!curr-sched_class-yield_to_task)
-   goto out;
+   goto out_unlock;
 
if (curr-sched_class != p-sched_class)
-   goto out;
+   goto out_unlock;
 
if (task_running(p_rq, p) || p-state)
-   goto out;
+   goto out_unlock;
 
yielded = curr-sched_class-yield_to_task(rq, p, preempt);
if (yielded) {
@@ -4329,11 +4341,12 @@ again:
resched_task(p_rq-curr);
}
 
-out:
+out_unlock:
double_rq_unlock(rq, p_rq);
+out_irq:
local_irq_restore(flags);
 
-   if (yielded)
+   if (yielded  0)
schedule();
 
return yielded;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 RFC 2/3] kvm: Handle yield_to failure return code for potential undercommit case

2012-10-29 Thread Raghavendra K T
From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Also we do not update last boosted vcpu in failure cases.

Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---

 virt/kvm/kvm_main.c |   21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index be70035..e376434 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
 {
struct pid *pid;
struct task_struct *task = NULL;
+   bool ret = false;
 
rcu_read_lock();
pid = rcu_dereference(target-pid);
@@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
task = get_pid_task(target-pid, PIDTYPE_PID);
rcu_read_unlock();
if (!task)
-   return false;
+   return ret;
if (task-flags  PF_VCPU) {
put_task_struct(task);
-   return false;
-   }
-   if (yield_to(task, 1)) {
-   put_task_struct(task);
-   return true;
+   return ret;
}
+   ret = yield_to(task, 1);
put_task_struct(task);
-   return false;
+
+   return ret;
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to);
 
@@ -1697,6 +1696,7 @@ bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu 
*vcpu)
return eligible;
 }
 #endif
+
 void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 {
struct kvm *kvm = me-kvm;
@@ -1727,11 +1727,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
continue;
if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
continue;
-   if (kvm_vcpu_yield_to(vcpu)) {
+
+   yielded = kvm_vcpu_yield_to(vcpu);
+   if (yielded  0)
kvm-last_boosted_vcpu = i;
-   yielded = 1;
+   if (yielded)
break;
-   }
}
}
kvm_vcpu_set_in_spin_loop(me, false);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 RFC 3/3] kvm: Check system load and handle different commit cases accordingly

2012-10-29 Thread Raghavendra K T
From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

The patch indroduces a helper function that calculates the system load
(idea borrowed from loadavg calculation). The load is normalized to
2048 i.e., return value (threshold) of 2048 implies an approximate 1:1
committed guest.

In undercommit cases (threshold/2) we simply return from PLE handler.
In overcommit cases (1.75 * threshold) we do a yield(). The rationale is to
allow other VMs of the host to run instead of burning the cpu cycle.

Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
Idea of yielding in overcommit cases (especially in large number of small guest
cases was
Acked-by: Rik van Riel r...@redhat.com
Andrew Theurer also has stressed the importance of reducing yield_to
overhead and using yield().

(let threshold = 2048)
Rationale for using threshold/2 for undercommit limit:
 Having a load below (0.5 * threshold) is used to avoid (the concern rasied by 
Rik)
scenarios where we still have lock holder preempted vcpu waiting to be
scheduled. (scenario arises when rq length is  1 even when we are under
committed)

Rationale for using (1.75 * threshold) for overcommit scenario:
This is a heuristic where we should probably see rq length  1
and a vcpu of a different VM is waiting to be scheduled.

 virt/kvm/kvm_main.c |   35 +++
 1 file changed, 35 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e376434..28bbdfb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1697,15 +1697,43 @@ bool kvm_vcpu_eligible_for_directed_yield(struct 
kvm_vcpu *vcpu)
 }
 #endif
 
+/*
+ * A load of 2048 corresponds to 1:1 overcommit
+ * undercommit threshold is half the 1:1 overcommit
+ * overcommit threshold is 1.75 times of 1:1 overcommit threshold
+ */
+#define COMMIT_THRESHOLD (FIXED_1)
+#define UNDERCOMMIT_THRESHOLD (COMMIT_THRESHOLD  1)
+#define OVERCOMMIT_THRESHOLD ((COMMIT_THRESHOLD  1) - (COMMIT_THRESHOLD  
2))
+
+unsigned long kvm_system_load(void)
+{
+   unsigned long load;
+
+   load = avenrun[0] + FIXED_1/200;
+   load = load / num_online_cpus();
+
+   return load;
+}
+
 void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 {
struct kvm *kvm = me-kvm;
struct kvm_vcpu *vcpu;
int last_boosted_vcpu = me-kvm-last_boosted_vcpu;
int yielded = 0;
+   unsigned long load;
int pass;
int i;
 
+   load = kvm_system_load();
+   /*
+* When we are undercomitted let us not waste time in
+* iterating over all the VCPUs.
+*/
+   if (load  UNDERCOMMIT_THRESHOLD)
+   return;
+
kvm_vcpu_set_in_spin_loop(me, true);
/*
 * We boost the priority of a VCPU that is runnable but not
@@ -1735,6 +1763,13 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
break;
}
}
+   /*
+* If we are not able to yield especially in overcommit cases
+* let us be courteous to other VM's VCPUs waiting to be scheduled.
+*/
+   if (!yielded  load  OVERCOMMIT_THRESHOLD)
+   yield();
+
kvm_vcpu_set_in_spin_loop(me, false);
 
/* Ensure vcpu is not eligible during next spinloop */

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 08/18] x86: pvclock: generic pvclock vsyscall initialization

2012-10-29 Thread Glauber Costa
On 10/24/2012 05:13 PM, Marcelo Tosatti wrote:
 Index: vsyscall/arch/x86/Kconfig
 ===
 --- vsyscall.orig/arch/x86/Kconfig
 +++ vsyscall/arch/x86/Kconfig
 @@ -632,6 +632,13 @@ config PARAVIRT_SPINLOCKS
  
  config PARAVIRT_CLOCK
   bool
 +config PARAVIRT_CLOCK_VSYSCALL
 + bool Paravirt clock vsyscall support
 + depends on PARAVIRT_CLOCK  GENERIC_TIME_VSYSCALL
 + ---help---
 +   Enable performance critical clock related system calls to
 +   be executed in userspace, provided that the hypervisor
 +   supports it.
  
  endif

Besides debugging, what is the point in having this as an
extra-selectable? Is there any case in which a virtual machine has code
for this, but may decide to run without it ?

I believe all this code in vsyscall should be wrapped in PARAVIRT_CLOCK
only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 08/18] x86: pvclock: generic pvclock vsyscall initialization

2012-10-29 Thread Glauber Costa
On 10/24/2012 05:13 PM, Marcelo Tosatti wrote:
 + */
 +int __init pvclock_init_vsyscall(void)
 +{
 + int idx;
 + unsigned int size = PVCLOCK_VSYSCALL_NR_PAGES*PAGE_SIZE;
 +
 + pvclock_vdso_info = __alloc_bootmem(size, PAGE_SIZE, 0);
 + if (!pvclock_vdso_info)
 + return -ENOMEM;
 +
 + memset(pvclock_vdso_info, 0, size);
 +
 + for (idx = 0; idx = (PVCLOCK_FIXMAP_END-PVCLOCK_FIXMAP_BEGIN); idx++) {
 + __set_fixmap(PVCLOCK_FIXMAP_BEGIN + idx,
 +  __pa_symbol(pvclock_vdso_info) + (idx*PAGE_SIZE),
 +  PAGE_KERNEL_VVAR);


BTW, Previous line is whitespace damaged.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 09/18] KVM: x86: introduce facility to support vsyscall pvclock, via MSR

2012-10-29 Thread Glauber Costa
On 10/24/2012 05:13 PM, Marcelo Tosatti wrote:
 Allow a guest to register a second location for the VCPU time info
 
 structure for each vcpu (as described by MSR_KVM_SYSTEM_TIME_NEW).
 This is intended to allow the guest kernel to map this information
 into a usermode accessible page, so that usermode can efficiently
 calculate system time from the TSC without having to make a syscall.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Can you please be a bit more specific about why we need this? Why does
the host need to provide us with two pages with the exact same data? Why
can't just do it with mapping tricks in the guest?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 11/18] x86: vsyscall: pass mode to gettime backend

2012-10-29 Thread Glauber Costa
On 10/24/2012 05:13 PM, Marcelo Tosatti wrote:
 Required by next patch.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
I don't see where.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 12/18] x86: vdso: pvclock gettime support

2012-10-29 Thread Glauber Costa
On 10/24/2012 05:13 PM, Marcelo Tosatti wrote:
 Improve performance of time system calls when using Linux pvclock, 
 by reading time info from fixmap visible copy of pvclock data.
 
 Originally from Jeremy Fitzhardinge.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 Index: vsyscall/arch/x86/vdso/vclock_gettime.c
 ===
 --- vsyscall.orig/arch/x86/vdso/vclock_gettime.c
 +++ vsyscall/arch/x86/vdso/vclock_gettime.c
 @@ -22,6 +22,7 @@
  #include asm/hpet.h
  #include asm/unistd.h
  #include asm/io.h
 +#include asm/pvclock.h
  
  #define gtod (VVAR(vsyscall_gtod_data))
  
 @@ -62,6 +63,69 @@ static notrace cycle_t vread_hpet(void)
   return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + 0xf0);
  }
  
 +#ifdef CONFIG_PARAVIRT_CLOCK_VSYSCALL
 +
 +static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
 +{
 + const aligned_pvti_t *pvti_base;
 + int idx = cpu / (PAGE_SIZE/PVTI_SIZE);
 + int offset = cpu % (PAGE_SIZE/PVTI_SIZE);
 +
 + BUG_ON(PVCLOCK_FIXMAP_BEGIN + idx  PVCLOCK_FIXMAP_END);
 +
 + pvti_base = (aligned_pvti_t *)__fix_to_virt(PVCLOCK_FIXMAP_BEGIN+idx);
 +
 + return pvti_base[offset].info;
 +}
 +

Unless I am missing something, if gcc decides to not inline get_pvti,
this will break, right? I believe you need to mark that function with
__always_inline.

 +static notrace cycle_t vread_pvclock(int *mode)
 +{
 + const struct pvclock_vsyscall_time_info *pvti;
 + cycle_t ret;
 + u64 last;
 + u32 version;
 + u32 migrate_count;
 + u8 flags;
 + unsigned cpu, cpu1;
 +
 +
 + /*
 +  * When looping to get a consistent (time-info, tsc) pair, we
 +  * also need to deal with the possibility we can switch vcpus,
 +  * so make sure we always re-fetch time-info for the current vcpu.
 +  */
 + do {
 + cpu = __getcpu()  0xfff;

Please wrap this 0xfff into something meaningful.

 + pvti = get_pvti(cpu);
 +
 + migrate_count = pvti-migrate_count;
 +
 + version = __pvclock_read_cycles(pvti-pvti, ret, flags);
 +
 + /*
 +  * Test we're still on the cpu as well as the version.
 +  * We could have been migrated just after the first
 +  * vgetcpu but before fetching the version, so we
 +  * wouldn't notice a version change.
 +  */
 + cpu1 = __getcpu()  0xfff;
 + } while (unlikely(cpu != cpu1 ||
 +   (pvti-pvti.version  1) ||
 +   pvti-pvti.version != version ||
 +   pvti-migrate_count != migrate_count));
 +
 + if (unlikely(!(flags  PVCLOCK_TSC_STABLE_BIT)))
 + *mode = VCLOCK_NONE;
 +
 + last = VVAR(vsyscall_gtod_data).clock.cycle_last;
 +
 + if (likely(ret = last))
 + return ret;
 +

Please add a comment here referring to tsc.c, where an explanation of
this test lives. This is quite non-obvious for the non initiated.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 13/18] KVM: x86: pass host_tsc to read_l1_tsc

2012-10-29 Thread Glauber Costa
On 10/24/2012 05:13 PM, Marcelo Tosatti wrote:
 Allow the caller to pass host tsc value to kvm_x86_ops-read_l1_tsc().
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Would you mind explaining why?

it seems to me that rdtscll() here would be perfectly safe: the only
case in which they wouldn't, is in a nested-vm environment running
paravirt-linux with a paravirt tsc. In this case, it is quite likely
that we'll want rdtscll *anyway*, instead of going to tsc directly.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can we run guest OS without using NAT and iptables?

2012-10-29 Thread Cole Robinson
On 10/29/2012 05:30 AM, Stefan Hajnoczi wrote:
 On Mon, Oct 29, 2012 at 12:55:43PM +0530, freak 62 wrote:
  Can we run guest o.s. on KVM without enabling NAT and iptables?

 The reason to do this is , I wanted to disable conntrack module
 from my system and to disable that I must have to delete iptable and
 NAT.

  I am getting the following message, when I start guest o.s. on
 KVM (iptable and NAT disabled):

 Error starting domain: internal error 'Network default' is not 
 active.

  Is their any way to run guest o.s. with NAT disabled? or Is their
 any way to disable conntrack module and still can use KVM to run guest
 OS ?

I am using Ubuntu 10.04

You can remove the default virsh network like

sudo virsh net-destroy default
sudo virsh net-undefine default

The most common networking setup that doesn't use NAT + iptables is probably
bridged networking:

http://wiki.libvirt.org/page/Networking#Bridged_networking_.28aka_.22shared_physical_device.22.29

- Cole

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 08/18] x86: pvclock: generic pvclock vsyscall initialization

2012-10-29 Thread Marcelo Tosatti
On Mon, Oct 29, 2012 at 06:18:20PM +0400, Glauber Costa wrote:
 On 10/24/2012 05:13 PM, Marcelo Tosatti wrote:
  Index: vsyscall/arch/x86/Kconfig
  ===
  --- vsyscall.orig/arch/x86/Kconfig
  +++ vsyscall/arch/x86/Kconfig
  @@ -632,6 +632,13 @@ config PARAVIRT_SPINLOCKS
   
   config PARAVIRT_CLOCK
  bool
  +config PARAVIRT_CLOCK_VSYSCALL
  +   bool Paravirt clock vsyscall support
  +   depends on PARAVIRT_CLOCK  GENERIC_TIME_VSYSCALL
  +   ---help---
  + Enable performance critical clock related system calls to
  + be executed in userspace, provided that the hypervisor
  + supports it.
   
   endif
 
 Besides debugging, what is the point in having this as an
 extra-selectable? Is there any case in which a virtual machine has code
 for this, but may decide to run without it ?

Don't think so (its pretty small anyway, the code).

 I believe all this code in vsyscall should be wrapped in PARAVIRT_CLOCK
 only.

Unless Jeremy has a reason, i'm fine with that.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/8] tun: report orphan frags errors to zero copy callback

2012-10-29 Thread Michael S. Tsirkin
When tun transmits a zero copy skb, it orphans the frags
which might need to allocate extra memory, in atomic context.
If that fails, notify ubufs callback before freeing the skb
as a hint that device should disable zerocopy mode.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/tun.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 3157519..613f826 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -433,6 +433,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
 drop:
dev-stats.tx_dropped++;
+   skb_tx_error(skb, -ENOMEM);
kfree_skb(skb);
return NETDEV_TX_OK;
 }
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 0/8] enable/disable zero copy tx dynamically

2012-10-29 Thread Michael S. Tsirkin

tun supports zero copy transmit since 0690899b4d4501b3505be069b9a687e68ccbe15b,
however you can only enable this mode if you know your workload does not
trigger heavy guest to host/host to guest traffic - otherwise you
get a (minor) performance regression.
This patchset addresses this problem by notifying the owner
device when callback is invoked because of a data copy.
This makes it possible to detect whether zero copy is appropriate
dynamically: we start in zero copy mode, when we detect
data copied we disable zero copy for a while.

With this patch applied, I get the same performance for
guest to host and guest to guest both with and without zero copy tx.

Michael S. Tsirkin (8):
  skb: report completion status for zero copy skbs
  skb: api to report errors for zero copy skbs
  tun: report orphan frags errors to zero copy callback
  vhost-net: cleanup macros for DMA status tracking
  vhost: track zero copy failures using DMA length
  vhost: move -net specific code out
  vhost-net: select tx zero copy dynamically
  vhost-net: reduce vq polling on tx zerocopy

 drivers/net/tun.c |   1 +
 drivers/vhost/net.c   | 109 +++---
 drivers/vhost/tcm_vhost.c |   1 +
 drivers/vhost/vhost.c |  52 +++---
 drivers/vhost/vhost.h |  11 ++---
 include/linux/skbuff.h|   5 ++-
 net/core/skbuff.c |  23 +-
 7 files changed, 141 insertions(+), 61 deletions(-)

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/8] skb: report completion status for zero copy skbs

2012-10-29 Thread Michael S. Tsirkin
Even if skb is marked for zero copy, net core might still decide
to copy it later which is somewhat slower than a copy in user context:
besides copying the data we need to pin/unpin the pages.

Add a parameter reporting such cases through zero copy callback:
if this happens a lot, device can take this into account
and switch to copying in user context.

This patch updates all users but ignores the passed value for now:
it will be used by follow-up patches.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/vhost.c  | 2 +-
 drivers/vhost/vhost.h  | 2 +-
 include/linux/skbuff.h | 4 +++-
 net/core/skbuff.c  | 4 ++--
 4 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 99ac2cb..92308b6 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1600,7 +1600,7 @@ void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref *ubufs)
kfree(ubufs);
 }
 
-void vhost_zerocopy_callback(struct ubuf_info *ubuf)
+void vhost_zerocopy_callback(struct ubuf_info *ubuf, int zerocopy_status)
 {
struct vhost_ubuf_ref *ubufs = ubuf-ctx;
struct vhost_virtqueue *vq = ubufs-vq;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 1125af3..eb7263c3 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -191,7 +191,7 @@ bool vhost_enable_notify(struct vhost_dev *, struct 
vhost_virtqueue *);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
unsigned int log_num, u64 len);
-void vhost_zerocopy_callback(struct ubuf_info *);
+void vhost_zerocopy_callback(struct ubuf_info *, int);
 int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq);
 
 #define vq_err(vq, fmt, ...) do {  \
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6a2c34e..8bac11b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -235,11 +235,13 @@ enum {
 /*
  * The callback notifies userspace to release buffers when skb DMA is done in
  * lower device, the skb last reference should be 0 when calling this.
+ * The zerocopy_status argument is 0 if zero copy transmit occurred,
+ * 1 on successful data copy;  0 on out of memory error.
  * The ctx field is used to track device context.
  * The desc field is used to track userspace buffer index.
  */
 struct ubuf_info {
-   void (*callback)(struct ubuf_info *);
+   void (*callback)(struct ubuf_info *, int zerocopy_status);
void *ctx;
unsigned long desc;
 };
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6e04b1f..eb31f6e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -519,7 +519,7 @@ static void skb_release_data(struct sk_buff *skb)
 
uarg = skb_shinfo(skb)-destructor_arg;
if (uarg-callback)
-   uarg-callback(uarg);
+   uarg-callback(uarg, 0);
}
 
if (skb_has_frag_list(skb))
@@ -797,7 +797,7 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
for (i = 0; i  num_frags; i++)
skb_frag_unref(skb, i);
 
-   uarg-callback(uarg);
+   uarg-callback(uarg, 1);
 
/* skb frags point to kernel buffers */
for (i = num_frags - 1; i = 0; i--) {
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/8] skb: api to report errors for zero copy skbs

2012-10-29 Thread Michael S. Tsirkin
Orphaning frags for zero copy skbs needs to allocate data in atomic
context so is has a chance to fail. If it does we currently discard
the skb which is safe, but we don't report anything to the caller,
so it can not recover by e.g. disabling zero copy.

Add an API to free skb reporting such errors: this is used
by tun in case orphaning frags fails.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 include/linux/skbuff.h |  1 +
 net/core/skbuff.c  | 19 +++
 2 files changed, 20 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8bac11b..0644432 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -568,6 +568,7 @@ static inline struct rtable *skb_rtable(const struct 
sk_buff *skb)
 }
 
 extern void kfree_skb(struct sk_buff *skb);
+extern void skb_tx_error(struct sk_buff *skb, int err);
 extern void consume_skb(struct sk_buff *skb);
 extern void   __kfree_skb(struct sk_buff *skb);
 extern struct kmem_cache *skbuff_head_cache;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index eb31f6e..ad99c64 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -635,6 +635,25 @@ void kfree_skb(struct sk_buff *skb)
 EXPORT_SYMBOL(kfree_skb);
 
 /**
+ * kfree_skb_on_error - report an sk_buff xmit error
+ * @skb: buffer that triggered an error
+ *
+ * Report xmit error if a device callback is tracking this skb.
+ */
+void skb_tx_error(struct sk_buff *skb, int err)
+{
+   if (skb_shinfo(skb)-tx_flags  SKBTX_DEV_ZEROCOPY) {
+   struct ubuf_info *uarg;
+
+   uarg = skb_shinfo(skb)-destructor_arg;
+   if (uarg-callback)
+   uarg-callback(uarg, err);
+   skb_shinfo(skb)-tx_flags = ~SKBTX_DEV_ZEROCOPY;
+   }
+}
+EXPORT_SYMBOL(skb_tx_error);
+
+/**
  * consume_skb - free an skbuff
  * @skb: buffer to free
  *
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 8/8] vhost-net: reduce vq polling on tx zerocopy

2012-10-29 Thread Michael S. Tsirkin
It seems that to avoid deadlocks it is enough to poll vq before
 we are going to use the last buffer.  This should be faster than
c70aa540c7a9f67add11ad3161096fb95233aa2e.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8e9de79..3967f82 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -197,8 +197,16 @@ static void vhost_zerocopy_callback(struct ubuf_info 
*ubuf, int status)
 {
struct vhost_ubuf_ref *ubufs = ubuf-ctx;
struct vhost_virtqueue *vq = ubufs-vq;
-
-   vhost_poll_queue(vq-poll);
+   int cnt = atomic_read(ubufs-kref.refcount);
+
+   /*
+* Trigger polling thread if guest stopped submitting new buffers:
+* in this case, the refcount after decrement will eventually reach 1
+* so here it is 2.
+* We also trigger polling periodically after each 16 packets.
+*/
+   if (cnt = 2 || !(cnt % 16))
+   vhost_poll_queue(vq-poll);
/* set len to mark this desc buffers done DMA */
vq-heads[ubuf-desc].len = status ?
VHOST_DMA_FAILED_LEN : VHOST_DMA_DONE_LEN;
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 4/8] vhost-net: cleanup macros for DMA status tracking

2012-10-29 Thread Michael S. Tsirkin
Better document macros for DMA tracking. Add an
explicit one for DMA in progress instead of
relying on user supplying len != 1.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c   |  3 ++-
 drivers/vhost/vhost.c |  2 +-
 drivers/vhost/vhost.h | 12 +---
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 072cbba..f80ae5f 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -237,7 +237,8 @@ static void handle_tx(struct vhost_net *net)
} else {
struct ubuf_info *ubuf = vq-ubuf_info[head];
 
-   vq-heads[vq-upend_idx].len = len;
+   vq-heads[vq-upend_idx].len =
+   VHOST_DMA_IN_PROGRESS;
ubuf-callback = vhost_zerocopy_callback;
ubuf-ctx = vq-ubufs;
ubuf-desc = vq-upend_idx;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 92308b6..906fd9f 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1606,7 +1606,7 @@ void vhost_zerocopy_callback(struct ubuf_info *ubuf, int 
zerocopy_status)
struct vhost_virtqueue *vq = ubufs-vq;
 
vhost_poll_queue(vq-poll);
-   /* set len = 1 to mark this desc buffers done DMA */
+   /* set len to mark this desc buffers done DMA */
vq-heads[ubuf-desc].len = VHOST_DMA_DONE_LEN;
kref_put(ubufs-kref, vhost_zerocopy_done_signal);
 }
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index eb7263c3..ad72a1f 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -13,9 +13,15 @@
 #include linux/virtio_ring.h
 #include linux/atomic.h
 
-/* This is for zerocopy, used buffer len is set to 1 when lower device DMA
- * done */
-#define VHOST_DMA_DONE_LEN 1
+/*
+ * For transmit, used buffer len is unused; we override it to track buffer
+ * status internally; used for zerocopy tx only.
+ */
+/* Lower device DMA done */
+#define VHOST_DMA_DONE_LEN 2
+/* Lower device DMA in progress */
+#define VHOST_DMA_IN_PROGRESS  1
+/* Buffer unused */
 #define VHOST_DMA_CLEAR_LEN0
 
 struct vhost_device;
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 6/8] vhost: move -net specific code out

2012-10-29 Thread Michael S. Tsirkin
Zerocopy handling code is vhost-net specific.
Move it from vhost.c/vhost.h out to net.c

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c   | 45 
 drivers/vhost/tcm_vhost.c |  1 +
 drivers/vhost/vhost.c | 53 +++
 drivers/vhost/vhost.h | 21 +++
 4 files changed, 56 insertions(+), 64 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f80ae5f..532fc88 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -126,6 +126,42 @@ static void tx_poll_start(struct vhost_net *net, struct 
socket *sock)
net-tx_poll_state = VHOST_NET_POLL_STARTED;
 }
 
+/* In case of DMA done not in order in lower device driver for some reason.
+ * upend_idx is used to track end of used idx, done_idx is used to track head
+ * of used idx. Once lower device DMA done contiguously, we will signal KVM
+ * guest used idx.
+ */
+int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
+{
+   int i;
+   int j = 0;
+
+   for (i = vq-done_idx; i != vq-upend_idx; i = (i + 1) % UIO_MAXIOV) {
+   if (VHOST_DMA_IS_DONE(vq-heads[i].len)) {
+   vq-heads[i].len = VHOST_DMA_CLEAR_LEN;
+   vhost_add_used_and_signal(vq-dev, vq,
+ vq-heads[i].id, 0);
+   ++j;
+   } else
+   break;
+   }
+   if (j)
+   vq-done_idx = i;
+   return j;
+}
+
+static void vhost_zerocopy_callback(struct ubuf_info *ubuf, int status)
+{
+   struct vhost_ubuf_ref *ubufs = ubuf-ctx;
+   struct vhost_virtqueue *vq = ubufs-vq;
+
+   vhost_poll_queue(vq-poll);
+   /* set len to mark this desc buffers done DMA */
+   vq-heads[ubuf-desc].len = status ?
+   VHOST_DMA_FAILED_LEN : VHOST_DMA_DONE_LEN;
+   vhost_ubuf_put(ubufs);
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_tx(struct vhost_net *net)
@@ -594,9 +630,18 @@ static int vhost_net_release(struct inode *inode, struct 
file *f)
struct vhost_net *n = f-private_data;
struct socket *tx_sock;
struct socket *rx_sock;
+   int i;
 
vhost_net_stop(n, tx_sock, rx_sock);
vhost_net_flush(n);
+   vhost_dev_stop(n-dev);
+   for (i = 0; i  n-dev.nvqs; ++i) {
+   /* Wait for all lower device DMAs done. */
+   if (n-dev.vqs[i].ubufs)
+   vhost_ubuf_put_and_wait(n-dev.vqs[i].ubufs);
+
+   vhost_zerocopy_signal_used(n, n-dev.vqs[i]);
+   }
vhost_dev_cleanup(n-dev, false);
if (tx_sock)
fput(tx_sock-file);
diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
index aa31692..23c138f 100644
--- a/drivers/vhost/tcm_vhost.c
+++ b/drivers/vhost/tcm_vhost.c
@@ -895,6 +895,7 @@ static int vhost_scsi_release(struct inode *inode, struct 
file *f)
vhost_scsi_clear_endpoint(s, backend);
}
 
+   vhost_dev_stop(s-dev);
vhost_dev_cleanup(s-dev, false);
kfree(s);
return 0;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5affce3..ef8f598 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -26,10 +26,6 @@
 #include linux/kthread.h
 #include linux/cgroup.h
 
-#include linux/net.h
-#include linux/if_packet.h
-#include linux/if_arp.h
-
 #include vhost.h
 
 enum {
@@ -414,28 +410,16 @@ long vhost_dev_reset_owner(struct vhost_dev *dev)
return 0;
 }
 
-/* In case of DMA done not in order in lower device driver for some reason.
- * upend_idx is used to track end of used idx, done_idx is used to track head
- * of used idx. Once lower device DMA done contiguously, we will signal KVM
- * guest used idx.
- */
-int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
+void vhost_dev_stop(struct vhost_dev *dev)
 {
int i;
-   int j = 0;
-
-   for (i = vq-done_idx; i != vq-upend_idx; i = (i + 1) % UIO_MAXIOV) {
-   if (VHOST_DMA_IS_DONE(vq-heads[i].len)) {
-   vq-heads[i].len = VHOST_DMA_CLEAR_LEN;
-   vhost_add_used_and_signal(vq-dev, vq,
- vq-heads[i].id, 0);
-   ++j;
-   } else
-   break;
+
+   for (i = 0; i  dev-nvqs; ++i) {
+   if (dev-vqs[i].kick  dev-vqs[i].handle_kick) {
+   vhost_poll_stop(dev-vqs[i].poll);
+   vhost_poll_flush(dev-vqs[i].poll);
+   }
}
-   if (j)
-   vq-done_idx = i;
-   return j;
 }
 
 /* Caller should have device mutex if and only if locked is set */
@@ -444,17 +428,6 @@ void vhost_dev_cleanup(struct vhost_dev *dev, bool locked)
int i;
 
for (i = 0; 

[PATCH net-next 7/8] vhost-net: select tx zero copy dynamically

2012-10-29 Thread Michael S. Tsirkin
Even when vhost-net is in zero-copy transmit mode,
net core might still decide to copy the skb later
which is somewhat slower than a copy in user
context: data copy overhead is added to the cost of
page pin/unpin. The result is that enabling tx zero copy
option leads to higher CPU utilization for guest to guest
and guest to host traffic.

To fix this, suppress zero copy tx after a given number of
packets triggered late data copy. Re-enable periodically
to detect workload changes.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c | 55 -
 1 file changed, 50 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 532fc88..8e9de79 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -42,6 +42,21 @@ MODULE_PARM_DESC(experimental_zcopytx, Enable Experimental 
Zero Copy TX);
 #define VHOST_MAX_PEND 128
 #define VHOST_GOODCOPY_LEN 256
 
+/*
+ * For transmit, used buffer len is unused; we override it to track buffer
+ * status internally; used for zerocopy tx only.
+ */
+/* Lower device DMA failed */
+#define VHOST_DMA_FAILED_LEN   3
+/* Lower device DMA done */
+#define VHOST_DMA_DONE_LEN 2
+/* Lower device DMA in progress */
+#define VHOST_DMA_IN_PROGRESS  1
+/* Buffer unused */
+#define VHOST_DMA_CLEAR_LEN0
+
+#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN)
+
 enum {
VHOST_NET_VQ_RX = 0,
VHOST_NET_VQ_TX = 1,
@@ -62,8 +77,33 @@ struct vhost_net {
 * We only do this when socket buffer fills up.
 * Protected by tx vq lock. */
enum vhost_net_poll_state tx_poll_state;
+   /* Number of TX recently submitted.
+* Protected by tx vq lock. */
+   unsigned tx_packets;
+   /* Number of times zerocopy TX recently failed.
+* Protected by tx vq lock. */
+   unsigned tx_zcopy_err;
 };
 
+static void vhost_net_tx_packet(struct vhost_net *net)
+{
+   ++net-tx_packets;
+   if (net-tx_packets  1024)
+   return;
+   net-tx_packets = 0;
+   net-tx_zcopy_err = 0;
+}
+
+static void vhost_net_tx_err(struct vhost_net *net)
+{
+   ++net-tx_zcopy_err;
+}
+
+static bool vhost_net_tx_select_zcopy(struct vhost_net *net)
+{
+   return net-tx_packets / 64 = net-tx_zcopy_err;
+}
+
 static bool vhost_sock_zcopy(struct socket *sock)
 {
return unlikely(experimental_zcopytx) 
@@ -131,12 +171,15 @@ static void tx_poll_start(struct vhost_net *net, struct 
socket *sock)
  * of used idx. Once lower device DMA done contiguously, we will signal KVM
  * guest used idx.
  */
-int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
+static int vhost_zerocopy_signal_used(struct vhost_net *net,
+ struct vhost_virtqueue *vq)
 {
int i;
int j = 0;
 
for (i = vq-done_idx; i != vq-upend_idx; i = (i + 1) % UIO_MAXIOV) {
+   if (vq-heads[i].len == VHOST_DMA_FAILED_LEN)
+   vhost_net_tx_err(net);
if (VHOST_DMA_IS_DONE(vq-heads[i].len)) {
vq-heads[i].len = VHOST_DMA_CLEAR_LEN;
vhost_add_used_and_signal(vq-dev, vq,
@@ -208,7 +251,7 @@ static void handle_tx(struct vhost_net *net)
for (;;) {
/* Release DMAs done buffers first */
if (zcopy)
-   vhost_zerocopy_signal_used(vq);
+   vhost_zerocopy_signal_used(net, vq);
 
head = vhost_get_vq_desc(net-dev, vq, vq-iov,
 ARRAY_SIZE(vq-iov),
@@ -263,7 +306,8 @@ static void handle_tx(struct vhost_net *net)
/* use msg_control to pass vhost zerocopy ubuf info to skb */
if (zcopy) {
vq-heads[vq-upend_idx].id = head;
-   if (len  VHOST_GOODCOPY_LEN) {
+   if (!vhost_net_tx_select_zcopy(net) ||
+   len  VHOST_GOODCOPY_LEN) {
/* copy don't need to wait for DMA done */
vq-heads[vq-upend_idx].len =
VHOST_DMA_DONE_LEN;
@@ -305,8 +349,9 @@ static void handle_tx(struct vhost_net *net)
if (!zcopy)
vhost_add_used_and_signal(net-dev, vq, head, 0);
else
-   vhost_zerocopy_signal_used(vq);
+   vhost_zerocopy_signal_used(net, vq);
total_len += len;
+   vhost_net_tx_packet(net);
if (unlikely(total_len = VHOST_NET_WEIGHT)) {
vhost_poll_queue(vq-poll);
break;
@@ -774,7 +819,7 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
if (oldubufs) {
vhost_ubuf_put_and_wait(oldubufs);
mutex_lock(vq-mutex);
-   

[PATCH net-next 5/8] vhost: track zero copy failures using DMA length

2012-10-29 Thread Michael S. Tsirkin
This will be used to disable zerocopy when error rate
is high.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/vhost.c | 7 ---
 drivers/vhost/vhost.h | 4 
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 906fd9f..5affce3 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -425,7 +425,7 @@ int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq)
int j = 0;
 
for (i = vq-done_idx; i != vq-upend_idx; i = (i + 1) % UIO_MAXIOV) {
-   if ((vq-heads[i].len == VHOST_DMA_DONE_LEN)) {
+   if (VHOST_DMA_IS_DONE(vq-heads[i].len)) {
vq-heads[i].len = VHOST_DMA_CLEAR_LEN;
vhost_add_used_and_signal(vq-dev, vq,
  vq-heads[i].id, 0);
@@ -1600,13 +1600,14 @@ void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref 
*ubufs)
kfree(ubufs);
 }
 
-void vhost_zerocopy_callback(struct ubuf_info *ubuf, int zerocopy_status)
+void vhost_zerocopy_callback(struct ubuf_info *ubuf, int status)
 {
struct vhost_ubuf_ref *ubufs = ubuf-ctx;
struct vhost_virtqueue *vq = ubufs-vq;
 
vhost_poll_queue(vq-poll);
/* set len to mark this desc buffers done DMA */
-   vq-heads[ubuf-desc].len = VHOST_DMA_DONE_LEN;
+   vq-heads[ubuf-desc].len = status ?
+   VHOST_DMA_FAILED_LEN : VHOST_DMA_DONE_LEN;
kref_put(ubufs-kref, vhost_zerocopy_done_signal);
 }
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index ad72a1f..6fdf31d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -17,6 +17,8 @@
  * For transmit, used buffer len is unused; we override it to track buffer
  * status internally; used for zerocopy tx only.
  */
+/* Lower device DMA failed */
+#define VHOST_DMA_FAILED_LEN   3
 /* Lower device DMA done */
 #define VHOST_DMA_DONE_LEN 2
 /* Lower device DMA in progress */
@@ -24,6 +26,8 @@
 /* Buffer unused */
 #define VHOST_DMA_CLEAR_LEN0
 
+#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN)
+
 struct vhost_device;
 
 struct vhost_work;
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: fix rbtree-interval search

2012-10-29 Thread William Dauchy
On Oct29 14:12, Kirill A. Shutemov wrote:
 I've noticed message on kvm exit:
 
   Warning: serial8250__exit failed.
 
 kvm tool is not able to remove ioport range which was added previously.
 
 The issue is caused by bug in rbtree-interval. Search algorithm in
 rb_int_search_single() expects correct value of max_high. But the tree
 can contain leaf nodes, which never were updated by propagate_callback().
 For this kind of nodes high_max will be 0 and we will not be able to
 find and remove them.
 
 Let's initialize max_high on RB_INT_INIT() time.
 
 Fixing this bug makes other bug visible: propagate_callback() can be
 called for empty tree: node == NULL. The callback is not ready for empty
 tree. Let's fix that as well.
 
 Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com

I had the same issue but didn't found the time to fix it.
Applying the patch fixes the problem.

Tested-by: William Dauchy will...@gandi.net

Thanks,
-- 
William


signature.asc
Description: Digital signature


Re: [patch 09/18] KVM: x86: introduce facility to support vsyscall pvclock, via MSR

2012-10-29 Thread Jeremy Fitzhardinge
On 10/29/2012 07:45 AM, Glauber Costa wrote:
 On 10/24/2012 05:13 PM, Marcelo Tosatti wrote:
 Allow a guest to register a second location for the VCPU time info

 structure for each vcpu (as described by MSR_KVM_SYSTEM_TIME_NEW).
 This is intended to allow the guest kernel to map this information
 into a usermode accessible page, so that usermode can efficiently
 calculate system time from the TSC without having to make a syscall.

 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 Can you please be a bit more specific about why we need this? Why does
 the host need to provide us with two pages with the exact same data? Why
 can't just do it with mapping tricks in the guest?

In Xen the pvclock structure is embedded within a pile of other stuff
that shouldn't be mapped into guest memory, so providing for a second
location allows it to be placed whereever is convenient for the guest.
That's a restriction of the Xen ABI, but I don't know if it affects KVM.

J
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 RFC 3/3] kvm: Check system load and handle different commit cases accordingly

2012-10-29 Thread Peter Zijlstra
On Mon, 2012-10-29 at 19:37 +0530, Raghavendra K T wrote:
 +/*
 + * A load of 2048 corresponds to 1:1 overcommit
 + * undercommit threshold is half the 1:1 overcommit
 + * overcommit threshold is 1.75 times of 1:1 overcommit threshold
 + */
 +#define COMMIT_THRESHOLD (FIXED_1)
 +#define UNDERCOMMIT_THRESHOLD (COMMIT_THRESHOLD  1)
 +#define OVERCOMMIT_THRESHOLD ((COMMIT_THRESHOLD  1) -
 (COMMIT_THRESHOLD  2))
 +
 +unsigned long kvm_system_load(void)
 +{
 +   unsigned long load;
 +
 +   load = avenrun[0] + FIXED_1/200;
 +   load = load / num_online_cpus();
 +
 +   return load;
 +} 

ARGH.. no that's wrong.. very wrong.

 1) avenrun[] EXPORT_SYMBOL says it should be removed, that's not a
joke.

 2) avenrun[] is a global load, do not ever use a global load measure

 3) avenrun[] has nothing what so ever to do with runqueue lengths,
someone with a gazillion tasks in D state will get a huge load but the
cpu is very idle.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] s390: Guest support for virtio-ccw.

2012-10-29 Thread Alexander Graf

On 29.10.2012, at 14:07, Cornelia Huck wrote:

 Avi, Marcelo,
 
 I'd like to propose inclusion of the guest support patches for
 virtio-ccw into 3.8.
 
 I'm confident that the host - guest interface for virtio-ccw
 is fine now, and the patches have been extensively tested by our
 internal test team.
 
 Patch 1 might conceivably be 3.7 material, though I fear it's a
 bit late for that.

Well, patch 1 without virtio-ccw support is quite useless, right? You wouldn't 
get any I/O at all.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 08/18] x86: pvclock: generic pvclock vsyscall initialization

2012-10-29 Thread Jeremy Fitzhardinge
On 10/29/2012 07:54 AM, Marcelo Tosatti wrote:
 On Mon, Oct 29, 2012 at 06:18:20PM +0400, Glauber Costa wrote:
 On 10/24/2012 05:13 PM, Marcelo Tosatti wrote:
 Index: vsyscall/arch/x86/Kconfig
 ===
 --- vsyscall.orig/arch/x86/Kconfig
 +++ vsyscall/arch/x86/Kconfig
 @@ -632,6 +632,13 @@ config PARAVIRT_SPINLOCKS
  
  config PARAVIRT_CLOCK
 bool
 +config PARAVIRT_CLOCK_VSYSCALL
 +   bool Paravirt clock vsyscall support
 +   depends on PARAVIRT_CLOCK  GENERIC_TIME_VSYSCALL
 +   ---help---
 + Enable performance critical clock related system calls to
 + be executed in userspace, provided that the hypervisor
 + supports it.
  
  endif
 Besides debugging, what is the point in having this as an
 extra-selectable? Is there any case in which a virtual machine has code
 for this, but may decide to run without it ?
 Don't think so (its pretty small anyway, the code).

 I believe all this code in vsyscall should be wrapped in PARAVIRT_CLOCK
 only.
 Unless Jeremy has a reason, i'm fine with that.

I often set up blind config variables for dependency management; I'm
guessing the GENERIC_TIME_VSYSCALL dependency is important.  I think
the problem is that this exists, but that it's a user-selectable
option.  Removing the prompt should fix that.

J

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] KVM: s390: Add a channel I/O based virtio transport driver.

2012-10-29 Thread Alexander Graf

On 29.10.2012, at 14:07, Cornelia Huck wrote:

 Add a driver for kvm guests that matches virtual ccw devices provided
 by the host as virtio bridge devices.
 
 These virtio-ccw devices use a special set of channel commands in order
 to perform virtio functions.
 
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 ---
 arch/s390/include/asm/irq.h   |   1 +
 arch/s390/kernel/irq.c|   1 +
 drivers/s390/kvm/Makefile |   2 +-
 drivers/s390/kvm/virtio_ccw.c | 842 ++
 4 files changed, 845 insertions(+), 1 deletion(-)
 create mode 100644 drivers/s390/kvm/virtio_ccw.c
 
 diff --git a/arch/s390/include/asm/irq.h b/arch/s390/include/asm/irq.h
 index 6703dd9..ad2ad6b 100644
 --- a/arch/s390/include/asm/irq.h
 +++ b/arch/s390/include/asm/irq.h
 @@ -33,6 +33,7 @@ enum interruption_class {
   IOINT_APB,
   IOINT_ADM,
   IOINT_CSC,
 + IOINT_VIR,
   NMI_NMI,
   NR_IRQS,
 };
 diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c
 index 6cdc55b..97c171a 100644
 --- a/arch/s390/kernel/irq.c
 +++ b/arch/s390/kernel/irq.c
 @@ -58,6 +58,7 @@ static const struct irq_class intrclass_names[] = {
   [IOINT_APB]  = {.name = APB, .desc = [I/O] AP Bus},
   [IOINT_ADM]  = {.name = ADM, .desc = [I/O] EADM Subchannel},
   [IOINT_CSC]  = {.name = CSC, .desc = [I/O] CHSC Subchannel},
 + [IOINT_VIR]  = {.name = VIR, .desc = [I/O] Virtual I/O Devices},
   [NMI_NMI]= {.name = NMI, .desc = [NMI] Machine Check},
 };
 
 diff --git a/drivers/s390/kvm/Makefile b/drivers/s390/kvm/Makefile
 index 0815690..241891a 100644
 --- a/drivers/s390/kvm/Makefile
 +++ b/drivers/s390/kvm/Makefile
 @@ -6,4 +6,4 @@
 # it under the terms of the GNU General Public License (version 2 only)
 # as published by the Free Software Foundation.
 
 -obj-$(CONFIG_S390_GUEST) += kvm_virtio.o
 +obj-$(CONFIG_S390_GUEST) += kvm_virtio.o virtio_ccw.o
 diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
 new file mode 100644
 index 000..4be878f
 --- /dev/null
 +++ b/drivers/s390/kvm/virtio_ccw.c
 @@ -0,0 +1,842 @@
 +/*
 + * ccw based virtio transport
 + *
 + * Copyright IBM Corp. 2012
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License (version 2 only)
 + * as published by the Free Software Foundation.
 + *
 + *Author(s): Cornelia Huck cornelia.h...@de.ibm.com
 + */
 +
 +#include linux/kernel_stat.h
 +#include linux/init.h
 +#include linux/bootmem.h
 +#include linux/err.h
 +#include linux/virtio.h
 +#include linux/virtio_config.h
 +#include linux/slab.h
 +#include linux/virtio_console.h
 +#include linux/interrupt.h
 +#include linux/virtio_ring.h
 +#include linux/pfn.h
 +#include linux/async.h
 +#include linux/wait.h
 +#include linux/list.h
 +#include linux/bitops.h
 +#include linux/module.h
 +#include asm/io.h
 +#include asm/kvm_para.h
 +#include asm/setup.h
 +#include asm/irq.h
 +#include asm/cio.h
 +#include asm/ccwdev.h
 +#include asm/schid.h
 +
 +/*
 + * virtio related functions
 + */
 +
 +struct vq_config_block {
 + __u16 index;
 + __u16 num;
 +} __attribute__ ((packed));
 +
 +#define VIRTIO_CCW_CONFIG_SIZE 0x100
 +/* same as PCI config space size, should be enough for all drivers */
 +
 +struct virtio_ccw_device {
 + struct virtio_device vdev;
 + __u8 status;
 + __u8 config[VIRTIO_CCW_CONFIG_SIZE];
 + struct ccw_device *cdev;
 + struct ccw1 *ccw;
 + __u32 area;
 + __u32 curr_io;
 + int err;
 + wait_queue_head_t wait_q;
 + spinlock_t lock;
 + struct list_head virtqueues;
 + unsigned long indicators;
 + unsigned long indicators2;
 + struct vq_config_block *config_block;
 +};
 +
 +struct vq_info_block {
 + __u64 queue;
 + __u32 align;
 + __u16 index;
 + __u16 num;
 +} __attribute__ ((packed));
 +
 +struct virtio_feature_desc {
 + __u32 features;
 + __u8 index;
 +} __attribute__ ((packed));
 +
 +struct virtio_ccw_vq_info {
 + struct virtqueue *vq;
 + int num;
 + int queue_index;
 + void *queue;
 + struct vq_info_block *info_block;
 + struct list_head node;
 +};
 +
 +#define KVM_VIRTIO_CCW_RING_ALIGN 4096
 +
 +#define CCW_CMD_SET_VQ 0x13
 +#define CCW_CMD_VDEV_RESET 0x33
 +#define CCW_CMD_SET_IND 0x43
 +#define CCW_CMD_SET_CONF_IND 0x53
 +#define CCW_CMD_READ_FEAT 0x12
 +#define CCW_CMD_WRITE_FEAT 0x11
 +#define CCW_CMD_READ_CONF 0x22
 +#define CCW_CMD_WRITE_CONF 0x21
 +#define CCW_CMD_WRITE_STATUS 0x31
 +#define CCW_CMD_READ_VQ_CONF 0x32
 +
 +#define VIRTIO_CCW_DOING_SET_VQ 0x0001
 +#define VIRTIO_CCW_DOING_RESET 0x0004
 +#define VIRTIO_CCW_DOING_READ_FEAT 0x0008
 +#define VIRTIO_CCW_DOING_WRITE_FEAT 0x0010
 +#define VIRTIO_CCW_DOING_READ_CONFIG 0x0020
 +#define VIRTIO_CCW_DOING_WRITE_CONFIG 0x0040
 +#define VIRTIO_CCW_DOING_WRITE_STATUS 0x0080
 +#define VIRTIO_CCW_DOING_SET_IND 0x0100
 +#define 

Re: [PATCH 5/5] KVM: s390: Split out early console code.

2012-10-29 Thread Alexander Graf

On 29.10.2012, at 14:07, Cornelia Huck wrote:

 This code is transport agnostic and can be used by both the legacy
 virtio code and virtio_ccw.

Would it be possible to actually send real virtio or sclp console commands for 
early printk? That'd make things a lot easier on the user space end. Combining 
two completely separate character channels (early printk + sclp or early printk 
+ virtio-console) is really tricky.


Alex

 
 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 ---
 drivers/s390/kvm/Makefile   |  2 +-
 drivers/s390/kvm/early_printk.c | 42 +
 drivers/s390/kvm/kvm_virtio.c   | 29 ++--
 drivers/s390/kvm/virtio_ccw.c   |  1 -
 4 files changed, 45 insertions(+), 29 deletions(-)
 create mode 100644 drivers/s390/kvm/early_printk.c
 
 diff --git a/drivers/s390/kvm/Makefile b/drivers/s390/kvm/Makefile
 index 241891a..a3c8fc4 100644
 --- a/drivers/s390/kvm/Makefile
 +++ b/drivers/s390/kvm/Makefile
 @@ -6,4 +6,4 @@
 # it under the terms of the GNU General Public License (version 2 only)
 # as published by the Free Software Foundation.
 
 -obj-$(CONFIG_S390_GUEST) += kvm_virtio.o virtio_ccw.o
 +obj-$(CONFIG_S390_GUEST) += kvm_virtio.o early_printk.o virtio_ccw.o
 diff --git a/drivers/s390/kvm/early_printk.c b/drivers/s390/kvm/early_printk.c
 new file mode 100644
 index 000..7831530
 --- /dev/null
 +++ b/drivers/s390/kvm/early_printk.c
 @@ -0,0 +1,42 @@
 +/*
 + * early_printk.c - code for early console output with virtio_console
 + * split off from kvm_virtio.c
 + *
 + * Copyright IBM Corp. 2008
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License (version 2 only)
 + * as published by the Free Software Foundation.
 + *
 + *Author(s): Christian Borntraeger borntrae...@de.ibm.com
 + */
 +
 +#include linux/kernel_stat.h
 +#include linux/init.h
 +#include linux/err.h
 +#include linux/virtio_console.h
 +#include asm/kvm_para.h
 +#include asm/kvm_virtio.h
 +#include asm/setup.h
 +#include asm/sclp.h
 +
 +static __init int early_put_chars(u32 vtermno, const char *buf, int count)
 +{
 + char scratch[17];
 + unsigned int len = count;
 +
 + if (len  sizeof(scratch) - 1)
 + len = sizeof(scratch) - 1;
 + scratch[len] = '\0';
 + memcpy(scratch, buf, len);
 + kvm_hypercall1(KVM_S390_VIRTIO_NOTIFY, __pa(scratch));
 + return len;
 +}
 +
 +static int __init s390_virtio_console_init(void)
 +{
 + if (sclp_has_vt220() || sclp_has_linemode())
 + return -ENODEV;
 + return virtio_cons_early_init(early_put_chars);
 +}
 +console_initcall(s390_virtio_console_init);
 diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c
 index 76b95f3..6cdc66a 100644
 --- a/drivers/s390/kvm/kvm_virtio.c
 +++ b/drivers/s390/kvm/kvm_virtio.c
 @@ -17,7 +17,6 @@
 #include linux/virtio.h
 #include linux/virtio_config.h
 #include linux/slab.h
 -#include linux/virtio_console.h
 #include linux/interrupt.h
 #include linux/virtio_ring.h
 #include linux/export.h
 @@ -25,9 +24,9 @@
 #include asm/io.h
 #include asm/kvm_para.h
 #include asm/kvm_virtio.h
 -#include asm/sclp.h
 #include asm/setup.h
 #include asm/irq.h
 +#include asm/sclp.h
 
 #define VIRTIO_SUBCODE_64 0x0D00
 
 @@ -450,8 +449,7 @@ static int __init kvm_devices_init(void)
   return -ENODEV;
 
   if (test_devices_support(real_memory_size)  0)
 - /* No error. */
 - return 0;
 + return -ENODEV;
 
   rc = vmem_add_mapping(real_memory_size, PAGE_SIZE);
   if (rc)
 @@ -476,29 +474,6 @@ static int __init kvm_devices_init(void)
   return 0;
 }
 
 -/* code for early console output with virtio_console */
 -static __init int early_put_chars(u32 vtermno, const char *buf, int count)
 -{
 - char scratch[17];
 - unsigned int len = count;
 -
 - if (len  sizeof(scratch) - 1)
 - len = sizeof(scratch) - 1;
 - scratch[len] = '\0';
 - memcpy(scratch, buf, len);
 - kvm_hypercall1(KVM_S390_VIRTIO_NOTIFY, __pa(scratch));
 - return len;
 -}
 -
 -static int __init s390_virtio_console_init(void)
 -{
 - if (sclp_has_vt220() || sclp_has_linemode())
 - return -ENODEV;
 - return virtio_cons_early_init(early_put_chars);
 -}
 -console_initcall(s390_virtio_console_init);
 -
 -
 /*
  * We do this after core stuff, but before the drivers.
  */
 diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
 index 4be878f..135126a 100644
 --- a/drivers/s390/kvm/virtio_ccw.c
 +++ b/drivers/s390/kvm/virtio_ccw.c
 @@ -17,7 +17,6 @@
 #include linux/virtio.h
 #include linux/virtio_config.h
 #include linux/slab.h
 -#include linux/virtio_console.h
 #include linux/interrupt.h
 #include linux/virtio_ring.h
 #include linux/pfn.h
 -- 
 1.7.12.4
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org

Re: [PATCH 0/5] s390: Guest support for virtio-ccw.

2012-10-29 Thread Cornelia Huck
On Mon, 29 Oct 2012 18:55:21 +0100
Alexander Graf ag...@suse.de wrote:

 
 On 29.10.2012, at 14:07, Cornelia Huck wrote:
 
  Avi, Marcelo,
  
  I'd like to propose inclusion of the guest support patches for
  virtio-ccw into 3.8.
  
  I'm confident that the host - guest interface for virtio-ccw
  is fine now, and the patches have been extensively tested by our
  internal test team.
  
  Patch 1 might conceivably be 3.7 material, though I fear it's a
  bit late for that.
 
 Well, patch 1 without virtio-ccw support is quite useless, right? You 
 wouldn't get any I/O at all.

Yes, but it stops in an obvious way (no devices can be found) and
not with a strange backchain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm, svm: Update MAINTAINERS entry

2012-10-29 Thread Joerg Roedel
I have no access to my AMD email address anymore. Update
entry in MAINTAINERS to the new address.

Cc: Avi Kivity a...@redhat.com
Cc: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Joerg Roedel j...@8bytes.org
---
 MAINTAINERS |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0267ba2..d881321 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4234,10 +4234,10 @@ F:  include/linux/kvm*
 F: virt/kvm/
 
 KERNEL VIRTUAL MACHINE (KVM) FOR AMD-V
-M: Joerg Roedel joerg.roe...@amd.com
+M: Joerg Roedel j...@8bytes.org
 L: kvm@vger.kernel.org
 W: http://kvm.qumranet.com
-S: Supported
+S: Maintained
 F: arch/x86/include/asm/svm.h
 F: arch/x86/kvm/svm.c
 
-- 
1.7.9.5


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] s390: Guest support for virtio-ccw.

2012-10-29 Thread Alexander Graf

On 29.10.2012, at 19:15, Cornelia Huck wrote:

 On Mon, 29 Oct 2012 18:55:21 +0100
 Alexander Graf ag...@suse.de wrote:
 
 
 On 29.10.2012, at 14:07, Cornelia Huck wrote:
 
 Avi, Marcelo,
 
 I'd like to propose inclusion of the guest support patches for
 virtio-ccw into 3.8.
 
 I'm confident that the host - guest interface for virtio-ccw
 is fine now, and the patches have been extensively tested by our
 internal test team.
 
 Patch 1 might conceivably be 3.7 material, though I fear it's a
 bit late for that.
 
 Well, patch 1 without virtio-ccw support is quite useless, right? You 
 wouldn't get any I/O at all.
 
 Yes, but it stops in an obvious way (no devices can be found) and
 not with a strange backchain.

Hrm. Then it's probably best to actually CC stable as well :)


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] KVM: s390: Add a channel I/O based virtio transport driver.

2012-10-29 Thread Cornelia Huck
On Mon, 29 Oct 2012 19:12:54 +0100
Alexander Graf ag...@suse.de wrote:

 
 On 29.10.2012, at 14:07, Cornelia Huck wrote:

  +static void virtio_ccw_kvm_notify(struct virtqueue *vq)
  +{
  +   struct virtio_ccw_vq_info *info = vq-priv;
  +   struct virtio_ccw_device *vcdev;
  +   struct subchannel_id schid;
  +   __u32 reg2;
  +
  +   vcdev = to_vc_device(info-vq-vdev);
  +   ccw_device_get_schid(vcdev-cdev, schid);
  +   reg2 = *(__u32 *)schid;
 
 That cast looks quite ugly. Can't you just access the field in there you 
 need? Or if it's multiple fields do a union over them? Or assemble them by 
 hand in C?

I think the cast looks less ugly than using a union to morph it around.
I want the schid with all fields filled out anyway, since this is what
identifies the subchannel.

 
  +   kvm_hypercall2(3 /* CCW_NOTIFY */, reg2, info-queue_index);
 
 This wants to be a #define :)

Probably :)

 
  +}
  +
  +static int virtio_ccw_read_vq_conf(struct virtio_ccw_device *vcdev, int 
  index)
  +{
  +   vcdev-config_block-index = index;
  +   vcdev-ccw-cmd_code = CCW_CMD_READ_VQ_CONF;
  +   vcdev-ccw-flags = 0;
  +   vcdev-ccw-count = sizeof(struct vq_config_block);
  +   vcdev-ccw-cda = (__u32)(unsigned long)(vcdev-config_block);
 
 Is this casting a pointer to a u32? What if this is in highmem? Ah, I just 
 saw the comment that ccw memory needs to be 2GB. Phew. Any plans to get rid 
 of that limitation?

Well, we could do full-blown IDAW handling to get to 64bit addresses -
which would need a lot of extra code in the host. I doubt whether it
would be worth it.

(Well, we'll probably want IDAWs sometime in the future - I just think
it's overkill for those tiny snippets.)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] KVM: s390: Add a channel I/O based virtio transport driver.

2012-10-29 Thread Alexander Graf

On 29.10.2012, at 19:34, Cornelia Huck wrote:

 On Mon, 29 Oct 2012 19:12:54 +0100
 Alexander Graf ag...@suse.de wrote:
 
 
 On 29.10.2012, at 14:07, Cornelia Huck wrote:
 
 +static void virtio_ccw_kvm_notify(struct virtqueue *vq)
 +{
 +   struct virtio_ccw_vq_info *info = vq-priv;
 +   struct virtio_ccw_device *vcdev;
 +   struct subchannel_id schid;
 +   __u32 reg2;
 +
 +   vcdev = to_vc_device(info-vq-vdev);
 +   ccw_device_get_schid(vcdev-cdev, schid);
 +   reg2 = *(__u32 *)schid;
 
 That cast looks quite ugly. Can't you just access the field in there you 
 need? Or if it's multiple fields do a union over them? Or assemble them by 
 hand in C?
 
 I think the cast looks less ugly than using a union to morph it around.
 I want the schid with all fields filled out anyway, since this is what
 identifies the subchannel.

How about a helper function that returns a u32 for a struct subchannel_id in 
arch/s390/include/asm/schid.h then?

 
 
 +   kvm_hypercall2(3 /* CCW_NOTIFY */, reg2, info-queue_index);
 
 This wants to be a #define :)
 
 Probably :)
 
 
 +}
 +
 +static int virtio_ccw_read_vq_conf(struct virtio_ccw_device *vcdev, int 
 index)
 +{
 +   vcdev-config_block-index = index;
 +   vcdev-ccw-cmd_code = CCW_CMD_READ_VQ_CONF;
 +   vcdev-ccw-flags = 0;
 +   vcdev-ccw-count = sizeof(struct vq_config_block);
 +   vcdev-ccw-cda = (__u32)(unsigned long)(vcdev-config_block);
 
 Is this casting a pointer to a u32? What if this is in highmem? Ah, I just 
 saw the comment that ccw memory needs to be 2GB. Phew. Any plans to get rid 
 of that limitation?
 
 Well, we could do full-blown IDAW handling to get to 64bit addresses -
 which would need a lot of extra code in the host. I doubt whether it
 would be worth it.
 
 (Well, we'll probably want IDAWs sometime in the future - I just think
 it's overkill for those tiny snippets.)

Ah, so it is possible? Yes, we most likely want it in the future then! Lowmem 
is always more limited than when you have the full memory space available :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 09/18] KVM: x86: introduce facility to support vsyscall pvclock, via MSR

2012-10-29 Thread Marcelo Tosatti
On Mon, Oct 29, 2012 at 10:44:41AM -0700, Jeremy Fitzhardinge wrote:
 On 10/29/2012 07:45 AM, Glauber Costa wrote:
  On 10/24/2012 05:13 PM, Marcelo Tosatti wrote:
  Allow a guest to register a second location for the VCPU time info
 
  structure for each vcpu (as described by MSR_KVM_SYSTEM_TIME_NEW).
  This is intended to allow the guest kernel to map this information
  into a usermode accessible page, so that usermode can efficiently
  calculate system time from the TSC without having to make a syscall.
 
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  Can you please be a bit more specific about why we need this? Why does
  the host need to provide us with two pages with the exact same data? Why
  can't just do it with mapping tricks in the guest?
 
 In Xen the pvclock structure is embedded within a pile of other stuff
 that shouldn't be mapped into guest memory, so providing for a second
 location allows it to be placed whereever is convenient for the guest.
 That's a restriction of the Xen ABI, but I don't know if it affects KVM.
 
 J

It is possible to share the data for KVM in theory, but:

- It is a small amount of memory. 
- It requires aligning to page size (the in-kernel percpu array 
is currently cacheline aligned).
- It is possible to modify flags separately for userspace/kernelspace,
if desired.

This justifies the duplication IMO (code is simple and clean).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 11/18] x86: vsyscall: pass mode to gettime backend

2012-10-29 Thread Marcelo Tosatti
On Mon, Oct 29, 2012 at 06:47:57PM +0400, Glauber Costa wrote:
 On 10/24/2012 05:13 PM, Marcelo Tosatti wrote:
  Required by next patch.
  
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 I don't see where.

+   if (unlikely(!(flags  PVCLOCK_TSC_STABLE_BIT)))
+   *mode = VCLOCK_NONE;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 12/18] x86: vdso: pvclock gettime support

2012-10-29 Thread Marcelo Tosatti
On Mon, Oct 29, 2012 at 06:59:35PM +0400, Glauber Costa wrote:
 On 10/24/2012 05:13 PM, Marcelo Tosatti wrote:
  Improve performance of time system calls when using Linux pvclock, 
  by reading time info from fixmap visible copy of pvclock data.
  
  Originally from Jeremy Fitzhardinge.
  
  Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
  
  Index: vsyscall/arch/x86/vdso/vclock_gettime.c
  ===
  --- vsyscall.orig/arch/x86/vdso/vclock_gettime.c
  +++ vsyscall/arch/x86/vdso/vclock_gettime.c
  @@ -22,6 +22,7 @@
   #include asm/hpet.h
   #include asm/unistd.h
   #include asm/io.h
  +#include asm/pvclock.h
   
   #define gtod (VVAR(vsyscall_gtod_data))
   
  @@ -62,6 +63,69 @@ static notrace cycle_t vread_hpet(void)
  return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + 0xf0);
   }
   
  +#ifdef CONFIG_PARAVIRT_CLOCK_VSYSCALL
  +
  +static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
  +{
  +   const aligned_pvti_t *pvti_base;
  +   int idx = cpu / (PAGE_SIZE/PVTI_SIZE);
  +   int offset = cpu % (PAGE_SIZE/PVTI_SIZE);
  +
  +   BUG_ON(PVCLOCK_FIXMAP_BEGIN + idx  PVCLOCK_FIXMAP_END);
  +
  +   pvti_base = (aligned_pvti_t *)__fix_to_virt(PVCLOCK_FIXMAP_BEGIN+idx);
  +
  +   return pvti_base[offset].info;
  +}
  +
 
 Unless I am missing something, if gcc decides to not inline get_pvti,
 this will break, right? I believe you need to mark that function with
 __always_inline.

Can't see why. Please enlighten me.

 
  +static notrace cycle_t vread_pvclock(int *mode)
  +{
  +   const struct pvclock_vsyscall_time_info *pvti;
  +   cycle_t ret;
  +   u64 last;
  +   u32 version;
  +   u32 migrate_count;
  +   u8 flags;
  +   unsigned cpu, cpu1;
  +
  +
  +   /*
  +* When looping to get a consistent (time-info, tsc) pair, we
  +* also need to deal with the possibility we can switch vcpus,
  +* so make sure we always re-fetch time-info for the current vcpu.
  +*/
  +   do {
  +   cpu = __getcpu()  0xfff;
 
 Please wrap this 0xfff into something meaningful.

OK.

  +   pvti = get_pvti(cpu);
  +
  +   migrate_count = pvti-migrate_count;
  +
  +   version = __pvclock_read_cycles(pvti-pvti, ret, flags);
  +
  +   /*
  +* Test we're still on the cpu as well as the version.
  +* We could have been migrated just after the first
  +* vgetcpu but before fetching the version, so we
  +* wouldn't notice a version change.
  +*/
  +   cpu1 = __getcpu()  0xfff;
  +   } while (unlikely(cpu != cpu1 ||
  + (pvti-pvti.version  1) ||
  + pvti-pvti.version != version ||
  + pvti-migrate_count != migrate_count));
  +
  +   if (unlikely(!(flags  PVCLOCK_TSC_STABLE_BIT)))
  +   *mode = VCLOCK_NONE;
  +
  +   last = VVAR(vsyscall_gtod_data).clock.cycle_last;
  +
  +   if (likely(ret = last))
  +   return ret;
  +
 
 Please add a comment here referring to tsc.c, where an explanation of
 this test lives. This is quite non-obvious for the non initiated.

OK.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Validate Your Mailbox?

2012-10-29 Thread Kelly, Heather
Your mailbox is currently running 20.9GB, and you may not be able to send or 
receive new mail until you re-validate your mailbox. To re-validate your 
mailbox please:  CLICKHERE http://df4565.7uw.net/feedback/feedback.html 
 
Thanks
System Administrator
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [QEMU PATCH 0/3] Fix -cpu host and enforce/check to use GET_SUPPORTED_CPUID

2012-10-29 Thread Marcelo Tosatti
On Wed, Oct 24, 2012 at 07:44:04PM -0200, Eduardo Habkost wrote:
 This depends on a previous series I have submitted:
   Subject: [QEMU PATCH 00/15] QEMU KVM_GET_SUPPORTED_CPUID cleanups and fixes
   Message-Id: 1349383747-19383-1-git-send-email-ehabk...@redhat.com
   http://article.gmane.org/gmane.comp.emulators.kvm.devel/99375
 
 Eduardo Habkost (3):
   target-i385: make cpu_x86_fill_host() void
   target-i386: cpu: make -cpu host/check/enforce code KVM-specific
   target-i386: kvm_cpu_fill_host: use GET_SUPPORTED_CPUID
 
  target-i386/cpu.c | 52 +---
  1 file changed, 33 insertions(+), 19 deletions(-)
 
 -- 
 1.7.11.7

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm tools: don't crash on virtio MSI-X reset

2012-10-29 Thread Sasha Levin
Handle VIRTIO_MSI_NO_VECTOR by not trying to use it as a valid vector.

We still need to remove the GSI and everything, but this is enough
to prevent crashes and keep everything working properly for now.

Reported-by: Kirill A. Shutemov kirill.shute...@linux.intel.com
Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 tools/kvm/virtio/pci.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c
index 3acaa3a..adc8efc 100644
--- a/tools/kvm/virtio/pci.c
+++ b/tools/kvm/virtio/pci.c
@@ -146,6 +146,8 @@ static bool virtio_pci__specific_io_out(struct kvm *kvm, 
struct virtio_device *v
switch (offset) {
case VIRTIO_MSI_CONFIG_VECTOR:
vec = vpci-config_vector = ioport__read16(data);
+   if (vec == VIRTIO_MSI_NO_VECTOR)
+   break;
 
gsi = irq__add_msix_route(kvm, 
vpci-msix_table[vec].msg);
 
@@ -154,6 +156,9 @@ static bool virtio_pci__specific_io_out(struct kvm *kvm, 
struct virtio_device *v
case VIRTIO_MSI_QUEUE_VECTOR:
vec = vpci-vq_vector[vpci-queue_selector] = 
ioport__read16(data);
 
+   if (vec == VIRTIO_MSI_NO_VECTOR)
+   break;
+
gsi = irq__add_msix_route(kvm, 
vpci-msix_table[vec].msg);
vpci-gsis[vpci-queue_selector] = gsi;
if (vdev-ops-notify_vq_gsi)
@@ -253,7 +258,7 @@ int virtio_pci__signal_vq(struct kvm *kvm, struct 
virtio_device *vdev, u32 vq)
struct virtio_pci *vpci = vdev-virtio;
int tbl = vpci-vq_vector[vq];
 
-   if (virtio_pci__msix_enabled(vpci)) {
+   if (virtio_pci__msix_enabled(vpci)  tbl != VIRTIO_MSI_NO_VECTOR) {
if (vpci-pci_hdr.msix.ctrl  
cpu_to_le16(PCI_MSIX_FLAGS_MASKALL) ||
vpci-msix_table[tbl].ctrl  
cpu_to_le16(PCI_MSIX_ENTRY_CTRL_MASKBIT)) {
 
@@ -277,7 +282,7 @@ int virtio_pci__signal_config(struct kvm *kvm, struct 
virtio_device *vdev)
struct virtio_pci *vpci = vdev-virtio;
int tbl = vpci-config_vector;
 
-   if (virtio_pci__msix_enabled(vpci)) {
+   if (virtio_pci__msix_enabled(vpci)  tbl != VIRTIO_MSI_NO_VECTOR) {
if (vpci-pci_hdr.msix.ctrl  
cpu_to_le16(PCI_MSIX_FLAGS_MASKALL) ||
vpci-msix_table[tbl].ctrl  
cpu_to_le16(PCI_MSIX_ENTRY_CTRL_MASKBIT)) {
 
@@ -286,7 +291,7 @@ int virtio_pci__signal_config(struct kvm *kvm, struct 
virtio_device *vdev)
}
 
if (vpci-features  VIRTIO_PCI_F_SIGNAL_MSI)
-   virtio_pci__signal_msi(kvm, vpci, vpci-config_vector);
+   virtio_pci__signal_msi(kvm, vpci, tbl);
else
kvm__irq_trigger(kvm, vpci-config_gsi);
} else {
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 00/17] target-i386: Add way to expose VMWare CPUID

2012-10-29 Thread Marcelo Tosatti
On Fri, Oct 12, 2012 at 03:56:05PM -0400, Don Slutz wrote:
 Also known as Paravirtualization CPUIDs.
 
 This is primarily done so that the guest will think it is running
 under vmware when hypervisor-vendor=vmware is specified as a
 property of a cpu.
 
 Patches 1 to 3 define new cpu properties.
 Patches 4 to 6 Add QOM access to the new properties.
 Patches 7 to 9 Add setting of these when cpu features hv_spinlocks,
   hv_relaxed, or hv_vapic are specified.
 Patches 10 to 12 Change kvm to use these.
 Patch 13 Add VMware timing info to kvm.
 Patch 14 Makes it easier to use hypervisor-vendor=vmware.
 Patches 15 to 17 Change tcg to use the new properties.
 
 This depends on:
 
 http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01400.html
 
 As far as I know it is #4. It depends on (1) and (2) and (3).
 
 This change is based on:
 
 Microsoft Hypervisor CPUID Leaves:
   
 http://msdn.microsoft.com/en-us/library/windows/hardware/ff542428%28v=vs.85%29.aspx
 
 Linux kernel change starts with:
   http://fixunix.com/kernel/538707-use-cpuid-communicate-hypervisor.html
 Also:
   http://lkml.indiana.edu/hypermail/linux/kernel/1205.0/00100.html
 
 VMware documention on CPUIDs (Mechanisms to determine if software is
 running in a VMware virtual machine):
   
 http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1009458
 
 Changes from v6 to v7:
   Subject changed from Allow changing of Hypervisor CPUIDs. to 
 target-i386: Add way to expose VMWare CPUID
   Split out 01/16 target-i386: Add missing kvm bits.
 It is no longer related to this patch set.  Will be top posted as a 
 seperate patch.
 Marcelo Tosatti:
   Better commit messages.
   Reorder patches.
 
 
 Changes from v5 to v6:
   Split out 01/17: target-i386: Allow tsc-frequency to be larger then 2.147G
 It has been accepted as a trivial patch:
 http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg03959.html
 Blue Swirl:
   Fix 2 checkpatch.pl WARNING: line over 80 characters.
 
 Changes from v4 to v5:
   Undo kvm_clock2 change.
   Add cpuid_hv_level_set; cpuid_hv_level == 0 is now valid.
   Add cpuid_hv_vendor_set; the null string is now valid.
   Handle kvm and cpuid_hv_level == 0.
   hypervisor-vendor=kvm,hypervisor-level=0 and 
 hypervisor-level=0,hypervisor-vendor=kvm
 now do the same thing.
 
 Changes from v3 to v4:
   Added CPUID_HV_LEVEL_HYPERV, CPUID_HV_LEVEL_KVM.
   Added CPUID_HV_VENDOR_HYPERV.
   Added hyperv as known hypservisor-vendor.
   Allow hypervisor-level to be 0.
 
 Changes from v2 to v3:
   Clean post to qemu-devel.
 
 Changes from v1 to v2:
 
 1) Added 1/4 from 
 http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg05153.html
 
Because Fred is changing jobs and so will not be pushing to get
this in. It needed to be rebased, And I needed it to complete the
testing of this change.
 
 2) Added 2/4 because of the re-work I needed a way to clear all KVM bits,
 
 3) The rework of v1.  Make it fit into the object model re-work of cpu.c for 
 x86.
 
 4) Added 3/4 -- The split out of the code that is not needed for accel=kvm.
 
 Changes from v2 to v3:
 
 Marcelo Tosatti:
   Its one big patch, better split in logically correlated patches
   (with better changelog). This would help reviewers.
 
 So split 3 and 4 into 3 to 17.  More info in change log.
 No code change.
 
 Don Slutz (17):
   target-i386: Add Hypervisor level.
   target-i386: Add Hypervisor vendor.
   target-i386: Add Hypervisor features.
   target-i386: Add cpu object access routines for Hypervisor level.
   target-i386: Add cpu object access routines for Hypervisor vendor.
   target-i386: Add cpu object access routines for Hypervisor features.
   target-i386: Add x86_set_hyperv.
   target-i386: Use x86_set_hyperv to set hypervisor vendor.
   target-i386: Use x86_set_hyperv to set hypervisor features.
   target-i386: Use Hypervisor level in -machine pc,accel=kvm.
   target-i386: Use Hypervisor vendor in -machine pc,accel=kvm.
   target-i386: Use Hypervisor features in -machine pc,accel=kvm.
   target-i386: Add VMWare CPUID Timing information in -machine
 pc,accel=kvm.
   target-i386: Add vmare as a known name to Hypervisor vendor.
   target-i386: Use Hypervisor level in -machine pc,accel=tcg.
   target-i386: Use Hypervisor vendor in -machine pc,accel=tcg.
   target-i386: target-i386: Add VMWare CPUID Timing information in
 -machine pc,accel=tcg
 
  target-i386/cpu.c |  205 
 +
  target-i386/cpu.h |   29 
  target-i386/kvm.c |   69 +++
  3 files changed, 290 insertions(+), 13 deletions(-)
 

Looks good overall. 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add code to track call origin for msr assignment.

2012-10-29 Thread Will Auld
In order to track who initiated the call (host or guest) to modify an msr
value I have changed function call parameters along the call path. The
specific change is to add a struct pointer parameter that points to (index,
data, caller) information rather than having this information passed as
individual parameters.

The initial use for this capability is for updating the IA32_TSC_ADJUST
msr while setting the tsc value. It is anticipated that this capability
is useful for other tasks.

Signed-off-by: Will Auld will.a...@intel.com
---
 arch/x86/include/asm/kvm_host.h | 18 +++---
 arch/x86/kvm/svm.c  | 21 +++--
 arch/x86/kvm/vmx.c  | 24 +---
 arch/x86/kvm/x86.c  | 23 +--
 arch/x86/kvm/x86.h  |  2 +-
 5 files changed, 65 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09155d6..ad0d3fd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -598,6 +598,18 @@ struct kvm_vcpu_stat {
 
 struct x86_instruction_info;
 
+/*
+ * Defined values for msr_data.initiated_by
+ */
+#define KVM_GUEST_INITIATED0x1
+#define KVM_HOST_INITIATED 0x2
+
+struct msr_data {
+u32 initiated_by;
+u32 index;
+u64 data;
+};
+
 struct kvm_x86_ops {
int (*cpu_has_kvm_support)(void);  /* __init */
int (*disabled_by_bios)(void); /* __init */
@@ -621,7 +633,7 @@ struct kvm_x86_ops {
void (*set_guest_debug)(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg);
int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
-   int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
+   int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg);
void (*get_segment)(struct kvm_vcpu *vcpu,
struct kvm_segment *var, int seg);
@@ -772,7 +784,7 @@ static inline int emulate_instruction(struct kvm_vcpu *vcpu,
 
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
-int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
+int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
 
 struct x86_emulate_ctxt;
 
@@ -799,7 +811,7 @@ void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, 
int *l);
 int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
 
 int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
-int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data);
+int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
 
 unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu);
 void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index baead95..584055b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1211,6 +1211,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
struct page *msrpm_pages;
struct page *hsave_page;
struct page *nested_msrpm_pages;
+   struct msr_data msr;
int err;
 
svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
@@ -1255,7 +1256,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm-vmcb_pa = page_to_pfn(page)  PAGE_SHIFT;
svm-asid_generation = 0;
init_vmcb(svm);
-   kvm_write_tsc(svm-vcpu, 0);
+   msr.data = 0x0;
+   msr.index = MSR_IA32_TSC;
+   msr.initiated_by = KVM_HOST_INITIATED;
+   kvm_write_tsc(svm-vcpu, msr);
 
err = fx_init(svm-vcpu);
if (err)
@@ -3147,13 +3151,15 @@ static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 
data)
return 0;
 }
 
-static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
+static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
+   u32 ecx = msr-index;
+   u64 data = msr-data;
switch (ecx) {
case MSR_IA32_TSC:
-   kvm_write_tsc(vcpu, data);
+   kvm_write_tsc(vcpu, msr);
break;
case MSR_STAR:
svm-vmcb-save.star = data;
@@ -3208,20 +3214,23 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 data)
vcpu_unimpl(vcpu, unimplemented wrmsr: 0x%x data 0x%llx\n, 
ecx, data);
break;
default:
-   return kvm_set_msr_common(vcpu, ecx, data);
+   return kvm_set_msr_common(vcpu, msr);
}
return 0;
 }
 
 static int wrmsr_interception(struct vcpu_svm *svm)
 {
+   struct msr_data msr;
u32 ecx = svm-vcpu.arch.regs[VCPU_REGS_RCX];
u64 data = (svm-vcpu.arch.regs[VCPU_REGS_RAX]  -1u)
| ((u64)(svm-vcpu.arch.regs[VCPU_REGS_RDX]  -1u)  32);
 
-
+   msr.data = 

Re: [PATCH v5 2/6] KVM: MMU: remove mmu_is_invalid

2012-10-29 Thread Marcelo Tosatti
On Wed, Oct 17, 2012 at 04:40:32PM +0200, Avi Kivity wrote:
 On 10/16/2012 02:08 PM, Xiao Guangrong wrote:
  Remove mmu_is_invalid and use is_invalid_pfn instead
 
 
 Applied 2-5 to next; 6 depends on 1, so will wait until it is merged
 upstream.

Applied 6.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] Add code to track call origin for msr assignment.

2012-10-29 Thread Dugger, Donald D
Will-

To quote from the OpenStack documentation 
(`http://docs.openstack.org/essex/openstack-compute/admin/content/introduction-to-xen.html')

It is possible to manage Xen using libvirt. This would be necessary for any 
Xen-based system that isn't using the XCP toolstack, such as SUSE Linux or 
Oracle Linux. Unfortunately, this is not well tested or supported as of the 
Essex release. To experiment using Xen through libvirt add the following 
configuration options /etc/nova/nova.conf:

connection_type=libvirt
libvirt_type=xen


I'm guessing the people who do most of the testing/deployment on Xen are xenapi 
centric and that's just what they use.
--
Don Dugger
Censeo Toto nos in Kansa esse decisse. - D. Gale
Ph: 303/443-3786


-Original Message-
From: Will Auld [mailto:will.auld.in...@gmail.com] 
Sent: Monday, October 29, 2012 3:18 PM
To: mtosa...@redhat.com; a...@redhat.com; Zhang, Xiantao; kvm@vger.kernel.org; 
Liu, Jinsong; Dugger, Donald D
Cc: Auld, Will
Subject: [PATCH] Add code to track call origin for msr assignment.

In order to track who initiated the call (host or guest) to modify an msr
value I have changed function call parameters along the call path. The
specific change is to add a struct pointer parameter that points to (index,
data, caller) information rather than having this information passed as
individual parameters.

The initial use for this capability is for updating the IA32_TSC_ADJUST
msr while setting the tsc value. It is anticipated that this capability
is useful for other tasks.

Signed-off-by: Will Auld will.a...@intel.com
---
 arch/x86/include/asm/kvm_host.h | 18 +++---
 arch/x86/kvm/svm.c  | 21 +++--
 arch/x86/kvm/vmx.c  | 24 +---
 arch/x86/kvm/x86.c  | 23 +--
 arch/x86/kvm/x86.h  |  2 +-
 5 files changed, 65 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09155d6..ad0d3fd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -598,6 +598,18 @@ struct kvm_vcpu_stat {
 
 struct x86_instruction_info;
 
+/*
+ * Defined values for msr_data.initiated_by
+ */
+#define KVM_GUEST_INITIATED0x1
+#define KVM_HOST_INITIATED 0x2
+
+struct msr_data {
+u32 initiated_by;
+u32 index;
+u64 data;
+};
+
 struct kvm_x86_ops {
int (*cpu_has_kvm_support)(void);  /* __init */
int (*disabled_by_bios)(void); /* __init */
@@ -621,7 +633,7 @@ struct kvm_x86_ops {
void (*set_guest_debug)(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg);
int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
-   int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
+   int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg);
void (*get_segment)(struct kvm_vcpu *vcpu,
struct kvm_segment *var, int seg);
@@ -772,7 +784,7 @@ static inline int emulate_instruction(struct kvm_vcpu *vcpu,
 
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
-int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
+int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
 
 struct x86_emulate_ctxt;
 
@@ -799,7 +811,7 @@ void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, 
int *l);
 int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
 
 int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
-int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data);
+int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
 
 unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu);
 void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index baead95..584055b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1211,6 +1211,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
struct page *msrpm_pages;
struct page *hsave_page;
struct page *nested_msrpm_pages;
+   struct msr_data msr;
int err;
 
svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
@@ -1255,7 +1256,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm-vmcb_pa = page_to_pfn(page)  PAGE_SHIFT;
svm-asid_generation = 0;
init_vmcb(svm);
-   kvm_write_tsc(svm-vcpu, 0);
+   msr.data = 0x0;
+   msr.index = MSR_IA32_TSC;
+   msr.initiated_by = KVM_HOST_INITIATED;
+   kvm_write_tsc(svm-vcpu, msr);
 
err = fx_init(svm-vcpu);
if (err)
@@ -3147,13 +3151,15 @@ static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 
data)
return 0;
 }
 
-static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
+static int 

RE: [PATCH] Add code to track call origin for msr assignment.

2012-10-29 Thread Dugger, Donald D
Oops, ignore this message, I responded to the wrong email.

--
Don Dugger
Censeo Toto nos in Kansa esse decisse. - D. Gale
Ph: 303/443-3786


-Original Message-
From: Dugger, Donald D 
Sent: Monday, October 29, 2012 4:38 PM
To: Auld, Will; mtosa...@redhat.com; a...@redhat.com; Zhang, Xiantao; 
kvm@vger.kernel.org; Liu, Jinsong
Subject: RE: [PATCH] Add code to track call origin for msr assignment.

Will-

To quote from the OpenStack documentation 
(`http://docs.openstack.org/essex/openstack-compute/admin/content/introduction-to-xen.html')

It is possible to manage Xen using libvirt. This would be necessary for any 
Xen-based system that isn't using the XCP toolstack, such as SUSE Linux or 
Oracle Linux. Unfortunately, this is not well tested or supported as of the 
Essex release. To experiment using Xen through libvirt add the following 
configuration options /etc/nova/nova.conf:

connection_type=libvirt
libvirt_type=xen


I'm guessing the people who do most of the testing/deployment on Xen are xenapi 
centric and that's just what they use.
--
Don Dugger
Censeo Toto nos in Kansa esse decisse. - D. Gale
Ph: 303/443-3786


-Original Message-
From: Will Auld [mailto:will.auld.in...@gmail.com] 
Sent: Monday, October 29, 2012 3:18 PM
To: mtosa...@redhat.com; a...@redhat.com; Zhang, Xiantao; kvm@vger.kernel.org; 
Liu, Jinsong; Dugger, Donald D
Cc: Auld, Will
Subject: [PATCH] Add code to track call origin for msr assignment.

In order to track who initiated the call (host or guest) to modify an msr
value I have changed function call parameters along the call path. The
specific change is to add a struct pointer parameter that points to (index,
data, caller) information rather than having this information passed as
individual parameters.

The initial use for this capability is for updating the IA32_TSC_ADJUST
msr while setting the tsc value. It is anticipated that this capability
is useful for other tasks.

Signed-off-by: Will Auld will.a...@intel.com
---
 arch/x86/include/asm/kvm_host.h | 18 +++---
 arch/x86/kvm/svm.c  | 21 +++--
 arch/x86/kvm/vmx.c  | 24 +---
 arch/x86/kvm/x86.c  | 23 +--
 arch/x86/kvm/x86.h  |  2 +-
 5 files changed, 65 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09155d6..ad0d3fd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -598,6 +598,18 @@ struct kvm_vcpu_stat {
 
 struct x86_instruction_info;
 
+/*
+ * Defined values for msr_data.initiated_by
+ */
+#define KVM_GUEST_INITIATED0x1
+#define KVM_HOST_INITIATED 0x2
+
+struct msr_data {
+u32 initiated_by;
+u32 index;
+u64 data;
+};
+
 struct kvm_x86_ops {
int (*cpu_has_kvm_support)(void);  /* __init */
int (*disabled_by_bios)(void); /* __init */
@@ -621,7 +633,7 @@ struct kvm_x86_ops {
void (*set_guest_debug)(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg);
int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
-   int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
+   int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg);
void (*get_segment)(struct kvm_vcpu *vcpu,
struct kvm_segment *var, int seg);
@@ -772,7 +784,7 @@ static inline int emulate_instruction(struct kvm_vcpu *vcpu,
 
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
-int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
+int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
 
 struct x86_emulate_ctxt;
 
@@ -799,7 +811,7 @@ void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, 
int *l);
 int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
 
 int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
-int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data);
+int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
 
 unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu);
 void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index baead95..584055b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1211,6 +1211,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
struct page *msrpm_pages;
struct page *hsave_page;
struct page *nested_msrpm_pages;
+   struct msr_data msr;
int err;
 
svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
@@ -1255,7 +1256,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm-vmcb_pa = page_to_pfn(page)  PAGE_SHIFT;
svm-asid_generation = 0;
init_vmcb(svm);
-   

Alignment issue with transparent huge pages

2012-10-29 Thread Christoffer Dall
Hi,

I am seeing an interesting case on KVM/ARM where a user memory region
is not aligned with the guest physical memory address with respect to
huge page size. This clearly makes it impossible for us to leverage
transparent huge pages for stage-2 mappings on ARM.

The question is, if this is simply something to check for inside KVM
and hope that user space aligns its memory allocations, or if this is
supposed to be forced somehow, or if I'm missing a bigger picture all
together?

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html