[PATCH] migrate: error message for ram_load size
Report an error when ramblock's sizes mismatch with a suggestion to the user as to what went wrong. --- libvirt uses migration to save the state, however when performing a distro upgrade you might get an error starting your VMs up again without much detail. This patch attempts to remedy that with extra error messages. Without patch: $ virsh start expo error: Failed to start domain expo error: internal error Process exited while reading console log output: char device redirected to /dev/pts/16 qemu: warning: error while loading state for instance 0x0 of device 'ram' load of migration failed With patch: $ virsh start expo error: Failed to start domain expo error: internal error Process exited while reading console log output: char device redirected to /dev/pts/16 qemu: warning: error ramblock ':00:02.0/qxl.vrom' length 16384 != 8192. Did you change the ROM/BIOS or RAM size between restarts? qemu: warning: error while loading state for instance 0x0 of device 'ram' load of migration failed --- arch_init.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch_init.c b/arch_init.c index 8c3bb0d..33f783b 100644 --- a/arch_init.c +++ b/arch_init.c @@ -810,6 +810,11 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) QLIST_FOREACH(block, ram_list.blocks, next) { if (!strncmp(id, block-idstr, sizeof(id))) { if (block-length != length) { +fprintf(stderr, qemu: warning: error ramblock +'%s' length %ld != %ld. Did you +change the ROM/BIOS or RAM size +between restarts?\n, id, +block-length, length); ret = -EINVAL; goto done; } -- 1.7.8.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1.1.1 - 1.1.2 migrate /managedsave issue
On Mon, Oct 22, 2012 at 6:23 AM, Avi Kivity a...@redhat.com wrote: On 10/22/2012 09:04 AM, Philipp Hahn wrote: Hello Doug, On Saturday 20 October 2012 00:46:43 Doug Goldstein wrote: I'm using libvirt 0.10.2 and I had qemu-kvm 1.1.1 running all my VMs. ... I had upgraded to qemu-kvm 1.1.2 ... qemu: warning: error while loading state for instance 0x0 of device 'ram' load of migration failed That error can be from many things. For me it was that the PXE-ROM images for the network cards were updated as well. Their size changed over the next power-of-two size, so kvm needed to allocate less/more memory and changed some PCI configuration registers, where the size of the ROM region is stored. On loading the saved state those sizes were compared and failed to validate. KVM then aborts loading the saved state with that little helpful message. So you might want to check, if your case is similar to mine. I diagnosed that using gdb to single step kvm until I found hw/pci.c#get_pci_config_device() returning -EINVAL. Seems reasonable. Doug, please verify to see if it's the same issue or another one. Juan, how can we fix this? It's clear that the option ROM size has to be fixed and not change whenever the blob is updated. This will fix it for future releases. But what to do about the ones in the field? -- error compiling committee.c: too many arguments to function Avi, Please consider the following patch based off qemu master: http://article.gmane.org/gmane.comp.emulators.kvm.devel/100231 It should hopefully help users with this issue in the future. -- Doug Goldstein -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/5] Qemu: do not mark bios readonly
Jan, On 10/26/2012 06:35 PM, Jan Kiszka wrote: This has two problems: We know it breaks at least Win 95 that overwrites its F-segment during boot. And it applies changes to the shadowed area (below 1 MB) also to the ROM area - I don't think that is the original behaviour on real hardware. So what is the problem? It can break Win95's running? I tried to install win95 guest but it failed to boot regardless my patchset was applied or not. I found the information that win 95 is not supported at http://www.linux-kvm.org/page/Guest_Support_Status Note: before my patchset, Win 95 still can happily something into ROM area because readonly memory is actually writable on KVM. And win95 can not run on isapc with --no-kvm since it is no way to enable shadow ROM. What we need is paravirtual shadow write control for the ISA PC. It's on my todo list, maybe I will be able to look into this during the next week. You idea is that modify the code of seabios and use a special way (PV) to notify Qemu to make the bios writable? Actually, I am confused why the guest (including bios) persistently uses shadow ROM even if it is not supported (on ISA PC), i think the right way is move itself to RAM under this case, no? BTW, your patch series should allow to drop the KVM special case from pc_system_firmware_init. That version, btw, treats high and low BIOS areas separately - but only reloads the upper area. Hmm... You mean that also allow Qemu to use pflash to load bios if kvm is enabled? We can not do that for pflash is a RD device which can not be directly written, kvm can not emulate the instruction which implicitly write the memory. (e.g: using this area as stack). Thanks! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Can we run guest OS without using NAT and iptables?
Can we run guest o.s. on KVM without enabling NAT and iptables? The reason to do this is , I wanted to disable conntrack module from my system and to disable that I must have to delete iptable and NAT. I am getting the following message, when I start guest o.s. on KVM (iptable and NAT disabled): Error starting domain: internal error 'Network default' is not active. Is their any way to run guest o.s. with NAT disabled? or Is their any way to disable conntrack module and still can use KVM to run guest OS ? I am using Ubuntu 10.04 Any help? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/5] Qemu: do not mark bios readonly
On 10/29/2012 03:44 PM, Jan Kiszka wrote: On 2012-10-29 08:09, Xiao Guangrong wrote: Jan, On 10/26/2012 06:35 PM, Jan Kiszka wrote: This has two problems: We know it breaks at least Win 95 that overwrites its F-segment during boot. And it applies changes to the shadowed area (below 1 MB) also to the ROM area - I don't think that is the original behaviour on real hardware. So what is the problem? It can break Win95's running? I tried to install win95 guest but it failed to boot regardless my patchset was applied or not. I found the information that win 95 is not supported at http://www.linux-kvm.org/page/Guest_Support_Status Note: before my patchset, Win 95 still can happily something into ROM area because readonly memory is actually writable on KVM. And win95 can not run on isapc with --no-kvm since it is no way to enable shadow ROM. Your patches causes regressions on TCG mode as that is perfectly fine with booting Win95 so far. Aha, i tried accel=tcg, before my patchset, it works for -machine pc but failed for -machine isapc (known issue for seabios). After my patchset, it works fine for both -machine pc and isapc. :) What we need is paravirtual shadow write control for the ISA PC. It's on my todo list, maybe I will be able to look into this during the next week. You idea is that modify the code of seabios and use a special way (PV) to notify Qemu to make the bios writable? Yes. Actually, I am confused why the guest (including bios) persistently uses shadow ROM even if it is not supported (on ISA PC), i think the right way is move itself to RAM under this case, no? I've been told that Seabios has been built around that assumption and the PV shadow control would be simpler to realize. Sounds the PV is complexer that directly making the bios area writable (if it works). BTW, your patch series should allow to drop the KVM special case from pc_system_firmware_init. That version, btw, treats high and low BIOS areas separately - but only reloads the upper area. Hmm... You mean that also allow Qemu to use pflash to load bios if kvm is enabled? Yes. We can not do that for pflash is a RD device which can not be directly written, kvm can not emulate the instruction which implicitly write the memory. (e.g: using this area as stack). Isn't enabling ROMD support for KVM that whole point of your patches? I It can generate MMIO exit if ROMD be written, that means the instruction needs kvm's help to be finished if it explicitly/implicitly write the memory. do not see yet what prevents this still, but it should be fixed first. For the explicitly write memory access, it is easy to be fixed - we just need to fetch the instruction from EIP and emulate it. But for the implicitly memory access, fixing its emulation is really hard work. Really worth doing it? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: acpi_piix4 migration issue
Il 28/10/2012 20:40, Marcelo Tosatti ha scritto: qemu-kvm 1.2 - qemu-1.3 migration fails with Unknown savevm section type 48 load of migration failed Due to a fix in acpi_piix4 in qemu-kvm (attached at the end of the message). The problem is that qemu-kvm correctly uses 2 bytes for sts and 2 bytes for en fields (which is their allocated size), while qemu uses 4*2 bytes for each. The fix present in qemu-kvm is correct, but, having it in qemu 1.3 would break qemu 1.2 - qemu 1.3 migration (while allowing qemu-kvm 1.2 - qemu 1.3 migration). Any opinions on what to do? Bump the .version_id and .minimum_version_id to 2 and load the QEMU 1.2 state via .load_state_old. qemu-kvm 1.2 - qemu 1.3 migration would be broken. qemu-kvm downstreams that care can leave .minimum_version_id to 1. Paolo +#define VMSTATE_GPE_ARRAY(_field, _state)\ + { \ + .name = (stringify(_field)), \ + .version_id = 0,\ + .num= GPE_LEN, \ + .info =vmstate_info_uint16, \ + .size = sizeof(uint16_t), \ + .flags = VMS_ARRAY | VMS_POINTER, \ + .offset = vmstate_offset_pointer(_state, _field, uint8_t), \ + } + static const VMStateDescription vmstate_gpe = { .name = gpe, .version_id = 1, .minimum_version_id = 1, .minimum_version_id_old = 1, .fields = (VMStateField []) { -VMSTATE_UINT16(sts, struct gpe_regs), -VMSTATE_UINT16(en, struct gpe_regs), +VMSTATE_GPE_ARRAY(sts, ACPIGPE), +VMSTATE_GPE_ARRAY(en, ACPIGPE), VMSTATE_END_OF_LIST() } }; I'm no vmstate expert, but this does look odd. Why both VMS_ARRAY and VMS_POINTER? aren't we trying to save/restore a simple 16-bit value? Or at least we did before this patch. That's right. the difference is, the new member type became uint8_t*. Does the following help? Signed-off-by: Avi Kivity a...@redhat.com diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c index d65a7e9..9dc6f43 100644 --- a/hw/acpi_piix4.c +++ b/hw/acpi_piix4.c @@ -221,10 +221,9 @@ static int vmstate_acpi_post_load(void *opaque, int version_id) { \ .name = (stringify(_field)), \ .version_id = 0,\ - .num= GPE_LEN, \ .info = vmstate_info_uint16, \ .size = sizeof(uint16_t), \ - .flags = VMS_ARRAY | VMS_POINTER, \ + .flags = VMS_SINGLE | VMS_POINTER, \ .offset = vmstate_offset_pointer(_state, _field, uint8_t), \ } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can we run guest OS without using NAT and iptables?
On Mon, Oct 29, 2012 at 12:55:43PM +0530, freak 62 wrote: Can we run guest o.s. on KVM without enabling NAT and iptables? The reason to do this is , I wanted to disable conntrack module from my system and to disable that I must have to delete iptable and NAT. I am getting the following message, when I start guest o.s. on KVM (iptable and NAT disabled): Error starting domain: internal error 'Network default' is not active. Is their any way to run guest o.s. with NAT disabled? or Is their any way to disable conntrack module and still can use KVM to run guest OS ? I am using Ubuntu 10.04 This is a libvirt question since libvirt sets up the networking configuration. You can try a different network config either using the virt-manager GUI tool or by editing the network XML, which is documented here: http://libvirt.org/formatnetwork.html CCed libvirt mailing list. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I/O errors in guest OS after repeated migration
On Fri, Oct 19, 2012 at 2:55 PM, Guido Winkelmann guido-k...@thisisnotatest.de wrote: Am Donnerstag, 18. Oktober 2012, 18:05:39 schrieb Avi Kivity: On 10/18/2012 05:50 PM, Guido Winkelmann wrote: Am Mittwoch, 17. Oktober 2012, 13:25:45 schrieb Brian Jackson: On Wednesday, October 17, 2012 10:45:14 AM Guido Winkelmann wrote: vda1, logical block 1858771 Oct 17 17:12:04 localhost kernel: [ 212.070600] Buffer I/O error on device vda1, logical block 1858772 Oct 17 17:12:04 localhost kernel: [ 212.070602] Buffer I/O error on device vda1, logical block 1858773 Oct 17 17:12:04 localhost kernel: [ 212.070605] Buffer I/O error on device vda1, logical block 1858774 Oct 17 17:12:04 localhost kernel: [ 212.070607] Buffer I/O error on device vda1, logical block 1858775 Oct 17 17:12:04 localhost kernel: [ 212.070610] Buffer I/O error on device vda1, logical block 1858776 Oct 17 17:12:04 localhost kernel: [ 212.070612] Buffer I/O error on device vda1, logical block 1858777 Oct 17 17:12:04 localhost kernel: [ 212.070615] Buffer I/O error on device vda1, logical block 1858778 Oct 17 17:12:04 localhost kernel: [ 212.070617] Buffer I/O error on device vda1, logical block 1858779 (I was writing a large file at the time, to make sure I actually catch I/O errors as they happen) What about newer versions of qemu/kvm? But of course if those work, your next task is going to be git bisect it or file a bug with your distro that is using an ancient version of qemu/kvm. I've just upgraded both hosts to qemu-kvm 1.2.0 (qemu-1.2.0-14.fc17.x86_64, built from spec files under http://pkgs.fedoraproject.org/cgit/qemu.git/). The bug is still there. If you let the guest go idle (no I/O), then migrate it, then restart the I/O, do the errors show? Just tested - yes, they do. The -EIO error does not really reveal why there is a problem. You can use SystemTap probes in QEMU to find out more about the nature of the error. # stap -e 'probe qemu.kvm.bdrv_*, qemu.kvm.virtio_blk_*, qemu.kvm.paio_* { printf(%s(%s)\n, probefunc(), $$parms) }' -x $PID_OF_QEMU Output looks like this: bdrv_co_readv($arg1=0x7fb2397cc580 $arg2=0x80c $arg3=0x1) bdrv_co_io_em($arg1=0x7fb2397cc580 $arg2=0x80c $arg3=0x1 $arg4=0x0 $arg5=0x7fb239da6f60) virtio_blk_rw_complete($arg1=0x7fb23982ed10 $arg2=0x0) virtio_blk_req_complete($arg1=0x7fb23982ed10 $arg2=0x0) virtio_blk_rw_complete $arg2=-5 means -EIO so look for that that. This will reveal what is happening when the error occurs. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm tools: fix rbtree-interval search
From: Kirill A. Shutemov kirill.shute...@linux.intel.com I've noticed message on kvm exit: Warning: serial8250__exit failed. kvm tool is not able to remove ioport range which was added previously. The issue is caused by bug in rbtree-interval. Search algorithm in rb_int_search_single() expects correct value of max_high. But the tree can contain leaf nodes, which never were updated by propagate_callback(). For this kind of nodes high_max will be 0 and we will not be able to find and remove them. Let's initialize max_high on RB_INT_INIT() time. Fixing this bug makes other bug visible: propagate_callback() can be called for empty tree: node == NULL. The callback is not ready for empty tree. Let's fix that as well. Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com --- tools/kvm/include/kvm/rbtree-interval.h |3 ++- tools/kvm/util/rbtree-interval.c|6 +- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/kvm/include/kvm/rbtree-interval.h b/tools/kvm/include/kvm/rbtree-interval.h index e97d05b..fb2102a 100644 --- a/tools/kvm/include/kvm/rbtree-interval.h +++ b/tools/kvm/include/kvm/rbtree-interval.h @@ -4,7 +4,8 @@ #include linux/rbtree_augmented.h #include linux/types.h -#define RB_INT_INIT(l, h) (struct rb_int_node){.low = l, .high = h} +#define RB_INT_INIT(l, h) \ + (struct rb_int_node){.low = l, .high = h, .max_high = h} #define rb_int(n) rb_entry(n, struct rb_int_node, node) struct rb_int_node { diff --git a/tools/kvm/util/rbtree-interval.c b/tools/kvm/util/rbtree-interval.c index c82ce98..d7fa96a 100644 --- a/tools/kvm/util/rbtree-interval.c +++ b/tools/kvm/util/rbtree-interval.c @@ -48,8 +48,12 @@ struct rb_int_node *rb_int_search_range(struct rb_root *root, u64 low, u64 high) */ static void propagate_callback(struct rb_node *node, struct rb_node *stop) { - struct rb_int_node *i_node = rb_int(node); + struct rb_int_node *i_node; + if (node == stop) + return; + + i_node = rb_int(node); i_node-max_high = i_node-high; if (node-rb_left) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Patch]KVM: enabling per domain PLE
Hi, Avi Yes, some cloud vendors already knew that different PLE values has big performance impact on their applications. They want one interface for them to set. And I think the big cloud vendors should have administrators that have experience on PLE tuning. :-) For current stage, do you think still need to approach dynamic adaptive ple solution? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] s390: Move css limits from drivers/s390/cio/ to include/asm/.
There's no need to keep __MAX_SUBCHANNEL and __MAX_SSID private to the common I/O layer when __MAX_CSSID is usable by everybody. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- arch/s390/include/asm/cio.h | 2 ++ drivers/s390/cio/css.h | 3 --- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/s390/include/asm/cio.h b/arch/s390/include/asm/cio.h index 55bde60..ad2b924 100644 --- a/arch/s390/include/asm/cio.h +++ b/arch/s390/include/asm/cio.h @@ -9,6 +9,8 @@ #define LPM_ANYPATH 0xff #define __MAX_CSSID 0 +#define __MAX_SUBCHANNEL 65535 +#define __MAX_SSID 3 #include asm/scsw.h diff --git a/drivers/s390/cio/css.h b/drivers/s390/cio/css.h index 33bb4d8..4af3dfe 100644 --- a/drivers/s390/cio/css.h +++ b/drivers/s390/cio/css.h @@ -112,9 +112,6 @@ extern int for_each_subchannel(int(*fn)(struct subchannel_id, void *), void *); extern void css_reiterate_subchannels(void); void css_update_ssd_info(struct subchannel *sch); -#define __MAX_SUBCHANNEL 65535 -#define __MAX_SSID 3 - struct channel_subsystem { u8 cssid; int valid; -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] KVM: s390: Handle hosts not supporting s390-virtio.
Running under a kvm host does not necessarily imply the presence of a page mapped above the main memory with the virtio information; however, the code includes a hard coded access to that page. Instead, check for the presence of the page and exit gracefully before we hit an addressing exception if it does not exist. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/s390/kvm/kvm_virtio.c | 39 +++ 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c index 47cccd5..76b95f3 100644 --- a/drivers/s390/kvm/kvm_virtio.c +++ b/drivers/s390/kvm/kvm_virtio.c @@ -419,6 +419,26 @@ static void kvm_extint_handler(struct ext_code ext_code, } /* + * For s390-virtio, we expect a page above main storage containing + * the virtio configuration. Try to actually load from this area + * in order to figure out if the host provides this page. + */ +static int __init test_devices_support(unsigned long addr) +{ + int ret = -EIO; + + asm volatile( + 0: lura0,%1\n + 1: xgr %0,%0\n + 2:\n + EX_TABLE(0b,2b) + EX_TABLE(1b,2b) + : +d (ret) + : a (addr) + : 0, cc); + return ret; +} +/* * Init function for virtio * devices are in a single page above top of normal mem */ @@ -429,21 +449,24 @@ static int __init kvm_devices_init(void) if (!MACHINE_IS_KVM) return -ENODEV; + if (test_devices_support(real_memory_size) 0) + /* No error. */ + return 0; + + rc = vmem_add_mapping(real_memory_size, PAGE_SIZE); + if (rc) + return rc; + + kvm_devices = (void *) real_memory_size; + kvm_root = root_device_register(kvm_s390); if (IS_ERR(kvm_root)) { rc = PTR_ERR(kvm_root); printk(KERN_ERR Could not register kvm_s390 root device); + vmem_remove_mapping(real_memory_size, PAGE_SIZE); return rc; } - rc = vmem_add_mapping(real_memory_size, PAGE_SIZE); - if (rc) { - root_device_unregister(kvm_root); - return rc; - } - - kvm_devices = (void *) real_memory_size; - INIT_WORK(hotplug_work, hotplug_devices); service_subclass_irq_register(); -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] s390: Guest support for virtio-ccw.
Avi, Marcelo, I'd like to propose inclusion of the guest support patches for virtio-ccw into 3.8. I'm confident that the host - guest interface for virtio-ccw is fine now, and the patches have been extensively tested by our internal test team. Patch 1 might conceivably be 3.7 material, though I fear it's a bit late for that. Patch 2 has been moved over from the host-support patchset since the limits are needed by the guest driver as well. Patch 4 has seen some further bugfixes (feature bits, 2G and 4G problems, device detach handling) and is working well in our internal environment. Cornelia Huck (5): KVM: s390: Handle hosts not supporting s390-virtio. s390: Move css limits from drivers/s390/cio/ to include/asm/. s390: Add a mechanism to get the subchannel id. KVM: s390: Add a channel I/O based virtio transport driver. KVM: s390: Split out early console code. arch/s390/include/asm/ccwdev.h | 5 + arch/s390/include/asm/cio.h | 2 + arch/s390/include/asm/irq.h | 1 + arch/s390/kernel/irq.c | 1 + drivers/s390/cio/css.h | 3 - drivers/s390/cio/device_ops.c | 12 + drivers/s390/kvm/Makefile | 2 +- drivers/s390/kvm/early_printk.c | 42 ++ drivers/s390/kvm/kvm_virtio.c | 64 ++- drivers/s390/kvm/virtio_ccw.c | 841 10 files changed, 936 insertions(+), 37 deletions(-) create mode 100644 drivers/s390/kvm/early_printk.c create mode 100644 drivers/s390/kvm/virtio_ccw.c -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] KVM: s390: Add a channel I/O based virtio transport driver.
Add a driver for kvm guests that matches virtual ccw devices provided by the host as virtio bridge devices. These virtio-ccw devices use a special set of channel commands in order to perform virtio functions. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- arch/s390/include/asm/irq.h | 1 + arch/s390/kernel/irq.c| 1 + drivers/s390/kvm/Makefile | 2 +- drivers/s390/kvm/virtio_ccw.c | 842 ++ 4 files changed, 845 insertions(+), 1 deletion(-) create mode 100644 drivers/s390/kvm/virtio_ccw.c diff --git a/arch/s390/include/asm/irq.h b/arch/s390/include/asm/irq.h index 6703dd9..ad2ad6b 100644 --- a/arch/s390/include/asm/irq.h +++ b/arch/s390/include/asm/irq.h @@ -33,6 +33,7 @@ enum interruption_class { IOINT_APB, IOINT_ADM, IOINT_CSC, + IOINT_VIR, NMI_NMI, NR_IRQS, }; diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c index 6cdc55b..97c171a 100644 --- a/arch/s390/kernel/irq.c +++ b/arch/s390/kernel/irq.c @@ -58,6 +58,7 @@ static const struct irq_class intrclass_names[] = { [IOINT_APB] = {.name = APB, .desc = [I/O] AP Bus}, [IOINT_ADM] = {.name = ADM, .desc = [I/O] EADM Subchannel}, [IOINT_CSC] = {.name = CSC, .desc = [I/O] CHSC Subchannel}, + [IOINT_VIR] = {.name = VIR, .desc = [I/O] Virtual I/O Devices}, [NMI_NMI]= {.name = NMI, .desc = [NMI] Machine Check}, }; diff --git a/drivers/s390/kvm/Makefile b/drivers/s390/kvm/Makefile index 0815690..241891a 100644 --- a/drivers/s390/kvm/Makefile +++ b/drivers/s390/kvm/Makefile @@ -6,4 +6,4 @@ # it under the terms of the GNU General Public License (version 2 only) # as published by the Free Software Foundation. -obj-$(CONFIG_S390_GUEST) += kvm_virtio.o +obj-$(CONFIG_S390_GUEST) += kvm_virtio.o virtio_ccw.o diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c new file mode 100644 index 000..4be878f --- /dev/null +++ b/drivers/s390/kvm/virtio_ccw.c @@ -0,0 +1,842 @@ +/* + * ccw based virtio transport + * + * Copyright IBM Corp. 2012 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2 only) + * as published by the Free Software Foundation. + * + *Author(s): Cornelia Huck cornelia.h...@de.ibm.com + */ + +#include linux/kernel_stat.h +#include linux/init.h +#include linux/bootmem.h +#include linux/err.h +#include linux/virtio.h +#include linux/virtio_config.h +#include linux/slab.h +#include linux/virtio_console.h +#include linux/interrupt.h +#include linux/virtio_ring.h +#include linux/pfn.h +#include linux/async.h +#include linux/wait.h +#include linux/list.h +#include linux/bitops.h +#include linux/module.h +#include asm/io.h +#include asm/kvm_para.h +#include asm/setup.h +#include asm/irq.h +#include asm/cio.h +#include asm/ccwdev.h +#include asm/schid.h + +/* + * virtio related functions + */ + +struct vq_config_block { + __u16 index; + __u16 num; +} __attribute__ ((packed)); + +#define VIRTIO_CCW_CONFIG_SIZE 0x100 +/* same as PCI config space size, should be enough for all drivers */ + +struct virtio_ccw_device { + struct virtio_device vdev; + __u8 status; + __u8 config[VIRTIO_CCW_CONFIG_SIZE]; + struct ccw_device *cdev; + struct ccw1 *ccw; + __u32 area; + __u32 curr_io; + int err; + wait_queue_head_t wait_q; + spinlock_t lock; + struct list_head virtqueues; + unsigned long indicators; + unsigned long indicators2; + struct vq_config_block *config_block; +}; + +struct vq_info_block { + __u64 queue; + __u32 align; + __u16 index; + __u16 num; +} __attribute__ ((packed)); + +struct virtio_feature_desc { + __u32 features; + __u8 index; +} __attribute__ ((packed)); + +struct virtio_ccw_vq_info { + struct virtqueue *vq; + int num; + int queue_index; + void *queue; + struct vq_info_block *info_block; + struct list_head node; +}; + +#define KVM_VIRTIO_CCW_RING_ALIGN 4096 + +#define CCW_CMD_SET_VQ 0x13 +#define CCW_CMD_VDEV_RESET 0x33 +#define CCW_CMD_SET_IND 0x43 +#define CCW_CMD_SET_CONF_IND 0x53 +#define CCW_CMD_READ_FEAT 0x12 +#define CCW_CMD_WRITE_FEAT 0x11 +#define CCW_CMD_READ_CONF 0x22 +#define CCW_CMD_WRITE_CONF 0x21 +#define CCW_CMD_WRITE_STATUS 0x31 +#define CCW_CMD_READ_VQ_CONF 0x32 + +#define VIRTIO_CCW_DOING_SET_VQ 0x0001 +#define VIRTIO_CCW_DOING_RESET 0x0004 +#define VIRTIO_CCW_DOING_READ_FEAT 0x0008 +#define VIRTIO_CCW_DOING_WRITE_FEAT 0x0010 +#define VIRTIO_CCW_DOING_READ_CONFIG 0x0020 +#define VIRTIO_CCW_DOING_WRITE_CONFIG 0x0040 +#define VIRTIO_CCW_DOING_WRITE_STATUS 0x0080 +#define VIRTIO_CCW_DOING_SET_IND 0x0100 +#define VIRTIO_CCW_DOING_READ_VQ_CONF 0x0200 +#define VIRTIO_CCW_DOING_SET_CONF_IND 0x0400 +#define VIRTIO_CCW_INTPARM_MASK 0x +
[PATCH 3/5] s390: Add a mechanism to get the subchannel id.
This will be needed by the new virtio-ccw transport. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- arch/s390/include/asm/ccwdev.h | 5 + drivers/s390/cio/device_ops.c | 12 2 files changed, 17 insertions(+) diff --git a/arch/s390/include/asm/ccwdev.h b/arch/s390/include/asm/ccwdev.h index 1cb4bb3..9ad79f7 100644 --- a/arch/s390/include/asm/ccwdev.h +++ b/arch/s390/include/asm/ccwdev.h @@ -18,6 +18,9 @@ struct irb; struct ccw1; struct ccw_dev_id; +/* from asm/schid.h */ +struct subchannel_id; + /* simplified initializers for struct ccw_device: * CCW_DEVICE and CCW_DEVICE_DEVTYPE initialize one * entry in your MODULE_DEVICE_TABLE and set the match_flag correctly */ @@ -226,5 +229,7 @@ int ccw_device_siosl(struct ccw_device *); // FIXME: these have to go extern int _ccw_device_get_subchannel_number(struct ccw_device *); +extern void ccw_device_get_schid(struct ccw_device *, struct subchannel_id *); + extern void *ccw_device_get_chp_desc(struct ccw_device *, int); #endif /* _S390_CCWDEV_H_ */ diff --git a/drivers/s390/cio/device_ops.c b/drivers/s390/cio/device_ops.c index ec7fb6d..2ad832f 100644 --- a/drivers/s390/cio/device_ops.c +++ b/drivers/s390/cio/device_ops.c @@ -763,6 +763,18 @@ _ccw_device_get_subchannel_number(struct ccw_device *cdev) return cdev-private-schid.sch_no; } +/** + * ccw_device_get_schid - obtain a subchannel id + * @cdev: device to obtain the id for + * @schid: where to fill in the values + */ +void ccw_device_get_schid(struct ccw_device *cdev, struct subchannel_id *schid) +{ + struct subchannel *sch = to_subchannel(cdev-dev.parent); + + *schid = sch-schid; +} +EXPORT_SYMBOL_GPL(ccw_device_get_schid); MODULE_LICENSE(GPL); EXPORT_SYMBOL(ccw_device_set_options_mask); -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] KVM: s390: Split out early console code.
This code is transport agnostic and can be used by both the legacy virtio code and virtio_ccw. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/s390/kvm/Makefile | 2 +- drivers/s390/kvm/early_printk.c | 42 + drivers/s390/kvm/kvm_virtio.c | 29 ++-- drivers/s390/kvm/virtio_ccw.c | 1 - 4 files changed, 45 insertions(+), 29 deletions(-) create mode 100644 drivers/s390/kvm/early_printk.c diff --git a/drivers/s390/kvm/Makefile b/drivers/s390/kvm/Makefile index 241891a..a3c8fc4 100644 --- a/drivers/s390/kvm/Makefile +++ b/drivers/s390/kvm/Makefile @@ -6,4 +6,4 @@ # it under the terms of the GNU General Public License (version 2 only) # as published by the Free Software Foundation. -obj-$(CONFIG_S390_GUEST) += kvm_virtio.o virtio_ccw.o +obj-$(CONFIG_S390_GUEST) += kvm_virtio.o early_printk.o virtio_ccw.o diff --git a/drivers/s390/kvm/early_printk.c b/drivers/s390/kvm/early_printk.c new file mode 100644 index 000..7831530 --- /dev/null +++ b/drivers/s390/kvm/early_printk.c @@ -0,0 +1,42 @@ +/* + * early_printk.c - code for early console output with virtio_console + * split off from kvm_virtio.c + * + * Copyright IBM Corp. 2008 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2 only) + * as published by the Free Software Foundation. + * + *Author(s): Christian Borntraeger borntrae...@de.ibm.com + */ + +#include linux/kernel_stat.h +#include linux/init.h +#include linux/err.h +#include linux/virtio_console.h +#include asm/kvm_para.h +#include asm/kvm_virtio.h +#include asm/setup.h +#include asm/sclp.h + +static __init int early_put_chars(u32 vtermno, const char *buf, int count) +{ + char scratch[17]; + unsigned int len = count; + + if (len sizeof(scratch) - 1) + len = sizeof(scratch) - 1; + scratch[len] = '\0'; + memcpy(scratch, buf, len); + kvm_hypercall1(KVM_S390_VIRTIO_NOTIFY, __pa(scratch)); + return len; +} + +static int __init s390_virtio_console_init(void) +{ + if (sclp_has_vt220() || sclp_has_linemode()) + return -ENODEV; + return virtio_cons_early_init(early_put_chars); +} +console_initcall(s390_virtio_console_init); diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c index 76b95f3..6cdc66a 100644 --- a/drivers/s390/kvm/kvm_virtio.c +++ b/drivers/s390/kvm/kvm_virtio.c @@ -17,7 +17,6 @@ #include linux/virtio.h #include linux/virtio_config.h #include linux/slab.h -#include linux/virtio_console.h #include linux/interrupt.h #include linux/virtio_ring.h #include linux/export.h @@ -25,9 +24,9 @@ #include asm/io.h #include asm/kvm_para.h #include asm/kvm_virtio.h -#include asm/sclp.h #include asm/setup.h #include asm/irq.h +#include asm/sclp.h #define VIRTIO_SUBCODE_64 0x0D00 @@ -450,8 +449,7 @@ static int __init kvm_devices_init(void) return -ENODEV; if (test_devices_support(real_memory_size) 0) - /* No error. */ - return 0; + return -ENODEV; rc = vmem_add_mapping(real_memory_size, PAGE_SIZE); if (rc) @@ -476,29 +474,6 @@ static int __init kvm_devices_init(void) return 0; } -/* code for early console output with virtio_console */ -static __init int early_put_chars(u32 vtermno, const char *buf, int count) -{ - char scratch[17]; - unsigned int len = count; - - if (len sizeof(scratch) - 1) - len = sizeof(scratch) - 1; - scratch[len] = '\0'; - memcpy(scratch, buf, len); - kvm_hypercall1(KVM_S390_VIRTIO_NOTIFY, __pa(scratch)); - return len; -} - -static int __init s390_virtio_console_init(void) -{ - if (sclp_has_vt220() || sclp_has_linemode()) - return -ENODEV; - return virtio_cons_early_init(early_put_chars); -} -console_initcall(s390_virtio_console_init); - - /* * We do this after core stuff, but before the drivers. */ diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c index 4be878f..135126a 100644 --- a/drivers/s390/kvm/virtio_ccw.c +++ b/drivers/s390/kvm/virtio_ccw.c @@ -17,7 +17,6 @@ #include linux/virtio.h #include linux/virtio_config.h #include linux/slab.h -#include linux/virtio_console.h #include linux/interrupt.h #include linux/virtio_ring.h #include linux/pfn.h -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for 2012-10-30
Hi Please send in any agenda topics you are interested in. Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 RFC 0/3] kvm: Improving undercommit,overcommit scenarios
In some special scenarios like #vcpu = #pcpu, PLE handler may prove very costly, because there is no need to iterate over vcpus and do unsuccessful yield_to burning CPU. Similarly, when we have large number of small guests, it is possible that a spinning vcpu fails to yield_to any vcpu of same VM and go back and spin. This is also not effective when we are over-committed. Instead, we do a yield() so that we give chance to other VMs to run. This patch tries to optimize above scenarios. The first patch optimizes all the yield_to by bailing out when there is no need to continue yield_to (i.e., when there is only one task in source and target rq). Second patch uses that in PLE handler. Third patch uses overall system load knowledge to take decison on continuing in yield_to handler, and also yielding in overcommits. To be precise, * loadavg is converted to a scale of 2048 / per CPU * a load value of less than 1024 is considered as undercommit and we return from PLE handler in those cases * a load value of greater than 3586 (1.75 * 2048) is considered as overcommit and we yield to other VMs in such cases. (let threshold = 2048) Rationale for using threshold/2 for undercommit limit: Having a load below (0.5 * threshold) is used to avoid (the concern rasied by Rik) scenarios where we still have lock holder preempted vcpu waiting to be scheduled. (scenario arises when rq length is 1 even when we are under committed) Rationale for using (1.75 * threshold) for overcommit scenario: This is a heuristic where we should probably see rq length 1 and a vcpu of a different VM is waiting to be scheduled. Related future work (independent of this series): - Dynamically changing PLE window depending on system load. Result on 3.7.0-rc1 kernel shows around 146% improvement for ebizzy 1x with 32 core PLE machine with 32 vcpu guest. I believe we should get very good improvements for overcommit (especially 2) on large machines with small vcpu guests. (Could not test this as I do not have access to a bigger machine) base = 3.7.0-rc1 machine: 32 core mx3850 x5 PLE mc --+---+---+---++---+ ebizzy (rec/sec higher is beter) --+---+---+---++---+ basestdev patched stdev %improve --+---+---+---++---+ 1x 2543.375020.29036279.375082.5226 146.89143 2x 2410.875096.43272450.7500 207.8136 1.65396 3x 2184.9167 205.52262178.97.2034-0.30131 --+---+---+---++---+ --+---+---+---++---+ dbench (throughput in MB/sec. higher is better) --+---+---+---++---+ basestdev patched stdev %improve --+---+---+---++---+ 1x 5545.4330 596.43447042.8510 1012.092427.00272 2x 1993.097043.65481990.620075.7837-0.12428 3x 1295.386722.39971315.520836.0075 1.55429 --+---+---+---++---+ Changes since V1: - Discard the idea of exporting nrrunning and optimize in core scheduler (Peter) - Use yield() instead of schedule in overcommit scenarios (Rik) - Use loadavg knowledge to detect undercommit/overcommit Peter Zijlstra (1): Bail out of yield_to when source and target runqueue has one task Raghavendra K T (2): Handle yield_to failure return for potential undercommit case Check system load and handle different commit cases accordingly Please let me know your comments and suggestions. Link for V1: https://lkml.org/lkml/2012/9/21/168 kernel/sched/core.c | 25 +++-- virt/kvm/kvm_main.c | 56 ++-- 2 files changed, 65 insertions(+), 16 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 RFC 1/3] sched: Bail out of yield_to when source and target runqueue has one task
From: Peter Zijlstra pet...@infradead.org In case of undercomitted scenarios, especially in large guests yield_to overhead is significantly high. when run queue length of source and target is one, take an opportunity to bail out and return -ESRCH. This return condition can be further exploited to quickly come out of PLE handler. Signed-off-by: Peter Zijlstra pet...@infradead.org Raghavendra, Checking the rq length of target vcpu condition added. Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- kernel/sched/core.c | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 2d8927f..fc219a5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); * It's the caller's job to ensure that the target task struct * can't go away on us before we can do any checks. * - * Returns true if we indeed boosted the target task. + * Returns: + * true (0) if we indeed boosted the target task. + * false (0) if we failed to boost the target. + * -ESRCH if there's no task to yield to. */ bool __sched yield_to(struct task_struct *p, bool preempt) { @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) again: p_rq = task_rq(p); + /* +* If we're the only runnable task on the rq and target rq also +* has only one task, there's absolutely no point in yielding. +*/ + if (rq-nr_running == 1 p_rq-nr_running == 1) { + yielded = -ESRCH; + goto out_irq; + } + double_rq_lock(rq, p_rq); while (task_rq(p) != p_rq) { double_rq_unlock(rq, p_rq); @@ -4310,13 +4322,13 @@ again: } if (!curr-sched_class-yield_to_task) - goto out; + goto out_unlock; if (curr-sched_class != p-sched_class) - goto out; + goto out_unlock; if (task_running(p_rq, p) || p-state) - goto out; + goto out_unlock; yielded = curr-sched_class-yield_to_task(rq, p, preempt); if (yielded) { @@ -4329,11 +4341,12 @@ again: resched_task(p_rq-curr); } -out: +out_unlock: double_rq_unlock(rq, p_rq); +out_irq: local_irq_restore(flags); - if (yielded) + if (yielded 0) schedule(); return yielded; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 RFC 2/3] kvm: Handle yield_to failure return code for potential undercommit case
From: Raghavendra K T raghavendra...@linux.vnet.ibm.com Also we do not update last boosted vcpu in failure cases. Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- virt/kvm/kvm_main.c | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index be70035..e376434 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1639,6 +1639,7 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target) { struct pid *pid; struct task_struct *task = NULL; + bool ret = false; rcu_read_lock(); pid = rcu_dereference(target-pid); @@ -1646,17 +1647,15 @@ bool kvm_vcpu_yield_to(struct kvm_vcpu *target) task = get_pid_task(target-pid, PIDTYPE_PID); rcu_read_unlock(); if (!task) - return false; + return ret; if (task-flags PF_VCPU) { put_task_struct(task); - return false; - } - if (yield_to(task, 1)) { - put_task_struct(task); - return true; + return ret; } + ret = yield_to(task, 1); put_task_struct(task); - return false; + + return ret; } EXPORT_SYMBOL_GPL(kvm_vcpu_yield_to); @@ -1697,6 +1696,7 @@ bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu) return eligible; } #endif + void kvm_vcpu_on_spin(struct kvm_vcpu *me) { struct kvm *kvm = me-kvm; @@ -1727,11 +1727,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) continue; if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) continue; - if (kvm_vcpu_yield_to(vcpu)) { + + yielded = kvm_vcpu_yield_to(vcpu); + if (yielded 0) kvm-last_boosted_vcpu = i; - yielded = 1; + if (yielded) break; - } } } kvm_vcpu_set_in_spin_loop(me, false); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 RFC 3/3] kvm: Check system load and handle different commit cases accordingly
From: Raghavendra K T raghavendra...@linux.vnet.ibm.com The patch indroduces a helper function that calculates the system load (idea borrowed from loadavg calculation). The load is normalized to 2048 i.e., return value (threshold) of 2048 implies an approximate 1:1 committed guest. In undercommit cases (threshold/2) we simply return from PLE handler. In overcommit cases (1.75 * threshold) we do a yield(). The rationale is to allow other VMs of the host to run instead of burning the cpu cycle. Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- Idea of yielding in overcommit cases (especially in large number of small guest cases was Acked-by: Rik van Riel r...@redhat.com Andrew Theurer also has stressed the importance of reducing yield_to overhead and using yield(). (let threshold = 2048) Rationale for using threshold/2 for undercommit limit: Having a load below (0.5 * threshold) is used to avoid (the concern rasied by Rik) scenarios where we still have lock holder preempted vcpu waiting to be scheduled. (scenario arises when rq length is 1 even when we are under committed) Rationale for using (1.75 * threshold) for overcommit scenario: This is a heuristic where we should probably see rq length 1 and a vcpu of a different VM is waiting to be scheduled. virt/kvm/kvm_main.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e376434..28bbdfb 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1697,15 +1697,43 @@ bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu) } #endif +/* + * A load of 2048 corresponds to 1:1 overcommit + * undercommit threshold is half the 1:1 overcommit + * overcommit threshold is 1.75 times of 1:1 overcommit threshold + */ +#define COMMIT_THRESHOLD (FIXED_1) +#define UNDERCOMMIT_THRESHOLD (COMMIT_THRESHOLD 1) +#define OVERCOMMIT_THRESHOLD ((COMMIT_THRESHOLD 1) - (COMMIT_THRESHOLD 2)) + +unsigned long kvm_system_load(void) +{ + unsigned long load; + + load = avenrun[0] + FIXED_1/200; + load = load / num_online_cpus(); + + return load; +} + void kvm_vcpu_on_spin(struct kvm_vcpu *me) { struct kvm *kvm = me-kvm; struct kvm_vcpu *vcpu; int last_boosted_vcpu = me-kvm-last_boosted_vcpu; int yielded = 0; + unsigned long load; int pass; int i; + load = kvm_system_load(); + /* +* When we are undercomitted let us not waste time in +* iterating over all the VCPUs. +*/ + if (load UNDERCOMMIT_THRESHOLD) + return; + kvm_vcpu_set_in_spin_loop(me, true); /* * We boost the priority of a VCPU that is runnable but not @@ -1735,6 +1763,13 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) break; } } + /* +* If we are not able to yield especially in overcommit cases +* let us be courteous to other VM's VCPUs waiting to be scheduled. +*/ + if (!yielded load OVERCOMMIT_THRESHOLD) + yield(); + kvm_vcpu_set_in_spin_loop(me, false); /* Ensure vcpu is not eligible during next spinloop */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 08/18] x86: pvclock: generic pvclock vsyscall initialization
On 10/24/2012 05:13 PM, Marcelo Tosatti wrote: Index: vsyscall/arch/x86/Kconfig === --- vsyscall.orig/arch/x86/Kconfig +++ vsyscall/arch/x86/Kconfig @@ -632,6 +632,13 @@ config PARAVIRT_SPINLOCKS config PARAVIRT_CLOCK bool +config PARAVIRT_CLOCK_VSYSCALL + bool Paravirt clock vsyscall support + depends on PARAVIRT_CLOCK GENERIC_TIME_VSYSCALL + ---help--- + Enable performance critical clock related system calls to + be executed in userspace, provided that the hypervisor + supports it. endif Besides debugging, what is the point in having this as an extra-selectable? Is there any case in which a virtual machine has code for this, but may decide to run without it ? I believe all this code in vsyscall should be wrapped in PARAVIRT_CLOCK only. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 08/18] x86: pvclock: generic pvclock vsyscall initialization
On 10/24/2012 05:13 PM, Marcelo Tosatti wrote: + */ +int __init pvclock_init_vsyscall(void) +{ + int idx; + unsigned int size = PVCLOCK_VSYSCALL_NR_PAGES*PAGE_SIZE; + + pvclock_vdso_info = __alloc_bootmem(size, PAGE_SIZE, 0); + if (!pvclock_vdso_info) + return -ENOMEM; + + memset(pvclock_vdso_info, 0, size); + + for (idx = 0; idx = (PVCLOCK_FIXMAP_END-PVCLOCK_FIXMAP_BEGIN); idx++) { + __set_fixmap(PVCLOCK_FIXMAP_BEGIN + idx, + __pa_symbol(pvclock_vdso_info) + (idx*PAGE_SIZE), + PAGE_KERNEL_VVAR); BTW, Previous line is whitespace damaged. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 09/18] KVM: x86: introduce facility to support vsyscall pvclock, via MSR
On 10/24/2012 05:13 PM, Marcelo Tosatti wrote: Allow a guest to register a second location for the VCPU time info structure for each vcpu (as described by MSR_KVM_SYSTEM_TIME_NEW). This is intended to allow the guest kernel to map this information into a usermode accessible page, so that usermode can efficiently calculate system time from the TSC without having to make a syscall. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Can you please be a bit more specific about why we need this? Why does the host need to provide us with two pages with the exact same data? Why can't just do it with mapping tricks in the guest? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 11/18] x86: vsyscall: pass mode to gettime backend
On 10/24/2012 05:13 PM, Marcelo Tosatti wrote: Required by next patch. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com I don't see where. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 12/18] x86: vdso: pvclock gettime support
On 10/24/2012 05:13 PM, Marcelo Tosatti wrote: Improve performance of time system calls when using Linux pvclock, by reading time info from fixmap visible copy of pvclock data. Originally from Jeremy Fitzhardinge. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: vsyscall/arch/x86/vdso/vclock_gettime.c === --- vsyscall.orig/arch/x86/vdso/vclock_gettime.c +++ vsyscall/arch/x86/vdso/vclock_gettime.c @@ -22,6 +22,7 @@ #include asm/hpet.h #include asm/unistd.h #include asm/io.h +#include asm/pvclock.h #define gtod (VVAR(vsyscall_gtod_data)) @@ -62,6 +63,69 @@ static notrace cycle_t vread_hpet(void) return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + 0xf0); } +#ifdef CONFIG_PARAVIRT_CLOCK_VSYSCALL + +static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu) +{ + const aligned_pvti_t *pvti_base; + int idx = cpu / (PAGE_SIZE/PVTI_SIZE); + int offset = cpu % (PAGE_SIZE/PVTI_SIZE); + + BUG_ON(PVCLOCK_FIXMAP_BEGIN + idx PVCLOCK_FIXMAP_END); + + pvti_base = (aligned_pvti_t *)__fix_to_virt(PVCLOCK_FIXMAP_BEGIN+idx); + + return pvti_base[offset].info; +} + Unless I am missing something, if gcc decides to not inline get_pvti, this will break, right? I believe you need to mark that function with __always_inline. +static notrace cycle_t vread_pvclock(int *mode) +{ + const struct pvclock_vsyscall_time_info *pvti; + cycle_t ret; + u64 last; + u32 version; + u32 migrate_count; + u8 flags; + unsigned cpu, cpu1; + + + /* + * When looping to get a consistent (time-info, tsc) pair, we + * also need to deal with the possibility we can switch vcpus, + * so make sure we always re-fetch time-info for the current vcpu. + */ + do { + cpu = __getcpu() 0xfff; Please wrap this 0xfff into something meaningful. + pvti = get_pvti(cpu); + + migrate_count = pvti-migrate_count; + + version = __pvclock_read_cycles(pvti-pvti, ret, flags); + + /* + * Test we're still on the cpu as well as the version. + * We could have been migrated just after the first + * vgetcpu but before fetching the version, so we + * wouldn't notice a version change. + */ + cpu1 = __getcpu() 0xfff; + } while (unlikely(cpu != cpu1 || + (pvti-pvti.version 1) || + pvti-pvti.version != version || + pvti-migrate_count != migrate_count)); + + if (unlikely(!(flags PVCLOCK_TSC_STABLE_BIT))) + *mode = VCLOCK_NONE; + + last = VVAR(vsyscall_gtod_data).clock.cycle_last; + + if (likely(ret = last)) + return ret; + Please add a comment here referring to tsc.c, where an explanation of this test lives. This is quite non-obvious for the non initiated. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 13/18] KVM: x86: pass host_tsc to read_l1_tsc
On 10/24/2012 05:13 PM, Marcelo Tosatti wrote: Allow the caller to pass host tsc value to kvm_x86_ops-read_l1_tsc(). Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Would you mind explaining why? it seems to me that rdtscll() here would be perfectly safe: the only case in which they wouldn't, is in a nested-vm environment running paravirt-linux with a paravirt tsc. In this case, it is quite likely that we'll want rdtscll *anyway*, instead of going to tsc directly. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can we run guest OS without using NAT and iptables?
On 10/29/2012 05:30 AM, Stefan Hajnoczi wrote: On Mon, Oct 29, 2012 at 12:55:43PM +0530, freak 62 wrote: Can we run guest o.s. on KVM without enabling NAT and iptables? The reason to do this is , I wanted to disable conntrack module from my system and to disable that I must have to delete iptable and NAT. I am getting the following message, when I start guest o.s. on KVM (iptable and NAT disabled): Error starting domain: internal error 'Network default' is not active. Is their any way to run guest o.s. with NAT disabled? or Is their any way to disable conntrack module and still can use KVM to run guest OS ? I am using Ubuntu 10.04 You can remove the default virsh network like sudo virsh net-destroy default sudo virsh net-undefine default The most common networking setup that doesn't use NAT + iptables is probably bridged networking: http://wiki.libvirt.org/page/Networking#Bridged_networking_.28aka_.22shared_physical_device.22.29 - Cole -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 08/18] x86: pvclock: generic pvclock vsyscall initialization
On Mon, Oct 29, 2012 at 06:18:20PM +0400, Glauber Costa wrote: On 10/24/2012 05:13 PM, Marcelo Tosatti wrote: Index: vsyscall/arch/x86/Kconfig === --- vsyscall.orig/arch/x86/Kconfig +++ vsyscall/arch/x86/Kconfig @@ -632,6 +632,13 @@ config PARAVIRT_SPINLOCKS config PARAVIRT_CLOCK bool +config PARAVIRT_CLOCK_VSYSCALL + bool Paravirt clock vsyscall support + depends on PARAVIRT_CLOCK GENERIC_TIME_VSYSCALL + ---help--- + Enable performance critical clock related system calls to + be executed in userspace, provided that the hypervisor + supports it. endif Besides debugging, what is the point in having this as an extra-selectable? Is there any case in which a virtual machine has code for this, but may decide to run without it ? Don't think so (its pretty small anyway, the code). I believe all this code in vsyscall should be wrapped in PARAVIRT_CLOCK only. Unless Jeremy has a reason, i'm fine with that. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 3/8] tun: report orphan frags errors to zero copy callback
When tun transmits a zero copy skb, it orphans the frags which might need to allocate extra memory, in atomic context. If that fails, notify ubufs callback before freeing the skb as a hint that device should disable zerocopy mode. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/tun.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 3157519..613f826 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -433,6 +433,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) drop: dev-stats.tx_dropped++; + skb_tx_error(skb, -ENOMEM); kfree_skb(skb); return NETDEV_TX_OK; } -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/8] enable/disable zero copy tx dynamically
tun supports zero copy transmit since 0690899b4d4501b3505be069b9a687e68ccbe15b, however you can only enable this mode if you know your workload does not trigger heavy guest to host/host to guest traffic - otherwise you get a (minor) performance regression. This patchset addresses this problem by notifying the owner device when callback is invoked because of a data copy. This makes it possible to detect whether zero copy is appropriate dynamically: we start in zero copy mode, when we detect data copied we disable zero copy for a while. With this patch applied, I get the same performance for guest to host and guest to guest both with and without zero copy tx. Michael S. Tsirkin (8): skb: report completion status for zero copy skbs skb: api to report errors for zero copy skbs tun: report orphan frags errors to zero copy callback vhost-net: cleanup macros for DMA status tracking vhost: track zero copy failures using DMA length vhost: move -net specific code out vhost-net: select tx zero copy dynamically vhost-net: reduce vq polling on tx zerocopy drivers/net/tun.c | 1 + drivers/vhost/net.c | 109 +++--- drivers/vhost/tcm_vhost.c | 1 + drivers/vhost/vhost.c | 52 +++--- drivers/vhost/vhost.h | 11 ++--- include/linux/skbuff.h| 5 ++- net/core/skbuff.c | 23 +- 7 files changed, 141 insertions(+), 61 deletions(-) -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/8] skb: report completion status for zero copy skbs
Even if skb is marked for zero copy, net core might still decide to copy it later which is somewhat slower than a copy in user context: besides copying the data we need to pin/unpin the pages. Add a parameter reporting such cases through zero copy callback: if this happens a lot, device can take this into account and switch to copying in user context. This patch updates all users but ignores the passed value for now: it will be used by follow-up patches. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/vhost.c | 2 +- drivers/vhost/vhost.h | 2 +- include/linux/skbuff.h | 4 +++- net/core/skbuff.c | 4 ++-- 4 files changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 99ac2cb..92308b6 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1600,7 +1600,7 @@ void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref *ubufs) kfree(ubufs); } -void vhost_zerocopy_callback(struct ubuf_info *ubuf) +void vhost_zerocopy_callback(struct ubuf_info *ubuf, int zerocopy_status) { struct vhost_ubuf_ref *ubufs = ubuf-ctx; struct vhost_virtqueue *vq = ubufs-vq; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 1125af3..eb7263c3 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -191,7 +191,7 @@ bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *); int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log, unsigned int log_num, u64 len); -void vhost_zerocopy_callback(struct ubuf_info *); +void vhost_zerocopy_callback(struct ubuf_info *, int); int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq); #define vq_err(vq, fmt, ...) do { \ diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 6a2c34e..8bac11b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -235,11 +235,13 @@ enum { /* * The callback notifies userspace to release buffers when skb DMA is done in * lower device, the skb last reference should be 0 when calling this. + * The zerocopy_status argument is 0 if zero copy transmit occurred, + * 1 on successful data copy; 0 on out of memory error. * The ctx field is used to track device context. * The desc field is used to track userspace buffer index. */ struct ubuf_info { - void (*callback)(struct ubuf_info *); + void (*callback)(struct ubuf_info *, int zerocopy_status); void *ctx; unsigned long desc; }; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 6e04b1f..eb31f6e 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -519,7 +519,7 @@ static void skb_release_data(struct sk_buff *skb) uarg = skb_shinfo(skb)-destructor_arg; if (uarg-callback) - uarg-callback(uarg); + uarg-callback(uarg, 0); } if (skb_has_frag_list(skb)) @@ -797,7 +797,7 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) for (i = 0; i num_frags; i++) skb_frag_unref(skb, i); - uarg-callback(uarg); + uarg-callback(uarg, 1); /* skb frags point to kernel buffers */ for (i = num_frags - 1; i = 0; i--) { -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/8] skb: api to report errors for zero copy skbs
Orphaning frags for zero copy skbs needs to allocate data in atomic context so is has a chance to fail. If it does we currently discard the skb which is safe, but we don't report anything to the caller, so it can not recover by e.g. disabling zero copy. Add an API to free skb reporting such errors: this is used by tun in case orphaning frags fails. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- include/linux/skbuff.h | 1 + net/core/skbuff.c | 19 +++ 2 files changed, 20 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 8bac11b..0644432 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -568,6 +568,7 @@ static inline struct rtable *skb_rtable(const struct sk_buff *skb) } extern void kfree_skb(struct sk_buff *skb); +extern void skb_tx_error(struct sk_buff *skb, int err); extern void consume_skb(struct sk_buff *skb); extern void __kfree_skb(struct sk_buff *skb); extern struct kmem_cache *skbuff_head_cache; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index eb31f6e..ad99c64 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -635,6 +635,25 @@ void kfree_skb(struct sk_buff *skb) EXPORT_SYMBOL(kfree_skb); /** + * kfree_skb_on_error - report an sk_buff xmit error + * @skb: buffer that triggered an error + * + * Report xmit error if a device callback is tracking this skb. + */ +void skb_tx_error(struct sk_buff *skb, int err) +{ + if (skb_shinfo(skb)-tx_flags SKBTX_DEV_ZEROCOPY) { + struct ubuf_info *uarg; + + uarg = skb_shinfo(skb)-destructor_arg; + if (uarg-callback) + uarg-callback(uarg, err); + skb_shinfo(skb)-tx_flags = ~SKBTX_DEV_ZEROCOPY; + } +} +EXPORT_SYMBOL(skb_tx_error); + +/** * consume_skb - free an skbuff * @skb: buffer to free * -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 8/8] vhost-net: reduce vq polling on tx zerocopy
It seems that to avoid deadlocks it is enough to poll vq before we are going to use the last buffer. This should be faster than c70aa540c7a9f67add11ad3161096fb95233aa2e. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/net.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8e9de79..3967f82 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -197,8 +197,16 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, int status) { struct vhost_ubuf_ref *ubufs = ubuf-ctx; struct vhost_virtqueue *vq = ubufs-vq; - - vhost_poll_queue(vq-poll); + int cnt = atomic_read(ubufs-kref.refcount); + + /* +* Trigger polling thread if guest stopped submitting new buffers: +* in this case, the refcount after decrement will eventually reach 1 +* so here it is 2. +* We also trigger polling periodically after each 16 packets. +*/ + if (cnt = 2 || !(cnt % 16)) + vhost_poll_queue(vq-poll); /* set len to mark this desc buffers done DMA */ vq-heads[ubuf-desc].len = status ? VHOST_DMA_FAILED_LEN : VHOST_DMA_DONE_LEN; -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 4/8] vhost-net: cleanup macros for DMA status tracking
Better document macros for DMA tracking. Add an explicit one for DMA in progress instead of relying on user supplying len != 1. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/net.c | 3 ++- drivers/vhost/vhost.c | 2 +- drivers/vhost/vhost.h | 12 +--- 3 files changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 072cbba..f80ae5f 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -237,7 +237,8 @@ static void handle_tx(struct vhost_net *net) } else { struct ubuf_info *ubuf = vq-ubuf_info[head]; - vq-heads[vq-upend_idx].len = len; + vq-heads[vq-upend_idx].len = + VHOST_DMA_IN_PROGRESS; ubuf-callback = vhost_zerocopy_callback; ubuf-ctx = vq-ubufs; ubuf-desc = vq-upend_idx; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 92308b6..906fd9f 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1606,7 +1606,7 @@ void vhost_zerocopy_callback(struct ubuf_info *ubuf, int zerocopy_status) struct vhost_virtqueue *vq = ubufs-vq; vhost_poll_queue(vq-poll); - /* set len = 1 to mark this desc buffers done DMA */ + /* set len to mark this desc buffers done DMA */ vq-heads[ubuf-desc].len = VHOST_DMA_DONE_LEN; kref_put(ubufs-kref, vhost_zerocopy_done_signal); } diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index eb7263c3..ad72a1f 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -13,9 +13,15 @@ #include linux/virtio_ring.h #include linux/atomic.h -/* This is for zerocopy, used buffer len is set to 1 when lower device DMA - * done */ -#define VHOST_DMA_DONE_LEN 1 +/* + * For transmit, used buffer len is unused; we override it to track buffer + * status internally; used for zerocopy tx only. + */ +/* Lower device DMA done */ +#define VHOST_DMA_DONE_LEN 2 +/* Lower device DMA in progress */ +#define VHOST_DMA_IN_PROGRESS 1 +/* Buffer unused */ #define VHOST_DMA_CLEAR_LEN0 struct vhost_device; -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 6/8] vhost: move -net specific code out
Zerocopy handling code is vhost-net specific. Move it from vhost.c/vhost.h out to net.c Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/net.c | 45 drivers/vhost/tcm_vhost.c | 1 + drivers/vhost/vhost.c | 53 +++ drivers/vhost/vhost.h | 21 +++ 4 files changed, 56 insertions(+), 64 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index f80ae5f..532fc88 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -126,6 +126,42 @@ static void tx_poll_start(struct vhost_net *net, struct socket *sock) net-tx_poll_state = VHOST_NET_POLL_STARTED; } +/* In case of DMA done not in order in lower device driver for some reason. + * upend_idx is used to track end of used idx, done_idx is used to track head + * of used idx. Once lower device DMA done contiguously, we will signal KVM + * guest used idx. + */ +int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq) +{ + int i; + int j = 0; + + for (i = vq-done_idx; i != vq-upend_idx; i = (i + 1) % UIO_MAXIOV) { + if (VHOST_DMA_IS_DONE(vq-heads[i].len)) { + vq-heads[i].len = VHOST_DMA_CLEAR_LEN; + vhost_add_used_and_signal(vq-dev, vq, + vq-heads[i].id, 0); + ++j; + } else + break; + } + if (j) + vq-done_idx = i; + return j; +} + +static void vhost_zerocopy_callback(struct ubuf_info *ubuf, int status) +{ + struct vhost_ubuf_ref *ubufs = ubuf-ctx; + struct vhost_virtqueue *vq = ubufs-vq; + + vhost_poll_queue(vq-poll); + /* set len to mark this desc buffers done DMA */ + vq-heads[ubuf-desc].len = status ? + VHOST_DMA_FAILED_LEN : VHOST_DMA_DONE_LEN; + vhost_ubuf_put(ubufs); +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_tx(struct vhost_net *net) @@ -594,9 +630,18 @@ static int vhost_net_release(struct inode *inode, struct file *f) struct vhost_net *n = f-private_data; struct socket *tx_sock; struct socket *rx_sock; + int i; vhost_net_stop(n, tx_sock, rx_sock); vhost_net_flush(n); + vhost_dev_stop(n-dev); + for (i = 0; i n-dev.nvqs; ++i) { + /* Wait for all lower device DMAs done. */ + if (n-dev.vqs[i].ubufs) + vhost_ubuf_put_and_wait(n-dev.vqs[i].ubufs); + + vhost_zerocopy_signal_used(n, n-dev.vqs[i]); + } vhost_dev_cleanup(n-dev, false); if (tx_sock) fput(tx_sock-file); diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c index aa31692..23c138f 100644 --- a/drivers/vhost/tcm_vhost.c +++ b/drivers/vhost/tcm_vhost.c @@ -895,6 +895,7 @@ static int vhost_scsi_release(struct inode *inode, struct file *f) vhost_scsi_clear_endpoint(s, backend); } + vhost_dev_stop(s-dev); vhost_dev_cleanup(s-dev, false); kfree(s); return 0; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 5affce3..ef8f598 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -26,10 +26,6 @@ #include linux/kthread.h #include linux/cgroup.h -#include linux/net.h -#include linux/if_packet.h -#include linux/if_arp.h - #include vhost.h enum { @@ -414,28 +410,16 @@ long vhost_dev_reset_owner(struct vhost_dev *dev) return 0; } -/* In case of DMA done not in order in lower device driver for some reason. - * upend_idx is used to track end of used idx, done_idx is used to track head - * of used idx. Once lower device DMA done contiguously, we will signal KVM - * guest used idx. - */ -int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq) +void vhost_dev_stop(struct vhost_dev *dev) { int i; - int j = 0; - - for (i = vq-done_idx; i != vq-upend_idx; i = (i + 1) % UIO_MAXIOV) { - if (VHOST_DMA_IS_DONE(vq-heads[i].len)) { - vq-heads[i].len = VHOST_DMA_CLEAR_LEN; - vhost_add_used_and_signal(vq-dev, vq, - vq-heads[i].id, 0); - ++j; - } else - break; + + for (i = 0; i dev-nvqs; ++i) { + if (dev-vqs[i].kick dev-vqs[i].handle_kick) { + vhost_poll_stop(dev-vqs[i].poll); + vhost_poll_flush(dev-vqs[i].poll); + } } - if (j) - vq-done_idx = i; - return j; } /* Caller should have device mutex if and only if locked is set */ @@ -444,17 +428,6 @@ void vhost_dev_cleanup(struct vhost_dev *dev, bool locked) int i; for (i = 0;
[PATCH net-next 7/8] vhost-net: select tx zero copy dynamically
Even when vhost-net is in zero-copy transmit mode, net core might still decide to copy the skb later which is somewhat slower than a copy in user context: data copy overhead is added to the cost of page pin/unpin. The result is that enabling tx zero copy option leads to higher CPU utilization for guest to guest and guest to host traffic. To fix this, suppress zero copy tx after a given number of packets triggered late data copy. Re-enable periodically to detect workload changes. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/net.c | 55 - 1 file changed, 50 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 532fc88..8e9de79 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -42,6 +42,21 @@ MODULE_PARM_DESC(experimental_zcopytx, Enable Experimental Zero Copy TX); #define VHOST_MAX_PEND 128 #define VHOST_GOODCOPY_LEN 256 +/* + * For transmit, used buffer len is unused; we override it to track buffer + * status internally; used for zerocopy tx only. + */ +/* Lower device DMA failed */ +#define VHOST_DMA_FAILED_LEN 3 +/* Lower device DMA done */ +#define VHOST_DMA_DONE_LEN 2 +/* Lower device DMA in progress */ +#define VHOST_DMA_IN_PROGRESS 1 +/* Buffer unused */ +#define VHOST_DMA_CLEAR_LEN0 + +#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN) + enum { VHOST_NET_VQ_RX = 0, VHOST_NET_VQ_TX = 1, @@ -62,8 +77,33 @@ struct vhost_net { * We only do this when socket buffer fills up. * Protected by tx vq lock. */ enum vhost_net_poll_state tx_poll_state; + /* Number of TX recently submitted. +* Protected by tx vq lock. */ + unsigned tx_packets; + /* Number of times zerocopy TX recently failed. +* Protected by tx vq lock. */ + unsigned tx_zcopy_err; }; +static void vhost_net_tx_packet(struct vhost_net *net) +{ + ++net-tx_packets; + if (net-tx_packets 1024) + return; + net-tx_packets = 0; + net-tx_zcopy_err = 0; +} + +static void vhost_net_tx_err(struct vhost_net *net) +{ + ++net-tx_zcopy_err; +} + +static bool vhost_net_tx_select_zcopy(struct vhost_net *net) +{ + return net-tx_packets / 64 = net-tx_zcopy_err; +} + static bool vhost_sock_zcopy(struct socket *sock) { return unlikely(experimental_zcopytx) @@ -131,12 +171,15 @@ static void tx_poll_start(struct vhost_net *net, struct socket *sock) * of used idx. Once lower device DMA done contiguously, we will signal KVM * guest used idx. */ -int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq) +static int vhost_zerocopy_signal_used(struct vhost_net *net, + struct vhost_virtqueue *vq) { int i; int j = 0; for (i = vq-done_idx; i != vq-upend_idx; i = (i + 1) % UIO_MAXIOV) { + if (vq-heads[i].len == VHOST_DMA_FAILED_LEN) + vhost_net_tx_err(net); if (VHOST_DMA_IS_DONE(vq-heads[i].len)) { vq-heads[i].len = VHOST_DMA_CLEAR_LEN; vhost_add_used_and_signal(vq-dev, vq, @@ -208,7 +251,7 @@ static void handle_tx(struct vhost_net *net) for (;;) { /* Release DMAs done buffers first */ if (zcopy) - vhost_zerocopy_signal_used(vq); + vhost_zerocopy_signal_used(net, vq); head = vhost_get_vq_desc(net-dev, vq, vq-iov, ARRAY_SIZE(vq-iov), @@ -263,7 +306,8 @@ static void handle_tx(struct vhost_net *net) /* use msg_control to pass vhost zerocopy ubuf info to skb */ if (zcopy) { vq-heads[vq-upend_idx].id = head; - if (len VHOST_GOODCOPY_LEN) { + if (!vhost_net_tx_select_zcopy(net) || + len VHOST_GOODCOPY_LEN) { /* copy don't need to wait for DMA done */ vq-heads[vq-upend_idx].len = VHOST_DMA_DONE_LEN; @@ -305,8 +349,9 @@ static void handle_tx(struct vhost_net *net) if (!zcopy) vhost_add_used_and_signal(net-dev, vq, head, 0); else - vhost_zerocopy_signal_used(vq); + vhost_zerocopy_signal_used(net, vq); total_len += len; + vhost_net_tx_packet(net); if (unlikely(total_len = VHOST_NET_WEIGHT)) { vhost_poll_queue(vq-poll); break; @@ -774,7 +819,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) if (oldubufs) { vhost_ubuf_put_and_wait(oldubufs); mutex_lock(vq-mutex); -
[PATCH net-next 5/8] vhost: track zero copy failures using DMA length
This will be used to disable zerocopy when error rate is high. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/vhost/vhost.c | 7 --- drivers/vhost/vhost.h | 4 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 906fd9f..5affce3 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -425,7 +425,7 @@ int vhost_zerocopy_signal_used(struct vhost_virtqueue *vq) int j = 0; for (i = vq-done_idx; i != vq-upend_idx; i = (i + 1) % UIO_MAXIOV) { - if ((vq-heads[i].len == VHOST_DMA_DONE_LEN)) { + if (VHOST_DMA_IS_DONE(vq-heads[i].len)) { vq-heads[i].len = VHOST_DMA_CLEAR_LEN; vhost_add_used_and_signal(vq-dev, vq, vq-heads[i].id, 0); @@ -1600,13 +1600,14 @@ void vhost_ubuf_put_and_wait(struct vhost_ubuf_ref *ubufs) kfree(ubufs); } -void vhost_zerocopy_callback(struct ubuf_info *ubuf, int zerocopy_status) +void vhost_zerocopy_callback(struct ubuf_info *ubuf, int status) { struct vhost_ubuf_ref *ubufs = ubuf-ctx; struct vhost_virtqueue *vq = ubufs-vq; vhost_poll_queue(vq-poll); /* set len to mark this desc buffers done DMA */ - vq-heads[ubuf-desc].len = VHOST_DMA_DONE_LEN; + vq-heads[ubuf-desc].len = status ? + VHOST_DMA_FAILED_LEN : VHOST_DMA_DONE_LEN; kref_put(ubufs-kref, vhost_zerocopy_done_signal); } diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index ad72a1f..6fdf31d 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -17,6 +17,8 @@ * For transmit, used buffer len is unused; we override it to track buffer * status internally; used for zerocopy tx only. */ +/* Lower device DMA failed */ +#define VHOST_DMA_FAILED_LEN 3 /* Lower device DMA done */ #define VHOST_DMA_DONE_LEN 2 /* Lower device DMA in progress */ @@ -24,6 +26,8 @@ /* Buffer unused */ #define VHOST_DMA_CLEAR_LEN0 +#define VHOST_DMA_IS_DONE(len) ((len) = VHOST_DMA_DONE_LEN) + struct vhost_device; struct vhost_work; -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: fix rbtree-interval search
On Oct29 14:12, Kirill A. Shutemov wrote: I've noticed message on kvm exit: Warning: serial8250__exit failed. kvm tool is not able to remove ioport range which was added previously. The issue is caused by bug in rbtree-interval. Search algorithm in rb_int_search_single() expects correct value of max_high. But the tree can contain leaf nodes, which never were updated by propagate_callback(). For this kind of nodes high_max will be 0 and we will not be able to find and remove them. Let's initialize max_high on RB_INT_INIT() time. Fixing this bug makes other bug visible: propagate_callback() can be called for empty tree: node == NULL. The callback is not ready for empty tree. Let's fix that as well. Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com I had the same issue but didn't found the time to fix it. Applying the patch fixes the problem. Tested-by: William Dauchy will...@gandi.net Thanks, -- William signature.asc Description: Digital signature
Re: [patch 09/18] KVM: x86: introduce facility to support vsyscall pvclock, via MSR
On 10/29/2012 07:45 AM, Glauber Costa wrote: On 10/24/2012 05:13 PM, Marcelo Tosatti wrote: Allow a guest to register a second location for the VCPU time info structure for each vcpu (as described by MSR_KVM_SYSTEM_TIME_NEW). This is intended to allow the guest kernel to map this information into a usermode accessible page, so that usermode can efficiently calculate system time from the TSC without having to make a syscall. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Can you please be a bit more specific about why we need this? Why does the host need to provide us with two pages with the exact same data? Why can't just do it with mapping tricks in the guest? In Xen the pvclock structure is embedded within a pile of other stuff that shouldn't be mapped into guest memory, so providing for a second location allows it to be placed whereever is convenient for the guest. That's a restriction of the Xen ABI, but I don't know if it affects KVM. J -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 RFC 3/3] kvm: Check system load and handle different commit cases accordingly
On Mon, 2012-10-29 at 19:37 +0530, Raghavendra K T wrote: +/* + * A load of 2048 corresponds to 1:1 overcommit + * undercommit threshold is half the 1:1 overcommit + * overcommit threshold is 1.75 times of 1:1 overcommit threshold + */ +#define COMMIT_THRESHOLD (FIXED_1) +#define UNDERCOMMIT_THRESHOLD (COMMIT_THRESHOLD 1) +#define OVERCOMMIT_THRESHOLD ((COMMIT_THRESHOLD 1) - (COMMIT_THRESHOLD 2)) + +unsigned long kvm_system_load(void) +{ + unsigned long load; + + load = avenrun[0] + FIXED_1/200; + load = load / num_online_cpus(); + + return load; +} ARGH.. no that's wrong.. very wrong. 1) avenrun[] EXPORT_SYMBOL says it should be removed, that's not a joke. 2) avenrun[] is a global load, do not ever use a global load measure 3) avenrun[] has nothing what so ever to do with runqueue lengths, someone with a gazillion tasks in D state will get a huge load but the cpu is very idle. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] s390: Guest support for virtio-ccw.
On 29.10.2012, at 14:07, Cornelia Huck wrote: Avi, Marcelo, I'd like to propose inclusion of the guest support patches for virtio-ccw into 3.8. I'm confident that the host - guest interface for virtio-ccw is fine now, and the patches have been extensively tested by our internal test team. Patch 1 might conceivably be 3.7 material, though I fear it's a bit late for that. Well, patch 1 without virtio-ccw support is quite useless, right? You wouldn't get any I/O at all. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 08/18] x86: pvclock: generic pvclock vsyscall initialization
On 10/29/2012 07:54 AM, Marcelo Tosatti wrote: On Mon, Oct 29, 2012 at 06:18:20PM +0400, Glauber Costa wrote: On 10/24/2012 05:13 PM, Marcelo Tosatti wrote: Index: vsyscall/arch/x86/Kconfig === --- vsyscall.orig/arch/x86/Kconfig +++ vsyscall/arch/x86/Kconfig @@ -632,6 +632,13 @@ config PARAVIRT_SPINLOCKS config PARAVIRT_CLOCK bool +config PARAVIRT_CLOCK_VSYSCALL + bool Paravirt clock vsyscall support + depends on PARAVIRT_CLOCK GENERIC_TIME_VSYSCALL + ---help--- + Enable performance critical clock related system calls to + be executed in userspace, provided that the hypervisor + supports it. endif Besides debugging, what is the point in having this as an extra-selectable? Is there any case in which a virtual machine has code for this, but may decide to run without it ? Don't think so (its pretty small anyway, the code). I believe all this code in vsyscall should be wrapped in PARAVIRT_CLOCK only. Unless Jeremy has a reason, i'm fine with that. I often set up blind config variables for dependency management; I'm guessing the GENERIC_TIME_VSYSCALL dependency is important. I think the problem is that this exists, but that it's a user-selectable option. Removing the prompt should fix that. J -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] KVM: s390: Add a channel I/O based virtio transport driver.
On 29.10.2012, at 14:07, Cornelia Huck wrote: Add a driver for kvm guests that matches virtual ccw devices provided by the host as virtio bridge devices. These virtio-ccw devices use a special set of channel commands in order to perform virtio functions. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- arch/s390/include/asm/irq.h | 1 + arch/s390/kernel/irq.c| 1 + drivers/s390/kvm/Makefile | 2 +- drivers/s390/kvm/virtio_ccw.c | 842 ++ 4 files changed, 845 insertions(+), 1 deletion(-) create mode 100644 drivers/s390/kvm/virtio_ccw.c diff --git a/arch/s390/include/asm/irq.h b/arch/s390/include/asm/irq.h index 6703dd9..ad2ad6b 100644 --- a/arch/s390/include/asm/irq.h +++ b/arch/s390/include/asm/irq.h @@ -33,6 +33,7 @@ enum interruption_class { IOINT_APB, IOINT_ADM, IOINT_CSC, + IOINT_VIR, NMI_NMI, NR_IRQS, }; diff --git a/arch/s390/kernel/irq.c b/arch/s390/kernel/irq.c index 6cdc55b..97c171a 100644 --- a/arch/s390/kernel/irq.c +++ b/arch/s390/kernel/irq.c @@ -58,6 +58,7 @@ static const struct irq_class intrclass_names[] = { [IOINT_APB] = {.name = APB, .desc = [I/O] AP Bus}, [IOINT_ADM] = {.name = ADM, .desc = [I/O] EADM Subchannel}, [IOINT_CSC] = {.name = CSC, .desc = [I/O] CHSC Subchannel}, + [IOINT_VIR] = {.name = VIR, .desc = [I/O] Virtual I/O Devices}, [NMI_NMI]= {.name = NMI, .desc = [NMI] Machine Check}, }; diff --git a/drivers/s390/kvm/Makefile b/drivers/s390/kvm/Makefile index 0815690..241891a 100644 --- a/drivers/s390/kvm/Makefile +++ b/drivers/s390/kvm/Makefile @@ -6,4 +6,4 @@ # it under the terms of the GNU General Public License (version 2 only) # as published by the Free Software Foundation. -obj-$(CONFIG_S390_GUEST) += kvm_virtio.o +obj-$(CONFIG_S390_GUEST) += kvm_virtio.o virtio_ccw.o diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c new file mode 100644 index 000..4be878f --- /dev/null +++ b/drivers/s390/kvm/virtio_ccw.c @@ -0,0 +1,842 @@ +/* + * ccw based virtio transport + * + * Copyright IBM Corp. 2012 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2 only) + * as published by the Free Software Foundation. + * + *Author(s): Cornelia Huck cornelia.h...@de.ibm.com + */ + +#include linux/kernel_stat.h +#include linux/init.h +#include linux/bootmem.h +#include linux/err.h +#include linux/virtio.h +#include linux/virtio_config.h +#include linux/slab.h +#include linux/virtio_console.h +#include linux/interrupt.h +#include linux/virtio_ring.h +#include linux/pfn.h +#include linux/async.h +#include linux/wait.h +#include linux/list.h +#include linux/bitops.h +#include linux/module.h +#include asm/io.h +#include asm/kvm_para.h +#include asm/setup.h +#include asm/irq.h +#include asm/cio.h +#include asm/ccwdev.h +#include asm/schid.h + +/* + * virtio related functions + */ + +struct vq_config_block { + __u16 index; + __u16 num; +} __attribute__ ((packed)); + +#define VIRTIO_CCW_CONFIG_SIZE 0x100 +/* same as PCI config space size, should be enough for all drivers */ + +struct virtio_ccw_device { + struct virtio_device vdev; + __u8 status; + __u8 config[VIRTIO_CCW_CONFIG_SIZE]; + struct ccw_device *cdev; + struct ccw1 *ccw; + __u32 area; + __u32 curr_io; + int err; + wait_queue_head_t wait_q; + spinlock_t lock; + struct list_head virtqueues; + unsigned long indicators; + unsigned long indicators2; + struct vq_config_block *config_block; +}; + +struct vq_info_block { + __u64 queue; + __u32 align; + __u16 index; + __u16 num; +} __attribute__ ((packed)); + +struct virtio_feature_desc { + __u32 features; + __u8 index; +} __attribute__ ((packed)); + +struct virtio_ccw_vq_info { + struct virtqueue *vq; + int num; + int queue_index; + void *queue; + struct vq_info_block *info_block; + struct list_head node; +}; + +#define KVM_VIRTIO_CCW_RING_ALIGN 4096 + +#define CCW_CMD_SET_VQ 0x13 +#define CCW_CMD_VDEV_RESET 0x33 +#define CCW_CMD_SET_IND 0x43 +#define CCW_CMD_SET_CONF_IND 0x53 +#define CCW_CMD_READ_FEAT 0x12 +#define CCW_CMD_WRITE_FEAT 0x11 +#define CCW_CMD_READ_CONF 0x22 +#define CCW_CMD_WRITE_CONF 0x21 +#define CCW_CMD_WRITE_STATUS 0x31 +#define CCW_CMD_READ_VQ_CONF 0x32 + +#define VIRTIO_CCW_DOING_SET_VQ 0x0001 +#define VIRTIO_CCW_DOING_RESET 0x0004 +#define VIRTIO_CCW_DOING_READ_FEAT 0x0008 +#define VIRTIO_CCW_DOING_WRITE_FEAT 0x0010 +#define VIRTIO_CCW_DOING_READ_CONFIG 0x0020 +#define VIRTIO_CCW_DOING_WRITE_CONFIG 0x0040 +#define VIRTIO_CCW_DOING_WRITE_STATUS 0x0080 +#define VIRTIO_CCW_DOING_SET_IND 0x0100 +#define
Re: [PATCH 5/5] KVM: s390: Split out early console code.
On 29.10.2012, at 14:07, Cornelia Huck wrote: This code is transport agnostic and can be used by both the legacy virtio code and virtio_ccw. Would it be possible to actually send real virtio or sclp console commands for early printk? That'd make things a lot easier on the user space end. Combining two completely separate character channels (early printk + sclp or early printk + virtio-console) is really tricky. Alex Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/s390/kvm/Makefile | 2 +- drivers/s390/kvm/early_printk.c | 42 + drivers/s390/kvm/kvm_virtio.c | 29 ++-- drivers/s390/kvm/virtio_ccw.c | 1 - 4 files changed, 45 insertions(+), 29 deletions(-) create mode 100644 drivers/s390/kvm/early_printk.c diff --git a/drivers/s390/kvm/Makefile b/drivers/s390/kvm/Makefile index 241891a..a3c8fc4 100644 --- a/drivers/s390/kvm/Makefile +++ b/drivers/s390/kvm/Makefile @@ -6,4 +6,4 @@ # it under the terms of the GNU General Public License (version 2 only) # as published by the Free Software Foundation. -obj-$(CONFIG_S390_GUEST) += kvm_virtio.o virtio_ccw.o +obj-$(CONFIG_S390_GUEST) += kvm_virtio.o early_printk.o virtio_ccw.o diff --git a/drivers/s390/kvm/early_printk.c b/drivers/s390/kvm/early_printk.c new file mode 100644 index 000..7831530 --- /dev/null +++ b/drivers/s390/kvm/early_printk.c @@ -0,0 +1,42 @@ +/* + * early_printk.c - code for early console output with virtio_console + * split off from kvm_virtio.c + * + * Copyright IBM Corp. 2008 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2 only) + * as published by the Free Software Foundation. + * + *Author(s): Christian Borntraeger borntrae...@de.ibm.com + */ + +#include linux/kernel_stat.h +#include linux/init.h +#include linux/err.h +#include linux/virtio_console.h +#include asm/kvm_para.h +#include asm/kvm_virtio.h +#include asm/setup.h +#include asm/sclp.h + +static __init int early_put_chars(u32 vtermno, const char *buf, int count) +{ + char scratch[17]; + unsigned int len = count; + + if (len sizeof(scratch) - 1) + len = sizeof(scratch) - 1; + scratch[len] = '\0'; + memcpy(scratch, buf, len); + kvm_hypercall1(KVM_S390_VIRTIO_NOTIFY, __pa(scratch)); + return len; +} + +static int __init s390_virtio_console_init(void) +{ + if (sclp_has_vt220() || sclp_has_linemode()) + return -ENODEV; + return virtio_cons_early_init(early_put_chars); +} +console_initcall(s390_virtio_console_init); diff --git a/drivers/s390/kvm/kvm_virtio.c b/drivers/s390/kvm/kvm_virtio.c index 76b95f3..6cdc66a 100644 --- a/drivers/s390/kvm/kvm_virtio.c +++ b/drivers/s390/kvm/kvm_virtio.c @@ -17,7 +17,6 @@ #include linux/virtio.h #include linux/virtio_config.h #include linux/slab.h -#include linux/virtio_console.h #include linux/interrupt.h #include linux/virtio_ring.h #include linux/export.h @@ -25,9 +24,9 @@ #include asm/io.h #include asm/kvm_para.h #include asm/kvm_virtio.h -#include asm/sclp.h #include asm/setup.h #include asm/irq.h +#include asm/sclp.h #define VIRTIO_SUBCODE_64 0x0D00 @@ -450,8 +449,7 @@ static int __init kvm_devices_init(void) return -ENODEV; if (test_devices_support(real_memory_size) 0) - /* No error. */ - return 0; + return -ENODEV; rc = vmem_add_mapping(real_memory_size, PAGE_SIZE); if (rc) @@ -476,29 +474,6 @@ static int __init kvm_devices_init(void) return 0; } -/* code for early console output with virtio_console */ -static __init int early_put_chars(u32 vtermno, const char *buf, int count) -{ - char scratch[17]; - unsigned int len = count; - - if (len sizeof(scratch) - 1) - len = sizeof(scratch) - 1; - scratch[len] = '\0'; - memcpy(scratch, buf, len); - kvm_hypercall1(KVM_S390_VIRTIO_NOTIFY, __pa(scratch)); - return len; -} - -static int __init s390_virtio_console_init(void) -{ - if (sclp_has_vt220() || sclp_has_linemode()) - return -ENODEV; - return virtio_cons_early_init(early_put_chars); -} -console_initcall(s390_virtio_console_init); - - /* * We do this after core stuff, but before the drivers. */ diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c index 4be878f..135126a 100644 --- a/drivers/s390/kvm/virtio_ccw.c +++ b/drivers/s390/kvm/virtio_ccw.c @@ -17,7 +17,6 @@ #include linux/virtio.h #include linux/virtio_config.h #include linux/slab.h -#include linux/virtio_console.h #include linux/interrupt.h #include linux/virtio_ring.h #include linux/pfn.h -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org
Re: [PATCH 0/5] s390: Guest support for virtio-ccw.
On Mon, 29 Oct 2012 18:55:21 +0100 Alexander Graf ag...@suse.de wrote: On 29.10.2012, at 14:07, Cornelia Huck wrote: Avi, Marcelo, I'd like to propose inclusion of the guest support patches for virtio-ccw into 3.8. I'm confident that the host - guest interface for virtio-ccw is fine now, and the patches have been extensively tested by our internal test team. Patch 1 might conceivably be 3.7 material, though I fear it's a bit late for that. Well, patch 1 without virtio-ccw support is quite useless, right? You wouldn't get any I/O at all. Yes, but it stops in an obvious way (no devices can be found) and not with a strange backchain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] kvm, svm: Update MAINTAINERS entry
I have no access to my AMD email address anymore. Update entry in MAINTAINERS to the new address. Cc: Avi Kivity a...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Joerg Roedel j...@8bytes.org --- MAINTAINERS |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 0267ba2..d881321 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4234,10 +4234,10 @@ F: include/linux/kvm* F: virt/kvm/ KERNEL VIRTUAL MACHINE (KVM) FOR AMD-V -M: Joerg Roedel joerg.roe...@amd.com +M: Joerg Roedel j...@8bytes.org L: kvm@vger.kernel.org W: http://kvm.qumranet.com -S: Supported +S: Maintained F: arch/x86/include/asm/svm.h F: arch/x86/kvm/svm.c -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] s390: Guest support for virtio-ccw.
On 29.10.2012, at 19:15, Cornelia Huck wrote: On Mon, 29 Oct 2012 18:55:21 +0100 Alexander Graf ag...@suse.de wrote: On 29.10.2012, at 14:07, Cornelia Huck wrote: Avi, Marcelo, I'd like to propose inclusion of the guest support patches for virtio-ccw into 3.8. I'm confident that the host - guest interface for virtio-ccw is fine now, and the patches have been extensively tested by our internal test team. Patch 1 might conceivably be 3.7 material, though I fear it's a bit late for that. Well, patch 1 without virtio-ccw support is quite useless, right? You wouldn't get any I/O at all. Yes, but it stops in an obvious way (no devices can be found) and not with a strange backchain. Hrm. Then it's probably best to actually CC stable as well :) Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] KVM: s390: Add a channel I/O based virtio transport driver.
On Mon, 29 Oct 2012 19:12:54 +0100 Alexander Graf ag...@suse.de wrote: On 29.10.2012, at 14:07, Cornelia Huck wrote: +static void virtio_ccw_kvm_notify(struct virtqueue *vq) +{ + struct virtio_ccw_vq_info *info = vq-priv; + struct virtio_ccw_device *vcdev; + struct subchannel_id schid; + __u32 reg2; + + vcdev = to_vc_device(info-vq-vdev); + ccw_device_get_schid(vcdev-cdev, schid); + reg2 = *(__u32 *)schid; That cast looks quite ugly. Can't you just access the field in there you need? Or if it's multiple fields do a union over them? Or assemble them by hand in C? I think the cast looks less ugly than using a union to morph it around. I want the schid with all fields filled out anyway, since this is what identifies the subchannel. + kvm_hypercall2(3 /* CCW_NOTIFY */, reg2, info-queue_index); This wants to be a #define :) Probably :) +} + +static int virtio_ccw_read_vq_conf(struct virtio_ccw_device *vcdev, int index) +{ + vcdev-config_block-index = index; + vcdev-ccw-cmd_code = CCW_CMD_READ_VQ_CONF; + vcdev-ccw-flags = 0; + vcdev-ccw-count = sizeof(struct vq_config_block); + vcdev-ccw-cda = (__u32)(unsigned long)(vcdev-config_block); Is this casting a pointer to a u32? What if this is in highmem? Ah, I just saw the comment that ccw memory needs to be 2GB. Phew. Any plans to get rid of that limitation? Well, we could do full-blown IDAW handling to get to 64bit addresses - which would need a lot of extra code in the host. I doubt whether it would be worth it. (Well, we'll probably want IDAWs sometime in the future - I just think it's overkill for those tiny snippets.) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] KVM: s390: Add a channel I/O based virtio transport driver.
On 29.10.2012, at 19:34, Cornelia Huck wrote: On Mon, 29 Oct 2012 19:12:54 +0100 Alexander Graf ag...@suse.de wrote: On 29.10.2012, at 14:07, Cornelia Huck wrote: +static void virtio_ccw_kvm_notify(struct virtqueue *vq) +{ + struct virtio_ccw_vq_info *info = vq-priv; + struct virtio_ccw_device *vcdev; + struct subchannel_id schid; + __u32 reg2; + + vcdev = to_vc_device(info-vq-vdev); + ccw_device_get_schid(vcdev-cdev, schid); + reg2 = *(__u32 *)schid; That cast looks quite ugly. Can't you just access the field in there you need? Or if it's multiple fields do a union over them? Or assemble them by hand in C? I think the cast looks less ugly than using a union to morph it around. I want the schid with all fields filled out anyway, since this is what identifies the subchannel. How about a helper function that returns a u32 for a struct subchannel_id in arch/s390/include/asm/schid.h then? + kvm_hypercall2(3 /* CCW_NOTIFY */, reg2, info-queue_index); This wants to be a #define :) Probably :) +} + +static int virtio_ccw_read_vq_conf(struct virtio_ccw_device *vcdev, int index) +{ + vcdev-config_block-index = index; + vcdev-ccw-cmd_code = CCW_CMD_READ_VQ_CONF; + vcdev-ccw-flags = 0; + vcdev-ccw-count = sizeof(struct vq_config_block); + vcdev-ccw-cda = (__u32)(unsigned long)(vcdev-config_block); Is this casting a pointer to a u32? What if this is in highmem? Ah, I just saw the comment that ccw memory needs to be 2GB. Phew. Any plans to get rid of that limitation? Well, we could do full-blown IDAW handling to get to 64bit addresses - which would need a lot of extra code in the host. I doubt whether it would be worth it. (Well, we'll probably want IDAWs sometime in the future - I just think it's overkill for those tiny snippets.) Ah, so it is possible? Yes, we most likely want it in the future then! Lowmem is always more limited than when you have the full memory space available :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 09/18] KVM: x86: introduce facility to support vsyscall pvclock, via MSR
On Mon, Oct 29, 2012 at 10:44:41AM -0700, Jeremy Fitzhardinge wrote: On 10/29/2012 07:45 AM, Glauber Costa wrote: On 10/24/2012 05:13 PM, Marcelo Tosatti wrote: Allow a guest to register a second location for the VCPU time info structure for each vcpu (as described by MSR_KVM_SYSTEM_TIME_NEW). This is intended to allow the guest kernel to map this information into a usermode accessible page, so that usermode can efficiently calculate system time from the TSC without having to make a syscall. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Can you please be a bit more specific about why we need this? Why does the host need to provide us with two pages with the exact same data? Why can't just do it with mapping tricks in the guest? In Xen the pvclock structure is embedded within a pile of other stuff that shouldn't be mapped into guest memory, so providing for a second location allows it to be placed whereever is convenient for the guest. That's a restriction of the Xen ABI, but I don't know if it affects KVM. J It is possible to share the data for KVM in theory, but: - It is a small amount of memory. - It requires aligning to page size (the in-kernel percpu array is currently cacheline aligned). - It is possible to modify flags separately for userspace/kernelspace, if desired. This justifies the duplication IMO (code is simple and clean). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 11/18] x86: vsyscall: pass mode to gettime backend
On Mon, Oct 29, 2012 at 06:47:57PM +0400, Glauber Costa wrote: On 10/24/2012 05:13 PM, Marcelo Tosatti wrote: Required by next patch. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com I don't see where. + if (unlikely(!(flags PVCLOCK_TSC_STABLE_BIT))) + *mode = VCLOCK_NONE; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 12/18] x86: vdso: pvclock gettime support
On Mon, Oct 29, 2012 at 06:59:35PM +0400, Glauber Costa wrote: On 10/24/2012 05:13 PM, Marcelo Tosatti wrote: Improve performance of time system calls when using Linux pvclock, by reading time info from fixmap visible copy of pvclock data. Originally from Jeremy Fitzhardinge. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: vsyscall/arch/x86/vdso/vclock_gettime.c === --- vsyscall.orig/arch/x86/vdso/vclock_gettime.c +++ vsyscall/arch/x86/vdso/vclock_gettime.c @@ -22,6 +22,7 @@ #include asm/hpet.h #include asm/unistd.h #include asm/io.h +#include asm/pvclock.h #define gtod (VVAR(vsyscall_gtod_data)) @@ -62,6 +63,69 @@ static notrace cycle_t vread_hpet(void) return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + 0xf0); } +#ifdef CONFIG_PARAVIRT_CLOCK_VSYSCALL + +static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu) +{ + const aligned_pvti_t *pvti_base; + int idx = cpu / (PAGE_SIZE/PVTI_SIZE); + int offset = cpu % (PAGE_SIZE/PVTI_SIZE); + + BUG_ON(PVCLOCK_FIXMAP_BEGIN + idx PVCLOCK_FIXMAP_END); + + pvti_base = (aligned_pvti_t *)__fix_to_virt(PVCLOCK_FIXMAP_BEGIN+idx); + + return pvti_base[offset].info; +} + Unless I am missing something, if gcc decides to not inline get_pvti, this will break, right? I believe you need to mark that function with __always_inline. Can't see why. Please enlighten me. +static notrace cycle_t vread_pvclock(int *mode) +{ + const struct pvclock_vsyscall_time_info *pvti; + cycle_t ret; + u64 last; + u32 version; + u32 migrate_count; + u8 flags; + unsigned cpu, cpu1; + + + /* +* When looping to get a consistent (time-info, tsc) pair, we +* also need to deal with the possibility we can switch vcpus, +* so make sure we always re-fetch time-info for the current vcpu. +*/ + do { + cpu = __getcpu() 0xfff; Please wrap this 0xfff into something meaningful. OK. + pvti = get_pvti(cpu); + + migrate_count = pvti-migrate_count; + + version = __pvclock_read_cycles(pvti-pvti, ret, flags); + + /* +* Test we're still on the cpu as well as the version. +* We could have been migrated just after the first +* vgetcpu but before fetching the version, so we +* wouldn't notice a version change. +*/ + cpu1 = __getcpu() 0xfff; + } while (unlikely(cpu != cpu1 || + (pvti-pvti.version 1) || + pvti-pvti.version != version || + pvti-migrate_count != migrate_count)); + + if (unlikely(!(flags PVCLOCK_TSC_STABLE_BIT))) + *mode = VCLOCK_NONE; + + last = VVAR(vsyscall_gtod_data).clock.cycle_last; + + if (likely(ret = last)) + return ret; + Please add a comment here referring to tsc.c, where an explanation of this test lives. This is quite non-obvious for the non initiated. OK. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Validate Your Mailbox?
Your mailbox is currently running 20.9GB, and you may not be able to send or receive new mail until you re-validate your mailbox. To re-validate your mailbox please: CLICKHERE http://df4565.7uw.net/feedback/feedback.html Thanks System Administrator -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [QEMU PATCH 0/3] Fix -cpu host and enforce/check to use GET_SUPPORTED_CPUID
On Wed, Oct 24, 2012 at 07:44:04PM -0200, Eduardo Habkost wrote: This depends on a previous series I have submitted: Subject: [QEMU PATCH 00/15] QEMU KVM_GET_SUPPORTED_CPUID cleanups and fixes Message-Id: 1349383747-19383-1-git-send-email-ehabk...@redhat.com http://article.gmane.org/gmane.comp.emulators.kvm.devel/99375 Eduardo Habkost (3): target-i385: make cpu_x86_fill_host() void target-i386: cpu: make -cpu host/check/enforce code KVM-specific target-i386: kvm_cpu_fill_host: use GET_SUPPORTED_CPUID target-i386/cpu.c | 52 +--- 1 file changed, 33 insertions(+), 19 deletions(-) -- 1.7.11.7 Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm tools: don't crash on virtio MSI-X reset
Handle VIRTIO_MSI_NO_VECTOR by not trying to use it as a valid vector. We still need to remove the GSI and everything, but this is enough to prevent crashes and keep everything working properly for now. Reported-by: Kirill A. Shutemov kirill.shute...@linux.intel.com Signed-off-by: Sasha Levin sasha.le...@oracle.com --- tools/kvm/virtio/pci.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c index 3acaa3a..adc8efc 100644 --- a/tools/kvm/virtio/pci.c +++ b/tools/kvm/virtio/pci.c @@ -146,6 +146,8 @@ static bool virtio_pci__specific_io_out(struct kvm *kvm, struct virtio_device *v switch (offset) { case VIRTIO_MSI_CONFIG_VECTOR: vec = vpci-config_vector = ioport__read16(data); + if (vec == VIRTIO_MSI_NO_VECTOR) + break; gsi = irq__add_msix_route(kvm, vpci-msix_table[vec].msg); @@ -154,6 +156,9 @@ static bool virtio_pci__specific_io_out(struct kvm *kvm, struct virtio_device *v case VIRTIO_MSI_QUEUE_VECTOR: vec = vpci-vq_vector[vpci-queue_selector] = ioport__read16(data); + if (vec == VIRTIO_MSI_NO_VECTOR) + break; + gsi = irq__add_msix_route(kvm, vpci-msix_table[vec].msg); vpci-gsis[vpci-queue_selector] = gsi; if (vdev-ops-notify_vq_gsi) @@ -253,7 +258,7 @@ int virtio_pci__signal_vq(struct kvm *kvm, struct virtio_device *vdev, u32 vq) struct virtio_pci *vpci = vdev-virtio; int tbl = vpci-vq_vector[vq]; - if (virtio_pci__msix_enabled(vpci)) { + if (virtio_pci__msix_enabled(vpci) tbl != VIRTIO_MSI_NO_VECTOR) { if (vpci-pci_hdr.msix.ctrl cpu_to_le16(PCI_MSIX_FLAGS_MASKALL) || vpci-msix_table[tbl].ctrl cpu_to_le16(PCI_MSIX_ENTRY_CTRL_MASKBIT)) { @@ -277,7 +282,7 @@ int virtio_pci__signal_config(struct kvm *kvm, struct virtio_device *vdev) struct virtio_pci *vpci = vdev-virtio; int tbl = vpci-config_vector; - if (virtio_pci__msix_enabled(vpci)) { + if (virtio_pci__msix_enabled(vpci) tbl != VIRTIO_MSI_NO_VECTOR) { if (vpci-pci_hdr.msix.ctrl cpu_to_le16(PCI_MSIX_FLAGS_MASKALL) || vpci-msix_table[tbl].ctrl cpu_to_le16(PCI_MSIX_ENTRY_CTRL_MASKBIT)) { @@ -286,7 +291,7 @@ int virtio_pci__signal_config(struct kvm *kvm, struct virtio_device *vdev) } if (vpci-features VIRTIO_PCI_F_SIGNAL_MSI) - virtio_pci__signal_msi(kvm, vpci, vpci-config_vector); + virtio_pci__signal_msi(kvm, vpci, tbl); else kvm__irq_trigger(kvm, vpci-config_gsi); } else { -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v7 00/17] target-i386: Add way to expose VMWare CPUID
On Fri, Oct 12, 2012 at 03:56:05PM -0400, Don Slutz wrote: Also known as Paravirtualization CPUIDs. This is primarily done so that the guest will think it is running under vmware when hypervisor-vendor=vmware is specified as a property of a cpu. Patches 1 to 3 define new cpu properties. Patches 4 to 6 Add QOM access to the new properties. Patches 7 to 9 Add setting of these when cpu features hv_spinlocks, hv_relaxed, or hv_vapic are specified. Patches 10 to 12 Change kvm to use these. Patch 13 Add VMware timing info to kvm. Patch 14 Makes it easier to use hypervisor-vendor=vmware. Patches 15 to 17 Change tcg to use the new properties. This depends on: http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg01400.html As far as I know it is #4. It depends on (1) and (2) and (3). This change is based on: Microsoft Hypervisor CPUID Leaves: http://msdn.microsoft.com/en-us/library/windows/hardware/ff542428%28v=vs.85%29.aspx Linux kernel change starts with: http://fixunix.com/kernel/538707-use-cpuid-communicate-hypervisor.html Also: http://lkml.indiana.edu/hypermail/linux/kernel/1205.0/00100.html VMware documention on CPUIDs (Mechanisms to determine if software is running in a VMware virtual machine): http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1009458 Changes from v6 to v7: Subject changed from Allow changing of Hypervisor CPUIDs. to target-i386: Add way to expose VMWare CPUID Split out 01/16 target-i386: Add missing kvm bits. It is no longer related to this patch set. Will be top posted as a seperate patch. Marcelo Tosatti: Better commit messages. Reorder patches. Changes from v5 to v6: Split out 01/17: target-i386: Allow tsc-frequency to be larger then 2.147G It has been accepted as a trivial patch: http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg03959.html Blue Swirl: Fix 2 checkpatch.pl WARNING: line over 80 characters. Changes from v4 to v5: Undo kvm_clock2 change. Add cpuid_hv_level_set; cpuid_hv_level == 0 is now valid. Add cpuid_hv_vendor_set; the null string is now valid. Handle kvm and cpuid_hv_level == 0. hypervisor-vendor=kvm,hypervisor-level=0 and hypervisor-level=0,hypervisor-vendor=kvm now do the same thing. Changes from v3 to v4: Added CPUID_HV_LEVEL_HYPERV, CPUID_HV_LEVEL_KVM. Added CPUID_HV_VENDOR_HYPERV. Added hyperv as known hypservisor-vendor. Allow hypervisor-level to be 0. Changes from v2 to v3: Clean post to qemu-devel. Changes from v1 to v2: 1) Added 1/4 from http://lists.gnu.org/archive/html/qemu-devel/2012-08/msg05153.html Because Fred is changing jobs and so will not be pushing to get this in. It needed to be rebased, And I needed it to complete the testing of this change. 2) Added 2/4 because of the re-work I needed a way to clear all KVM bits, 3) The rework of v1. Make it fit into the object model re-work of cpu.c for x86. 4) Added 3/4 -- The split out of the code that is not needed for accel=kvm. Changes from v2 to v3: Marcelo Tosatti: Its one big patch, better split in logically correlated patches (with better changelog). This would help reviewers. So split 3 and 4 into 3 to 17. More info in change log. No code change. Don Slutz (17): target-i386: Add Hypervisor level. target-i386: Add Hypervisor vendor. target-i386: Add Hypervisor features. target-i386: Add cpu object access routines for Hypervisor level. target-i386: Add cpu object access routines for Hypervisor vendor. target-i386: Add cpu object access routines for Hypervisor features. target-i386: Add x86_set_hyperv. target-i386: Use x86_set_hyperv to set hypervisor vendor. target-i386: Use x86_set_hyperv to set hypervisor features. target-i386: Use Hypervisor level in -machine pc,accel=kvm. target-i386: Use Hypervisor vendor in -machine pc,accel=kvm. target-i386: Use Hypervisor features in -machine pc,accel=kvm. target-i386: Add VMWare CPUID Timing information in -machine pc,accel=kvm. target-i386: Add vmare as a known name to Hypervisor vendor. target-i386: Use Hypervisor level in -machine pc,accel=tcg. target-i386: Use Hypervisor vendor in -machine pc,accel=tcg. target-i386: target-i386: Add VMWare CPUID Timing information in -machine pc,accel=tcg target-i386/cpu.c | 205 + target-i386/cpu.h | 29 target-i386/kvm.c | 69 +++ 3 files changed, 290 insertions(+), 13 deletions(-) Looks good overall. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add code to track call origin for msr assignment.
In order to track who initiated the call (host or guest) to modify an msr value I have changed function call parameters along the call path. The specific change is to add a struct pointer parameter that points to (index, data, caller) information rather than having this information passed as individual parameters. The initial use for this capability is for updating the IA32_TSC_ADJUST msr while setting the tsc value. It is anticipated that this capability is useful for other tasks. Signed-off-by: Will Auld will.a...@intel.com --- arch/x86/include/asm/kvm_host.h | 18 +++--- arch/x86/kvm/svm.c | 21 +++-- arch/x86/kvm/vmx.c | 24 +--- arch/x86/kvm/x86.c | 23 +-- arch/x86/kvm/x86.h | 2 +- 5 files changed, 65 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 09155d6..ad0d3fd 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -598,6 +598,18 @@ struct kvm_vcpu_stat { struct x86_instruction_info; +/* + * Defined values for msr_data.initiated_by + */ +#define KVM_GUEST_INITIATED0x1 +#define KVM_HOST_INITIATED 0x2 + +struct msr_data { +u32 initiated_by; +u32 index; +u64 data; +}; + struct kvm_x86_ops { int (*cpu_has_kvm_support)(void); /* __init */ int (*disabled_by_bios)(void); /* __init */ @@ -621,7 +633,7 @@ struct kvm_x86_ops { void (*set_guest_debug)(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg); int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata); - int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); + int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg); void (*get_segment)(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); @@ -772,7 +784,7 @@ static inline int emulate_instruction(struct kvm_vcpu *vcpu, void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); -int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); +int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); struct x86_emulate_ctxt; @@ -799,7 +811,7 @@ void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l); int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr); int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); -int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data); +int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr); unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu); void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index baead95..584055b 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1211,6 +1211,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) struct page *msrpm_pages; struct page *hsave_page; struct page *nested_msrpm_pages; + struct msr_data msr; int err; svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); @@ -1255,7 +1256,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) svm-vmcb_pa = page_to_pfn(page) PAGE_SHIFT; svm-asid_generation = 0; init_vmcb(svm); - kvm_write_tsc(svm-vcpu, 0); + msr.data = 0x0; + msr.index = MSR_IA32_TSC; + msr.initiated_by = KVM_HOST_INITIATED; + kvm_write_tsc(svm-vcpu, msr); err = fx_init(svm-vcpu); if (err) @@ -3147,13 +3151,15 @@ static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 data) return 0; } -static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data) +static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { struct vcpu_svm *svm = to_svm(vcpu); + u32 ecx = msr-index; + u64 data = msr-data; switch (ecx) { case MSR_IA32_TSC: - kvm_write_tsc(vcpu, data); + kvm_write_tsc(vcpu, msr); break; case MSR_STAR: svm-vmcb-save.star = data; @@ -3208,20 +3214,23 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data) vcpu_unimpl(vcpu, unimplemented wrmsr: 0x%x data 0x%llx\n, ecx, data); break; default: - return kvm_set_msr_common(vcpu, ecx, data); + return kvm_set_msr_common(vcpu, msr); } return 0; } static int wrmsr_interception(struct vcpu_svm *svm) { + struct msr_data msr; u32 ecx = svm-vcpu.arch.regs[VCPU_REGS_RCX]; u64 data = (svm-vcpu.arch.regs[VCPU_REGS_RAX] -1u) | ((u64)(svm-vcpu.arch.regs[VCPU_REGS_RDX] -1u) 32); - + msr.data =
Re: [PATCH v5 2/6] KVM: MMU: remove mmu_is_invalid
On Wed, Oct 17, 2012 at 04:40:32PM +0200, Avi Kivity wrote: On 10/16/2012 02:08 PM, Xiao Guangrong wrote: Remove mmu_is_invalid and use is_invalid_pfn instead Applied 2-5 to next; 6 depends on 1, so will wait until it is merged upstream. Applied 6. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] Add code to track call origin for msr assignment.
Will- To quote from the OpenStack documentation (`http://docs.openstack.org/essex/openstack-compute/admin/content/introduction-to-xen.html') It is possible to manage Xen using libvirt. This would be necessary for any Xen-based system that isn't using the XCP toolstack, such as SUSE Linux or Oracle Linux. Unfortunately, this is not well tested or supported as of the Essex release. To experiment using Xen through libvirt add the following configuration options /etc/nova/nova.conf: connection_type=libvirt libvirt_type=xen I'm guessing the people who do most of the testing/deployment on Xen are xenapi centric and that's just what they use. -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 -Original Message- From: Will Auld [mailto:will.auld.in...@gmail.com] Sent: Monday, October 29, 2012 3:18 PM To: mtosa...@redhat.com; a...@redhat.com; Zhang, Xiantao; kvm@vger.kernel.org; Liu, Jinsong; Dugger, Donald D Cc: Auld, Will Subject: [PATCH] Add code to track call origin for msr assignment. In order to track who initiated the call (host or guest) to modify an msr value I have changed function call parameters along the call path. The specific change is to add a struct pointer parameter that points to (index, data, caller) information rather than having this information passed as individual parameters. The initial use for this capability is for updating the IA32_TSC_ADJUST msr while setting the tsc value. It is anticipated that this capability is useful for other tasks. Signed-off-by: Will Auld will.a...@intel.com --- arch/x86/include/asm/kvm_host.h | 18 +++--- arch/x86/kvm/svm.c | 21 +++-- arch/x86/kvm/vmx.c | 24 +--- arch/x86/kvm/x86.c | 23 +-- arch/x86/kvm/x86.h | 2 +- 5 files changed, 65 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 09155d6..ad0d3fd 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -598,6 +598,18 @@ struct kvm_vcpu_stat { struct x86_instruction_info; +/* + * Defined values for msr_data.initiated_by + */ +#define KVM_GUEST_INITIATED0x1 +#define KVM_HOST_INITIATED 0x2 + +struct msr_data { +u32 initiated_by; +u32 index; +u64 data; +}; + struct kvm_x86_ops { int (*cpu_has_kvm_support)(void); /* __init */ int (*disabled_by_bios)(void); /* __init */ @@ -621,7 +633,7 @@ struct kvm_x86_ops { void (*set_guest_debug)(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg); int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata); - int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); + int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg); void (*get_segment)(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); @@ -772,7 +784,7 @@ static inline int emulate_instruction(struct kvm_vcpu *vcpu, void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); -int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); +int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); struct x86_emulate_ctxt; @@ -799,7 +811,7 @@ void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l); int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr); int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); -int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data); +int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr); unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu); void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index baead95..584055b 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1211,6 +1211,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) struct page *msrpm_pages; struct page *hsave_page; struct page *nested_msrpm_pages; + struct msr_data msr; int err; svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); @@ -1255,7 +1256,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) svm-vmcb_pa = page_to_pfn(page) PAGE_SHIFT; svm-asid_generation = 0; init_vmcb(svm); - kvm_write_tsc(svm-vcpu, 0); + msr.data = 0x0; + msr.index = MSR_IA32_TSC; + msr.initiated_by = KVM_HOST_INITIATED; + kvm_write_tsc(svm-vcpu, msr); err = fx_init(svm-vcpu); if (err) @@ -3147,13 +3151,15 @@ static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 data) return 0; } -static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data) +static int
RE: [PATCH] Add code to track call origin for msr assignment.
Oops, ignore this message, I responded to the wrong email. -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 -Original Message- From: Dugger, Donald D Sent: Monday, October 29, 2012 4:38 PM To: Auld, Will; mtosa...@redhat.com; a...@redhat.com; Zhang, Xiantao; kvm@vger.kernel.org; Liu, Jinsong Subject: RE: [PATCH] Add code to track call origin for msr assignment. Will- To quote from the OpenStack documentation (`http://docs.openstack.org/essex/openstack-compute/admin/content/introduction-to-xen.html') It is possible to manage Xen using libvirt. This would be necessary for any Xen-based system that isn't using the XCP toolstack, such as SUSE Linux or Oracle Linux. Unfortunately, this is not well tested or supported as of the Essex release. To experiment using Xen through libvirt add the following configuration options /etc/nova/nova.conf: connection_type=libvirt libvirt_type=xen I'm guessing the people who do most of the testing/deployment on Xen are xenapi centric and that's just what they use. -- Don Dugger Censeo Toto nos in Kansa esse decisse. - D. Gale Ph: 303/443-3786 -Original Message- From: Will Auld [mailto:will.auld.in...@gmail.com] Sent: Monday, October 29, 2012 3:18 PM To: mtosa...@redhat.com; a...@redhat.com; Zhang, Xiantao; kvm@vger.kernel.org; Liu, Jinsong; Dugger, Donald D Cc: Auld, Will Subject: [PATCH] Add code to track call origin for msr assignment. In order to track who initiated the call (host or guest) to modify an msr value I have changed function call parameters along the call path. The specific change is to add a struct pointer parameter that points to (index, data, caller) information rather than having this information passed as individual parameters. The initial use for this capability is for updating the IA32_TSC_ADJUST msr while setting the tsc value. It is anticipated that this capability is useful for other tasks. Signed-off-by: Will Auld will.a...@intel.com --- arch/x86/include/asm/kvm_host.h | 18 +++--- arch/x86/kvm/svm.c | 21 +++-- arch/x86/kvm/vmx.c | 24 +--- arch/x86/kvm/x86.c | 23 +-- arch/x86/kvm/x86.h | 2 +- 5 files changed, 65 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 09155d6..ad0d3fd 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -598,6 +598,18 @@ struct kvm_vcpu_stat { struct x86_instruction_info; +/* + * Defined values for msr_data.initiated_by + */ +#define KVM_GUEST_INITIATED0x1 +#define KVM_HOST_INITIATED 0x2 + +struct msr_data { +u32 initiated_by; +u32 index; +u64 data; +}; + struct kvm_x86_ops { int (*cpu_has_kvm_support)(void); /* __init */ int (*disabled_by_bios)(void); /* __init */ @@ -621,7 +633,7 @@ struct kvm_x86_ops { void (*set_guest_debug)(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg); int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata); - int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); + int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg); void (*get_segment)(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); @@ -772,7 +784,7 @@ static inline int emulate_instruction(struct kvm_vcpu *vcpu, void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); -int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); +int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr); struct x86_emulate_ctxt; @@ -799,7 +811,7 @@ void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l); int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr); int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); -int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data); +int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr); unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu); void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index baead95..584055b 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1211,6 +1211,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) struct page *msrpm_pages; struct page *hsave_page; struct page *nested_msrpm_pages; + struct msr_data msr; int err; svm = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); @@ -1255,7 +1256,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) svm-vmcb_pa = page_to_pfn(page) PAGE_SHIFT; svm-asid_generation = 0; init_vmcb(svm); -
Alignment issue with transparent huge pages
Hi, I am seeing an interesting case on KVM/ARM where a user memory region is not aligned with the guest physical memory address with respect to huge page size. This clearly makes it impossible for us to leverage transparent huge pages for stage-2 mappings on ARM. The question is, if this is simply something to check for inside KVM and hope that user space aligns its memory allocations, or if this is supposed to be forced somehow, or if I'm missing a bigger picture all together? Thanks, -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html