Re: kvm tuning guide
On 09/30/2009 07:09 AM, Nikola Ciprich wrote: The default, IDE, is highly supported by guests but may be slow, especially with disk arrays. If your guest supports it, use the virtio interface: Avi, what is the status of data integrity issues Chris Hellwig summarized some time ago? I don't know. Christoph? Is it safe to recommend virtio to newbies already? I think so. Shouldn't SCSI be safer (where applicable)? SCSI suffers from being untested, and I think doesn't truly offer the parallelism it appears to. nik On Tue, Sep 29, 2009 at 07:30:55PM +0200, Avi Kivity wrote: I wrote a short tuning guide for kvm, http://www.linux-kvm.org/page/Tuning_KVM. It should all be well known to the list, but a newbie is born every minute. Please review and expand! -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream
On 09/29/2009 10:45 PM, Mark McLoughlin wrote: On Tue, 2009-05-05 at 09:56 +0100, Mark McLoughlin wrote: This commit: commit 559a8f45f34cc50d1a60b4f67a06614d506b2e01 Subject: Remove stray GSO code from virtio_net (Mark McLoughlin) Removed some GSO code from upstream qemu.git, but it needs to be re-instated in qemu-kvm.git. Reported-by: Sridhar Samudralas...@us.ibm.com Signed-off-by: Mark McLoughlinmar...@redhat.com --- hw/virtio-net.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index ac8e030..e5d7add 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -424,6 +424,11 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size) if (n-promisc) return 1; +#ifdef TAP_VNET_HDR +if (tap_has_vnet_hdr(n-vc-vlan-first_client)) +ptr += sizeof(struct virtio_net_hdr); +#endif + if (!memcmp(ptr[12], vlan, sizeof(vlan))) { int vid = be16_to_cpup((uint16_t *)(ptr + 14)) 0xfff; if (!(n-vlans[vid 5] (1U (vid 0x1f I'm not sure[1] how we didn't notice, but this has been broken on the stable-0.10 branch since 0.10.3; please apply there too Thanks, we'll queue it on stable-0.10. Anthony/Glauber, is 0.10.7 in the works? If not, we'll release it as 0.10.6.1. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
On 09/30/2009 03:01 AM, Zhai, Edwin wrote: Avi, I modify it according your comments. The only thing I want to keep is the module param ple_gap/window. Although they are not per-guest, they can be used to find the right value, and disable PLE for debug purpose. Fair enough, ACK. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Release plan for 0.12.0
On 09/30/2009 01:54 AM, Anthony Liguori wrote: Hi, Now that 0.11.0 is behind us, it's time to start thinking about 0.12.0. I'd like to do a few things different this time around. I don't think the -rc process went very well as I don't think we got more testing out of it. I'd like to shorten the timeline for 0.12.0 a good bit. The 0.10 stable tree got pretty difficult to maintain toward the end of the cycle. We also had a pretty huge amount of change between 0.10 and 0.11 so I think a shorter cycle is warranted. I think aiming for early to mid-December would give us roughly a 3 month cycle and would align well with some of the Linux distribution cycles. I'd like to limit things to a single -rc that lasted only for about a week. This is enough time to fix most of the obvious issues I think. I'd also like to try to enumerate some features for this release. Here's a short list of things I expect to see for this release (target-i386 centric). Please add or comment on items that you'd either like to see in the release or are planning on working on. o VMState conversion -- I expect most of the pc target to be completed o qdev conversion -- I hope that we'll get most of the pc target completely converted to qdev o storage live migration o switch to SeaBIOS (need to finish porting features from Bochs) o switch to gPXE (need to resolve slirp tftp server issue) o KSM integration o in-kernel APIC support for KVM o guest SMP support for KVM o updates to the default pc machine type Machine monitor protocol. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: migrate_set_downtime bug
Since the problem you pinpointed do exist, I would suggest measuring the average load of the last, say, 10 iterations. The last 10 interation does not define a fixed time. I guess it is much more reasonable to measure the average of the last '10 seconds'. But usually a migration only takes about 10-30 seconds. So do you really want to add additional complexity? - Dietmar -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 04/47] KVM: x86: Disallow hypercalls for guest callers in rings 0
Hi! On Wed, 2009-08-26 at 13:29 +0300, Avi Kivity wrote: From: Jan Kiszka jan.kis...@siemens.com So far unprivileged guest callers running in ring 3 can issue, e.g., MMU hypercalls. Normally, such callers cannot provide any hand-crafted MMU command structure as it has to be passed by its physical address, but they can still crash the guest kernel by passing random addresses. To close the hole, this patch considers hypercalls valid only if issued from guest ring 0. This may still be relaxed on a per-hypercall base in the future once required. Does kvm-72 (used by Debian and Ubuntu in stable releases) have the problem? If yes, would the approach in this fix also work there? Thanks, Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add two parameters for wait_for_login
Sometimes we need login to guest using different start_time and step_time. Signed-off-by: Yolkfull Chow yz...@redhat.com --- client/tests/kvm/kvm_test_utils.py |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/client/tests/kvm/kvm_test_utils.py b/client/tests/kvm/kvm_test_utils.py index 601b350..0983003 100644 --- a/client/tests/kvm/kvm_test_utils.py +++ b/client/tests/kvm/kvm_test_utils.py @@ -43,7 +43,7 @@ def get_living_vm(env, vm_name): return vm -def wait_for_login(vm, nic_index=0, timeout=240): +def wait_for_login(vm, nic_index=0, timeout=240, start=0, step=2): Try logging into a VM repeatedly. Stop on success or when timeout expires. @@ -54,8 +54,8 @@ def wait_for_login(vm, nic_index=0, timeout=240): logging.info(Waiting for guest '%s' to be up... % vm.name) session = kvm_utils.wait_for(lambda: vm.remote_login(nic_index=nic_index), - timeout, 0, 2) + timeout, start, step) if not session: raise error.TestFail(Could not log into guest '%s' % vm.name) -logging.info(Logged in) +logging.info(Logged in '%s' % vm.name) return session -- 1.6.2.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4: kvm 1/4] Code motion. Separate timer intialization into an indepedent function.
On 09/29/2009 11:38 PM, Zachary Amsden wrote: Signed-off-by: Zachary Amsdenzams...@redhat.com Looks good. Is anything preventing us from unifying the constant_tsc and !same paths? We could just do a quick check in the notifier, see the tsc frequency hasn't changed, and return. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Release plan for 0.12.0
Anthony Liguori wrote: [] Here's a short list of things I expect to see for this release (target-i386 centric). Please add or comment on items that you'd either like to see in the release or are planning on working on. [..] o guest SMP support for KVM Hmm. What is this, can you elaborate a bit more please? -smp nn is already here, no? Thanks! /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: migrate_set_downtime bug
Another problem occur when max_downtime is too short. This can results in never ending migration task. To reproduce just play a video inside a VM and set max_downtime to 30ns Sure, one can argument that this behavior is expected. But the following would avoid the problem: +if ((stage == 2) (bytes_transferred 2*ram_bytes_total())) { +return 1; +} Or do you think that is not reasonable? - Dietmar -Original Message- From: Glauber Costa [mailto:glom...@redhat.com] Sent: Mittwoch, 30. September 2009 06:49 To: Dietmar Maurer Cc: Anthony Liguori; kvm Subject: Re: migrate_set_downtime bug On Tue, Sep 29, 2009 at 06:36:57PM +0200, Dietmar Maurer wrote: Also, if this is really the case (buffered), then the bandwidth capping part of migration is also wrong. Have you compared the reported bandwidth to your actual bandwith ? I suspect the source of the problem can be that we're currently ignoring the time we take to transfer the state of the devices, and maybe it is not negligible. I have a 1GB network (e1000 card), and get values like bwidth=0.98 - which is much too high. The main reason for not using the whole migration time is that it can lead to values that are not very helpful in situation where the network load changes too much. Since the problem you pinpointed do exist, I would suggest measuring the average load of the last, say, 10 iterations. How would that work for you? migrate.diff Description: migrate.diff
Re: [Qemu-devel] Release plan for 0.12.0
On 09/30/2009 10:53 AM, Michael Tokarev wrote: Anthony Liguori wrote: [] Here's a short list of things I expect to see for this release (target-i386 centric). Please add or comment on items that you'd either like to see in the release or are planning on working on. [..] o guest SMP support for KVM Hmm. What is this, can you elaborate a bit more please? -smp nn is already here, no? Only in qemu-kvm.git. This is about qemu.git (which supports -smp, but not with kvm). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Release plan for 0.12.0
Hi, On 30.09.2009 01:54, Anthony Liguori wrote: Now that 0.11.0 is behind us, it's time to start thinking about 0.12.0. I'd also like to try to enumerate some features for this release. Here's a short list of things I expect to see for this release (target-i386 centric). o switch to SeaBIOS (need to finish porting features from Bochs) That switch is much appreciated because it also reduces the testing matrix of those coreboot developers who boot test every commit with Qemu. However, to run coreboot on Qemu with the same init sequence as on simplified real hardware, we need Cache-as-RAM (CAR) support. This is basically a mode where sizeof(cacheable area) = sizeof (L2 cache) and causes the processor to lock the cache and not pass any reads/writes through to the RAM behind the cached area. The easiest way to implement this would be to check the cache size criterion upon every MTRR manipulation and either map a chunk of fresh memory on top of the existing memory (which may be RAM, ROM or unmapped) for every cacheable area, and if the cacheable area starts to exceed the L2 cache size, discard all memory contents of the memory mapped on top. For additional correctness, the memory shoud not be discarded and written back to the lower layer of memory if WBINVD (instead of INVD) or CLFLUSH are called. That one is mostly sugar, though, and coreboot can do without. Right now coreboot sets up the MTRRs correctly, but then (conditional on Qemu) only uses areas which are known to be backed by RAM instead of the areas designated by CAR. I'd like to implement CAR support which builds on top of my MTRR code which was merged some months ago (and I already have code to check for total cacheable area size), but I need help with the memory mapping stuff. How do I proceed? Clean up what I have and insert FIXME comments where I don't know how to implement stuff so others can see the code and comment on it? Regards, Carl-Daniel -- http://www.hailfinger.org/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: migrate_set_downtime bug
On Wed, Sep 30, 2009 at 10:55:24AM +0200, Dietmar Maurer wrote: Another problem occur when max_downtime is too short. This can results in never ending migration task. To reproduce just play a video inside a VM and set max_downtime to 30ns Sure, one can argument that this behavior is expected. But the following would avoid the problem: +if ((stage == 2) (bytes_transferred 2*ram_bytes_total())) { +return 1; +} why 2 * ? This means we'll have to transfer the whole contents of RAM at least twice to hit this condition, right? Or do you think that is not reasonable? - Dietmar -Original Message- From: Glauber Costa [mailto:glom...@redhat.com] Sent: Mittwoch, 30. September 2009 06:49 To: Dietmar Maurer Cc: Anthony Liguori; kvm Subject: Re: migrate_set_downtime bug On Tue, Sep 29, 2009 at 06:36:57PM +0200, Dietmar Maurer wrote: Also, if this is really the case (buffered), then the bandwidth capping part of migration is also wrong. Have you compared the reported bandwidth to your actual bandwith ? I suspect the source of the problem can be that we're currently ignoring the time we take to transfer the state of the devices, and maybe it is not negligible. I have a 1GB network (e1000 card), and get values like bwidth=0.98 - which is much too high. The main reason for not using the whole migration time is that it can lead to values that are not very helpful in situation where the network load changes too much. Since the problem you pinpointed do exist, I would suggest measuring the average load of the last, say, 10 iterations. How would that work for you? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[v4 KVM AUTOTEST PATCH] KVM test: client parallel test execution
From: Michael Goldish mgold...@redhat.com This patch adds a control.parallel file that runs several test execution pipelines in parallel. The number of pipelines is set to the number of CPUs reported by /proc/cpuinfo. It can be changed by modifying the control file. The total amount of RAM defaults to 3/4 times what 'free -m' reports. The scheduler's job is to make sure tests run in parallel only when there are sufficient resources to allow it. For example, a test that requires 2 CPUs will not run together with a test that requires 3 CPUs on a 4 CPU machine. The same logic applies to RAM. Note that tests that require more CPUs and/or more RAM than the machine has are allowed to run alone, e.g. a test that requires 3GB of RAM is allowed to run on a machine with only 2GB of RAM, but no tests will run in parallel to it. Currently TAP networking isn't supported by this scheduler because the main MAC address pool must be divided between the pipelines (workers). This should be straightforward to do but I haven't had the time to do it yet. scan_results.py can be used to list the test results during and after execution. v4: * Updated the install part to be in sync with the current control file * Blended this patch with the one that add scheduler parameters * Instead of custom code to figure number of cpus, used an autotest utils function Signed-off-by: Michael Goldish mgold...@redhat.com --- client/tests/kvm/control.parallel | 204 + client/tests/kvm/kvm_scheduler.py | 229 + client/tests/kvm/kvm_tests.cfg.sample | 18 +++- 3 files changed, 449 insertions(+), 2 deletions(-) create mode 100644 client/tests/kvm/control.parallel create mode 100644 client/tests/kvm/kvm_scheduler.py diff --git a/client/tests/kvm/control.parallel b/client/tests/kvm/control.parallel new file mode 100644 index 000..5c1f20d --- /dev/null +++ b/client/tests/kvm/control.parallel @@ -0,0 +1,204 @@ +AUTHOR = +u...@redhat.com (Uri Lublin) +dru...@redhat.com (Dror Russo) +mgold...@redhat.com (Michael Goldish) +dh...@redhat.com (David Huff) +aerom...@redhat.com (Alexey Eromenko) +mbu...@redhat.com (Mike Burns) + +TIME = 'SHORT' +NAME = 'KVM test' +TEST_TYPE = 'client' +TEST_CLASS = 'Virtualization' +TEST_CATEGORY = 'Functional' + +DOC = +Executes the KVM test framework on a given host. This module is separated in +minor functions, that execute different tests for doing Quality Assurance on +KVM (both kernelspace and userspace) code. + + + +import sys, os, commands, re + +#- +# set English environment (command output might be localized, need to be safe) +#- +os.environ['LANG'] = 'en_US.UTF-8' + +#- +# Enable modules import from current directory (tests/kvm) +#- +pwd = os.path.join(os.environ['AUTODIR'],'tests/kvm') +sys.path.append(pwd) + +# +# create required symlinks +# +# When dispatching tests from autotest-server the links we need do not exist on +# the host (the client). The following lines create those symlinks. Change +# 'rootdir' here and/or mount appropriate directories in it. +# +# When dispatching tests on local host (client mode) one can either setup kvm +# links, or same as server mode use rootdir and set all appropriate links and +# mount-points there. For example, guest installation tests need to know where +# to find the iso-files. +# +# We create the links only if not already exist, so if one already set up the +# links for client/local run we do not touch the links. +rootdir='/tmp/kvm_autotest_root' +iso=os.path.join(rootdir, 'iso') +images=os.path.join(rootdir, 'images') +qemu=os.path.join(rootdir, 'qemu') +qemu_img=os.path.join(rootdir, 'qemu-img') + + +def link_if_not_exist(ldir, target, link_name): +t = target +l = os.path.join(ldir, link_name) +if not os.path.exists(l): +os.system('ln -s %s %s' % (t, l)) + +# Create links only if not already exist +link_if_not_exist(pwd, '../../', 'autotest') +link_if_not_exist(pwd, iso, 'isos') +link_if_not_exist(pwd, images, 'images') +link_if_not_exist(pwd, qemu, 'qemu') +link_if_not_exist(pwd, qemu_img, 'qemu-img') + +# +# Params that will be passed to the KVM install/build test +# +params = { +name: build, +shortname: build, +type: build, +mode: release, +#mode: snapshot, +#mode: localtar, +#mode: localsrc, +#mode: git, +#mode: noinstall, +#mode: koji, + +## Are we going to load modules built by this test? +## Defaults to 'yes', so if you are going to provide only userspace code to +## be built by
Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream
On Wed, Sep 30, 2009 at 08:24:18AM +0200, Avi Kivity wrote: On 09/29/2009 10:45 PM, Mark McLoughlin wrote: On Tue, 2009-05-05 at 09:56 +0100, Mark McLoughlin wrote: This commit: commit 559a8f45f34cc50d1a60b4f67a06614d506b2e01 Subject: Remove stray GSO code from virtio_net (Mark McLoughlin) Removed some GSO code from upstream qemu.git, but it needs to be re-instated in qemu-kvm.git. Reported-by: Sridhar Samudralas...@us.ibm.com Signed-off-by: Mark McLoughlinmar...@redhat.com --- hw/virtio-net.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index ac8e030..e5d7add 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -424,6 +424,11 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size) if (n-promisc) return 1; +#ifdef TAP_VNET_HDR +if (tap_has_vnet_hdr(n-vc-vlan-first_client)) +ptr += sizeof(struct virtio_net_hdr); +#endif + if (!memcmp(ptr[12], vlan, sizeof(vlan))) { int vid = be16_to_cpup((uint16_t *)(ptr + 14)) 0xfff; if (!(n-vlans[vid 5] (1U (vid 0x1f I'm not sure[1] how we didn't notice, but this has been broken on the stable-0.10 branch since 0.10.3; please apply there too Thanks, we'll queue it on stable-0.10. Anthony/Glauber, is 0.10.7 in the works? If not, we'll release it as 0.10.6.1. Since it is just one patch, I don't see a problem in anthony picking it directly and making a new release. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream
On Wed, 2009-09-30 at 08:24 -0300, Glauber Costa wrote: On Wed, Sep 30, 2009 at 08:24:18AM +0200, Avi Kivity wrote: On 09/29/2009 10:45 PM, Mark McLoughlin wrote: On Tue, 2009-05-05 at 09:56 +0100, Mark McLoughlin wrote: This commit: commit 559a8f45f34cc50d1a60b4f67a06614d506b2e01 Subject: Remove stray GSO code from virtio_net (Mark McLoughlin) Removed some GSO code from upstream qemu.git, but it needs to be re-instated in qemu-kvm.git. Reported-by: Sridhar Samudralas...@us.ibm.com Signed-off-by: Mark McLoughlinmar...@redhat.com --- hw/virtio-net.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index ac8e030..e5d7add 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -424,6 +424,11 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size) if (n-promisc) return 1; +#ifdef TAP_VNET_HDR +if (tap_has_vnet_hdr(n-vc-vlan-first_client)) +ptr += sizeof(struct virtio_net_hdr); +#endif + if (!memcmp(ptr[12], vlan, sizeof(vlan))) { int vid = be16_to_cpup((uint16_t *)(ptr + 14)) 0xfff; if (!(n-vlans[vid 5] (1U (vid 0x1f I'm not sure[1] how we didn't notice, but this has been broken on the stable-0.10 branch since 0.10.3; please apply there too Thanks, we'll queue it on stable-0.10. Anthony/Glauber, is 0.10.7 in the works? If not, we'll release it as 0.10.6.1. Since it is just one patch, I don't see a problem in anthony picking it directly and making a new release. It's not for qemu.git, it's for qemu-kvm.git - see the changelog Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Build problem found during daily testing (09/30/09)
Today's git test failed due to a build problem: 09/30 04:53:37 ERROR| kvm:0114| Test failed: Command make -j 4 failed, rc=2, Command returned non-zero exit status * Command: make -j 4 Exit status: 2 Duration: 0 stdout: make -C /lib/modules/2.6.29.6-217.2.8.fc11.x86_64/build M=`pwd` \ LINUXINCLUDE=-I`pwd`/include -Iinclude \ -Iarch/x86/include -I`pwd`/include-compat -I`pwd`/x86 \ -include include/linux/autoconf.h \ -include `pwd`/x86/external-module-compat.h \ $@ make[1]: Entering directory `/usr/src/kernels/2.6.29.6-217.2.8.fc11.x86_64' LD /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/built-in.o CC [M] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/svm.o CC [M] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.o CC [M] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx-debug.o CC [M] /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/kvm_main.o make[1]: Leaving directory `/usr/src/kernels/2.6.29.6-217.2.8.fc11.x86_64' stderr: /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/kvm_main.c:381: error: unknown field ‘change_pte’ specified in initializer /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/kvm_main.c:381: warning: initialization from incompatible pointer type make[3]: *** [/usr/local/autotest/tests/kvm/src/kvm_kmod/x86/kvm_main.o] Error 1 Relevant commit hashes: 09/30 04:52:40 INFO | kvm_utils:0182| Commit hash for git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git is d80e68823cada7b6d850330da1edfdf8bff9e2e6 (v2.6.31-rc3-11538-gd80e688) 09/30 04:53:21 INFO | kvm_utils:0182| Commit hash for git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git is 692d9aca97b865b0f7903565274a52606910f129 (kvm-88-1366-g692d9ac) 09/30 04:53:23 INFO | kvm_utils:0182| Commit hash for git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git is b86de9524511f75bf9115047b7b57e1da86bfb37 (kvm-88-22-gb86de95) If you need more info please let me know, Lucas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST PATCH 1/2] Add KSM test
On 09/29/2009 05:50 PM, Lucas Meneghel Rodrigues wrote: On Fri, 2009-09-25 at 05:22 -0400, Jiri Zupka wrote: - Dor Laordl...@redhat.com wrote: On 09/16/2009 04:09 PM, Jiri Zupka wrote: - Dor Laordl...@redhat.com wrote: On 09/15/2009 09:58 PM, Jiri Zupka wrote: After a quick review I have the following questions: 1. Why did you implement the guest tool in 'c' and not in python? Python is much simpler and you can share some code with the server. This 'test protocol' would also be easier to understand this way. We need speed and the precise control of allocate memory in pages. 2. IMHO there is no need to use select, you can do blocking read. We replace socket communication by interactive program communication via ssh/telnet 3. Also you can use plain malloc without the more complex ( a bit) mmap. We need address exactly the memory pages. We can't allow shift of the data in memory. You can use the tmpfs+dd idea instead of the specific program as I detailed before. Maybe some other binary can be used. My intention is to simplify the test/environment as much as possible. We need compatibility with others system, like Windows etc.. We want to add support for others system in next version KSM is a host feature and should be agnostic to the guest. Also I don't think your code will compile on windows... Yes, I think you have true. First of all, sorry, I am doing the best I can to review carefully all the patch queue, and as KSM is a more involved feature that I am not very familiar with, I need a bit more time to review it! But because we need generate special data to pages in memory. We need use script on guest side of test. Because communication over ssh is to slow to transfer lot of GB of special data to guests. We can use optimized C program which is 10x and more faster than python script on native system. Heavy load of virtual guest can make some performance problem. About code compiling under windows, I guess making a native windows c or c++ program is an option, I generally agree with your reasoning, this case seems to be better covered with a c program. Will get into it in more detail ASAP... We can use tmpfs but with python script to generate special data. We can't use dd with random because we need test some special case. (change only last 96B of page etc.. ) What do you think about it? I think it can be done with some simple scripting and it will be fast enough and more importantly, easier to understand and to change in the future. Here is a short example for creating lots of identical pages that contain '0' apart for the last two bytes. If you'll run it in a single guest you should expect to save lots of memory. Then you can change the last bytes to random value and see the memory consumption grow: [Remember to cancel the guest swap to keep it in the guest ram] dd if=/dev/zero of=template count=1 bs=4094 echo '1' template cp template large_file for ((i=0;i10;i++)) do dd if=large_file of=large_file conv=notrunc oflag=append /dev/null 21 ; done It creates a 4k*2^10 file with identical pages (since it's on tmpfs with no swap) Can you try it? It should be far simpler than the original option. Thanks, Dor -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream
Avi Kivity wrote: Anthony/Glauber, is 0.10.7 in the works? If not, we'll release it as 0.10.6.1. Yes. I can release it very soon. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Release plan for 0.12.0
Hi Isaku, Isaku Yamahata wrote: o newer chipset (which is based on Q35 chipset) o multiple pci bus o PCI express (MMCONFIG) o PCI express hot plug (not acpi based) o PCI express switch emulator Although there is no PCIe emulated device at the moment, this will be a fundamental infrastructure for PCI express native direct attach. Your patches definitely deserve review/commit. I'll make sure that happens for the 0.12 time frame. Michael, could you help review some of the PCI patches? Thanks, -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Release plan for 0.12.0
Amit Shah wrote: On (Tue) Sep 29 2009 [18:54:53], Anthony Liguori wrote: o multiport virtio-console support Assuming we can get the kernel drivers straightened out, I think it's certainly reasonable for 0.12. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Release plan for 0.12.0
Avi Kivity wrote: On 09/30/2009 01:54 AM, Anthony Liguori wrote: Hi, Now that 0.11.0 is behind us, it's time to start thinking about 0.12.0. I'd like to do a few things different this time around. I don't think the -rc process went very well as I don't think we got more testing out of it. I'd like to shorten the timeline for 0.12.0 a good bit. The 0.10 stable tree got pretty difficult to maintain toward the end of the cycle. We also had a pretty huge amount of change between 0.10 and 0.11 so I think a shorter cycle is warranted. I think aiming for early to mid-December would give us roughly a 3 month cycle and would align well with some of the Linux distribution cycles. I'd like to limit things to a single -rc that lasted only for about a week. This is enough time to fix most of the obvious issues I think. I'd also like to try to enumerate some features for this release. Here's a short list of things I expect to see for this release (target-i386 centric). Please add or comment on items that you'd either like to see in the release or are planning on working on. o VMState conversion -- I expect most of the pc target to be completed o qdev conversion -- I hope that we'll get most of the pc target completely converted to qdev o storage live migration o switch to SeaBIOS (need to finish porting features from Bochs) o switch to gPXE (need to resolve slirp tftp server issue) o KSM integration o in-kernel APIC support for KVM o guest SMP support for KVM o updates to the default pc machine type Machine monitor protocol. If we're going to support the protocol for 0.12, I'd like to most of the code merged by the end of October. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Release plan for 0.12.0
Carl-Daniel Hailfinger wrote: Hi, On 30.09.2009 01:54, Anthony Liguori wrote: Now that 0.11.0 is behind us, it's time to start thinking about 0.12.0. I'd also like to try to enumerate some features for this release. Here's a short list of things I expect to see for this release (target-i386 centric). o switch to SeaBIOS (need to finish porting features from Bochs) That switch is much appreciated because it also reduces the testing matrix of those coreboot developers who boot test every commit with Qemu. However, to run coreboot on Qemu with the same init sequence as on simplified real hardware, we need Cache-as-RAM (CAR) support. This is basically a mode where sizeof(cacheable area) = sizeof (L2 cache) and causes the processor to lock the cache and not pass any reads/writes through to the RAM behind the cached area. The easiest way to implement this would be to check the cache size criterion upon every MTRR manipulation and either map a chunk of fresh memory on top of the existing memory (which may be RAM, ROM or unmapped) for every cacheable area, and if the cacheable area starts to exceed the L2 cache size, discard all memory contents of the memory mapped on top. For additional correctness, the memory shoud not be discarded and written back to the lower layer of memory if WBINVD (instead of INVD) or CLFLUSH are called. That one is mostly sugar, though, and coreboot can do without. Do we really need coreboot to use the same init sequence? coreboot is firmware and we don't necessarily run real firmware under QEMU. It's a short cut that lets us avoid a lot of complexity. Right now coreboot sets up the MTRRs correctly, but then (conditional on Qemu) only uses areas which are known to be backed by RAM instead of the areas designated by CAR. I'd like to implement CAR support which builds on top of my MTRR code which was merged some months ago (and I already have code to check for total cacheable area size), but I need help with the memory mapping stuff. How do I proceed? Clean up what I have and insert FIXME comments where I don't know how to implement stuff so others can see the code and comment on it? You could start there. But from a higher level, I'm not sure I think a partial implementation of something like CAR is all that valuable since coreboot already runs under QEMU. Regards, Carl-Daniel -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Release plan for 0.12.0
On Wed, 30 Sep 2009 08:41:23 +0200 Avi Kivity a...@redhat.com wrote: On 09/30/2009 01:54 AM, Anthony Liguori wrote: Hi, Now that 0.11.0 is behind us, it's time to start thinking about 0.12.0. I'd like to do a few things different this time around. I don't think the -rc process went very well as I don't think we got more testing out of it. I'd like to shorten the timeline for 0.12.0 a good bit. The 0.10 stable tree got pretty difficult to maintain toward the end of the cycle. We also had a pretty huge amount of change between 0.10 and 0.11 so I think a shorter cycle is warranted. I think aiming for early to mid-December would give us roughly a 3 month cycle and would align well with some of the Linux distribution cycles. I'd like to limit things to a single -rc that lasted only for about a week. This is enough time to fix most of the obvious issues I think. I'd also like to try to enumerate some features for this release. Here's a short list of things I expect to see for this release (target-i386 centric). Please add or comment on items that you'd either like to see in the release or are planning on working on. o VMState conversion -- I expect most of the pc target to be completed o qdev conversion -- I hope that we'll get most of the pc target completely converted to qdev o storage live migration o switch to SeaBIOS (need to finish porting features from Bochs) o switch to gPXE (need to resolve slirp tftp server issue) o KSM integration o in-kernel APIC support for KVM o guest SMP support for KVM o updates to the default pc machine type Machine monitor protocol. Yeah, I was going to suggest it as well. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] Nested VMX patch 1 implements vmon and vmoff
From: Orit Wasserman or...@il.ibm.com --- arch/x86/kvm/svm.c |3 - arch/x86/kvm/vmx.c | 217 +++- arch/x86/kvm/x86.c |6 +- arch/x86/kvm/x86.h |2 + 4 files changed, 222 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 2df9b45..3c1f22a 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -124,9 +124,6 @@ static int npt = 1; module_param(npt, int, S_IRUGO); -static int nested = 1; -module_param(nested, int, S_IRUGO); - static void svm_flush_tlb(struct kvm_vcpu *vcpu); static void svm_complete_interrupts(struct vcpu_svm *svm); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 78101dd..71bd91a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -67,6 +67,11 @@ struct vmcs { char data[0]; }; +struct nested_vmx { + /* Has the level1 guest done vmxon? */ + bool vmxon; +}; + struct vcpu_vmx { struct kvm_vcpu vcpu; struct list_head local_vcpus_link; @@ -114,6 +119,9 @@ struct vcpu_vmx { ktime_t entry_time; s64 vnmi_blocked_time; u32 exit_reason; + + /* Nested vmx */ + struct nested_vmx nested; }; static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) @@ -967,6 +975,95 @@ static void guest_write_tsc(u64 guest_tsc, u64 host_tsc) } /* + * Handles msr read for nested virtualization + */ +static int nested_vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, + u64 *pdata) +{ + u64 vmx_msr = 0; + + switch (msr_index) { + case MSR_IA32_FEATURE_CONTROL: + *pdata = 0; + break; + case MSR_IA32_VMX_BASIC: + *pdata = 0; + rdmsrl(MSR_IA32_VMX_BASIC, vmx_msr); + *pdata = (vmx_msr 0x00cf); + break; + case MSR_IA32_VMX_PINBASED_CTLS: + rdmsrl(MSR_IA32_VMX_PINBASED_CTLS, vmx_msr); + *pdata = (PIN_BASED_EXT_INTR_MASK vmcs_config.pin_based_exec_ctrl) | + (PIN_BASED_NMI_EXITING vmcs_config.pin_based_exec_ctrl) | + (PIN_BASED_VIRTUAL_NMIS vmcs_config.pin_based_exec_ctrl); + break; + case MSR_IA32_VMX_PROCBASED_CTLS: + { + u32 vmx_msr_high, vmx_msr_low; + u32 control = CPU_BASED_HLT_EXITING | +#ifdef CONFIG_X86_64 + CPU_BASED_CR8_LOAD_EXITING | + CPU_BASED_CR8_STORE_EXITING | +#endif + CPU_BASED_CR3_LOAD_EXITING | + CPU_BASED_CR3_STORE_EXITING | + CPU_BASED_USE_IO_BITMAPS | + CPU_BASED_MOV_DR_EXITING | + CPU_BASED_USE_TSC_OFFSETING | + CPU_BASED_INVLPG_EXITING | + CPU_BASED_TPR_SHADOW | + CPU_BASED_USE_MSR_BITMAPS | + CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; + + rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high); + + control = vmx_msr_high; /* bit == 0 in high word == must be zero */ + control |= vmx_msr_low; /* bit == 1 in low word == must be one */ + + *pdata = (CPU_BASED_HLT_EXITING control) | +#ifdef CONFIG_X86_64 + (CPU_BASED_CR8_LOAD_EXITING control) | + (CPU_BASED_CR8_STORE_EXITING control) | +#endif + (CPU_BASED_CR3_LOAD_EXITING control) | + (CPU_BASED_CR3_STORE_EXITING control) | + (CPU_BASED_USE_IO_BITMAPS control) | + (CPU_BASED_MOV_DR_EXITING control) | + (CPU_BASED_USE_TSC_OFFSETING control) | + (CPU_BASED_INVLPG_EXITING control) ; + + if (cpu_has_secondary_exec_ctrls()) + *pdata |= CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; + + if (vm_need_tpr_shadow(vcpu-kvm)) + *pdata |= CPU_BASED_TPR_SHADOW; + break; + } + case MSR_IA32_VMX_EXIT_CTLS: + *pdata = 0; +#ifdef CONFIG_X86_64 + *pdata |= VM_EXIT_HOST_ADDR_SPACE_SIZE; +#endif + break; + case MSR_IA32_VMX_ENTRY_CTLS: + *pdata = 0; + break; + case MSR_IA32_VMX_PROCBASED_CTLS2: + *pdata = 0; + if (vm_need_virtualize_apic_accesses(vcpu-kvm)) + *pdata |= SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; + break; + case MSR_IA32_VMX_EPT_VPID_CAP: + *pdata = 0; + break; + default: + return 1; + } + + return 0; +} + +/* * Reads an msr value (of 'msr_index') into 'pdata'. * Returns 0 on success, non-0 otherwise. * Assumes vcpu_load() was already called. @@ -1005,6 +1102,9 @@ static int
[PATCH 2/5] Nested VMX patch 2 implements vmclear
From: Orit Wasserman or...@il.ibm.com --- arch/x86/kvm/vmx.c | 70 --- 1 files changed, 65 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 71bd91a..411cbdb 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -61,15 +61,26 @@ module_param_named(unrestricted_guest, static int __read_mostly emulate_invalid_guest_state = 0; module_param(emulate_invalid_guest_state, bool, S_IRUGO); -struct vmcs { - u32 revision_id; - u32 abort; - char data[0]; +struct __attribute__ ((__packed__)) level_state { + /* Has the level1 guest done vmclear? */ + bool vmclear; }; struct nested_vmx { /* Has the level1 guest done vmxon? */ bool vmxon; + + /* +* Level 2 state : includes vmcs,registers and +* a copy of vmcs12 for vmread/vmwrite +*/ + struct level_state *l2_state; +}; + +struct vmcs { + u32 revision_id; + u32 abort; + char data[0]; }; struct vcpu_vmx { @@ -186,6 +197,8 @@ static struct kvm_vmx_segment_field { static void ept_save_pdptrs(struct kvm_vcpu *vcpu); +static int create_l2_state(struct kvm_vcpu *vcpu); + /* * Keep MSR_K6_STAR at the end, as setup_msrs() will try to optimize it * away by decrementing the array size. @@ -1293,6 +1306,30 @@ static void vmclear_local_vcpus(void) __vcpu_clear(vmx); } +struct level_state *create_state(void) +{ + struct level_state *state = NULL; + + state = kzalloc(sizeof(struct level_state), GFP_KERNEL); + if (!state) { + printk(KERN_INFO Error create level state\n); + return NULL; + } + return state; +} + +int create_l2_state(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (!vmx-nested.l2_state) { + vmx-nested.l2_state = create_state(); + if (!vmx-nested.l2_state) + return -ENOMEM; + } + + return 0; +} /* Just like cpu_vmxoff(), but with the __kvm_handle_fault_on_reboot() * tricks. @@ -3261,6 +3298,27 @@ static int handle_vmx_insn(struct kvm_vcpu *vcpu) return 1; } +static void clear_rflags_cf_zf(struct kvm_vcpu *vcpu) +{ + unsigned long rflags; + rflags = vmx_get_rflags(vcpu); + rflags = ~(X86_EFLAGS_CF | X86_EFLAGS_ZF); + vmx_set_rflags(vcpu, rflags); +} + +static int handle_vmclear(struct kvm_vcpu *vcpu) +{ + if (!nested_vmx_check_permission(vcpu)) + return 1; + + to_vmx(vcpu)-nested.l2_state-vmclear = 1; + + skip_emulated_instruction(vcpu); + clear_rflags_cf_zf(vcpu); + + return 1; +} + static int handle_vmoff(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -3310,6 +3368,8 @@ static int handle_vmon(struct kvm_vcpu *vcpu) vmx-nested.vmxon = 1; + create_l2_state(vcpu); + skip_emulated_instruction(vcpu); return 1; } @@ -3582,7 +3642,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { [EXIT_REASON_HLT] = handle_halt, [EXIT_REASON_INVLPG] = handle_invlpg, [EXIT_REASON_VMCALL] = handle_vmcall, - [EXIT_REASON_VMCLEAR] = handle_vmx_insn, + [EXIT_REASON_VMCLEAR] = handle_vmclear, [EXIT_REASON_VMLAUNCH]= handle_vmx_insn, [EXIT_REASON_VMPTRLD] = handle_vmx_insn, [EXIT_REASON_VMPTRST] = handle_vmx_insn, -- 1.6.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] Nested VMX patch 3 implements vmptrld and vmptrst
From: Orit Wasserman or...@il.ibm.com --- arch/x86/kvm/vmx.c | 468 ++-- arch/x86/kvm/x86.c |3 +- 2 files changed, 459 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 411cbdb..8c186e0 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -61,20 +61,168 @@ module_param_named(unrestricted_guest, static int __read_mostly emulate_invalid_guest_state = 0; module_param(emulate_invalid_guest_state, bool, S_IRUGO); + +struct __attribute__ ((__packed__)) shadow_vmcs { + u32 revision_id; + u32 abort; + u16 virtual_processor_id; + u16 guest_es_selector; + u16 guest_cs_selector; + u16 guest_ss_selector; + u16 guest_ds_selector; + u16 guest_fs_selector; + u16 guest_gs_selector; + u16 guest_ldtr_selector; + u16 guest_tr_selector; + u16 host_es_selector; + u16 host_cs_selector; + u16 host_ss_selector; + u16 host_ds_selector; + u16 host_fs_selector; + u16 host_gs_selector; + u16 host_tr_selector; + u64 io_bitmap_a; + u64 io_bitmap_b; + u64 msr_bitmap; + u64 vm_exit_msr_store_addr; + u64 vm_exit_msr_load_addr; + u64 vm_entry_msr_load_addr; + u64 tsc_offset; + u64 virtual_apic_page_addr; + u64 apic_access_addr; + u64 ept_pointer; + u64 guest_physical_address; + u64 vmcs_link_pointer; + u64 guest_ia32_debugctl; + u64 guest_ia32_pat; + u64 guest_pdptr0; + u64 guest_pdptr1; + u64 guest_pdptr2; + u64 guest_pdptr3; + u64 host_ia32_pat; + u32 pin_based_vm_exec_control; + u32 cpu_based_vm_exec_control; + u32 exception_bitmap; + u32 page_fault_error_code_mask; + u32 page_fault_error_code_match; + u32 cr3_target_count; + u32 vm_exit_controls; + u32 vm_exit_msr_store_count; + u32 vm_exit_msr_load_count; + u32 vm_entry_controls; + u32 vm_entry_msr_load_count; + u32 vm_entry_intr_info_field; + u32 vm_entry_exception_error_code; + u32 vm_entry_instruction_len; + u32 tpr_threshold; + u32 secondary_vm_exec_control; + u32 vm_instruction_error; + u32 vm_exit_reason; + u32 vm_exit_intr_info; + u32 vm_exit_intr_error_code; + u32 idt_vectoring_info_field; + u32 idt_vectoring_error_code; + u32 vm_exit_instruction_len; + u32 vmx_instruction_info; + u32 guest_es_limit; + u32 guest_cs_limit; + u32 guest_ss_limit; + u32 guest_ds_limit; + u32 guest_fs_limit; + u32 guest_gs_limit; + u32 guest_ldtr_limit; + u32 guest_tr_limit; + u32 guest_gdtr_limit; + u32 guest_idtr_limit; + u32 guest_es_ar_bytes; + u32 guest_cs_ar_bytes; + u32 guest_ss_ar_bytes; + u32 guest_ds_ar_bytes; + u32 guest_fs_ar_bytes; + u32 guest_gs_ar_bytes; + u32 guest_ldtr_ar_bytes; + u32 guest_tr_ar_bytes; + u32 guest_interruptibility_info; + u32 guest_activity_state; + u32 guest_sysenter_cs; + u32 host_ia32_sysenter_cs; + unsigned long cr0_guest_host_mask; + unsigned long cr4_guest_host_mask; + unsigned long cr0_read_shadow; + unsigned long cr4_read_shadow; + unsigned long cr3_target_value0; + unsigned long cr3_target_value1; + unsigned long cr3_target_value2; + unsigned long cr3_target_value3; + unsigned long exit_qualification; + unsigned long guest_linear_address; + unsigned long guest_cr0; + unsigned long guest_cr3; + unsigned long guest_cr4; + unsigned long guest_es_base; + unsigned long guest_cs_base; + unsigned long guest_ss_base; + unsigned long guest_ds_base; + unsigned long guest_fs_base; + unsigned long guest_gs_base; + unsigned long guest_ldtr_base; + unsigned long guest_tr_base; + unsigned long guest_gdtr_base; + unsigned long guest_idtr_base; + unsigned long guest_dr7; + unsigned long guest_rsp; + unsigned long guest_rip; + unsigned long guest_rflags; + unsigned long guest_pending_dbg_exceptions; + unsigned long guest_sysenter_esp; + unsigned long guest_sysenter_eip; + unsigned long host_cr0; + unsigned long host_cr3; + unsigned long host_cr4; + unsigned long host_fs_base; + unsigned long host_gs_base; + unsigned long host_tr_base; + unsigned long host_gdtr_base; + unsigned long host_idtr_base; + unsigned long host_ia32_sysenter_esp; + unsigned long host_ia32_sysenter_eip; + unsigned long host_rsp; + unsigned long host_rip; +}; + struct __attribute__ ((__packed__)) level_state { /* Has the level1 guest done vmclear? */ bool vmclear; + u16 vpid; + u64 shadow_efer; + unsigned long cr2; +
[PATCH 4/5] Nested VMX patch 4 implements vmread and vmwrite
From: Orit Wasserman or...@il.ibm.com --- arch/x86/kvm/vmx.c | 591 +++- 1 files changed, 589 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 8c186e0..6a4c252 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -225,6 +225,21 @@ struct nested_vmx { struct level_state *l1_state; }; +enum vmcs_field_type { + VMCS_FIELD_TYPE_U16 = 0, + VMCS_FIELD_TYPE_U64 = 1, + VMCS_FIELD_TYPE_U32 = 2, + VMCS_FIELD_TYPE_ULONG = 3 +}; + +#define VMCS_FIELD_LENGTH_OFFSET 13 +#define VMCS_FIELD_LENGTH_MASK 0x6000 + +static inline int vmcs_field_length(unsigned long field) +{ + return (VMCS_FIELD_LENGTH_MASK field) 13; +} + struct vmcs { u32 revision_id; u32 abort; @@ -288,6 +303,404 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) return container_of(vcpu, struct vcpu_vmx, vcpu); } +#define SHADOW_VMCS_OFFSET(x) offsetof(struct shadow_vmcs, x) + +static unsigned short vmcs_field_to_offset_table[HOST_RIP+1] = { + + [VIRTUAL_PROCESSOR_ID] = + SHADOW_VMCS_OFFSET(virtual_processor_id), + [GUEST_ES_SELECTOR] = + SHADOW_VMCS_OFFSET(guest_es_selector), + [GUEST_CS_SELECTOR] = + SHADOW_VMCS_OFFSET(guest_cs_selector), + [GUEST_SS_SELECTOR] = + SHADOW_VMCS_OFFSET(guest_ss_selector), + [GUEST_DS_SELECTOR] = + SHADOW_VMCS_OFFSET(guest_ds_selector), + [GUEST_FS_SELECTOR] = + SHADOW_VMCS_OFFSET(guest_fs_selector), + [GUEST_GS_SELECTOR] = + SHADOW_VMCS_OFFSET(guest_gs_selector), + [GUEST_LDTR_SELECTOR] = + SHADOW_VMCS_OFFSET(guest_ldtr_selector), + [GUEST_TR_SELECTOR] = + SHADOW_VMCS_OFFSET(guest_tr_selector), + [HOST_ES_SELECTOR] = + SHADOW_VMCS_OFFSET(host_es_selector), + [HOST_CS_SELECTOR] = + SHADOW_VMCS_OFFSET(host_cs_selector), + [HOST_SS_SELECTOR] = + SHADOW_VMCS_OFFSET(host_ss_selector), + [HOST_DS_SELECTOR] = + SHADOW_VMCS_OFFSET(host_ds_selector), + [HOST_FS_SELECTOR] = + SHADOW_VMCS_OFFSET(host_fs_selector), + [HOST_GS_SELECTOR] = + SHADOW_VMCS_OFFSET(host_gs_selector), + [HOST_TR_SELECTOR] = + SHADOW_VMCS_OFFSET(host_tr_selector), + [IO_BITMAP_A] = + SHADOW_VMCS_OFFSET(io_bitmap_a), + [IO_BITMAP_A_HIGH] = + SHADOW_VMCS_OFFSET(io_bitmap_a)+4, + [IO_BITMAP_B] = + SHADOW_VMCS_OFFSET(io_bitmap_b), + [IO_BITMAP_B_HIGH] = + SHADOW_VMCS_OFFSET(io_bitmap_b)+4, + [MSR_BITMAP] = + SHADOW_VMCS_OFFSET(msr_bitmap), + [MSR_BITMAP_HIGH] = + SHADOW_VMCS_OFFSET(msr_bitmap)+4, + [VM_EXIT_MSR_STORE_ADDR] = + SHADOW_VMCS_OFFSET(vm_exit_msr_store_addr), + [VM_EXIT_MSR_STORE_ADDR_HIGH] = + SHADOW_VMCS_OFFSET(vm_exit_msr_store_addr)+4, + [VM_EXIT_MSR_LOAD_ADDR] = + SHADOW_VMCS_OFFSET(vm_exit_msr_load_addr), + [VM_EXIT_MSR_LOAD_ADDR_HIGH] = + SHADOW_VMCS_OFFSET(vm_exit_msr_load_addr)+4, + [VM_ENTRY_MSR_LOAD_ADDR] = + SHADOW_VMCS_OFFSET(vm_entry_msr_load_addr), + [VM_ENTRY_MSR_LOAD_ADDR_HIGH] = + SHADOW_VMCS_OFFSET(vm_entry_msr_load_addr)+4, + [TSC_OFFSET] = + SHADOW_VMCS_OFFSET(tsc_offset), + [TSC_OFFSET_HIGH] = + SHADOW_VMCS_OFFSET(tsc_offset)+4, + [VIRTUAL_APIC_PAGE_ADDR] = + SHADOW_VMCS_OFFSET(virtual_apic_page_addr), + [VIRTUAL_APIC_PAGE_ADDR_HIGH] = + SHADOW_VMCS_OFFSET(virtual_apic_page_addr)+4, + [APIC_ACCESS_ADDR] = + SHADOW_VMCS_OFFSET(apic_access_addr), + [APIC_ACCESS_ADDR_HIGH] = + SHADOW_VMCS_OFFSET(apic_access_addr)+4, + [EPT_POINTER] = + SHADOW_VMCS_OFFSET(ept_pointer), + [EPT_POINTER_HIGH] = + SHADOW_VMCS_OFFSET(ept_pointer)+4, + [GUEST_PHYSICAL_ADDRESS] = + SHADOW_VMCS_OFFSET(guest_physical_address), + [GUEST_PHYSICAL_ADDRESS_HIGH] = + SHADOW_VMCS_OFFSET(guest_physical_address)+4, + [VMCS_LINK_POINTER] = + SHADOW_VMCS_OFFSET(vmcs_link_pointer), + [VMCS_LINK_POINTER_HIGH] = + SHADOW_VMCS_OFFSET(vmcs_link_pointer)+4, + [GUEST_IA32_DEBUGCTL] = + SHADOW_VMCS_OFFSET(guest_ia32_debugctl), + [GUEST_IA32_DEBUGCTL_HIGH] = + SHADOW_VMCS_OFFSET(guest_ia32_debugctl)+4, + [GUEST_IA32_PAT] = + SHADOW_VMCS_OFFSET(guest_ia32_pat), + [GUEST_IA32_PAT_HIGH] = + SHADOW_VMCS_OFFSET(guest_ia32_pat)+4, + [GUEST_PDPTR0] = + SHADOW_VMCS_OFFSET(guest_pdptr0), + [GUEST_PDPTR0_HIGH] = +
[PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
From: Orit Wasserman or...@il.ibm.com --- arch/x86/kvm/vmx.c | 1173 ++-- 1 files changed, 1148 insertions(+), 25 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 6a4c252..e814029 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -209,6 +209,7 @@ struct __attribute__ ((__packed__)) level_state { struct vmcs *vmcs; int cpu; int launched; + bool first_launch; }; struct nested_vmx { @@ -216,6 +217,12 @@ struct nested_vmx { bool vmxon; /* What is the location of the vmcs l1 keeps for l2? (in level1 gpa) */ u64 vmptr; + /* Are we running nested guest */ + bool nested_mode; + /* L1 requested VMLAUNCH or VMRESUME but we didn't run L2 yet */ + bool nested_run_pending; + /* flag indicating if there was a valid IDT after exiting from l2 */ + bool nested_valid_idt; /* * Level 2 state : includes vmcs,registers and * a copy of vmcs12 for vmread/vmwrite @@ -240,6 +247,10 @@ static inline int vmcs_field_length(unsigned long field) return (VMCS_FIELD_LENGTH_MASK field) 13; } +#define NESTED_VM_EXIT_CONTROLS_MASK (~(VM_EXIT_LOAD_IA32_PAT | \ + VM_EXIT_SAVE_IA32_PAT)) +#define NESTED_VM_ENTRY_CONTROLS_MASK (~(VM_ENTRY_LOAD_IA32_PAT | \ +VM_ENTRY_IA32E_MODE)) struct vmcs { u32 revision_id; u32 abort; @@ -303,6 +314,12 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) return container_of(vcpu, struct vcpu_vmx, vcpu); } +static inline struct shadow_vmcs *get_shadow_vmcs(struct kvm_vcpu *vcpu) +{ + WARN_ON(!to_vmx(vcpu)-nested.l2_state-shadow_vmcs); + return to_vmx(vcpu)-nested.l2_state-shadow_vmcs; +} + #define SHADOW_VMCS_OFFSET(x) offsetof(struct shadow_vmcs, x) static unsigned short vmcs_field_to_offset_table[HOST_RIP+1] = { @@ -822,8 +839,16 @@ static struct kvm_vmx_segment_field { static void ept_save_pdptrs(struct kvm_vcpu *vcpu); static int nested_vmx_check_permission(struct kvm_vcpu *vcpu); +static int nested_vmx_check_exception(struct vcpu_vmx *vmx, unsigned nr, + bool has_error_code, u32 error_code); +static int nested_vmx_intr(struct kvm_vcpu *vcpu); static int create_l1_state(struct kvm_vcpu *vcpu); static int create_l2_state(struct kvm_vcpu *vcpu); +static int launch_guest(struct kvm_vcpu *vcpu); +static int nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu); +static int nested_vmx_exit_handled(struct kvm_vcpu *vcpu, bool kvm_override); +static int nested_vmx_vmexit(struct kvm_vcpu *vcpu, +bool is_interrupt); /* * Keep MSR_K6_STAR at the end, as setup_msrs() will try to optimize it @@ -940,6 +965,18 @@ static inline bool cpu_has_vmx_ept_2m_page(void) return !!(vmx_capability.ept VMX_EPT_2MB_PAGE_BIT); } +static inline int is_exception(u32 intr_info) +{ + return (intr_info (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK)) + == (INTR_TYPE_HARD_EXCEPTION | INTR_INFO_VALID_MASK); +} + +static inline int is_nmi(u32 intr_info) +{ + return (intr_info (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK)) + == (INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK); +} + static inline int cpu_has_vmx_invept_individual_addr(void) { return !!(vmx_capability.ept VMX_EPT_EXTENT_INDIVIDUAL_BIT); @@ -990,6 +1027,51 @@ static inline bool report_flexpriority(void) return flexpriority_enabled; } +static inline int nested_cpu_has_vmx_tpr_shadow(struct kvm_vcpu *vcpu) +{ + return cpu_has_vmx_tpr_shadow() + get_shadow_vmcs(vcpu)-cpu_based_vm_exec_control + CPU_BASED_TPR_SHADOW; +} + +static inline int nested_cpu_has_secondary_exec_ctrls(struct kvm_vcpu *vcpu) +{ + return cpu_has_secondary_exec_ctrls() + get_shadow_vmcs(vcpu)-cpu_based_vm_exec_control + CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; +} + +static inline bool nested_vm_need_virtualize_apic_accesses(struct kvm_vcpu + *vcpu) +{ + return get_shadow_vmcs(vcpu)-secondary_vm_exec_control + SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; +} + +static inline int nested_cpu_has_vmx_ept(struct kvm_vcpu *vcpu) +{ + return get_shadow_vmcs(vcpu)- + secondary_vm_exec_control SECONDARY_EXEC_ENABLE_EPT; +} + +static inline int nested_cpu_has_vmx_vpid(struct kvm_vcpu *vcpu) +{ + return get_shadow_vmcs(vcpu)-secondary_vm_exec_control + SECONDARY_EXEC_ENABLE_VPID; +} + +static inline int nested_cpu_has_vmx_pat(struct kvm_vcpu *vcpu) +{ + return get_shadow_vmcs(vcpu)-vm_entry_controls + VM_ENTRY_LOAD_IA32_PAT; +} + +static inline int nested_cpu_has_vmx_msr_bitmap(struct kvm_vcpu *vcpu) +{ + return
Nested VMX support v2
The following patches implement nested VMX support. The patches enable a guest to use the VMX APIs in order to run its own nested guest (i.e., enable running other hypervisors which use VMX under KVM). The current patches support running Linux under a nested KVM using shadow page table (with bypass_guest_pf disabled). SMP support was fixed. Reworking EPT support to mesh cleanly with the current shadow paging design per Avi's comments is a work-in-progress. The current patches only support a single nested hypervisor, which can only run a single guest (multiple guests are work in progress). Only 64-bit nested hypervisors are supported. Additional patches for running Windows under nested KVM, and Linux under nested VMware server(!), are currently running in the lab. We are in the process of forward-porting those patches to -tip. This patches were written by: Orit Wasserman, or...@il.ibm.com Ben-Ami Yassor, ben...@il.ibm.com Abel Gordon, ab...@il.ibm.com Muli Ben-Yehuda, m...@il.ibm.com With contributions by: Anthony Liguori, aligu...@us.ibm.com Mike Day, m...@us.ibm.com This work was inspired by the nested SVM support by Alexander Graf and Joerg Roedel. Changes since v2: Added check to nested_vmx_get_msr. Static initialization of the vmcs_field_to_offset_table array. Use the memory allocated by L1 for VMCS12 to store the shadow vmcs. Some optimization to the prepare_vmcs_12 function. vpid allocation will be updated with the multiguest support (work in progress). We are working on fixing the cr0.TS handling, it works for nested kvm by not for vmware server. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Release plan for 0.12.0
On (Wed) Sep 30 2009 [08:04:17], Anthony Liguori wrote: Amit Shah wrote: On (Tue) Sep 29 2009 [18:54:53], Anthony Liguori wrote: o multiport virtio-console support Assuming we can get the kernel drivers straightened out, I think it's certainly reasonable for 0.12. The kernel drivers are in fine shape. Amit -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4: kvm 4/4] Fix hotplug of CPUs for KVM.
On Tue, Sep 29, 2009 at 11:38:37AM -1000, Zachary Amsden wrote: Both VMX and SVM require per-cpu memory allocation, which is done at module init time, for only online cpus. Backend was not allocating enough structure for all possible CPUs, so new CPUs coming online could not be hardware enabled. Signed-off-by: Zachary Amsden zams...@redhat.com Applied all, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Release plan for 0.12.0
On Wed, Sep 30, 2009 at 08:03:20AM -0500, Anthony Liguori wrote: Hi Isaku, Isaku Yamahata wrote: o newer chipset (which is based on Q35 chipset) o multiple pci bus o PCI express (MMCONFIG) o PCI express hot plug (not acpi based) o PCI express switch emulator Although there is no PCIe emulated device at the moment, this will be a fundamental infrastructure for PCI express native direct attach. Your patches definitely deserve review/commit. I'll make sure that happens for the 0.12 time frame. Michael, could you help review some of the PCI patches? Yes, I am doing this sent comments already. The only thing I have not looked at yet is the new express file. Thanks, -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream
I might sound like a broken record, but why isn't the full GSO support for virtio-net upstream in qemu? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream
On 09/30/2009 03:51 PM, Christoph Hellwig wrote: I might sound like a broken record, but why isn't the full GSO support for virtio-net upstream in qemu? IIRC the current hacks are not upstream quality. The problem (again IIRC) is that the guest and host negotiate a protocol, but the qemu vlan model doesn't have a guest and a host, it has peers (possibly more than two), so a lot of translation has to take place if you have one peer supporting a guest feature and another not. IMO the best way out is to drop the vlan model. It has its uses, but they can all be implemented in other ways, and are all have minor usage compared to the business of getting data into and out of a guest. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream
On Wed, 2009-09-30 at 15:55 +0200, Avi Kivity wrote: On 09/30/2009 03:51 PM, Christoph Hellwig wrote: I might sound like a broken record, but why isn't the full GSO support for virtio-net upstream in qemu? IIRC the current hacks are not upstream quality. The problem (again IIRC) is that the guest and host negotiate a protocol, but the qemu vlan model doesn't have a guest and a host, it has peers (possibly more than two), so a lot of translation has to take place if you have one peer supporting a guest feature and another not. Right. IMO the best way out is to drop the vlan model. It has its uses, but they can all be implemented in other ways, and are all have minor usage compared to the business of getting data into and out of a guest. I think we should keep the vlan stuff, just de-emphasise it. I'm planning on adding -hostnet and -nic arguments, which would not use vlans by default but rather connect the nic directly to the host side. The QemuOpts conversion of -net which is waiting to be merged is the first stage of that. Cheers, Mark. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: migrate_set_downtime bug
On Wed, Sep 30, 2009 at 10:55:24AM +0200, Dietmar Maurer wrote: Another problem occur when max_downtime is too short. This can results in never ending migration task. To reproduce just play a video inside a VM and set max_downtime to 30ns Sure, one can argument that this behavior is expected. But the following would avoid the problem: +if ((stage == 2) (bytes_transferred 2*ram_bytes_total())) { +return 1; +} why 2 * ? This means we'll have to transfer the whole contents of RAM at least twice to hit this condition, right? Yes, this is just an arbitrary limit. - Dietmar -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
virt-install: hda disks?
Hi, Not sure if this is the right place to ask this. I'm getting hda disks by default with kvm under RHEL5.4 using virt-install. This seems an odd default. Is there a reason for hda disks over sda disks? Can I change this? Thanks, JB -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Release plan for 0.12.0
Luiz Capitulino wrote: On Tue, 29 Sep 2009 18:54:53 -0500 Anthony Liguori aligu...@us.ibm.com wrote: I think aiming for early to mid-December would give us roughly a 3 month cycle and would align well with some of the Linux distribution cycles. I'd like to limit things to a single -rc that lasted only for about a week. This is enough time to fix most of the obvious issues I think. How do you plan to do it? I mean, are you going to create a separate branch or make master the -rc? Creating a separate branch (which is what we do today, iiuc) makes it get less attention, freezing master for a certain period is the best way to stabilize. Is this what you had in mind? What do people think? One reason I branch is because some people care a bit less about releases so it makes the process non-disruptive to them. If the other maintainers agreed though, I would certainly like to have the master branch essentially frozen for the week before the release. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Release plan for 0.12.0
Amit Shah wrote: On (Wed) Sep 30 2009 [08:04:17], Anthony Liguori wrote: Amit Shah wrote: On (Tue) Sep 29 2009 [18:54:53], Anthony Liguori wrote: o multiport virtio-console support Assuming we can get the kernel drivers straightened out, I think it's certainly reasonable for 0.12. The kernel drivers are in fine shape. I meant on track for including into the appropriate tree. Looking for an Ack/Nack from Rusty. That's been the general policy for all virtio changes btw. Nothing specific to virtio-console. Amit -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Release plan for 0.12.0
On (Wed) Sep 30 2009 [09:47:22], Anthony Liguori wrote: Amit Shah wrote: On (Wed) Sep 30 2009 [08:04:17], Anthony Liguori wrote: Amit Shah wrote: On (Tue) Sep 29 2009 [18:54:53], Anthony Liguori wrote: o multiport virtio-console support Assuming we can get the kernel drivers straightened out, I think it's certainly reasonable for 0.12. The kernel drivers are in fine shape. I meant on track for including into the appropriate tree. Looking for an Ack/Nack from Rusty. That's been the general policy for all virtio changes btw. Nothing specific to virtio-console. That's fine. Amit -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virt-install: hda disks?
On 09/30/2009 10:28 AM, James Brackinshaw wrote: Hi, Not sure if this is the right place to ask this. virt-install questions should be directed to virt-tools-l...@redhat.com I'm getting hda disks by default with kvm under RHEL5.4 using virt-install. This seems an odd default. Is there a reason for hda disks over sda disks? Can I change this? virt-install/libvirt defaults to IDE for disk devices (as does directly launching qemu or kvm). These disks will show up in a RHEL5 guest as /dev/hda, etc. In newer distros, these disks show up as /dev/sda, etc. It's just a matter of the RHEL5 stack being older than the hdX - sdX change. If you want to use scsi disks via virt-install, you can use: virt-install --disk ...,bus=scsi Though AIUI it's generally considered buggy at the qemu level, and may even be disabled in RHEL5.4 - Cole -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virt-install: hda disks?
On Wed, Sep 30, 2009 at 4:51 PM, Cole Robinson crobi...@redhat.com wrote: On 09/30/2009 10:28 AM, James Brackinshaw wrote: Hi, Not sure if this is the right place to ask this. virt-install questions should be directed to virt-tools-l...@redhat.com Thanks. I'm getting hda disks by default with kvm under RHEL5.4 using virt-install. This seems an odd default. Is there a reason for hda disks over sda disks? Can I change this? virt-install/libvirt defaults to IDE for disk devices (as does directly launching qemu or kvm). These disks will show up in a RHEL5 guest as /dev/hda, etc. In newer distros, these disks show up as /dev/sda, etc. It's just a matter of the RHEL5 stack being older than the hdX - sdX change. If you want to use scsi disks via virt-install, you can use: virt-install --disk ...,bus=scsi Though AIUI it's generally considered buggy at the qemu level, and may even be disabled in RHEL5.4 - Cole Is virtio stable and recommended? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virt-install: hda disks?
On 09/30/2009 11:11 AM, James Brackinshaw wrote: On Wed, Sep 30, 2009 at 4:51 PM, Cole Robinson crobi...@redhat.com wrote: On 09/30/2009 10:28 AM, James Brackinshaw wrote: Hi, Not sure if this is the right place to ask this. virt-install questions should be directed to virt-tools-l...@redhat.com Thanks. I'm getting hda disks by default with kvm under RHEL5.4 using virt-install. This seems an odd default. Is there a reason for hda disks over sda disks? Can I change this? virt-install/libvirt defaults to IDE for disk devices (as does directly launching qemu or kvm). These disks will show up in a RHEL5 guest as /dev/hda, etc. In newer distros, these disks show up as /dev/sda, etc. It's just a matter of the RHEL5 stack being older than the hdX - sdX change. If you want to use scsi disks via virt-install, you can use: virt-install --disk ...,bus=scsi Though AIUI it's generally considered buggy at the qemu level, and may even be disabled in RHEL5.4 - Cole Is virtio stable and recommended? Sorry, forgot about that. If you are installing a RHEL5.4 guest, use virt-install --disk ...,model=virtio or virt-install --os-variant virtio26 which will take care of disk and networking defaults. - Cole -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virt-install: hda disks?
virt-install --os-variant virtio26 which will take care of disk and networking defaults. - Cole Ah. For networking, is this in addition to, or instead of model type=e1000 / ? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virt-install: hda disks?
On 09/30/2009 11:18 AM, James Brackinshaw wrote: virt-install --os-variant virtio26 which will take care of disk and networking defaults. - Cole Ah. For networking, is this in addition to, or instead of model type=e1000 / ? Instead. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Release plan for 0.12.0
On Wed, 30 Sep 2009 17:03:23 +0200 Fred Leeflang fr...@dutchie.org wrote: 2009/9/30 Anthony Liguori aligu...@us.ibm.com Luiz Capitulino wrote: On Tue, 29 Sep 2009 18:54:53 -0500 Anthony Liguori aligu...@us.ibm.com wrote: I think aiming for early to mid-December would give us roughly a 3 month cycle and would align well with some of the Linux distribution cycles. I'd like to limit things to a single -rc that lasted only for about a week. This is enough time to fix most of the obvious issues I think. How do you plan to do it? I mean, are you going to create a separate branch or make master the -rc? Creating a separate branch (which is what we do today, iiuc) makes it get less attention, freezing master for a certain period is the best way to stabilize. Is this what you had in mind? What do people think? One reason I branch is because some people care a bit less about releases so it makes the process non-disruptive to them. If the other maintainers agreed though, I would certainly like to have the master branch essentially frozen for the week before the release. freezing is only neccesary if you need time to gather all the patches, build and test them together etc. Not exactly, freezing is done to stop/slowdown writing new code and focus on bug fixing for a period of time. This is not only needed for a release, but projects should always try to find the best balance between 'number of bugs' and 'feature addition rate'. If you don't feel you or the developers need to do that to get a reliable release out I think it only halts developers without any clear reason to do so. Calling 'attention' to a release is not a clear reason IMO. Having a functional and relatively stable release is not only important, but it's the ultimate goal IMO. Obviously we should take care not to take extremes. No QEMU release will be 100% bug free, that's why we have stables. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix last 2 KR prototyes
On Wed, Sep 30, 2009 at 01:07:27AM +0200, Juan Quintela wrote: Rest of cases are already fixed qemu-upstream Signed-off-by: Juan Quintela quint...@redhat.com --- hw/device-assignment.c |2 +- qemu-kvm.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Fix task switch back link handling (take 2)
Now, also remove pre_task_link setting in save_state_to_tss16. commit b237ac37a149e8b56436fabf093532483bff13b0 Author: Gleb Natapov g...@redhat.com Date: Mon Mar 30 16:03:24 2009 +0300 KVM: Fix task switch back link handling. CC: Gleb Natapov g...@redhat.com Signed-off-by: Juan Quintela quint...@redhat.com --- arch/x86/kvm/x86.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fedac9d..e5ed2cd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4196,7 +4196,6 @@ static void save_state_to_tss16(struct kvm_vcpu *vcpu, tss-ss = get_segment_selector(vcpu, VCPU_SREG_SS); tss-ds = get_segment_selector(vcpu, VCPU_SREG_DS); tss-ldt = get_segment_selector(vcpu, VCPU_SREG_LDTR); - tss-prev_task_link = get_segment_selector(vcpu, VCPU_SREG_TR); } static int load_state_from_tss16(struct kvm_vcpu *vcpu, -- 1.6.2.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4: kvm 1/4] Code motion. Separate timer intialization into an indepedent function.
On 09/29/2009 10:45 PM, Avi Kivity wrote: On 09/29/2009 11:38 PM, Zachary Amsden wrote: Signed-off-by: Zachary Amsdenzams...@redhat.com Looks good. Is anything preventing us from unifying the constant_tsc and !same paths? We could just do a quick check in the notifier, see the tsc frequency hasn't changed, and return. Actually, yes. On constant_tsc processors, the processor frequency may still change, however the TSC frequency does not change with it. I actually have both of these kinds of processors (freq changes with constant TSC and freq changes with variable TSC) so I was able to test both of these cases. Zach -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4: kvm 1/4] Code motion. Separate timer intialization into an indepedent function.
On 09/30/2009 05:51 PM, Zachary Amsden wrote: Is anything preventing us from unifying the constant_tsc and !same paths? We could just do a quick check in the notifier, see the tsc frequency hasn't changed, and return. Actually, yes. On constant_tsc processors, the processor frequency may still change, however the TSC frequency does not change with it. I actually have both of these kinds of processors (freq changes with constant TSC and freq changes with variable TSC) so I was able to test both of these cases. If the API allows us to query the tsc frequency, it would simply return the same values in all cases, which we'd ignore. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm tuning guide
On Wed, Sep 30, 2009 at 08:20:35AM +0200, Avi Kivity wrote: On 09/30/2009 07:09 AM, Nikola Ciprich wrote: The default, IDE, is highly supported by guests but may be slow, especially with disk arrays. If your guest supports it, use the virtio interface: Avi, what is the status of data integrity issues Chris Hellwig summarized some time ago? I don't know. Christoph? On the qemu side everything is in git HEAD now, but I'm not sure about the qemu-0.11 release as I haven't really followed it. For the guest kernel the virtio cache flush support is now in mainline (past-2.6.31). For the host kernel side about 2/3 of the fixes are now in mainline (past-2.6.31) with the others hopefully getting in this merge window. Is it safe to recommend virtio to newbies already? I think so. I wouldn't. At least not for people caring about their data. It will take a while to promote the guest side fixes to all the interesting guests. IDE has the major advantage that cache flush support has been around in the guest driver for a long time so we only need to fix the host side which is a lot easier. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4: kvm 1/4] Code motion. Separate timer intialization into an indepedent function.
On 09/30/2009 05:56 AM, Avi Kivity wrote: On 09/30/2009 05:51 PM, Zachary Amsden wrote: If the API allows us to query the tsc frequency, it would simply return the same values in all cases, which we'd ignore. The API only allows querying the processor frequency. In the constant_tsc case, the highest processor frequency is likely going to be the actual TSC frequency, but I don't think it's a guarantee; theoretically, it could be faster on normal hardware ... or slower on overclocked hardware with an externally clocked TSC. Zach -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4: kvm 1/4] Code motion. Separate timer intialization into an indepedent function.
On 09/30/2009 06:11 AM, Avi Kivity wrote: On 09/30/2009 06:06 PM, Zachary Amsden wrote: On 09/30/2009 05:56 AM, Avi Kivity wrote: On 09/30/2009 05:51 PM, Zachary Amsden wrote: If the API allows us to query the tsc frequency, it would simply return the same values in all cases, which we'd ignore. The API only allows querying the processor frequency. In the constant_tsc case, the highest processor frequency is likely going to be the actual TSC frequency, but I don't think it's a guarantee; theoretically, it could be faster on normal hardware ... or slower on overclocked hardware with an externally clocked TSC. Well we could add a new API then (or a new tscfreq notifier). Those conditionals don't belong in client code. It's possible... but it's also possible to run without cpufreq enabled, which won't work properly unless the cpufreq code is aware of the measured tsc_khz... this could be a little ugly architecture wise given the big melting pot of generic code and vendor / arch specific code here. Since we're already very hardware dependent and one of the few clients who care, it seems okay to leave it as is for now. Zach -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
On Wed, Sep 30, 2009 at 09:01:51AM +0800, Zhai, Edwin wrote: Avi, I modify it according your comments. The only thing I want to keep is the module param ple_gap/window. Although they are not per-guest, they can be used to find the right value, and disable PLE for debug purpose. Thanks, Avi Kivity wrote: On 09/28/2009 11:33 AM, Zhai, Edwin wrote: Avi Kivity wrote: +#define KVM_VMX_DEFAULT_PLE_GAP41 +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096 +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP; +module_param(ple_gap, int, S_IRUGO); + +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW; +module_param(ple_window, int, S_IRUGO); Shouldn't be __read_mostly since they're read very rarely (__read_mostly should be for variables that are very often read, and rarely written). In general, they are read only except that experienced user may try different parameter for perf tuning. __read_mostly doesn't just mean it's read mostly. It also means it's read often. Otherwise it's just wasting space in hot cachelines. I'm not even sure they should be parameters. For different spinlock in different OS, and for different workloads, we need different parameter for tuning. It's similar as the enable_ept. No, global parameters don't work for tuning workloads and guests since they cannot be modified on a per-guest basis. enable_ept is only useful for debugging and testing. +set_current_state(TASK_INTERRUPTIBLE); +schedule_hrtimeout(expires, HRTIMER_MODE_ABS); + Please add a tracepoint for this (since it can cause significant change in behaviour), Isn't trace_kvm_exit(exit_reason, ...) enough? We can tell the PLE vmexit from other vmexits. Right. I thought of the software spinlock detector, but that's another problem. I think you can drop the sleep_time parameter, it can be part of the function. Also kvm_vcpu_sleep() is confusing, we also sleep on halt. Please call it kvm_vcpu_on_spin() or something (since that's what the guest is doing). kvm_vcpu_on_spin() should add the vcpu to vcpu-wq (so a new pending interrupt wakes it up immediately). Do you (and/or Mark) have any numbers for non-vcpu overcommited guests? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: migrate_set_downtime bug
On Wed, Sep 30, 2009 at 04:11:32PM +0200, Dietmar Maurer wrote: On Wed, Sep 30, 2009 at 10:55:24AM +0200, Dietmar Maurer wrote: Another problem occur when max_downtime is too short. This can results in never ending migration task. To reproduce just play a video inside a VM and set max_downtime to 30ns Sure, one can argument that this behavior is expected. But the following would avoid the problem: +if ((stage == 2) (bytes_transferred 2*ram_bytes_total())) { +return 1; +} why 2 * ? This means we'll have to transfer the whole contents of RAM at least twice to hit this condition, right? Yes, this is just an arbitrary limit. I don't know. If we are going for a limit, I would prefere a limit of pages yet to transfer, not pages already transferred. However, the very reason this whole thing was written in the first place, was to leave choices to management tools ontop of qemu, not qemu itself. So I would say yes, if you set limit for 30ns, you asked for it never finishing. Your first patch is okay, tough. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Release plan for 0.12.0
Anthony Liguori aligu...@us.ibm.com wrote: Luiz Capitulino wrote: On Tue, 29 Sep 2009 18:54:53 -0500 Anthony Liguori aligu...@us.ibm.com wrote: I think aiming for early to mid-December would give us roughly a 3 month cycle and would align well with some of the Linux distribution cycles. I'd like to limit things to a single -rc that lasted only for about a week. This is enough time to fix most of the obvious issues I think. How do you plan to do it? I mean, are you going to create a separate branch or make master the -rc? Creating a separate branch (which is what we do today, iiuc) makes it get less attention, freezing master for a certain period is the best way to stabilize. Is this what you had in mind? What do people think? One reason I branch is because some people care a bit less about releases so it makes the process non-disruptive to them. If the other maintainers agreed though, I would certainly like to have the master branch essentially frozen for the week before the release. I am not a maintainer, but I still think that it is a good idea :) Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: migrate_set_downtime bug
+if ((stage == 2) (bytes_transferred 2*ram_bytes_total())) { +return 1; +} why 2 * ? This means we'll have to transfer the whole contents of RAM at least twice to hit this condition, right? Yes, this is just an arbitrary limit. I don't know. If we are going for a limit, I would prefere a limit of pages yet to transfer, not pages already transferred. However, the very reason this whole thing was written in the first place, was to leave choices to management tools ontop of qemu, not qemu itself. So I would say yes, if you set limit for 30ns, you asked for it never finishing. I just think of common scenarios like 'maintanace mode', where all VM should migrate to another host. A endless migrate task can make that fail. For me, it is totally unclear what value I should set for 'max_downtime' to avoid that behavior? - Dietmar -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream
On 09/30/09 15:59, Mark McLoughlin wrote: I'm planning on adding -hostnet and -nic arguments, which would not use vlans by default but rather connect the nic directly to the host side. No new -nic argument please. We should just finalize the qdev-ifycation of the nic drivers, then you'll do either -device e1000,vlan=nr or -device e1000,hostnet=name and be done with it. cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Release plan for 0.12.0
On Wed, Sep 30, 2009 at 6:59 PM, Carl-Daniel Hailfinger c-d.hailfinger.devel.2...@gmx.net wrote: On 30.09.2009 15:07, Anthony Liguori wrote: Carl-Daniel Hailfinger wrote: However, to run coreboot on Qemu with the same init sequence as on simplified real hardware, we need Cache-as-RAM (CAR) support. [...] Do we really need coreboot to use the same init sequence? coreboot is firmware and we don't necessarily run real firmware under QEMU. It's a short cut that lets us avoid a lot of complexity. I know that some people were running 440BX BIOS images for real hardware on Qemu and they got pretty far. The complexity would be limited to the MTRR code and unless there were major architectural changes in mapping RAM to address ranges, no other code (except VM save and VM restore) should get even a single line changed. Right now coreboot sets up the MTRRs correctly, but then (conditional on Qemu) only uses areas which are known to be backed by RAM instead of the areas designated by CAR. I'd like to implement CAR support which builds on top of my MTRR code which was merged some months ago (and I already have code to check for total cacheable area size), but I need help with the memory mapping stuff. How do I proceed? Clean up what I have and insert FIXME comments where I don't know how to implement stuff so others can see the code and comment on it? You could start there. But from a higher level, I'm not sure I think a partial implementation of something like CAR is all that valuable since coreboot already runs under QEMU. It only runs if WORKAROUND_QEMU is defined (maybe not exactly that name, but you get the point). The code in coreboot calculates MTRR settings to cover the place where the stack will be. To workaround missing CAR in Qemu, it then has to recalculate the stack location to be able to actually use the stack. That forces coreboot to keep two stack base variables and to completely replace the generic logic which switches off CAR. I hope the explanation above didn't offend you, I just tried to clarify why working CAR is such a big deal for coreboot. If you want either a full CAR implementation or no CAR implementation, I can write a patch which implements full CAR, but then I need to hook WBINVD, INVD and CLFLUSH. Neither instruction is executed often enough to show up in any profile. Besides that, for anything not using CAR (everything after the firmware), the penalty is a simple test of a boolean variable per WBINVD/INVD/CLFLUSH. The CAR mode could affect only translation so that special CAR versions of the WBINVD etc. instructions are selected. On switch to normal mode, the TBs need to be flushed. Instead of your memory mapping approach (which should work) you could also try using different memory access functions in CAR mode. It may be more difficult, though. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Release plan for 0.12.0
On 09/30/09 16:45, Anthony Liguori wrote: One reason I branch is because some people care a bit less about releases so it makes the process non-disruptive to them. If the other maintainers agreed though, I would certainly like to have the master branch essentially frozen for the week before the release. We had much longer disruptions without a release freeze, so why worry about a single week? One week freeze is short enougth that the disruption isn't a big issue. It will help testing the to-be-released code. Go for it. cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
Avi Kivity wrote: On 09/26/2009 12:32 AM, Gregory Haskins wrote: I realize in retrospect that my choice of words above implies vbus _is_ complete, but this is not what I was saying. What I was trying to convey is that vbus is _more_ complete. Yes, in either case some kind of glue needs to be written. The difference is that vbus implements more of the glue generally, and leaves less required to be customized for each iteration. No argument there. Since you care about non-virt scenarios and virtio doesn't, naturally vbus is a better fit for them as the code stands. Thanks for finally starting to acknowledge there's a benefit, at least. I think I've mentioned vbus' finer grained layers as helpful here, though I doubt the value of this. Hypervisors are added rarely, while devices and drivers are added (and modified) much more often. I don't buy the anything-to-anything promise. The ease in which a new hypervisor should be able to integrate into the stack is only one of vbus's many benefits. To be more precise, IMO virtio is designed to be a performance oriented ring-based driver interface that supports all types of hypervisors (e.g. shmem based kvm, and non-shmem based Xen). vbus is designed to be a high-performance generic shared-memory interconnect (for rings or otherwise) framework for environments where linux is the underpinning host (physical or virtual). They are distinctly different, but complementary (the former addresses the part of the front-end, and latter addresses the back-end, and a different part of the front-end). They're not truly complementary since they're incompatible. No, that is incorrect. Not to be rude, but for clarity: Complementary \Com`ple*menta*ry\, a. Serving to fill out or to complete; as, complementary numbers. [1913 Webster] Citation: www.dict.org IOW: Something being complementary has nothing to do with guest/host binary compatibility. virtio-pci and virtio-vbus are both equally complementary to virtio since they fill in the bottom layer of the virtio stack. So yes, vbus is truly complementary to virtio afaict. A 2.6.27 guest, or Windows guest with the existing virtio drivers, won't work over vbus. Binary compatibility with existing virtio drivers, while nice to have, is not a specific requirement nor goal. We will simply load an updated KMP/MSI into those guests and they will work again. As previously discussed, this is how more or less any system works today. It's like we are removing an old adapter card and adding a new one to uprev the silicon. Further, non-shmem virtio can't work over vbus. Actually I misspoke earlier when I said virtio works over non-shmem. Thinking about it some more, both virtio and vbus fundamentally require shared-memory, since sharing their metadata concurrently on both sides is their raison d'être. The difference is that virtio utilizes a pre-translation/mapping (via -add_buf) from the guest side. OTOH, vbus uses a post translation scheme (via memctx) from the host-side. If anything, vbus is actually more flexible because it doesn't assume the entire guest address space is directly mappable. In summary, your statement is incorrect (though it is my fault for putting that idea in your head). Since virtio is guest-oriented and host-agnostic, it can't ignore non-shared-memory hosts (even though it's unlikely virtio will be adopted there) Well, to be fair no one said it has to ignore them. Either virtio-vbus transport is present and available to the virtio stack, or it isn't. If its present, it may or may not publish objects for consumption. Providing a virtio-vbus transport in no way limits or degrades the existing capabilities of the virtio stack. It only enhances them. I digress. The whole point is moot since I realized that the non-shmem distinction isn't accurate anyway. They both require shared-memory for the metadata, and IIUC virtio requires the entire address space to be mappable whereas vbus only assumes the metadata is. In addition, the kvm-connector used in AlacrityVM's design strives to add value and improve performance via other mechanisms, such as dynamic allocation, interrupt coalescing (thus reducing exit-ratio, which is a serious issue in KVM) Do you have measurements of inter-interrupt coalescing rates (excluding intra-interrupt coalescing). I actually do not have a rig setup to explicitly test inter-interrupt rates at the moment. Once things stabilize for me, I will try to re-gather some numbers here. Last time I looked, however, there were some decent savings for inter as well. Inter rates are interesting because they are what tends to ramp up with IO load more than intra since guest interrupt mitigation techniques like NAPI often quell intra-rates naturally. This is especially true for data-center, cloud, hpc-grid, etc, kind of workloads (vs vanilla desktops, etc) that tend to have multiple IO
INFO: task journal:337 blocked for more than 120 seconds
Hello all, Anybody found this problem before? I kept hitting this issue for 2.6.31 guest kernel even with a simple network test. INFO: task kjournal:337 blocked for more than 120 seconds. echo 0 /proc/sys/kernel/hung_task_timeout_sec disables this message. kjournald D 0041 0 337 2 0x My test is totally being blocked. Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm or qemu-kvm?
Ross Boylan wrote: http://www.linux-kvm.org/page/HOWTO1 says to build kvm I should get the latest kvm-release.tar.gz. http://www.linux-kvm.org/page/Downloads says If you want to use the latest version of KVM kernel modules and supporting userspace, you can download the latest version from http://sourceforge.net/project/showfiles.php?group_id=180599.; That page shows the latest version is qemu-kvm-0.11.0.tar.gz. The most recent kvm-release.tar.gz appears to be for kvm-88. So which file should I start from? If you don't know what you want, you want qemu-kvm, which is based off a stable release of qemu. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_x86_64_out_of_tree
The Buildbot has detected a new failure of disable_kvm_x86_64_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_out_of_tree/builds/29 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_x86_64_debian_5_0
The Buildbot has detected a new failure of disable_kvm_x86_64_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_debian_5_0/builds/80 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix warning in sync
Patch is self-explanatory commit 071a800cd07c2b9d13c7909aa99016d89a814ae6 Author: Zachary Amsden zams...@redhat.com Date: Wed Sep 30 17:03:16 2009 -1000 Remove warning due to kvm_mmu_notifier_change_pte being static Signed-off-by: Zachary Amsden zams...@redhat.com diff --git a/sync b/sync index b09f629..0bbd488 100755 --- a/sync +++ b/sync @@ -97,6 +97,9 @@ def __hack(data): line = '#include asm/types.h' if match(r'\t\.change_pte.*kvm_mmu_notifier_change_pte,'): line = '#ifdef MMU_NOTIFIER_HAS_CHANGE_PTE\n' + line + '\n#endif' +if match(r'static void kvm_mmu_notifier_change_pte'): +line = sub(r'static ', '', line) +line = '#ifdef MMU_NOTIFIER_HAS_CHANGE_PTE\n' + 'static\n' + '#endif\n' + line line = sub(r'\bhrtimer_init\b', 'hrtimer_init_p', line) line = sub(r'\bhrtimer_start\b', 'hrtimer_start_p', line) line = sub(r'\bhrtimer_cancel\b', 'hrtimer_cancel_p', line)
Re: buildbot failure in qemu-kvm on disable_kvm_x86_64_debian_5_0
On Thursday 01 October 2009 04:05:40 am qemu-...@buildbot.b1-systems.de wrote: The Buildbot has detected a new failure of disable_kvm_x86_64_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_debian_ 5_0/builds/80 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile Please ignore buildbot failure disable_kvm_x86_64_debian_5_0 (#80) and disable_kvm_x86_64_out_of_tree (#29) Two nightly builds got scheduled at the same time (disable_kvm and out-of- tree_disable_kvm) for the same buildslave ... which caused memory preasure in the tiny buildslave VM. Will change that: out-of-tree should get (nightly) build tested an hour later or so ... to avoid two builds at the same time. Best Regards, Daniel -- Daniel GollubGeschaeftsfuehrer: Ralph Dehner FOSS Developer Unternehmenssitz: Vohburg B1 Systems GmbH Amtsgericht: Ingolstadt Mobil: +49-(0)-160 47 73 970 Handelsregister: HRB 3537 EMail: gol...@b1-systems.de http://www.b1-systems.de Adresse: B1 Systems GmbH, Osterfeldstraße 7, 85088 Vohburg http://pgpkeys.pca.dfn.de/pks/lookup?op=getsearch=0xED14B95C2F8CA78D signature.asc Description: This is a digitally signed message part.
Re: linux-next: tree build failure
roel kluin roel.kl...@gmail.com 29.09.09 11:51 On Tue, Sep 29, 2009 at 11:28 AM, Jan Beulich jbeul...@novell.com wrote: Hollis Blanchard 09/29/09 2:00 AM First, I think there is a real bug here, and the code should read like this (to match the comment): /* type has to be known at build time for optimization */ -BUILD_BUG_ON(__builtin_constant_p(type)); +BUILD_BUG_ON(!__builtin_constant_p(type)); However, I get the same build error *both* ways, i.e. __builtin_constant_p(type) evaluates to both 0 and 1? Either that, or the new BUILD_BUG_ON() macro isn't working... No, at this point of the compilation process it's neither zero nor one, it's simply considered non-constant by the compiler at that stage (this builtin is used for optimization, not during parsing, and the error gets generated when the body of the function gets parsed, not when code gets generated from it). Jan then maybe if(__builtin_constant_p(type)) BUILD_BUG_ON(1); would work? Definitely not - this would result in the compiler *always* generating an error. Jan -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: tree build failure
Hollis Blanchard holl...@us.ibm.com 30.09.09 01:39 On Tue, 2009-09-29 at 10:28 +0100, Jan Beulich wrote: Hollis Blanchard 09/29/09 2:00 AM First, I think there is a real bug here, and the code should read like this (to match the comment): /* type has to be known at build time for optimization */ -BUILD_BUG_ON(__builtin_constant_p(type)); +BUILD_BUG_ON(!__builtin_constant_p(type)); However, I get the same build error *both* ways, i.e. __builtin_constant_p(type) evaluates to both 0 and 1? Either that, or the new BUILD_BUG_ON() macro isn't working... No, at this point of the compilation process it's neither zero nor one, it's simply considered non-constant by the compiler at that stage (this builtin is used for optimization, not during parsing, and the error gets generated when the body of the function gets parsed, not when code gets generated from it). I think I see what you're saying. Do you have a fix to suggest? The one Rusty suggested the other day may help here. I don't like it as a drop-in replacement for BUILD_BUG_ON() though (due to it deferring the error generated to the linking stage), I'd rather view this as an improvement to MAYBE_BUILD_BUG_ON() (which should then be used here). Jan -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
On 09/29/2009 10:17 AM, Alexander Graf wrote: KVM for PowerPC only supports embedded cores at the moment. While it makes sense to virtualize on small machines, it's even more fun to do so on big boxes. So I figured we need KVM for PowerPC64 as well. This patchset implements KVM support for Book3s_64 hosts and guest support for Book3s_64 and G3/G4. To really make use of this, you also need a recent version of qemu. Looks good to my non-ppc eyes. I'd like to see thus reviewed by the powerpc people, then it's good to go. TODO: - use MMU Notifiers What's the plan here? While not a requirement for merging, that's one of the kvm points of strength and I'd like to see it supported across the board. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
On 30.09.2009, at 10:42, Avi Kivity wrote: On 09/29/2009 10:17 AM, Alexander Graf wrote: KVM for PowerPC only supports embedded cores at the moment. While it makes sense to virtualize on small machines, it's even more fun to do so on big boxes. So I figured we need KVM for PowerPC64 as well. This patchset implements KVM support for Book3s_64 hosts and guest support for Book3s_64 and G3/G4. To really make use of this, you also need a recent version of qemu. Looks good to my non-ppc eyes. I'd like to see thus reviewed by the powerpc people, then it's good to go. TODO: - use MMU Notifiers What's the plan here? While not a requirement for merging, that's one of the kvm points of strength and I'd like to see it supported across the board. I'm having a deja vu :-). The plan is to get qemu ppc64 guest support in a shape where it can actually use the KVM support. As it is it's rather useless. When we have that, a PV interface would be needed to get things fast and then the next thing on my list is the MMU notifiers. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
On 09/30/2009 10:47 AM, Alexander Graf wrote: What's the plan here? While not a requirement for merging, that's one of the kvm points of strength and I'd like to see it supported across the board. I'm having a deja vu :-). Will probably get one on every repost. The plan is to get qemu ppc64 guest support in a shape where it can actually use the KVM support. As it is it's rather useless. When we have that, a PV interface would be needed to get things fast and then the next thing on my list is the MMU notifiers. Um. How slow is it today? What paths are problematic? mmu, context switch? Our experience with pv on x86 has been mostly negative. It's not trivial to get security right, it ended up slower than non-pv, and hardware obsoleted it fairly quickly. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4
On 30.09.2009, at 10:59, Avi Kivity wrote: On 09/30/2009 10:47 AM, Alexander Graf wrote: What's the plan here? While not a requirement for merging, that's one of the kvm points of strength and I'd like to see it supported across the board. I'm having a deja vu :-). Will probably get one on every repost. Yippie :) The plan is to get qemu ppc64 guest support in a shape where it can actually use the KVM support. As it is it's rather useless. When we have that, a PV interface would be needed to get things fast and then the next thing on my list is the MMU notifiers. Um. How slow is it today? What paths are problematic? mmu, context switch? Instruction emulation. X86 with virtualization extensions doesn't trap often, as most of the state can be safely handled within the guest mode. Now with PPC we're basically running in ring 3 (called problem state in ppc speech) which traps all the time because guests change the IF or access some SPRs that we don't really need to trap on, but only need to sync state with on #VMEXIT. So the PV idea here is to have a shared page between host and guest that contains guest specific SPRs and other state (an MSR shadow for example). That way the guest can patch itself to use that shared page and KVM always knows about the most current state on #VMEXIT. At the same time we're reducing exits by a _lot_. A short kvm_stat during boot of a ppc32 guest on ppc64 shows what I'm talking about: dec 3224 168 exits 18957500 1037240 ext_intr75 5 halt_wakeup 6874 0 inst_emu 8570503 818597 ld 0 0 ld_slow 0 0 mmio 8719444 26249 pf_instruc 302572 35379 pf_storage 9215970 86750 queue_intr 354020 31482 sig 7244 188 sp_instruc 302541 35365 sp_storage 370002 45370 st 0 0 st_slow 0 0 sysc 579075342 As you can see the bulk of exits are from MMIO and emulation. We certainly won't be able to get rid of all the emulation exits, but quite a bunch of them aren't really that useful. For MMIO we'll hopefully be able to use virtio. Our experience with pv on x86 has been mostly negative. It's not trivial to get security right, it ended up slower than non-pv, and hardware obsoleted it fairly quickly. Yes, and I really don't want to overdo it. PV for mfmsr/mtmsr and mfspr/mtspr is really necessary. X86 simply has that in hardware. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 26/27] Enable 32bit dirty log pointers on 64bit host
On 09/30/2009 02:04 PM, Arnd Bergmann wrote: On Tuesday 29 September 2009, Avi Kivity wrote: r = -EINVAL; if (log-slot= KVM_MEMORY_SLOTS) @@ -718,8 +719,15 @@ int kvm_get_dirty_log(struct kvm *kvm, for (i = 0; !any i n/sizeof(long); ++i) any = memslot-dirty_bitmap[i]; +#if defined(__BIG_ENDIAN) defined(CONFIG_64BIT) + /* Need to convert user pointers */ + if (test_thread_flag(TIF_32BIT)) + target_bm = (void*)((u64)log-dirty_bitmap 32); + else +#endif + target_bm = log-dirty_bitmap; r = -EFAULT; - if (copy_to_user(log-dirty_bitmap, memslot-dirty_bitmap, n)) + if (copy_to_user(target_bm, memslot-dirty_bitmap, n)) goto out; if (any) Ah, that's much better. Plus a mental note not to put pointers in user-visible structures in the future. This can serve as a reminder :) It's still broken on s390, which 1. uses TIF_31BIT instead of TIF_32BIT 2. needs to call compat_ptr() to do a real conversion instead of a cast The TIF_32BIT method is also not reliable. E.g. on x86_64 you are supposed to get the 32 bit ABI when calling through INT80 instead of syscall/sysenter, independent of the value of TIF_32BIT. A better way to do this is to add a separate compat_ioctl() method that converts this for you. The patch below is an example for the canonical way to do this. Not tested! Signed-off-by: Arnd Bergmanna...@arndb.de --- diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 897bff3..20f88ad 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2297,6 +2297,49 @@ out: return r; } +#ifdef CONFIG_COMPAT +struct compat_kvm_dirty_log { + __u32 slot; + __u32 padding1; + union { + compat_uptr_t dirty_bitmap; /* one bit per page */ + __u64 padding2; + }; +}; + +static long kvm_vm_compat_ioctl(struct file *filp, + unsigned int ioctl, unsigned long arg) +{ + struct kvm *kvm = filp-private_data; + int r; + + if (kvm-mm != current-mm) + return -EIO; + switch (ioctl) { + case KVM_GET_DIRTY_LOG: { + struct compat_kvm_dirty_log compat_log; + struct kvm_dirty_log log; + + r = -EFAULT; + if (copy_from_user(compat_log, (void __user *)arg, sizeof log)) + goto out; + log.slot = compat_log.slot; + log.padding1 = compat_log.padding1; + log.padding2 = compat_log.padding2; + log.dirty_bitmap = compat_ptr(compat_log.dirty_bitmap); + + r = kvm_vm_ioctl_get_dirty_log(kvm,log.log); + if (r) + goto out; + break; + default: + r = kvm_vm_ioctl(filp, ioctl, arg); + } + + return r; +} +#endif + static int kvm_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) { struct page *page[1]; @@ -2331,7 +2374,7 @@ static int kvm_vm_mmap(struct file *file, struct vm_area_struct *vma) static struct file_operations kvm_vm_fops = { .release= kvm_vm_release, .unlocked_ioctl = kvm_vm_ioctl, - .compat_ioctl = kvm_vm_ioctl, + .compat_ioctl = kvm_vm_compat_ioctl, .mmap = kvm_vm_mmap, }; This is a bit painful - I tried to avoid compat_ioctl. Maybe it's better to have dirty_bitmap_virt, given no existing users are impacted. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 26/27] Enable 32bit dirty log pointers on 64bit host
On 09/30/2009 03:17 PM, Avi Kivity wrote: { struct page *page[1]; @@ -2331,7 +2374,7 @@ static int kvm_vm_mmap(struct file *file, struct vm_area_struct *vma) static struct file_operations kvm_vm_fops = { .release= kvm_vm_release, .unlocked_ioctl = kvm_vm_ioctl, -.compat_ioctl = kvm_vm_ioctl, +.compat_ioctl = kvm_vm_compat_ioctl, .mmap = kvm_vm_mmap, }; static int kvm_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) This is a bit painful - I tried to avoid compat_ioctl. Maybe it's better to have dirty_bitmap_virt, given no existing users are impacted. But that misses compat_ptr(). So it looks like we'll need compat_ioctl. Patch looks fine, except s/log.log/log/. I'd also sizeof(compat_log) instead of sizeof(log) to avoid frightening reviewers. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html