Re: kvm tuning guide

2009-09-30 Thread Avi Kivity

On 09/30/2009 07:09 AM, Nikola Ciprich wrote:

The default, IDE, is highly supported by guests but may be slow, especially with 
disk arrays. If your guest supports it, use the virtio interface:
Avi,
what is the status of data integrity issues Chris Hellwig summarized some time 
ago?
   


I don't know.  Christoph?


Is it safe to recommend virtio to newbies already?


I think so.


Shouldn't SCSI
be safer (where applicable)?
   


SCSI suffers from being untested, and I think doesn't truly offer the 
parallelism it appears to.



nik


On Tue, Sep 29, 2009 at 07:30:55PM +0200, Avi Kivity wrote:
   

I wrote a short tuning guide for kvm,
http://www.linux-kvm.org/page/Tuning_KVM.  It should all be well known
to the list, but a newbie is born every minute.  Please review and
expand!

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

 
   



--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream

2009-09-30 Thread Avi Kivity

On 09/29/2009 10:45 PM, Mark McLoughlin wrote:

On Tue, 2009-05-05 at 09:56 +0100, Mark McLoughlin wrote:
   

This commit:

commit 559a8f45f34cc50d1a60b4f67a06614d506b2e01
Subject: Remove stray GSO code from virtio_net (Mark McLoughlin)

Removed some GSO code from upstream qemu.git, but it needs to
be re-instated in qemu-kvm.git.

Reported-by: Sridhar Samudralas...@us.ibm.com
Signed-off-by: Mark McLoughlinmar...@redhat.com
---
  hw/virtio-net.c |5 +
  1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index ac8e030..e5d7add 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -424,6 +424,11 @@ static int receive_filter(VirtIONet *n, const uint8_t 
*buf, int size)
  if (n-promisc)
  return 1;

+#ifdef TAP_VNET_HDR
+if (tap_has_vnet_hdr(n-vc-vlan-first_client))
+ptr += sizeof(struct virtio_net_hdr);
+#endif
+
  if (!memcmp(ptr[12], vlan, sizeof(vlan))) {
  int vid = be16_to_cpup((uint16_t *)(ptr + 14))  0xfff;
  if (!(n-vlans[vid  5]  (1U  (vid  0x1f
 

I'm not sure[1] how we didn't notice, but this has been broken on the
stable-0.10 branch since 0.10.3; please apply there too

   


Thanks, we'll queue it on stable-0.10.

Anthony/Glauber, is 0.10.7 in the works?  If not, we'll release it as 
0.10.6.1.



--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting

2009-09-30 Thread Avi Kivity

On 09/30/2009 03:01 AM, Zhai, Edwin wrote:

Avi,
I modify it according your comments. The only thing I want to keep is 
the module param ple_gap/window.  Although they are not per-guest, 
they can be used to find the right value, and disable PLE for debug 
purpose.


Fair enough, ACK.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Release plan for 0.12.0

2009-09-30 Thread Avi Kivity

On 09/30/2009 01:54 AM, Anthony Liguori wrote:

Hi,

Now that 0.11.0 is behind us, it's time to start thinking about 0.12.0.

I'd like to do a few things different this time around.  I don't think 
the -rc process went very well as I don't think we got more testing 
out of it.  I'd like to shorten the timeline for 0.12.0 a good bit.  
The 0.10 stable tree got pretty difficult to maintain toward the end 
of the cycle.  We also had a pretty huge amount of change between 0.10 
and 0.11 so I think a shorter cycle is warranted.


I think aiming for early to mid-December would give us roughly a 3 
month cycle and would align well with some of the Linux distribution 
cycles.  I'd like to limit things to a single -rc that lasted only for 
about a week.  This is enough time to fix most of the obvious issues I 
think.


I'd also like to try to enumerate some features for this release.  
Here's a short list of things I expect to see for this release 
(target-i386 centric).  Please add or comment on items that you'd 
either like to see in the release or are planning on working on.


o VMState conversion -- I expect most of the pc target to be completed
o qdev conversion -- I hope that we'll get most of the pc target 
completely converted to qdev

o storage live migration
o switch to SeaBIOS (need to finish porting features from Bochs)
o switch to gPXE (need to resolve slirp tftp server issue)
o KSM integration
o in-kernel APIC support for KVM
o guest SMP support for KVM
o updates to the default pc machine type


Machine monitor protocol.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: migrate_set_downtime bug

2009-09-30 Thread Dietmar Maurer
 Since the problem you pinpointed do exist, I would suggest measuring
 the average load of the last,
 say, 10 iterations.

The last 10 interation does not define a fixed time. I guess it is much more 
reasonable to measure the average of the last '10 seconds'.

But usually a migration only takes about 10-30 seconds. So do you really want 
to add additional complexity?

- Dietmar

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/47] KVM: x86: Disallow hypercalls for guest callers in rings 0

2009-09-30 Thread Jan Lübbe
Hi!

On Wed, 2009-08-26 at 13:29 +0300, Avi Kivity wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 
 So far unprivileged guest callers running in ring 3 can issue, e.g., MMU
 hypercalls. Normally, such callers cannot provide any hand-crafted MMU
 command structure as it has to be passed by its physical address, but
 they can still crash the guest kernel by passing random addresses.
 
 To close the hole, this patch considers hypercalls valid only if issued
 from guest ring 0. This may still be relaxed on a per-hypercall base in
 the future once required.

Does kvm-72 (used by Debian and Ubuntu in stable releases) have the
problem? If yes, would the approach in this fix also work there?

Thanks,
Jan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add two parameters for wait_for_login

2009-09-30 Thread Yolkfull Chow
Sometimes we need login to guest using different start_time and step_time.

Signed-off-by: Yolkfull Chow yz...@redhat.com
---
 client/tests/kvm/kvm_test_utils.py |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/client/tests/kvm/kvm_test_utils.py 
b/client/tests/kvm/kvm_test_utils.py
index 601b350..0983003 100644
--- a/client/tests/kvm/kvm_test_utils.py
+++ b/client/tests/kvm/kvm_test_utils.py
@@ -43,7 +43,7 @@ def get_living_vm(env, vm_name):
 return vm
 
 
-def wait_for_login(vm, nic_index=0, timeout=240):
+def wait_for_login(vm, nic_index=0, timeout=240, start=0, step=2):
 
 Try logging into a VM repeatedly.  Stop on success or when timeout expires.
 
@@ -54,8 +54,8 @@ def wait_for_login(vm, nic_index=0, timeout=240):
 
 logging.info(Waiting for guest '%s' to be up... % vm.name)
 session = kvm_utils.wait_for(lambda: vm.remote_login(nic_index=nic_index),
- timeout, 0, 2)
+ timeout, start, step)
 if not session:
 raise error.TestFail(Could not log into guest '%s' % vm.name)
-logging.info(Logged in)
+logging.info(Logged in '%s' % vm.name)
 return session
-- 
1.6.2.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4: kvm 1/4] Code motion. Separate timer intialization into an indepedent function.

2009-09-30 Thread Avi Kivity

On 09/29/2009 11:38 PM, Zachary Amsden wrote:

Signed-off-by: Zachary Amsdenzams...@redhat.com
   



Looks good.

Is anything preventing us from unifying the constant_tsc and !same 
paths?  We could just do a quick check in the notifier, see the tsc 
frequency hasn't changed, and return.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Release plan for 0.12.0

2009-09-30 Thread Michael Tokarev

Anthony Liguori wrote:
[]
Here's a short list of things I expect to see for this release 
(target-i386 centric).  Please add or comment on items that you'd either 
like to see in the release or are planning on working on.

[..]

o guest SMP support for KVM


Hmm.  What is this, can you elaborate a bit more please?
-smp nn is already here, no?

Thanks!

/mjt

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: migrate_set_downtime bug

2009-09-30 Thread Dietmar Maurer
Another problem occur when max_downtime is too short. This can results in never 
ending migration task.

To reproduce just play a video inside a VM and set max_downtime to 30ns

Sure, one can argument that this behavior is expected.

But the following would avoid the problem:

+if ((stage == 2)  (bytes_transferred  2*ram_bytes_total())) {
+return 1;
+}

Or do you think that is not reasonable?

- Dietmar

 -Original Message-
 From: Glauber Costa [mailto:glom...@redhat.com]
 Sent: Mittwoch, 30. September 2009 06:49
 To: Dietmar Maurer
 Cc: Anthony Liguori; kvm
 Subject: Re: migrate_set_downtime bug
 
 On Tue, Sep 29, 2009 at 06:36:57PM +0200, Dietmar Maurer wrote:
   Also, if this is really the case (buffered), then the bandwidth
 capping
   part
   of migration is also wrong.
  
   Have you compared the reported bandwidth to your actual bandwith ?
 I
   suspect
   the source of the problem can be that we're currently ignoring the
 time
   we take
   to transfer the state of the devices, and maybe it is not
 negligible.
  
 
  I have a 1GB network (e1000 card), and get values like bwidth=0.98 -
 which is much too high.
 The main reason for not using the whole migration time is that it can
 lead to values
 that are not very helpful in situation where the network load changes
 too much.
 
 Since the problem you pinpointed do exist, I would suggest measuring
 the average load of the last,
 say, 10 iterations. How would that work for you?



migrate.diff
Description: migrate.diff


Re: [Qemu-devel] Release plan for 0.12.0

2009-09-30 Thread Avi Kivity

On 09/30/2009 10:53 AM, Michael Tokarev wrote:

Anthony Liguori wrote:
[]
Here's a short list of things I expect to see for this release 
(target-i386 centric).  Please add or comment on items that you'd 
either like to see in the release or are planning on working on.

[..]

o guest SMP support for KVM


Hmm.  What is this, can you elaborate a bit more please?
-smp nn is already here, no?



Only in qemu-kvm.git.  This is about qemu.git (which supports -smp, but 
not with kvm).


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Release plan for 0.12.0

2009-09-30 Thread Carl-Daniel Hailfinger
Hi,

On 30.09.2009 01:54, Anthony Liguori wrote:
 Now that 0.11.0 is behind us, it's time to start thinking about 0.12.0.

 I'd also like to try to enumerate some features for this release. 
 Here's a short list of things I expect to see for this release
 (target-i386 centric).

 o switch to SeaBIOS (need to finish porting features from Bochs)

That switch is much appreciated because it also reduces the testing
matrix of those coreboot developers who boot test every commit with Qemu.

However, to run coreboot on Qemu with the same init sequence as on
simplified real hardware, we need Cache-as-RAM (CAR) support. This is
basically a mode where sizeof(cacheable area) = sizeof (L2 cache) and
causes the processor to lock the cache and not pass any reads/writes
through to the RAM behind the cached area. The easiest way to implement
this would be to check the cache size criterion upon every MTRR
manipulation and either map a chunk of fresh memory on top of the
existing memory (which may be RAM, ROM or unmapped) for every cacheable
area, and if the cacheable area starts to exceed the L2 cache size,
discard all memory contents of the memory mapped on top.
For additional correctness, the memory shoud not be discarded and
written back to the lower layer of memory if WBINVD (instead of INVD) or
CLFLUSH are called. That one is mostly sugar, though, and coreboot can
do without.

Right now coreboot sets up the MTRRs correctly, but then (conditional on
Qemu) only uses areas which are known to be backed by RAM instead of the
areas designated by CAR.

I'd like to implement CAR support which builds on top of my MTRR code
which was merged some months ago (and I already have code to check for
total cacheable area size), but I need help with the memory mapping
stuff. How do I proceed? Clean up what I have and insert FIXME
comments where I don't know how to implement stuff so others can see the
code and comment on it?

Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: migrate_set_downtime bug

2009-09-30 Thread Glauber Costa
On Wed, Sep 30, 2009 at 10:55:24AM +0200, Dietmar Maurer wrote:
 Another problem occur when max_downtime is too short. This can results in 
 never ending migration task.
 
 To reproduce just play a video inside a VM and set max_downtime to 30ns
 
 Sure, one can argument that this behavior is expected.
 
 But the following would avoid the problem:
 
 +if ((stage == 2)  (bytes_transferred  2*ram_bytes_total())) {
 +return 1;
 +}
why 2 * ? 
This means we'll have to transfer the whole contents of RAM at least twice to 
hit this condition, right?

 
 Or do you think that is not reasonable?
 
 - Dietmar
 
  -Original Message-
  From: Glauber Costa [mailto:glom...@redhat.com]
  Sent: Mittwoch, 30. September 2009 06:49
  To: Dietmar Maurer
  Cc: Anthony Liguori; kvm
  Subject: Re: migrate_set_downtime bug
  
  On Tue, Sep 29, 2009 at 06:36:57PM +0200, Dietmar Maurer wrote:
Also, if this is really the case (buffered), then the bandwidth
  capping
part
of migration is also wrong.
   
Have you compared the reported bandwidth to your actual bandwith ?
  I
suspect
the source of the problem can be that we're currently ignoring the
  time
we take
to transfer the state of the devices, and maybe it is not
  negligible.
   
  
   I have a 1GB network (e1000 card), and get values like bwidth=0.98 -
  which is much too high.
  The main reason for not using the whole migration time is that it can
  lead to values
  that are not very helpful in situation where the network load changes
  too much.
  
  Since the problem you pinpointed do exist, I would suggest measuring
  the average load of the last,
  say, 10 iterations. How would that work for you?
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[v4 KVM AUTOTEST PATCH] KVM test: client parallel test execution

2009-09-30 Thread Lucas Meneghel Rodrigues
From: Michael Goldish mgold...@redhat.com

This patch adds a control.parallel file that runs several test execution
pipelines in parallel.

The number of pipelines is set to the number of CPUs reported by /proc/cpuinfo.
It can be changed by modifying the control file.
The total amount of RAM defaults to 3/4 times what 'free -m' reports.

The scheduler's job is to make sure tests run in parallel only when there are
sufficient resources to allow it.  For example, a test that requires 2 CPUs
will not run together with a test that requires 3 CPUs on a 4 CPU machine.
The same logic applies to RAM.

Note that tests that require more CPUs and/or more RAM than the machine has are
allowed to run alone, e.g. a test that requires 3GB of RAM is allowed to run
on a machine with only 2GB of RAM, but no tests will run in parallel to it.

Currently TAP networking isn't supported by this scheduler because the main MAC
address pool must be divided between the pipelines (workers).  This should be
straightforward to do but I haven't had the time to do it yet.

scan_results.py can be used to list the test results during and after
execution.

v4:
 * Updated the install part to be in sync with the current control file
 * Blended this patch with the one that add scheduler parameters
 * Instead of custom code to figure number of cpus, used an autotest utils
 function

Signed-off-by: Michael Goldish mgold...@redhat.com
---
 client/tests/kvm/control.parallel |  204 +
 client/tests/kvm/kvm_scheduler.py |  229 +
 client/tests/kvm/kvm_tests.cfg.sample |   18 +++-
 3 files changed, 449 insertions(+), 2 deletions(-)
 create mode 100644 client/tests/kvm/control.parallel
 create mode 100644 client/tests/kvm/kvm_scheduler.py

diff --git a/client/tests/kvm/control.parallel 
b/client/tests/kvm/control.parallel
new file mode 100644
index 000..5c1f20d
--- /dev/null
+++ b/client/tests/kvm/control.parallel
@@ -0,0 +1,204 @@
+AUTHOR = 
+u...@redhat.com (Uri Lublin)
+dru...@redhat.com (Dror Russo)
+mgold...@redhat.com (Michael Goldish)
+dh...@redhat.com (David Huff)
+aerom...@redhat.com (Alexey Eromenko)
+mbu...@redhat.com (Mike Burns)
+
+TIME = 'SHORT'
+NAME = 'KVM test'
+TEST_TYPE = 'client'
+TEST_CLASS = 'Virtualization'
+TEST_CATEGORY = 'Functional'
+
+DOC = 
+Executes the KVM test framework on a given host. This module is separated in
+minor functions, that execute different tests for doing Quality Assurance on
+KVM (both kernelspace and userspace) code.
+
+
+
+import sys, os, commands, re
+
+#-
+# set English environment (command output might be localized, need to be safe)
+#-
+os.environ['LANG'] = 'en_US.UTF-8'
+
+#-
+# Enable modules import from current directory (tests/kvm)
+#-
+pwd = os.path.join(os.environ['AUTODIR'],'tests/kvm')
+sys.path.append(pwd)
+
+# 
+# create required symlinks
+# 
+# When dispatching tests from autotest-server the links we need do not exist on
+# the host (the client). The following lines create those symlinks. Change
+# 'rootdir' here and/or mount appropriate directories in it.
+#
+# When dispatching tests on local host (client mode) one can either setup kvm
+# links, or same as server mode use rootdir and set all appropriate links and
+# mount-points there. For example, guest installation tests need to know where
+# to find the iso-files.
+#
+# We create the links only if not already exist, so if one already set up the
+# links for client/local run we do not touch the links.
+rootdir='/tmp/kvm_autotest_root'
+iso=os.path.join(rootdir, 'iso')
+images=os.path.join(rootdir, 'images')
+qemu=os.path.join(rootdir, 'qemu')
+qemu_img=os.path.join(rootdir, 'qemu-img')
+
+
+def link_if_not_exist(ldir, target, link_name):
+t = target
+l = os.path.join(ldir, link_name)
+if not os.path.exists(l):
+os.system('ln -s %s %s' % (t, l))
+
+# Create links only if not already exist
+link_if_not_exist(pwd, '../../', 'autotest')
+link_if_not_exist(pwd, iso, 'isos')
+link_if_not_exist(pwd, images, 'images')
+link_if_not_exist(pwd, qemu, 'qemu')
+link_if_not_exist(pwd, qemu_img, 'qemu-img')
+
+# 
+# Params that will be passed to the KVM install/build test
+# 
+params = {
+name: build,
+shortname: build,
+type: build,
+mode: release,
+#mode: snapshot,
+#mode: localtar,
+#mode: localsrc,
+#mode: git,
+#mode: noinstall,
+#mode: koji,
+
+## Are we going to load modules built by this test?
+## Defaults to 'yes', so if you are going to provide only userspace code to
+## be built by 

Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream

2009-09-30 Thread Glauber Costa
On Wed, Sep 30, 2009 at 08:24:18AM +0200, Avi Kivity wrote:
 On 09/29/2009 10:45 PM, Mark McLoughlin wrote:
 On Tue, 2009-05-05 at 09:56 +0100, Mark McLoughlin wrote:

 This commit:

 commit 559a8f45f34cc50d1a60b4f67a06614d506b2e01
 Subject: Remove stray GSO code from virtio_net (Mark McLoughlin)

 Removed some GSO code from upstream qemu.git, but it needs to
 be re-instated in qemu-kvm.git.

 Reported-by: Sridhar Samudralas...@us.ibm.com
 Signed-off-by: Mark McLoughlinmar...@redhat.com
 ---
   hw/virtio-net.c |5 +
   1 files changed, 5 insertions(+), 0 deletions(-)

 diff --git a/hw/virtio-net.c b/hw/virtio-net.c
 index ac8e030..e5d7add 100644
 --- a/hw/virtio-net.c
 +++ b/hw/virtio-net.c
 @@ -424,6 +424,11 @@ static int receive_filter(VirtIONet *n, const uint8_t 
 *buf, int size)
   if (n-promisc)
   return 1;

 +#ifdef TAP_VNET_HDR
 +if (tap_has_vnet_hdr(n-vc-vlan-first_client))
 +ptr += sizeof(struct virtio_net_hdr);
 +#endif
 +
   if (!memcmp(ptr[12], vlan, sizeof(vlan))) {
   int vid = be16_to_cpup((uint16_t *)(ptr + 14))  0xfff;
   if (!(n-vlans[vid  5]  (1U  (vid  0x1f
  
 I'm not sure[1] how we didn't notice, but this has been broken on the
 stable-0.10 branch since 0.10.3; please apply there too



 Thanks, we'll queue it on stable-0.10.

 Anthony/Glauber, is 0.10.7 in the works?  If not, we'll release it as  
 0.10.6.1.
Since it is just one patch, I don't see a problem in anthony picking it directly
and making a new release.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream

2009-09-30 Thread Mark McLoughlin
On Wed, 2009-09-30 at 08:24 -0300, Glauber Costa wrote:
 On Wed, Sep 30, 2009 at 08:24:18AM +0200, Avi Kivity wrote:
  On 09/29/2009 10:45 PM, Mark McLoughlin wrote:
  On Tue, 2009-05-05 at 09:56 +0100, Mark McLoughlin wrote:
 
  This commit:
 
  commit 559a8f45f34cc50d1a60b4f67a06614d506b2e01
  Subject: Remove stray GSO code from virtio_net (Mark McLoughlin)
 
  Removed some GSO code from upstream qemu.git, but it needs to
  be re-instated in qemu-kvm.git.
 
  Reported-by: Sridhar Samudralas...@us.ibm.com
  Signed-off-by: Mark McLoughlinmar...@redhat.com
  ---
hw/virtio-net.c |5 +
1 files changed, 5 insertions(+), 0 deletions(-)
 
  diff --git a/hw/virtio-net.c b/hw/virtio-net.c
  index ac8e030..e5d7add 100644
  --- a/hw/virtio-net.c
  +++ b/hw/virtio-net.c
  @@ -424,6 +424,11 @@ static int receive_filter(VirtIONet *n, const 
  uint8_t *buf, int size)
if (n-promisc)
return 1;
 
  +#ifdef TAP_VNET_HDR
  +if (tap_has_vnet_hdr(n-vc-vlan-first_client))
  +ptr += sizeof(struct virtio_net_hdr);
  +#endif
  +
if (!memcmp(ptr[12], vlan, sizeof(vlan))) {
int vid = be16_to_cpup((uint16_t *)(ptr + 14))  0xfff;
if (!(n-vlans[vid  5]  (1U  (vid  0x1f
   
  I'm not sure[1] how we didn't notice, but this has been broken on the
  stable-0.10 branch since 0.10.3; please apply there too
 
 
 
  Thanks, we'll queue it on stable-0.10.
 
  Anthony/Glauber, is 0.10.7 in the works?  If not, we'll release it as  
  0.10.6.1.
 Since it is just one patch, I don't see a problem in anthony picking it 
 directly
 and making a new release.

It's not for qemu.git, it's for qemu-kvm.git - see the changelog

Cheers,
Mark.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Build problem found during daily testing (09/30/09)

2009-09-30 Thread Lucas Meneghel Rodrigues
Today's git test failed due to a build problem:

09/30 04:53:37 ERROR|   kvm:0114| Test failed: Command make -j 4
failed, rc=2, Command returned non-zero exit status
* Command: 
make -j 4
Exit status: 2
Duration: 0

stdout:
make -C /lib/modules/2.6.29.6-217.2.8.fc11.x86_64/build M=`pwd` \
LINUXINCLUDE=-I`pwd`/include -Iinclude \
 -Iarch/x86/include -I`pwd`/include-compat -I`pwd`/x86 \
-include include/linux/autoconf.h \
-include `pwd`/x86/external-module-compat.h  \
$@
make[1]: Entering directory `/usr/src/kernels/2.6.29.6-217.2.8.fc11.x86_64'
  LD  /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/built-in.o
  CC [M]  /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/svm.o
  CC [M]  /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx.o
  CC [M]  /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/vmx-debug.o
  CC [M]  /usr/local/autotest/tests/kvm/src/kvm_kmod/x86/kvm_main.o
make[1]: Leaving directory `/usr/src/kernels/2.6.29.6-217.2.8.fc11.x86_64'
stderr:
/usr/local/autotest/tests/kvm/src/kvm_kmod/x86/kvm_main.c:381: error: unknown 
field ‘change_pte’ specified in initializer
/usr/local/autotest/tests/kvm/src/kvm_kmod/x86/kvm_main.c:381: warning: 
initialization from incompatible pointer type
make[3]: *** [/usr/local/autotest/tests/kvm/src/kvm_kmod/x86/kvm_main.o] Error 1

Relevant commit hashes:

09/30 04:52:40 INFO | kvm_utils:0182| Commit hash for 
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git is 
d80e68823cada7b6d850330da1edfdf8bff9e2e6 (v2.6.31-rc3-11538-gd80e688)
09/30 04:53:21 INFO | kvm_utils:0182| Commit hash for 
git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git is 
692d9aca97b865b0f7903565274a52606910f129 (kvm-88-1366-g692d9ac)
09/30 04:53:23 INFO | kvm_utils:0182| Commit hash for 
git://git.kernel.org/pub/scm/virt/kvm/kvm-kmod.git is 
b86de9524511f75bf9115047b7b57e1da86bfb37 (kvm-88-22-gb86de95)

If you need more info please let me know,

Lucas

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH 1/2] Add KSM test

2009-09-30 Thread Dor Laor

On 09/29/2009 05:50 PM, Lucas Meneghel Rodrigues wrote:

On Fri, 2009-09-25 at 05:22 -0400, Jiri Zupka wrote:

- Dor Laordl...@redhat.com  wrote:


On 09/16/2009 04:09 PM, Jiri Zupka wrote:


- Dor Laordl...@redhat.com   wrote:


On 09/15/2009 09:58 PM, Jiri Zupka wrote:

After a quick review I have the following questions:
1. Why did you implement the guest tool in 'c' and not in

python?

 Python is much simpler and you can share some code with the

server.

 This 'test protocol' would also be easier to understand this

way.


We need speed and the precise control of allocate memory in

pages.



2. IMHO there is no need to use select, you can do blocking

read.


We replace socket communication by interactive program

communication

via ssh/telnet



3. Also you can use plain malloc without the more complex ( a

bit)

mmap.


We need address exactly the memory pages. We can't allow shift of

the data in memory.

You can use the tmpfs+dd idea instead of the specific program as I
detailed before. Maybe some other binary can be used. My intention

is

to
simplify the test/environment as much as possible.



We need compatibility with others system, like Windows etc..
We want to add support for others system in next version


KSM is a host feature and should be agnostic to the guest.
Also I don't think your code will compile on windows...


Yes, I think you have true.


First of all, sorry, I am doing the best I can to review carefully all
the patch queue, and as KSM is a more involved feature that I am not
very familiar with, I need a bit more time to review it!


But because we need generate special data to pages in memory.
We need use script on guest side of test. Because communication
over ssh is to slow to transfer lot of GB of special data to guests.

We can use optimized C program which is 10x and more faster than
python script on native system. Heavy load of virtual guest can
make some performance problem.


About code compiling under windows, I guess making a native windows c or
c++ program is an option, I generally agree with your reasoning, this
case seems to be better covered with a c program. Will get into it in
more detail ASAP...


We can use tmpfs but with python script to generate special data.
We can't use dd with random because we need test some special case.
(change only last 96B of page etc.. )


What do you think about it?



I think it can be done with some simple scripting and it will be fast 
enough and more importantly, easier to understand and to change in the 
future.


Here is a short example for creating lots of identical pages that 
contain '0' apart for the last two bytes. If you'll run it in a single 
guest you should expect to save lots of memory. Then you can change the 
last bytes to random value and see the memory consumption grow:

[Remember to cancel the guest swap to keep it in the guest ram]

dd if=/dev/zero of=template  count=1 bs=4094
echo '1'  template
cp template large_file
for ((i=0;i10;i++)) do dd if=large_file of=large_file conv=notrunc 
oflag=append  /dev/null 21 ; done


It creates a 4k*2^10 file with identical pages (since it's on tmpfs with 
no swap)


Can you try it? It should be far simpler than the original option.

Thanks,
Dor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream

2009-09-30 Thread Anthony Liguori

Avi Kivity wrote:
Anthony/Glauber, is 0.10.7 in the works?  If not, we'll release it as 
0.10.6.1.


Yes.  I can release it very soon.

--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Release plan for 0.12.0

2009-09-30 Thread Anthony Liguori

Hi Isaku,

Isaku Yamahata wrote:

 o newer chipset (which is based on Q35 chipset)
 o multiple pci bus 
 o PCI express (MMCONFIG)

 o PCI express hot plug (not acpi based)
 o PCI express switch emulator

Although there is no PCIe emulated device at the moment, 
this will be a fundamental infrastructure for PCI express native

direct attach.
  


Your patches definitely deserve review/commit.  I'll make sure that 
happens for the 0.12 time frame.


Michael, could you help review some of the PCI patches?

Thanks,

--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Release plan for 0.12.0

2009-09-30 Thread Anthony Liguori

Amit Shah wrote:

On (Tue) Sep 29 2009 [18:54:53], Anthony Liguori wrote:
  
  o multiport virtio-console support
  


Assuming we can get the kernel drivers straightened out, I think it's 
certainly reasonable for 0.12.


--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Release plan for 0.12.0

2009-09-30 Thread Anthony Liguori

Avi Kivity wrote:

On 09/30/2009 01:54 AM, Anthony Liguori wrote:

Hi,

Now that 0.11.0 is behind us, it's time to start thinking about 0.12.0.

I'd like to do a few things different this time around.  I don't 
think the -rc process went very well as I don't think we got more 
testing out of it.  I'd like to shorten the timeline for 0.12.0 a 
good bit.  The 0.10 stable tree got pretty difficult to maintain 
toward the end of the cycle.  We also had a pretty huge amount of 
change between 0.10 and 0.11 so I think a shorter cycle is warranted.


I think aiming for early to mid-December would give us roughly a 3 
month cycle and would align well with some of the Linux distribution 
cycles.  I'd like to limit things to a single -rc that lasted only 
for about a week.  This is enough time to fix most of the obvious 
issues I think.


I'd also like to try to enumerate some features for this release.  
Here's a short list of things I expect to see for this release 
(target-i386 centric).  Please add or comment on items that you'd 
either like to see in the release or are planning on working on.


o VMState conversion -- I expect most of the pc target to be completed
o qdev conversion -- I hope that we'll get most of the pc target 
completely converted to qdev

o storage live migration
o switch to SeaBIOS (need to finish porting features from Bochs)
o switch to gPXE (need to resolve slirp tftp server issue)
o KSM integration
o in-kernel APIC support for KVM
o guest SMP support for KVM
o updates to the default pc machine type


Machine monitor protocol.


If we're going to support the protocol for 0.12, I'd like to most of the 
code merged by the end of October.


--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Release plan for 0.12.0

2009-09-30 Thread Anthony Liguori

Carl-Daniel Hailfinger wrote:

Hi,

On 30.09.2009 01:54, Anthony Liguori wrote:
  

Now that 0.11.0 is behind us, it's time to start thinking about 0.12.0.

I'd also like to try to enumerate some features for this release. 
Here's a short list of things I expect to see for this release

(target-i386 centric).

o switch to SeaBIOS (need to finish porting features from Bochs)



That switch is much appreciated because it also reduces the testing
matrix of those coreboot developers who boot test every commit with Qemu.

However, to run coreboot on Qemu with the same init sequence as on
simplified real hardware, we need Cache-as-RAM (CAR) support. This is
basically a mode where sizeof(cacheable area) = sizeof (L2 cache) and
causes the processor to lock the cache and not pass any reads/writes
through to the RAM behind the cached area. The easiest way to implement
this would be to check the cache size criterion upon every MTRR
manipulation and either map a chunk of fresh memory on top of the
existing memory (which may be RAM, ROM or unmapped) for every cacheable
area, and if the cacheable area starts to exceed the L2 cache size,
discard all memory contents of the memory mapped on top.
For additional correctness, the memory shoud not be discarded and
written back to the lower layer of memory if WBINVD (instead of INVD) or
CLFLUSH are called. That one is mostly sugar, though, and coreboot can
do without.
  


Do we really need coreboot to use the same init sequence?   coreboot is 
firmware and we don't necessarily run real firmware under QEMU.  It's a 
short cut that lets us avoid a lot of complexity.



Right now coreboot sets up the MTRRs correctly, but then (conditional on
Qemu) only uses areas which are known to be backed by RAM instead of the
areas designated by CAR.

I'd like to implement CAR support which builds on top of my MTRR code
which was merged some months ago (and I already have code to check for
total cacheable area size), but I need help with the memory mapping
stuff. How do I proceed? Clean up what I have and insert FIXME
comments where I don't know how to implement stuff so others can see the
code and comment on it?
  


You could start there.  But from a higher level, I'm not sure I think a 
partial implementation of something like CAR is all that valuable since 
coreboot already runs under QEMU.



Regards,
Carl-Daniel

  

--

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Release plan for 0.12.0

2009-09-30 Thread Luiz Capitulino
On Wed, 30 Sep 2009 08:41:23 +0200
Avi Kivity a...@redhat.com wrote:

 On 09/30/2009 01:54 AM, Anthony Liguori wrote:
  Hi,
 
  Now that 0.11.0 is behind us, it's time to start thinking about 0.12.0.
 
  I'd like to do a few things different this time around.  I don't think 
  the -rc process went very well as I don't think we got more testing 
  out of it.  I'd like to shorten the timeline for 0.12.0 a good bit.  
  The 0.10 stable tree got pretty difficult to maintain toward the end 
  of the cycle.  We also had a pretty huge amount of change between 0.10 
  and 0.11 so I think a shorter cycle is warranted.
 
  I think aiming for early to mid-December would give us roughly a 3 
  month cycle and would align well with some of the Linux distribution 
  cycles.  I'd like to limit things to a single -rc that lasted only for 
  about a week.  This is enough time to fix most of the obvious issues I 
  think.
 
  I'd also like to try to enumerate some features for this release.  
  Here's a short list of things I expect to see for this release 
  (target-i386 centric).  Please add or comment on items that you'd 
  either like to see in the release or are planning on working on.
 
  o VMState conversion -- I expect most of the pc target to be completed
  o qdev conversion -- I hope that we'll get most of the pc target 
  completely converted to qdev
  o storage live migration
  o switch to SeaBIOS (need to finish porting features from Bochs)
  o switch to gPXE (need to resolve slirp tftp server issue)
  o KSM integration
  o in-kernel APIC support for KVM
  o guest SMP support for KVM
  o updates to the default pc machine type
 
 Machine monitor protocol.

 Yeah, I was going to suggest it as well.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] Nested VMX patch 1 implements vmon and vmoff

2009-09-30 Thread oritw
From: Orit Wasserman or...@il.ibm.com

---
 arch/x86/kvm/svm.c |3 -
 arch/x86/kvm/vmx.c |  217 +++-
 arch/x86/kvm/x86.c |6 +-
 arch/x86/kvm/x86.h |2 +
 4 files changed, 222 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2df9b45..3c1f22a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -124,9 +124,6 @@ static int npt = 1;
 
 module_param(npt, int, S_IRUGO);
 
-static int nested = 1;
-module_param(nested, int, S_IRUGO);
-
 static void svm_flush_tlb(struct kvm_vcpu *vcpu);
 static void svm_complete_interrupts(struct vcpu_svm *svm);
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 78101dd..71bd91a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -67,6 +67,11 @@ struct vmcs {
char data[0];
 };
 
+struct nested_vmx {
+   /* Has the level1 guest done vmxon? */
+   bool vmxon;
+};
+
 struct vcpu_vmx {
struct kvm_vcpu   vcpu;
struct list_head  local_vcpus_link;
@@ -114,6 +119,9 @@ struct vcpu_vmx {
ktime_t entry_time;
s64 vnmi_blocked_time;
u32 exit_reason;
+
+   /* Nested vmx */
+   struct nested_vmx nested;
 };
 
 static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
@@ -967,6 +975,95 @@ static void guest_write_tsc(u64 guest_tsc, u64 host_tsc)
 }
 
 /*
+ * Handles msr read for nested virtualization
+ */
+static int nested_vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index,
+ u64 *pdata)
+{
+   u64 vmx_msr = 0;
+
+   switch (msr_index) {
+   case MSR_IA32_FEATURE_CONTROL:
+   *pdata = 0;
+   break;
+   case MSR_IA32_VMX_BASIC:
+   *pdata = 0;
+   rdmsrl(MSR_IA32_VMX_BASIC, vmx_msr);
+   *pdata = (vmx_msr  0x00cf);
+   break;
+   case MSR_IA32_VMX_PINBASED_CTLS:
+   rdmsrl(MSR_IA32_VMX_PINBASED_CTLS, vmx_msr);
+   *pdata = (PIN_BASED_EXT_INTR_MASK  
vmcs_config.pin_based_exec_ctrl) |
+   (PIN_BASED_NMI_EXITING  
vmcs_config.pin_based_exec_ctrl) |
+   (PIN_BASED_VIRTUAL_NMIS  
vmcs_config.pin_based_exec_ctrl);
+   break;
+   case MSR_IA32_VMX_PROCBASED_CTLS:
+   {
+   u32 vmx_msr_high, vmx_msr_low;
+   u32 control = CPU_BASED_HLT_EXITING |
+#ifdef CONFIG_X86_64
+   CPU_BASED_CR8_LOAD_EXITING |
+   CPU_BASED_CR8_STORE_EXITING |
+#endif
+   CPU_BASED_CR3_LOAD_EXITING |
+   CPU_BASED_CR3_STORE_EXITING |
+   CPU_BASED_USE_IO_BITMAPS |
+   CPU_BASED_MOV_DR_EXITING |
+   CPU_BASED_USE_TSC_OFFSETING |
+   CPU_BASED_INVLPG_EXITING |
+   CPU_BASED_TPR_SHADOW |
+   CPU_BASED_USE_MSR_BITMAPS |
+   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
+
+   rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high);
+
+   control = vmx_msr_high; /* bit == 0 in high word == must be 
zero */
+   control |= vmx_msr_low;  /* bit == 1 in low word  == must be 
one  */
+
+   *pdata = (CPU_BASED_HLT_EXITING  control) |
+#ifdef CONFIG_X86_64
+   (CPU_BASED_CR8_LOAD_EXITING  control) |
+   (CPU_BASED_CR8_STORE_EXITING  control) |
+#endif
+   (CPU_BASED_CR3_LOAD_EXITING  control) |
+   (CPU_BASED_CR3_STORE_EXITING  control) |
+   (CPU_BASED_USE_IO_BITMAPS  control) |
+   (CPU_BASED_MOV_DR_EXITING  control) |
+   (CPU_BASED_USE_TSC_OFFSETING  control) |
+   (CPU_BASED_INVLPG_EXITING  control) ;
+
+   if (cpu_has_secondary_exec_ctrls())
+   *pdata |= CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
+
+   if (vm_need_tpr_shadow(vcpu-kvm))
+   *pdata |= CPU_BASED_TPR_SHADOW;
+   break;
+   }
+   case MSR_IA32_VMX_EXIT_CTLS:
+   *pdata = 0;
+#ifdef CONFIG_X86_64
+   *pdata |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
+#endif
+   break;
+   case MSR_IA32_VMX_ENTRY_CTLS:
+   *pdata = 0;
+   break;
+   case MSR_IA32_VMX_PROCBASED_CTLS2:
+   *pdata = 0;
+   if (vm_need_virtualize_apic_accesses(vcpu-kvm))
+   *pdata |= SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
+   break;
+   case MSR_IA32_VMX_EPT_VPID_CAP:
+   *pdata = 0;
+   break;
+   default:
+   return 1;
+   }
+
+   return 0;
+}
+
+/*
  * Reads an msr value (of 'msr_index') into 'pdata'.
  * Returns 0 on success, non-0 otherwise.
  * Assumes vcpu_load() was already called.
@@ -1005,6 +1102,9 @@ static int 

[PATCH 2/5] Nested VMX patch 2 implements vmclear

2009-09-30 Thread oritw
From: Orit Wasserman or...@il.ibm.com

---
 arch/x86/kvm/vmx.c |   70 ---
 1 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 71bd91a..411cbdb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -61,15 +61,26 @@ module_param_named(unrestricted_guest,
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
-struct vmcs {
-   u32 revision_id;
-   u32 abort;
-   char data[0];
+struct __attribute__ ((__packed__)) level_state {
+   /* Has the level1 guest done vmclear? */
+   bool vmclear;
 };
 
 struct nested_vmx {
/* Has the level1 guest done vmxon? */
bool vmxon;
+
+   /*
+* Level 2 state : includes vmcs,registers and
+* a copy of vmcs12 for vmread/vmwrite
+*/
+   struct level_state *l2_state;
+};
+
+struct vmcs {
+   u32 revision_id;
+   u32 abort;
+   char data[0];
 };
 
 struct vcpu_vmx {
@@ -186,6 +197,8 @@ static struct kvm_vmx_segment_field {
 
 static void ept_save_pdptrs(struct kvm_vcpu *vcpu);
 
+static int create_l2_state(struct kvm_vcpu *vcpu);
+
 /*
  * Keep MSR_K6_STAR at the end, as setup_msrs() will try to optimize it
  * away by decrementing the array size.
@@ -1293,6 +1306,30 @@ static void vmclear_local_vcpus(void)
__vcpu_clear(vmx);
 }
 
+struct level_state *create_state(void)
+{
+   struct level_state *state = NULL;
+
+   state = kzalloc(sizeof(struct level_state), GFP_KERNEL);
+   if (!state) {
+   printk(KERN_INFO Error create level state\n);
+   return NULL;
+   }
+   return state;
+}
+
+int create_l2_state(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   if (!vmx-nested.l2_state) {
+   vmx-nested.l2_state = create_state();
+   if (!vmx-nested.l2_state)
+   return -ENOMEM;
+   }
+
+   return 0;
+}
 
 /* Just like cpu_vmxoff(), but with the __kvm_handle_fault_on_reboot()
  * tricks.
@@ -3261,6 +3298,27 @@ static int handle_vmx_insn(struct kvm_vcpu *vcpu)
return 1;
 }
 
+static void clear_rflags_cf_zf(struct kvm_vcpu *vcpu)
+{
+   unsigned long rflags;
+   rflags = vmx_get_rflags(vcpu);
+   rflags = ~(X86_EFLAGS_CF | X86_EFLAGS_ZF);
+   vmx_set_rflags(vcpu, rflags);
+}
+
+static int handle_vmclear(struct kvm_vcpu *vcpu)
+{
+   if (!nested_vmx_check_permission(vcpu))
+   return 1;
+
+   to_vmx(vcpu)-nested.l2_state-vmclear = 1;
+
+   skip_emulated_instruction(vcpu);
+   clear_rflags_cf_zf(vcpu);
+
+   return 1;
+}
+
 static int handle_vmoff(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -3310,6 +3368,8 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
 
vmx-nested.vmxon = 1;
 
+   create_l2_state(vcpu);
+
skip_emulated_instruction(vcpu);
return 1;
 }
@@ -3582,7 +3642,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu 
*vcpu) = {
[EXIT_REASON_HLT] = handle_halt,
[EXIT_REASON_INVLPG]  = handle_invlpg,
[EXIT_REASON_VMCALL]  = handle_vmcall,
-   [EXIT_REASON_VMCLEAR] = handle_vmx_insn,
+   [EXIT_REASON_VMCLEAR] = handle_vmclear,
[EXIT_REASON_VMLAUNCH]= handle_vmx_insn,
[EXIT_REASON_VMPTRLD] = handle_vmx_insn,
[EXIT_REASON_VMPTRST] = handle_vmx_insn,
-- 
1.6.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] Nested VMX patch 3 implements vmptrld and vmptrst

2009-09-30 Thread oritw
From: Orit Wasserman or...@il.ibm.com

---
 arch/x86/kvm/vmx.c |  468 ++--
 arch/x86/kvm/x86.c |3 +-
 2 files changed, 459 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 411cbdb..8c186e0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -61,20 +61,168 @@ module_param_named(unrestricted_guest,
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
+
+struct __attribute__ ((__packed__)) shadow_vmcs {
+   u32 revision_id;
+   u32 abort;
+   u16 virtual_processor_id;
+   u16 guest_es_selector;
+   u16 guest_cs_selector;
+   u16 guest_ss_selector;
+   u16 guest_ds_selector;
+   u16 guest_fs_selector;
+   u16 guest_gs_selector;
+   u16 guest_ldtr_selector;
+   u16 guest_tr_selector;
+   u16 host_es_selector;
+   u16 host_cs_selector;
+   u16 host_ss_selector;
+   u16 host_ds_selector;
+   u16 host_fs_selector;
+   u16 host_gs_selector;
+   u16 host_tr_selector;
+   u64 io_bitmap_a;
+   u64 io_bitmap_b;
+   u64 msr_bitmap;
+   u64 vm_exit_msr_store_addr;
+   u64 vm_exit_msr_load_addr;
+   u64 vm_entry_msr_load_addr;
+   u64 tsc_offset;
+   u64 virtual_apic_page_addr;
+   u64 apic_access_addr;
+   u64 ept_pointer;
+   u64 guest_physical_address;
+   u64 vmcs_link_pointer;
+   u64 guest_ia32_debugctl;
+   u64 guest_ia32_pat;
+   u64 guest_pdptr0;
+   u64 guest_pdptr1;
+   u64 guest_pdptr2;
+   u64 guest_pdptr3;
+   u64 host_ia32_pat;
+   u32 pin_based_vm_exec_control;
+   u32 cpu_based_vm_exec_control;
+   u32 exception_bitmap;
+   u32 page_fault_error_code_mask;
+   u32 page_fault_error_code_match;
+   u32 cr3_target_count;
+   u32 vm_exit_controls;
+   u32 vm_exit_msr_store_count;
+   u32 vm_exit_msr_load_count;
+   u32 vm_entry_controls;
+   u32 vm_entry_msr_load_count;
+   u32 vm_entry_intr_info_field;
+   u32 vm_entry_exception_error_code;
+   u32 vm_entry_instruction_len;
+   u32 tpr_threshold;
+   u32 secondary_vm_exec_control;
+   u32 vm_instruction_error;
+   u32 vm_exit_reason;
+   u32 vm_exit_intr_info;
+   u32 vm_exit_intr_error_code;
+   u32 idt_vectoring_info_field;
+   u32 idt_vectoring_error_code;
+   u32 vm_exit_instruction_len;
+   u32 vmx_instruction_info;
+   u32 guest_es_limit;
+   u32 guest_cs_limit;
+   u32 guest_ss_limit;
+   u32 guest_ds_limit;
+   u32 guest_fs_limit;
+   u32 guest_gs_limit;
+   u32 guest_ldtr_limit;
+   u32 guest_tr_limit;
+   u32 guest_gdtr_limit;
+   u32 guest_idtr_limit;
+   u32 guest_es_ar_bytes;
+   u32 guest_cs_ar_bytes;
+   u32 guest_ss_ar_bytes;
+   u32 guest_ds_ar_bytes;
+   u32 guest_fs_ar_bytes;
+   u32 guest_gs_ar_bytes;
+   u32 guest_ldtr_ar_bytes;
+   u32 guest_tr_ar_bytes;
+   u32 guest_interruptibility_info;
+   u32 guest_activity_state;
+   u32 guest_sysenter_cs;
+   u32 host_ia32_sysenter_cs;
+   unsigned long cr0_guest_host_mask;
+   unsigned long cr4_guest_host_mask;
+   unsigned long cr0_read_shadow;
+   unsigned long cr4_read_shadow;
+   unsigned long cr3_target_value0;
+   unsigned long cr3_target_value1;
+   unsigned long cr3_target_value2;
+   unsigned long cr3_target_value3;
+   unsigned long exit_qualification;
+   unsigned long guest_linear_address;
+   unsigned long guest_cr0;
+   unsigned long guest_cr3;
+   unsigned long guest_cr4;
+   unsigned long guest_es_base;
+   unsigned long guest_cs_base;
+   unsigned long guest_ss_base;
+   unsigned long guest_ds_base;
+   unsigned long guest_fs_base;
+   unsigned long guest_gs_base;
+   unsigned long guest_ldtr_base;
+   unsigned long guest_tr_base;
+   unsigned long guest_gdtr_base;
+   unsigned long guest_idtr_base;
+   unsigned long guest_dr7;
+   unsigned long guest_rsp;
+   unsigned long guest_rip;
+   unsigned long guest_rflags;
+   unsigned long guest_pending_dbg_exceptions;
+   unsigned long guest_sysenter_esp;
+   unsigned long guest_sysenter_eip;
+   unsigned long host_cr0;
+   unsigned long host_cr3;
+   unsigned long host_cr4;
+   unsigned long host_fs_base;
+   unsigned long host_gs_base;
+   unsigned long host_tr_base;
+   unsigned long host_gdtr_base;
+   unsigned long host_idtr_base;
+   unsigned long host_ia32_sysenter_esp;
+   unsigned long host_ia32_sysenter_eip;
+   unsigned long host_rsp;
+   unsigned long host_rip;
+};
+
 struct __attribute__ ((__packed__)) level_state {
/* Has the level1 guest done vmclear? */
bool vmclear;
+   u16 vpid;
+   u64 shadow_efer;
+   unsigned long cr2;
+   

[PATCH 4/5] Nested VMX patch 4 implements vmread and vmwrite

2009-09-30 Thread oritw
From: Orit Wasserman or...@il.ibm.com

---
 arch/x86/kvm/vmx.c |  591 +++-
 1 files changed, 589 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8c186e0..6a4c252 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -225,6 +225,21 @@ struct nested_vmx {
struct level_state *l1_state;
 };
 
+enum vmcs_field_type {
+   VMCS_FIELD_TYPE_U16 = 0,
+   VMCS_FIELD_TYPE_U64 = 1,
+   VMCS_FIELD_TYPE_U32 = 2,
+   VMCS_FIELD_TYPE_ULONG = 3
+};
+
+#define VMCS_FIELD_LENGTH_OFFSET 13
+#define VMCS_FIELD_LENGTH_MASK 0x6000
+
+static inline int vmcs_field_length(unsigned long field)
+{
+   return (VMCS_FIELD_LENGTH_MASK  field)  13;
+}
+
 struct vmcs {
u32 revision_id;
u32 abort;
@@ -288,6 +303,404 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu 
*vcpu)
return container_of(vcpu, struct vcpu_vmx, vcpu);
 }
 
+#define SHADOW_VMCS_OFFSET(x) offsetof(struct shadow_vmcs, x)
+
+static unsigned short vmcs_field_to_offset_table[HOST_RIP+1] = {
+
+   [VIRTUAL_PROCESSOR_ID] =
+   SHADOW_VMCS_OFFSET(virtual_processor_id),
+   [GUEST_ES_SELECTOR] =
+   SHADOW_VMCS_OFFSET(guest_es_selector),
+   [GUEST_CS_SELECTOR] =
+   SHADOW_VMCS_OFFSET(guest_cs_selector),
+   [GUEST_SS_SELECTOR] =
+   SHADOW_VMCS_OFFSET(guest_ss_selector),
+   [GUEST_DS_SELECTOR] =
+   SHADOW_VMCS_OFFSET(guest_ds_selector),
+   [GUEST_FS_SELECTOR] =
+   SHADOW_VMCS_OFFSET(guest_fs_selector),
+   [GUEST_GS_SELECTOR] =
+   SHADOW_VMCS_OFFSET(guest_gs_selector),
+   [GUEST_LDTR_SELECTOR] =
+   SHADOW_VMCS_OFFSET(guest_ldtr_selector),
+   [GUEST_TR_SELECTOR] =
+   SHADOW_VMCS_OFFSET(guest_tr_selector),
+   [HOST_ES_SELECTOR] =
+   SHADOW_VMCS_OFFSET(host_es_selector),
+   [HOST_CS_SELECTOR] =
+   SHADOW_VMCS_OFFSET(host_cs_selector),
+   [HOST_SS_SELECTOR] =
+   SHADOW_VMCS_OFFSET(host_ss_selector),
+   [HOST_DS_SELECTOR] =
+   SHADOW_VMCS_OFFSET(host_ds_selector),
+   [HOST_FS_SELECTOR] =
+   SHADOW_VMCS_OFFSET(host_fs_selector),
+   [HOST_GS_SELECTOR] =
+   SHADOW_VMCS_OFFSET(host_gs_selector),
+   [HOST_TR_SELECTOR] =
+   SHADOW_VMCS_OFFSET(host_tr_selector),
+   [IO_BITMAP_A] =
+   SHADOW_VMCS_OFFSET(io_bitmap_a),
+   [IO_BITMAP_A_HIGH] =
+   SHADOW_VMCS_OFFSET(io_bitmap_a)+4,
+   [IO_BITMAP_B] =
+   SHADOW_VMCS_OFFSET(io_bitmap_b),
+   [IO_BITMAP_B_HIGH] =
+   SHADOW_VMCS_OFFSET(io_bitmap_b)+4,
+   [MSR_BITMAP] =
+   SHADOW_VMCS_OFFSET(msr_bitmap),
+   [MSR_BITMAP_HIGH] =
+   SHADOW_VMCS_OFFSET(msr_bitmap)+4,
+   [VM_EXIT_MSR_STORE_ADDR] =
+   SHADOW_VMCS_OFFSET(vm_exit_msr_store_addr),
+   [VM_EXIT_MSR_STORE_ADDR_HIGH] =
+   SHADOW_VMCS_OFFSET(vm_exit_msr_store_addr)+4,
+   [VM_EXIT_MSR_LOAD_ADDR] =
+   SHADOW_VMCS_OFFSET(vm_exit_msr_load_addr),
+   [VM_EXIT_MSR_LOAD_ADDR_HIGH] =
+   SHADOW_VMCS_OFFSET(vm_exit_msr_load_addr)+4,
+   [VM_ENTRY_MSR_LOAD_ADDR] =
+   SHADOW_VMCS_OFFSET(vm_entry_msr_load_addr),
+   [VM_ENTRY_MSR_LOAD_ADDR_HIGH] =
+   SHADOW_VMCS_OFFSET(vm_entry_msr_load_addr)+4,
+   [TSC_OFFSET] =
+   SHADOW_VMCS_OFFSET(tsc_offset),
+   [TSC_OFFSET_HIGH] =
+   SHADOW_VMCS_OFFSET(tsc_offset)+4,
+   [VIRTUAL_APIC_PAGE_ADDR] =
+   SHADOW_VMCS_OFFSET(virtual_apic_page_addr),
+   [VIRTUAL_APIC_PAGE_ADDR_HIGH] =
+   SHADOW_VMCS_OFFSET(virtual_apic_page_addr)+4,
+   [APIC_ACCESS_ADDR] =
+   SHADOW_VMCS_OFFSET(apic_access_addr),
+   [APIC_ACCESS_ADDR_HIGH] =
+   SHADOW_VMCS_OFFSET(apic_access_addr)+4,
+   [EPT_POINTER] =
+   SHADOW_VMCS_OFFSET(ept_pointer),
+   [EPT_POINTER_HIGH] =
+   SHADOW_VMCS_OFFSET(ept_pointer)+4,
+   [GUEST_PHYSICAL_ADDRESS] =
+   SHADOW_VMCS_OFFSET(guest_physical_address),
+   [GUEST_PHYSICAL_ADDRESS_HIGH] =
+   SHADOW_VMCS_OFFSET(guest_physical_address)+4,
+   [VMCS_LINK_POINTER] =
+   SHADOW_VMCS_OFFSET(vmcs_link_pointer),
+   [VMCS_LINK_POINTER_HIGH] =
+   SHADOW_VMCS_OFFSET(vmcs_link_pointer)+4,
+   [GUEST_IA32_DEBUGCTL] =
+   SHADOW_VMCS_OFFSET(guest_ia32_debugctl),
+   [GUEST_IA32_DEBUGCTL_HIGH] =
+   SHADOW_VMCS_OFFSET(guest_ia32_debugctl)+4,
+   [GUEST_IA32_PAT] =
+   SHADOW_VMCS_OFFSET(guest_ia32_pat),
+   [GUEST_IA32_PAT_HIGH] =
+   SHADOW_VMCS_OFFSET(guest_ia32_pat)+4,
+   [GUEST_PDPTR0] =
+   SHADOW_VMCS_OFFSET(guest_pdptr0),
+   [GUEST_PDPTR0_HIGH] =
+   

[PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

2009-09-30 Thread oritw
From: Orit Wasserman or...@il.ibm.com

---
 arch/x86/kvm/vmx.c | 1173 ++--
 1 files changed, 1148 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6a4c252..e814029 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -209,6 +209,7 @@ struct __attribute__ ((__packed__)) level_state {
struct vmcs *vmcs;
int cpu;
int launched;
+   bool first_launch;
 };
 
 struct nested_vmx {
@@ -216,6 +217,12 @@ struct nested_vmx {
bool vmxon;
/* What is the location of the  vmcs l1 keeps for l2? (in level1 gpa) */
u64 vmptr;
+   /* Are we running nested guest */
+   bool nested_mode;
+   /* L1 requested VMLAUNCH or VMRESUME but we didn't run L2 yet */
+   bool nested_run_pending;
+   /* flag indicating if there was a valid IDT after exiting from l2 */
+   bool nested_valid_idt;
/*
 * Level 2 state : includes vmcs,registers and
 * a copy of vmcs12 for vmread/vmwrite
@@ -240,6 +247,10 @@ static inline int vmcs_field_length(unsigned long field)
return (VMCS_FIELD_LENGTH_MASK  field)  13;
 }
 
+#define NESTED_VM_EXIT_CONTROLS_MASK (~(VM_EXIT_LOAD_IA32_PAT | \
+   VM_EXIT_SAVE_IA32_PAT))
+#define NESTED_VM_ENTRY_CONTROLS_MASK (~(VM_ENTRY_LOAD_IA32_PAT | \
+VM_ENTRY_IA32E_MODE))
 struct vmcs {
u32 revision_id;
u32 abort;
@@ -303,6 +314,12 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu 
*vcpu)
return container_of(vcpu, struct vcpu_vmx, vcpu);
 }
 
+static inline struct shadow_vmcs *get_shadow_vmcs(struct kvm_vcpu *vcpu)
+{
+   WARN_ON(!to_vmx(vcpu)-nested.l2_state-shadow_vmcs);
+   return to_vmx(vcpu)-nested.l2_state-shadow_vmcs;
+}
+
 #define SHADOW_VMCS_OFFSET(x) offsetof(struct shadow_vmcs, x)
 
 static unsigned short vmcs_field_to_offset_table[HOST_RIP+1] = {
@@ -822,8 +839,16 @@ static struct kvm_vmx_segment_field {
 static void ept_save_pdptrs(struct kvm_vcpu *vcpu);
 
 static int nested_vmx_check_permission(struct kvm_vcpu *vcpu);
+static int nested_vmx_check_exception(struct vcpu_vmx *vmx, unsigned nr,
+ bool has_error_code, u32 error_code);
+static int nested_vmx_intr(struct kvm_vcpu *vcpu);
 static int create_l1_state(struct kvm_vcpu *vcpu);
 static int create_l2_state(struct kvm_vcpu *vcpu);
+static int launch_guest(struct kvm_vcpu *vcpu);
+static int nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu);
+static int nested_vmx_exit_handled(struct kvm_vcpu *vcpu, bool kvm_override);
+static int nested_vmx_vmexit(struct kvm_vcpu *vcpu,
+bool is_interrupt);
 
 /*
  * Keep MSR_K6_STAR at the end, as setup_msrs() will try to optimize it
@@ -940,6 +965,18 @@ static inline bool cpu_has_vmx_ept_2m_page(void)
return !!(vmx_capability.ept  VMX_EPT_2MB_PAGE_BIT);
 }
 
+static inline int is_exception(u32 intr_info)
+{
+   return (intr_info  (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
+   == (INTR_TYPE_HARD_EXCEPTION | INTR_INFO_VALID_MASK);
+}
+
+static inline int is_nmi(u32 intr_info)
+{
+   return (intr_info  (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
+   == (INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK);
+}
+
 static inline int cpu_has_vmx_invept_individual_addr(void)
 {
return !!(vmx_capability.ept  VMX_EPT_EXTENT_INDIVIDUAL_BIT);
@@ -990,6 +1027,51 @@ static inline bool report_flexpriority(void)
return flexpriority_enabled;
 }
 
+static inline int nested_cpu_has_vmx_tpr_shadow(struct  kvm_vcpu *vcpu)
+{
+   return cpu_has_vmx_tpr_shadow() 
+   get_shadow_vmcs(vcpu)-cpu_based_vm_exec_control 
+   CPU_BASED_TPR_SHADOW;
+}
+
+static inline int nested_cpu_has_secondary_exec_ctrls(struct kvm_vcpu *vcpu)
+{
+   return cpu_has_secondary_exec_ctrls() 
+   get_shadow_vmcs(vcpu)-cpu_based_vm_exec_control 
+   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
+}
+
+static inline bool nested_vm_need_virtualize_apic_accesses(struct kvm_vcpu
+  *vcpu)
+{
+   return get_shadow_vmcs(vcpu)-secondary_vm_exec_control 
+   SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
+}
+
+static inline int nested_cpu_has_vmx_ept(struct kvm_vcpu *vcpu)
+{
+   return get_shadow_vmcs(vcpu)-
+   secondary_vm_exec_control  SECONDARY_EXEC_ENABLE_EPT;
+}
+
+static inline int nested_cpu_has_vmx_vpid(struct kvm_vcpu *vcpu)
+{
+   return get_shadow_vmcs(vcpu)-secondary_vm_exec_control 
+   SECONDARY_EXEC_ENABLE_VPID;
+}
+
+static inline int nested_cpu_has_vmx_pat(struct kvm_vcpu *vcpu)
+{
+   return get_shadow_vmcs(vcpu)-vm_entry_controls 
+   VM_ENTRY_LOAD_IA32_PAT;
+}
+
+static inline int nested_cpu_has_vmx_msr_bitmap(struct kvm_vcpu *vcpu)
+{
+   return 

Nested VMX support v2

2009-09-30 Thread oritw
The following patches implement nested VMX support. The patches enable a guest
to use the VMX APIs in order to run its own nested guest (i.e., enable running
other hypervisors which use VMX under KVM). The current patches support running
Linux under a nested KVM using shadow page table (with bypass_guest_pf
disabled). SMP support was fixed.  Reworking EPT support to mesh cleanly with
the current shadow paging design per Avi's comments is a work-in-progress.  

The current patches only support a single nested hypervisor, which can only run
a single guest (multiple guests are work in progress). Only 64-bit nested
hypervisors are supported.

Additional patches for running Windows under nested KVM, and Linux under nested
VMware server(!), are currently running in the lab. We are in the process of
forward-porting those patches to -tip.

This patches were written by:
 Orit Wasserman, or...@il.ibm.com
 Ben-Ami Yassor, ben...@il.ibm.com
 Abel Gordon, ab...@il.ibm.com
 Muli Ben-Yehuda, m...@il.ibm.com
 
With contributions by:
 Anthony Liguori, aligu...@us.ibm.com
 Mike Day, m...@us.ibm.com

This work was inspired by the nested SVM support by Alexander Graf and Joerg
Roedel.

Changes since v2:
Added check to nested_vmx_get_msr.
Static initialization of the vmcs_field_to_offset_table array.
Use the memory allocated by L1 for VMCS12 to store the shadow vmcs.
Some optimization to the prepare_vmcs_12 function.

vpid allocation will be updated with the multiguest support (work in progress).
We are working on fixing the cr0.TS handling, it works for nested kvm by not 
for vmware server.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Release plan for 0.12.0

2009-09-30 Thread Amit Shah
On (Wed) Sep 30 2009 [08:04:17], Anthony Liguori wrote:
 Amit Shah wrote:
 On (Tue) Sep 29 2009 [18:54:53], Anthony Liguori wrote:
 o multiport virtio-console support
   

 Assuming we can get the kernel drivers straightened out, I think it's  
 certainly reasonable for 0.12.

The kernel drivers are in fine shape.

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4: kvm 4/4] Fix hotplug of CPUs for KVM.

2009-09-30 Thread Marcelo Tosatti
On Tue, Sep 29, 2009 at 11:38:37AM -1000, Zachary Amsden wrote:
 Both VMX and SVM require per-cpu memory allocation, which is done at module
 init time, for only online cpus.
 
 Backend was not allocating enough structure for all possible CPUs, so
 new CPUs coming online could not be hardware enabled.
 
 Signed-off-by: Zachary Amsden zams...@redhat.com

Applied all, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Release plan for 0.12.0

2009-09-30 Thread Michael S. Tsirkin
On Wed, Sep 30, 2009 at 08:03:20AM -0500, Anthony Liguori wrote:
 Hi Isaku,

 Isaku Yamahata wrote:
  o newer chipset (which is based on Q35 chipset)
  o multiple pci bus  o PCI express (MMCONFIG)
  o PCI express hot plug (not acpi based)
  o PCI express switch emulator

 Although there is no PCIe emulated device at the moment, this will be a 
 fundamental infrastructure for PCI express native
 direct attach.
   

 Your patches definitely deserve review/commit.  I'll make sure that  
 happens for the 0.12 time frame.

 Michael, could you help review some of the PCI patches?

Yes, I am doing this sent comments already.
The only thing I have not looked at yet is the new express file.

 Thanks,

 -- 
 Regards,

 Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream

2009-09-30 Thread Christoph Hellwig
I might sound like a broken record, but why isn't the full GSO support
for virtio-net upstream in qemu?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream

2009-09-30 Thread Avi Kivity

On 09/30/2009 03:51 PM, Christoph Hellwig wrote:

I might sound like a broken record, but why isn't the full GSO support
for virtio-net upstream in qemu?
   


IIRC the current hacks are not upstream quality.

The problem (again IIRC) is that the guest and host negotiate a 
protocol, but the qemu vlan model doesn't have a guest and a host, it 
has peers (possibly more than two), so a lot of translation has to take 
place if you have one peer supporting a guest feature and another not.


IMO the best way out is to drop the vlan model.  It has its uses, but 
they can all be implemented in other ways, and are all have minor usage 
compared to the business of getting data into and out of a guest.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream

2009-09-30 Thread Mark McLoughlin
On Wed, 2009-09-30 at 15:55 +0200, Avi Kivity wrote:
 On 09/30/2009 03:51 PM, Christoph Hellwig wrote:
  I might sound like a broken record, but why isn't the full GSO support
  for virtio-net upstream in qemu?
 
 
 IIRC the current hacks are not upstream quality.
 
 The problem (again IIRC) is that the guest and host negotiate a 
 protocol, but the qemu vlan model doesn't have a guest and a host, it 
 has peers (possibly more than two), so a lot of translation has to take 
 place if you have one peer supporting a guest feature and another not.

Right.

 IMO the best way out is to drop the vlan model.  It has its uses, but 
 they can all be implemented in other ways, and are all have minor usage 
 compared to the business of getting data into and out of a guest.

I think we should keep the vlan stuff, just de-emphasise it.

I'm planning on adding -hostnet and -nic arguments, which would not use
vlans by default but rather connect the nic directly to the host side.

The QemuOpts conversion of -net which is waiting to be merged is the
first stage of that.

Cheers,
Mark.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: migrate_set_downtime bug

2009-09-30 Thread Dietmar Maurer
 On Wed, Sep 30, 2009 at 10:55:24AM +0200, Dietmar Maurer wrote:
  Another problem occur when max_downtime is too short. This can
 results in never ending migration task.
 
  To reproduce just play a video inside a VM and set max_downtime to
 30ns
 
  Sure, one can argument that this behavior is expected.
 
  But the following would avoid the problem:
 
  +if ((stage == 2)  (bytes_transferred  2*ram_bytes_total())) {
  +return 1;
  +}
 why 2 * ?
 This means we'll have to transfer the whole contents of RAM at least
 twice to hit this condition, right?

Yes, this is just an arbitrary limit. 

- Dietmar

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


virt-install: hda disks?

2009-09-30 Thread James Brackinshaw
Hi,

Not sure if this is the right place to ask this.

I'm getting hda disks by default with kvm under RHEL5.4 using
virt-install. This seems an odd default. Is there a reason for hda
disks over sda disks? Can I change this?

Thanks,

JB
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Release plan for 0.12.0

2009-09-30 Thread Anthony Liguori

Luiz Capitulino wrote:

On Tue, 29 Sep 2009 18:54:53 -0500
Anthony Liguori aligu...@us.ibm.com wrote:

  
I think aiming for early to mid-December would give us roughly a 3 month 
cycle and would align well with some of the Linux distribution cycles.  
I'd like to limit things to a single -rc that lasted only for about a 
week.  This is enough time to fix most of the obvious issues I think.



 How do you plan to do it? I mean, are you going to create a separate branch
or make master the -rc?

 Creating a separate branch (which is what we do today, iiuc) makes it
get less attention, freezing master for a certain period is the best
way to stabilize.

 Is this what you had in mind?
  

What do people think?

One reason I branch is because some people care a bit less about 
releases so it makes the process non-disruptive to them.  If the other 
maintainers agreed though, I would certainly like to have the master 
branch essentially frozen for the week before the release.


--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Release plan for 0.12.0

2009-09-30 Thread Anthony Liguori

Amit Shah wrote:

On (Wed) Sep 30 2009 [08:04:17], Anthony Liguori wrote:
  

Amit Shah wrote:


On (Tue) Sep 29 2009 [18:54:53], Anthony Liguori wrote:
o multiport virtio-console support
  
  
Assuming we can get the kernel drivers straightened out, I think it's  
certainly reasonable for 0.12.



The kernel drivers are in fine shape.
  


I meant on track for including into the appropriate tree.  Looking for 
an Ack/Nack from Rusty.  That's been the general policy for all virtio 
changes btw.  Nothing specific to virtio-console.



Amit
  



--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Release plan for 0.12.0

2009-09-30 Thread Amit Shah
On (Wed) Sep 30 2009 [09:47:22], Anthony Liguori wrote:
 Amit Shah wrote:
 On (Wed) Sep 30 2009 [08:04:17], Anthony Liguori wrote:
   
 Amit Shah wrote:
 
 On (Tue) Sep 29 2009 [18:54:53], Anthony Liguori wrote:
 o multiport virtio-console support
 
 Assuming we can get the kernel drivers straightened out, I think it's 
  certainly reasonable for 0.12.

 The kernel drivers are in fine shape.

 I meant on track for including into the appropriate tree.  Looking for  
 an Ack/Nack from Rusty.  That's been the general policy for all virtio  
 changes btw.  Nothing specific to virtio-console.

That's fine.

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virt-install: hda disks?

2009-09-30 Thread Cole Robinson
On 09/30/2009 10:28 AM, James Brackinshaw wrote:
 Hi,
 
 Not sure if this is the right place to ask this.
 

virt-install questions should be directed to virt-tools-l...@redhat.com

 I'm getting hda disks by default with kvm under RHEL5.4 using
 virt-install. This seems an odd default. Is there a reason for hda
 disks over sda disks? Can I change this?
 

virt-install/libvirt defaults to IDE for disk devices (as does directly
launching qemu or kvm). These disks will show up in a RHEL5 guest as
/dev/hda, etc. In newer distros, these disks show up as /dev/sda, etc.
It's just a matter of the RHEL5 stack being older than the hdX - sdX
change.

If you want to use scsi disks via virt-install, you can use:

virt-install --disk ...,bus=scsi

Though AIUI it's generally considered buggy at the qemu level, and may
even be disabled in RHEL5.4

- Cole
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virt-install: hda disks?

2009-09-30 Thread James Brackinshaw
On Wed, Sep 30, 2009 at 4:51 PM, Cole Robinson crobi...@redhat.com wrote:
 On 09/30/2009 10:28 AM, James Brackinshaw wrote:
 Hi,

 Not sure if this is the right place to ask this.


 virt-install questions should be directed to virt-tools-l...@redhat.com

Thanks.

 I'm getting hda disks by default with kvm under RHEL5.4 using
 virt-install. This seems an odd default. Is there a reason for hda
 disks over sda disks? Can I change this?


 virt-install/libvirt defaults to IDE for disk devices (as does directly
 launching qemu or kvm). These disks will show up in a RHEL5 guest as
 /dev/hda, etc. In newer distros, these disks show up as /dev/sda, etc.
 It's just a matter of the RHEL5 stack being older than the hdX - sdX
 change.

 If you want to use scsi disks via virt-install, you can use:

 virt-install --disk ...,bus=scsi

 Though AIUI it's generally considered buggy at the qemu level, and may
 even be disabled in RHEL5.4

 - Cole

Is virtio stable and recommended?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virt-install: hda disks?

2009-09-30 Thread Cole Robinson
On 09/30/2009 11:11 AM, James Brackinshaw wrote:
 On Wed, Sep 30, 2009 at 4:51 PM, Cole Robinson crobi...@redhat.com wrote:
 On 09/30/2009 10:28 AM, James Brackinshaw wrote:
 Hi,

 Not sure if this is the right place to ask this.


 virt-install questions should be directed to virt-tools-l...@redhat.com
 
 Thanks.
 
 I'm getting hda disks by default with kvm under RHEL5.4 using
 virt-install. This seems an odd default. Is there a reason for hda
 disks over sda disks? Can I change this?


 virt-install/libvirt defaults to IDE for disk devices (as does directly
 launching qemu or kvm). These disks will show up in a RHEL5 guest as
 /dev/hda, etc. In newer distros, these disks show up as /dev/sda, etc.
 It's just a matter of the RHEL5 stack being older than the hdX - sdX
 change.

 If you want to use scsi disks via virt-install, you can use:

 virt-install --disk ...,bus=scsi

 Though AIUI it's generally considered buggy at the qemu level, and may
 even be disabled in RHEL5.4

 - Cole
 
 Is virtio stable and recommended?

Sorry, forgot about that. If you are installing a RHEL5.4 guest, use

virt-install --disk ...,model=virtio

or

virt-install --os-variant virtio26

which will take care of disk and networking defaults.

- Cole
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virt-install: hda disks?

2009-09-30 Thread James Brackinshaw
 virt-install --os-variant virtio26

 which will take care of disk and networking defaults.

 - Cole


Ah. For networking, is this in addition to, or instead of model
type=e1000 / ?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virt-install: hda disks?

2009-09-30 Thread Cole Robinson
On 09/30/2009 11:18 AM, James Brackinshaw wrote:
 virt-install --os-variant virtio26

 which will take care of disk and networking defaults.

 - Cole

 
 Ah. For networking, is this in addition to, or instead of model
 type=e1000 / ?

Instead.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Release plan for 0.12.0

2009-09-30 Thread Luiz Capitulino
On Wed, 30 Sep 2009 17:03:23 +0200
Fred Leeflang fr...@dutchie.org wrote:

 2009/9/30 Anthony Liguori aligu...@us.ibm.com
 
  Luiz Capitulino wrote:
 
  On Tue, 29 Sep 2009 18:54:53 -0500
  Anthony Liguori aligu...@us.ibm.com wrote:
 
 
 
  I think aiming for early to mid-December would give us roughly a 3 month
  cycle and would align well with some of the Linux distribution cycles.  
  I'd
  like to limit things to a single -rc that lasted only for about a week.
   This is enough time to fix most of the obvious issues I think.
 
 
 
   How do you plan to do it? I mean, are you going to create a separate
  branch
  or make master the -rc?
 
   Creating a separate branch (which is what we do today, iiuc) makes it
  get less attention, freezing master for a certain period is the best
  way to stabilize.
 
   Is this what you had in mind?
 
 
  What do people think?
 
  One reason I branch is because some people care a bit less about releases
  so it makes the process non-disruptive to them.  If the other maintainers
  agreed though, I would certainly like to have the master branch essentially
  frozen for the week before the release.
 
 
 freezing is only neccesary if you need time to gather all the patches, build
 and test them together etc. 

 Not exactly, freezing is done to stop/slowdown writing new code and focus
on bug fixing for a period of time.

 This is not only needed for a release, but projects should always try
to find the best balance between 'number of bugs' and 'feature addition rate'.

 If you don't feel you or the developers need to
 do that to get a reliable release out I think it only halts developers
 without any clear reason to do so. Calling 'attention' to a release is not a
 clear reason IMO.

 Having a functional and relatively stable release is not only
important, but it's the ultimate goal IMO.

 Obviously we should take care not to take extremes. No QEMU release
will be 100% bug free, that's why we have stables.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix last 2 KR prototyes

2009-09-30 Thread Marcelo Tosatti
On Wed, Sep 30, 2009 at 01:07:27AM +0200, Juan Quintela wrote:
 Rest of cases are already fixed qemu-upstream
 
 Signed-off-by: Juan Quintela quint...@redhat.com
 ---
  hw/device-assignment.c |2 +-
  qemu-kvm.c |2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Fix task switch back link handling (take 2)

2009-09-30 Thread Juan Quintela
Now, also remove pre_task_link setting in save_state_to_tss16.

  commit b237ac37a149e8b56436fabf093532483bff13b0
  Author: Gleb Natapov g...@redhat.com
  Date:   Mon Mar 30 16:03:24 2009 +0300

KVM: Fix task switch back link handling.

CC: Gleb Natapov g...@redhat.com
Signed-off-by: Juan Quintela quint...@redhat.com
---
 arch/x86/kvm/x86.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fedac9d..e5ed2cd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4196,7 +4196,6 @@ static void save_state_to_tss16(struct kvm_vcpu *vcpu,
tss-ss = get_segment_selector(vcpu, VCPU_SREG_SS);
tss-ds = get_segment_selector(vcpu, VCPU_SREG_DS);
tss-ldt = get_segment_selector(vcpu, VCPU_SREG_LDTR);
-   tss-prev_task_link = get_segment_selector(vcpu, VCPU_SREG_TR);
 }

 static int load_state_from_tss16(struct kvm_vcpu *vcpu,
-- 
1.6.2.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4: kvm 1/4] Code motion. Separate timer intialization into an indepedent function.

2009-09-30 Thread Zachary Amsden

On 09/29/2009 10:45 PM, Avi Kivity wrote:

On 09/29/2009 11:38 PM, Zachary Amsden wrote:

Signed-off-by: Zachary Amsdenzams...@redhat.com



Looks good.

Is anything preventing us from unifying the constant_tsc and !same 
paths?  We could just do a quick check in the notifier, see the tsc 
frequency hasn't changed, and return.


Actually, yes.  On constant_tsc processors, the processor frequency may 
still change, however the TSC frequency does not change with it.


I actually have both of these kinds of processors (freq changes with 
constant TSC and freq changes with variable TSC) so I was able to test 
both of these cases.


Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4: kvm 1/4] Code motion. Separate timer intialization into an indepedent function.

2009-09-30 Thread Avi Kivity

On 09/30/2009 05:51 PM, Zachary Amsden wrote:
Is anything preventing us from unifying the constant_tsc and !same 
paths?  We could just do a quick check in the notifier, see the tsc 
frequency hasn't changed, and return.



Actually, yes.  On constant_tsc processors, the processor frequency 
may still change, however the TSC frequency does not change with it.


I actually have both of these kinds of processors (freq changes with 
constant TSC and freq changes with variable TSC) so I was able to test 
both of these cases.


If the API allows us to query the tsc frequency, it would simply return 
the same values in all cases, which we'd ignore.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm tuning guide

2009-09-30 Thread Christoph Hellwig
On Wed, Sep 30, 2009 at 08:20:35AM +0200, Avi Kivity wrote:
 On 09/30/2009 07:09 AM, Nikola Ciprich wrote:
 The default, IDE, is highly supported by guests but may be slow, especially 
 with disk arrays. If your guest supports it, use the virtio interface:
 Avi,
 what is the status of data integrity issues Chris Hellwig summarized some 
 time ago?


 I don't know.  Christoph?

On the qemu side everything is in git HEAD now, but I'm not sure about
the qemu-0.11 release as I haven't really followed it.

For the guest kernel the virtio cache flush support is now in mainline
(past-2.6.31).  For the host kernel side about 2/3 of the fixes are now
in mainline (past-2.6.31) with the others hopefully getting in this
merge window.


 Is it safe to recommend virtio to newbies already?

 I think so.

I wouldn't.  At least not for people caring about their data.  It will
take a while to promote the guest side fixes to all the interesting
guests.  IDE has the major advantage that cache flush support has been
around in the guest driver for a long time so we only need to fix the
host side which is a lot easier.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4: kvm 1/4] Code motion. Separate timer intialization into an indepedent function.

2009-09-30 Thread Zachary Amsden

On 09/30/2009 05:56 AM, Avi Kivity wrote:

On 09/30/2009 05:51 PM, Zachary Amsden wrote:

If the API allows us to query the tsc frequency, it would simply 
return the same values in all cases, which we'd ignore.


The API only allows querying the processor frequency.  In the 
constant_tsc case, the highest processor frequency is likely going to be 
the actual TSC frequency, but I don't think it's a guarantee; 
theoretically, it could be faster on normal hardware ... or slower on 
overclocked hardware with an externally clocked TSC.


Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4: kvm 1/4] Code motion. Separate timer intialization into an indepedent function.

2009-09-30 Thread Zachary Amsden

On 09/30/2009 06:11 AM, Avi Kivity wrote:

On 09/30/2009 06:06 PM, Zachary Amsden wrote:

On 09/30/2009 05:56 AM, Avi Kivity wrote:

On 09/30/2009 05:51 PM, Zachary Amsden wrote:

If the API allows us to query the tsc frequency, it would simply 
return the same values in all cases, which we'd ignore.


The API only allows querying the processor frequency.  In the 
constant_tsc case, the highest processor frequency is likely going to 
be the actual TSC frequency, but I don't think it's a guarantee; 
theoretically, it could be faster on normal hardware ... or slower on 
overclocked hardware with an externally clocked TSC.


Well we could add a new API then (or a new tscfreq notifier).  Those 
conditionals don't belong in client code.


It's possible... but it's also possible to run without cpufreq enabled, 
which won't work properly unless the cpufreq code is aware of the 
measured tsc_khz...  this could be a little ugly architecture wise given 
the big melting pot of generic code and vendor / arch specific code here.


Since we're already very hardware dependent and one of the few clients 
who care, it seems okay to leave it as is for now.


Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting

2009-09-30 Thread Marcelo Tosatti
On Wed, Sep 30, 2009 at 09:01:51AM +0800, Zhai, Edwin wrote:
 Avi,
 I modify it according your comments. The only thing I want to keep is  
 the module param ple_gap/window.  Although they are not per-guest, they  
 can be used to find the right value, and disable PLE for debug purpose.

 Thanks,


 Avi Kivity wrote:
 On 09/28/2009 11:33 AM, Zhai, Edwin wrote:
   
 Avi Kivity wrote:
 
 +#define KVM_VMX_DEFAULT_PLE_GAP41
 +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096
 +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP;
 +module_param(ple_gap, int, S_IRUGO);
 +
 +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW;
 +module_param(ple_window, int, S_IRUGO);

 Shouldn't be __read_mostly since they're read very rarely  
 (__read_mostly should be for variables that are very often read, 
 and rarely written).
   
 In general, they are read only except that experienced user may try  
 different parameter for perf tuning.
 


 __read_mostly doesn't just mean it's read mostly.  It also means it's  
 read often.  Otherwise it's just wasting space in hot cachelines.

   
 I'm not even sure they should be parameters.
   
 For different spinlock in different OS, and for different workloads,  
 we need different parameter for tuning. It's similar as the 
 enable_ept.
 

 No, global parameters don't work for tuning workloads and guests since  
 they cannot be modified on a per-guest basis.  enable_ept is only 
 useful for debugging and testing.

   
 +set_current_state(TASK_INTERRUPTIBLE);
 +schedule_hrtimeout(expires, HRTIMER_MODE_ABS);
 +
 
 Please add a tracepoint for this (since it can cause significant  
 change in behaviour),   
 Isn't trace_kvm_exit(exit_reason, ...) enough? We can tell the PLE  
 vmexit from other vmexits.
 

 Right.  I thought of the software spinlock detector, but that's another 
 problem.

 I think you can drop the sleep_time parameter, it can be part of the  
 function.  Also kvm_vcpu_sleep() is confusing, we also sleep on halt.   
 Please call it kvm_vcpu_on_spin() or something (since that's what the  
 guest is doing).

kvm_vcpu_on_spin() should add the vcpu to vcpu-wq (so a new pending
interrupt wakes it up immediately).

Do you (and/or Mark) have any numbers for non-vcpu overcommited guests?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: migrate_set_downtime bug

2009-09-30 Thread Glauber Costa
On Wed, Sep 30, 2009 at 04:11:32PM +0200, Dietmar Maurer wrote:
  On Wed, Sep 30, 2009 at 10:55:24AM +0200, Dietmar Maurer wrote:
   Another problem occur when max_downtime is too short. This can
  results in never ending migration task.
  
   To reproduce just play a video inside a VM and set max_downtime to
  30ns
  
   Sure, one can argument that this behavior is expected.
  
   But the following would avoid the problem:
  
   +if ((stage == 2)  (bytes_transferred  2*ram_bytes_total())) {
   +return 1;
   +}
  why 2 * ?
  This means we'll have to transfer the whole contents of RAM at least
  twice to hit this condition, right?
 
 Yes, this is just an arbitrary limit. 
I don't know. If we are going for a limit, I would prefere a limit of pages yet 
to transfer,
not pages already transferred.

However, the very reason this whole thing was written in the first place, was 
to leave choices
to management tools ontop of qemu, not qemu itself. So I would say yes, if you 
set limit for 30ns,
you asked for it never finishing.

Your first patch is okay, tough.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Release plan for 0.12.0

2009-09-30 Thread Juan Quintela
Anthony Liguori aligu...@us.ibm.com wrote:
 Luiz Capitulino wrote:
 On Tue, 29 Sep 2009 18:54:53 -0500
 Anthony Liguori aligu...@us.ibm.com wrote:

   
 I think aiming for early to mid-December would give us roughly a 3
 month cycle and would align well with some of the Linux
 distribution cycles.  I'd like to limit things to a single -rc that
 lasted only for about a week.  This is enough time to fix most of
 the obvious issues I think.
 

  How do you plan to do it? I mean, are you going to create a separate branch
 or make master the -rc?

  Creating a separate branch (which is what we do today, iiuc) makes it
 get less attention, freezing master for a certain period is the best
 way to stabilize.

  Is this what you had in mind?
   
 What do people think?

 One reason I branch is because some people care a bit less about
 releases so it makes the process non-disruptive to them.  If the other
 maintainers agreed though, I would certainly like to have the master
 branch essentially frozen for the week before the release.

I am not a maintainer, but I still think that it is a good idea :)

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: migrate_set_downtime bug

2009-09-30 Thread Dietmar Maurer
+if ((stage == 2)  (bytes_transferred 
 2*ram_bytes_total())) {
+return 1;
+}
   why 2 * ?
   This means we'll have to transfer the whole contents of RAM at
 least
   twice to hit this condition, right?
 
  Yes, this is just an arbitrary limit.
 I don't know. If we are going for a limit, I would prefere a limit of
 pages yet to transfer,
 not pages already transferred.
 
 However, the very reason this whole thing was written in the first
 place, was to leave choices
 to management tools ontop of qemu, not qemu itself. So I would say yes,
 if you set limit for 30ns,
 you asked for it never finishing.

I just think of common scenarios like 'maintanace mode', where all VM should 
migrate to another host. A endless migrate task can make that fail. 

For me, it is totally unclear what value I should set for 'max_downtime' to 
avoid that behavior?

- Dietmar


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] qemu-kvm: virtio-net: Re-instate GSO code removed upstream

2009-09-30 Thread Gerd Hoffmann

On 09/30/09 15:59, Mark McLoughlin wrote:

I'm planning on adding -hostnet and -nic arguments, which would not use
vlans by default but rather connect the nic directly to the host side.


No new -nic argument please.  We should just finalize the qdev-ifycation 
of the nic drivers, then you'll do either


  -device e1000,vlan=nr

or

  -device e1000,hostnet=name

and be done with it.

cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Release plan for 0.12.0

2009-09-30 Thread Blue Swirl
On Wed, Sep 30, 2009 at 6:59 PM, Carl-Daniel Hailfinger
c-d.hailfinger.devel.2...@gmx.net wrote:
 On 30.09.2009 15:07, Anthony Liguori wrote:
 Carl-Daniel Hailfinger wrote:
 However, to run coreboot on Qemu with the same init sequence as on
 simplified real hardware, we need Cache-as-RAM (CAR) support. [...]

 Do we really need coreboot to use the same init sequence?   coreboot
 is firmware and we don't necessarily run real firmware under QEMU.
 It's a short cut that lets us avoid a lot of complexity.

 I know that some people were running 440BX BIOS images for real hardware
 on Qemu and they got pretty far.

 The complexity would be limited to the MTRR code and unless there were
 major architectural changes in mapping RAM to address ranges, no other
 code (except VM save and VM restore) should get even a single line changed.

 Right now coreboot sets up the MTRRs correctly, but then (conditional on
 Qemu) only uses areas which are known to be backed by RAM instead of the
 areas designated by CAR.

 I'd like to implement CAR support which builds on top of my MTRR code
 which was merged some months ago (and I already have code to check for
 total cacheable area size), but I need help with the memory mapping
 stuff. How do I proceed? Clean up what I have and insert FIXME
 comments where I don't know how to implement stuff so others can see the
 code and comment on it?

 You could start there.  But from a higher level, I'm not sure I think
 a partial implementation of something like CAR is all that valuable
 since coreboot already runs under QEMU.

 It only runs if WORKAROUND_QEMU is defined (maybe not exactly that name,
 but you get the point). The code in coreboot calculates MTRR settings to
 cover the place where the stack will be. To workaround missing CAR in
 Qemu, it then has to recalculate the stack location to be able to
 actually use the stack. That forces coreboot to keep two stack base
 variables and to completely replace the generic logic which switches off
 CAR.

 I hope the explanation above didn't offend you, I just tried to clarify
 why working CAR is such a big deal for coreboot.

 If you want either a full CAR implementation or no CAR implementation, I
 can write a patch which implements full CAR, but then I need to hook
 WBINVD, INVD and CLFLUSH. Neither instruction is executed often enough
 to show up in any profile. Besides that, for anything not using CAR
 (everything after the firmware), the penalty is a simple test of a
 boolean variable per WBINVD/INVD/CLFLUSH.

The CAR mode could affect only translation so that special CAR
versions of the WBINVD etc. instructions are selected. On switch to
normal mode, the TBs need to be flushed.

Instead of your memory mapping approach (which should work) you could
also try using different memory access functions in CAR mode. It may
be more difficult, though.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Release plan for 0.12.0

2009-09-30 Thread Gerd Hoffmann

On 09/30/09 16:45, Anthony Liguori wrote:

One reason I branch is because some people care a bit less about
releases so it makes the process non-disruptive to them. If the other
maintainers agreed though, I would certainly like to have the master
branch essentially frozen for the week before the release.


We had much longer disruptions without a release freeze, so why worry 
about a single week?  One week freeze is short enougth that the 
disruption isn't a big issue.  It will help testing the to-be-released 
code.  Go for it.


cheers,
  Gerd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-30 Thread Gregory Haskins
Avi Kivity wrote:
 On 09/26/2009 12:32 AM, Gregory Haskins wrote:

 I realize in retrospect that my choice of words above implies vbus _is_
 complete, but this is not what I was saying.  What I was trying to
 convey is that vbus is _more_ complete.  Yes, in either case some kind
 of glue needs to be written.  The difference is that vbus implements
 more of the glue generally, and leaves less required to be customized
 for each iteration.



 No argument there.  Since you care about non-virt scenarios and virtio
 doesn't, naturally vbus is a better fit for them as the code stands.
  
 Thanks for finally starting to acknowledge there's a benefit, at least.

 
 I think I've mentioned vbus' finer grained layers as helpful here,
 though I doubt the value of this.  Hypervisors are added rarely, while
 devices and drivers are added (and modified) much more often.  I don't
 buy the anything-to-anything promise.

The ease in which a new hypervisor should be able to integrate into the
stack is only one of vbus's many benefits.

 
 To be more precise, IMO virtio is designed to be a performance oriented
 ring-based driver interface that supports all types of hypervisors (e.g.
 shmem based kvm, and non-shmem based Xen).  vbus is designed to be a
 high-performance generic shared-memory interconnect (for rings or
 otherwise) framework for environments where linux is the underpinning
 host (physical or virtual).  They are distinctly different, but
 complementary (the former addresses the part of the front-end, and
 latter addresses the back-end, and a different part of the front-end).

 
 They're not truly complementary since they're incompatible.

No, that is incorrect.  Not to be rude, but for clarity:

  Complementary \Com`ple*menta*ry\, a.
 Serving to fill out or to complete; as, complementary
 numbers.
 [1913 Webster]

Citation: www.dict.org

IOW: Something being complementary has nothing to do with guest/host
binary compatibility.  virtio-pci and virtio-vbus are both equally
complementary to virtio since they fill in the bottom layer of the
virtio stack.

So yes, vbus is truly complementary to virtio afaict.

 A 2.6.27 guest, or Windows guest with the existing virtio drivers, won't work
 over vbus.

Binary compatibility with existing virtio drivers, while nice to have,
is not a specific requirement nor goal.  We will simply load an updated
KMP/MSI into those guests and they will work again.  As previously
discussed, this is how more or less any system works today.  It's like
we are removing an old adapter card and adding a new one to uprev the
silicon.

  Further, non-shmem virtio can't work over vbus.

Actually I misspoke earlier when I said virtio works over non-shmem.
Thinking about it some more, both virtio and vbus fundamentally require
shared-memory, since sharing their metadata concurrently on both sides
is their raison d'être.

The difference is that virtio utilizes a pre-translation/mapping (via
-add_buf) from the guest side.  OTOH, vbus uses a post translation
scheme (via memctx) from the host-side.  If anything, vbus is actually
more flexible because it doesn't assume the entire guest address space
is directly mappable.

In summary, your statement is incorrect (though it is my fault for
putting that idea in your head).

  Since
 virtio is guest-oriented and host-agnostic, it can't ignore
 non-shared-memory hosts (even though it's unlikely virtio will be
 adopted there)

Well, to be fair no one said it has to ignore them.  Either virtio-vbus
transport is present and available to the virtio stack, or it isn't.  If
its present, it may or may not publish objects for consumption.
Providing a virtio-vbus transport in no way limits or degrades the
existing capabilities of the virtio stack.  It only enhances them.

I digress.  The whole point is moot since I realized that the non-shmem
distinction isn't accurate anyway.  They both require shared-memory for
the metadata, and IIUC virtio requires the entire address space to be
mappable whereas vbus only assumes the metadata is.

 
 In addition, the kvm-connector used in AlacrityVM's design strives to
 add value and improve performance via other mechanisms, such as dynamic
   allocation, interrupt coalescing (thus reducing exit-ratio, which is a
 serious issue in KVM)
 
 Do you have measurements of inter-interrupt coalescing rates (excluding
 intra-interrupt coalescing).

I actually do not have a rig setup to explicitly test inter-interrupt
rates at the moment.  Once things stabilize for me, I will try to
re-gather some numbers here.  Last time I looked, however, there were
some decent savings for inter as well.

Inter rates are interesting because they are what tends to ramp up with
IO load more than intra since guest interrupt mitigation techniques like
NAPI often quell intra-rates naturally.  This is especially true for
data-center, cloud, hpc-grid, etc, kind of workloads (vs vanilla
desktops, etc) that tend to have multiple IO 

INFO: task journal:337 blocked for more than 120 seconds

2009-09-30 Thread Shirley Ma
Hello all,

Anybody found this problem before? I kept hitting this issue for 2.6.31
guest kernel even with a simple network test.

INFO: task kjournal:337 blocked for more than 120 seconds.
echo 0  /proc/sys/kernel/hung_task_timeout_sec disables this message.

kjournald   D 0041  0   337 2 0x

My test is totally being blocked.

Thanks
Shirley

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm or qemu-kvm?

2009-09-30 Thread Charles Duffy

Ross Boylan wrote:

http://www.linux-kvm.org/page/HOWTO1 says to build kvm I should get the
latest kvm-release.tar.gz.

http://www.linux-kvm.org/page/Downloads says If you want to use the
latest version of KVM kernel modules and supporting userspace, you can
download the latest version from
http://sourceforge.net/project/showfiles.php?group_id=180599.;
That page shows the latest version is qemu-kvm-0.11.0.tar.gz.

The most recent kvm-release.tar.gz appears to be for kvm-88.

So which file should I start from?


If you don't know what you want, you want qemu-kvm, which is based off a 
stable release of qemu.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on disable_kvm_x86_64_out_of_tree

2009-09-30 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_x86_64_out_of_tree on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_out_of_tree/builds/29

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this 
build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in qemu-kvm on disable_kvm_x86_64_debian_5_0

2009-09-30 Thread qemu-kvm
The Buildbot has detected a new failure of disable_kvm_x86_64_debian_5_0 on 
qemu-kvm.
Full details are available at:
 
http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_debian_5_0/builds/80

Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/

Buildslave for this Build: b1_qemu_kvm_1

Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered this 
build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix warning in sync

2009-09-30 Thread Zachary Amsden

Patch is self-explanatory
commit 071a800cd07c2b9d13c7909aa99016d89a814ae6
Author: Zachary Amsden zams...@redhat.com
Date:   Wed Sep 30 17:03:16 2009 -1000

Remove warning due to kvm_mmu_notifier_change_pte being static

Signed-off-by: Zachary Amsden zams...@redhat.com

diff --git a/sync b/sync
index b09f629..0bbd488 100755
--- a/sync
+++ b/sync
@@ -97,6 +97,9 @@ def __hack(data):
 line = '#include asm/types.h'
 if match(r'\t\.change_pte.*kvm_mmu_notifier_change_pte,'):
 line = '#ifdef MMU_NOTIFIER_HAS_CHANGE_PTE\n' + line + '\n#endif'
+if match(r'static void kvm_mmu_notifier_change_pte'):
+line = sub(r'static ', '', line)
+line = '#ifdef MMU_NOTIFIER_HAS_CHANGE_PTE\n' + 'static\n' + 
'#endif\n' + line
 line = sub(r'\bhrtimer_init\b', 'hrtimer_init_p', line)
 line = sub(r'\bhrtimer_start\b', 'hrtimer_start_p', line)
 line = sub(r'\bhrtimer_cancel\b', 'hrtimer_cancel_p', line)


Re: buildbot failure in qemu-kvm on disable_kvm_x86_64_debian_5_0

2009-09-30 Thread Daniel Gollub
On Thursday 01 October 2009 04:05:40 am qemu-...@buildbot.b1-systems.de wrote:
 The Buildbot has detected a new failure of disable_kvm_x86_64_debian_5_0 on
  qemu-kvm. Full details are available at:
  http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_debian_
 5_0/builds/80
 
 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/
 
 Buildslave for this Build: b1_qemu_kvm_1
 
 Build Reason: The Nightly scheduler named 'nightly_disable_kvm' triggered
  this build Build Source Stamp: [branch master] HEAD
 Blamelist:
 
 BUILD FAILED: failed compile


Please ignore buildbot failure disable_kvm_x86_64_debian_5_0 (#80) and  
disable_kvm_x86_64_out_of_tree (#29)

Two nightly builds got scheduled at the same time (disable_kvm and out-of-
tree_disable_kvm) for the same buildslave ... which caused memory preasure in 
the tiny buildslave VM. Will change that: out-of-tree should get (nightly) 
build tested an hour later or so ... to avoid two builds at the same time.

Best Regards,
Daniel


-- 
Daniel GollubGeschaeftsfuehrer: Ralph Dehner
FOSS Developer   Unternehmenssitz:  Vohburg
B1 Systems GmbH  Amtsgericht:   Ingolstadt
Mobil: +49-(0)-160 47 73 970 Handelsregister:   HRB 3537
EMail: gol...@b1-systems.de  http://www.b1-systems.de

Adresse: B1 Systems GmbH, Osterfeldstraße 7, 85088 Vohburg
http://pgpkeys.pca.dfn.de/pks/lookup?op=getsearch=0xED14B95C2F8CA78D


signature.asc
Description: This is a digitally signed message part.


Re: linux-next: tree build failure

2009-09-30 Thread Jan Beulich
 roel kluin roel.kl...@gmail.com 29.09.09 11:51 
On Tue, Sep 29, 2009 at 11:28 AM, Jan Beulich jbeul...@novell.com wrote:
 Hollis Blanchard  09/29/09 2:00 AM 
First, I think there is a real bug here, and the code should read like
this (to match the comment):
/* type has to be known at build time for optimization */
-BUILD_BUG_ON(__builtin_constant_p(type));
+BUILD_BUG_ON(!__builtin_constant_p(type));

However, I get the same build error *both* ways, i.e.
__builtin_constant_p(type) evaluates to both 0 and 1? Either that, or
the new BUILD_BUG_ON() macro isn't working...

 No, at this point of the compilation process it's neither zero nor one,
 it's simply considered non-constant by the compiler at that stage
 (this builtin is used for optimization, not during parsing, and the
 error gets generated when the body of the function gets parsed,
 not when code gets generated from it).

 Jan

then maybe

if(__builtin_constant_p(type))
BUILD_BUG_ON(1);

would work?

Definitely not - this would result in the compiler *always* generating an
error.

Jan

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: tree build failure

2009-09-30 Thread Jan Beulich
 Hollis Blanchard holl...@us.ibm.com 30.09.09 01:39 
On Tue, 2009-09-29 at 10:28 +0100, Jan Beulich wrote:
  Hollis Blanchard  09/29/09 2:00 AM 
 First, I think there is a real bug here, and the code should read like
 this (to match the comment):
 /* type has to be known at build time for optimization */
 -BUILD_BUG_ON(__builtin_constant_p(type));
 +BUILD_BUG_ON(!__builtin_constant_p(type));
 
 However, I get the same build error *both* ways, i.e.
 __builtin_constant_p(type) evaluates to both 0 and 1? Either that, or
 the new BUILD_BUG_ON() macro isn't working...
 
 No, at this point of the compilation process it's neither zero nor one,
 it's simply considered non-constant by the compiler at that stage
 (this builtin is used for optimization, not during parsing, and the
 error gets generated when the body of the function gets parsed,
 not when code gets generated from it).

I think I see what you're saying. Do you have a fix to suggest?

The one Rusty suggested the other day may help here. I don't like it
as a drop-in replacement for BUILD_BUG_ON() though (due to it
deferring the error generated to the linking stage), I'd rather view
this as an improvement to MAYBE_BUILD_BUG_ON() (which should
then be used here).

Jan

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4

2009-09-30 Thread Avi Kivity

On 09/29/2009 10:17 AM, Alexander Graf wrote:

KVM for PowerPC only supports embedded cores at the moment.

While it makes sense to virtualize on small machines, it's even more fun
to do so on big boxes. So I figured we need KVM for PowerPC64 as well.

This patchset implements KVM support for Book3s_64 hosts and guest support
for Book3s_64 and G3/G4.

To really make use of this, you also need a recent version of qemu.
   


Looks good to my non-ppc eyes.  I'd like to see thus reviewed by the 
powerpc people, then it's good to go.




TODO:

  - use MMU Notifiers
   



What's the plan here?  While not a requirement for merging, that's one 
of the kvm points of strength and I'd like to see it supported across 
the board.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4

2009-09-30 Thread Alexander Graf


On 30.09.2009, at 10:42, Avi Kivity wrote:


On 09/29/2009 10:17 AM, Alexander Graf wrote:

KVM for PowerPC only supports embedded cores at the moment.

While it makes sense to virtualize on small machines, it's even  
more fun
to do so on big boxes. So I figured we need KVM for PowerPC64 as  
well.


This patchset implements KVM support for Book3s_64 hosts and guest  
support

for Book3s_64 and G3/G4.

To really make use of this, you also need a recent version of qemu.



Looks good to my non-ppc eyes.  I'd like to see thus reviewed by the  
powerpc people, then it's good to go.




TODO:

 - use MMU Notifiers




What's the plan here?  While not a requirement for merging, that's  
one of the kvm points of strength and I'd like to see it supported  
across the board.


I'm having a deja vu :-).

The plan is to get qemu ppc64 guest support in a shape where it can  
actually use the KVM support. As it is it's rather useless.
When we have that, a PV interface would be needed to get things fast  
and then the next thing on my list is the MMU notifiers.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4

2009-09-30 Thread Avi Kivity

On 09/30/2009 10:47 AM, Alexander Graf wrote:


What's the plan here?  While not a requirement for merging, that's 
one of the kvm points of strength and I'd like to see it supported 
across the board.



I'm having a deja vu :-).


Will probably get one on every repost.



The plan is to get qemu ppc64 guest support in a shape where it can 
actually use the KVM support. As it is it's rather useless.
When we have that, a PV interface would be needed to get things fast 
and then the next thing on my list is the MMU notifiers.


Um.  How slow is it today?  What paths are problematic? mmu, context switch?

Our experience with pv on x86 has been mostly negative.  It's not 
trivial to get security right, it ended up slower than non-pv, and 
hardware obsoleted it fairly quickly.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v4

2009-09-30 Thread Alexander Graf


On 30.09.2009, at 10:59, Avi Kivity wrote:


On 09/30/2009 10:47 AM, Alexander Graf wrote:


What's the plan here?  While not a requirement for merging, that's  
one of the kvm points of strength and I'd like to see it supported  
across the board.



I'm having a deja vu :-).


Will probably get one on every repost.


Yippie :)

The plan is to get qemu ppc64 guest support in a shape where it can  
actually use the KVM support. As it is it's rather useless.
When we have that, a PV interface would be needed to get things  
fast and then the next thing on my list is the MMU notifiers.


Um.  How slow is it today?  What paths are problematic? mmu, context  
switch?


Instruction emulation.

X86 with virtualization extensions doesn't trap often, as most of the  
state can be safely handled within the guest mode.
Now with PPC we're basically running in ring 3 (called problem  
state in ppc speech) which traps all the time because guests change  
the IF or access some SPRs that we don't really need to trap on, but  
only need to sync state with on #VMEXIT.


So the PV idea here is to have a shared page between host and guest  
that contains guest specific SPRs and other state (an MSR shadow for  
example). That way the guest can patch itself to use that shared page  
and KVM always knows about the most current state on #VMEXIT. At the  
same time we're reducing exits by a _lot_.


A short kvm_stat during boot of a ppc32 guest on ppc64 shows what I'm  
talking about:


 dec   3224 168
 exits 18957500 1037240
 ext_intr75   5
 halt_wakeup   6874   0
 inst_emu   8570503  818597
 ld   0   0
 ld_slow  0   0
 mmio   8719444   26249
 pf_instruc  302572   35379
 pf_storage 9215970   86750
 queue_intr  354020   31482
 sig   7244 188
 sp_instruc  302541   35365
 sp_storage  370002   45370
 st   0   0
 st_slow  0   0
 sysc 579075342


As you can see the bulk of exits are from MMIO and emulation.

We certainly won't be able to get rid of all the emulation exits, but  
quite a bunch of them aren't really that useful.


For MMIO we'll hopefully be able to use virtio.

Our experience with pv on x86 has been mostly negative.  It's not  
trivial to get security right, it ended up slower than non-pv, and  
hardware obsoleted it fairly quickly.


Yes, and I really don't want to overdo it. PV for mfmsr/mtmsr and  
mfspr/mtspr is really necessary. X86 simply has that in hardware.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 26/27] Enable 32bit dirty log pointers on 64bit host

2009-09-30 Thread Avi Kivity

On 09/30/2009 02:04 PM, Arnd Bergmann wrote:

On Tuesday 29 September 2009, Avi Kivity wrote:
   

 r = -EINVAL;
 if (log-slot= KVM_MEMORY_SLOTS)
@@ -718,8 +719,15 @@ int kvm_get_dirty_log(struct kvm *kvm,
 for (i = 0; !any  i  n/sizeof(long); ++i)
 any = memslot-dirty_bitmap[i];

+#if defined(__BIG_ENDIAN)  defined(CONFIG_64BIT)
+   /* Need to convert user pointers */
+   if (test_thread_flag(TIF_32BIT))
+   target_bm = (void*)((u64)log-dirty_bitmap  32);
+   else
+#endif
+   target_bm = log-dirty_bitmap;
 r = -EFAULT;
-   if (copy_to_user(log-dirty_bitmap, memslot-dirty_bitmap, n))
+   if (copy_to_user(target_bm, memslot-dirty_bitmap, n))
 goto out;

 if (any)
   

Ah, that's much better.  Plus a mental note not to put pointers in
user-visible structures in the future.  This can serve as a reminder :)
 

It's still broken on s390, which

1. uses TIF_31BIT instead of TIF_32BIT
2. needs to call compat_ptr() to do a real conversion instead of a cast

The TIF_32BIT method is also not reliable. E.g. on x86_64 you are supposed
to get the 32 bit ABI when calling through INT80 instead of syscall/sysenter,
independent of the value of TIF_32BIT.

A better way to do this is to add a separate compat_ioctl() method that
converts this for you.

The patch below is an example for the canonical way to do this. Not tested!

Signed-off-by: Arnd Bergmanna...@arndb.de
---

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 897bff3..20f88ad 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2297,6 +2297,49 @@ out:
return r;
  }

+#ifdef CONFIG_COMPAT
+struct compat_kvm_dirty_log {
+   __u32 slot;
+   __u32 padding1;
+   union {
+   compat_uptr_t dirty_bitmap; /* one bit per page */
+   __u64 padding2;
+   };
+};
+
+static long kvm_vm_compat_ioctl(struct file *filp,
+  unsigned int ioctl, unsigned long arg)
+{
+   struct kvm *kvm = filp-private_data;
+   int r;
+
+   if (kvm-mm != current-mm)
+   return -EIO;
+   switch (ioctl) {
+   case KVM_GET_DIRTY_LOG: {
+   struct compat_kvm_dirty_log compat_log;
+   struct kvm_dirty_log log;
+
+   r = -EFAULT;
+   if (copy_from_user(compat_log, (void __user *)arg, sizeof log))
+   goto out;
+   log.slot = compat_log.slot;
+   log.padding1 = compat_log.padding1;
+   log.padding2 = compat_log.padding2;
+   log.dirty_bitmap = compat_ptr(compat_log.dirty_bitmap);
+
+   r = kvm_vm_ioctl_get_dirty_log(kvm,log.log);
+   if (r)
+   goto out;
+   break;
+   default:
+   r = kvm_vm_ioctl(filp, ioctl, arg);
+   }
+
+   return r;
+}
+#endif
+
  static int kvm_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
  {
struct page *page[1];
@@ -2331,7 +2374,7 @@ static int kvm_vm_mmap(struct file *file, struct 
vm_area_struct *vma)
  static struct file_operations kvm_vm_fops = {
.release= kvm_vm_release,
.unlocked_ioctl = kvm_vm_ioctl,
-   .compat_ioctl   = kvm_vm_ioctl,
+   .compat_ioctl   = kvm_vm_compat_ioctl,
.mmap   = kvm_vm_mmap,
  };
   


This is a bit painful - I tried to avoid compat_ioctl.  Maybe it's 
better to have dirty_bitmap_virt, given no existing users are impacted.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 26/27] Enable 32bit dirty log pointers on 64bit host

2009-09-30 Thread Avi Kivity

On 09/30/2009 03:17 PM, Avi Kivity wrote:

  {
  struct page *page[1];
@@ -2331,7 +2374,7 @@ static int kvm_vm_mmap(struct file *file, 
struct vm_area_struct *vma)

  static struct file_operations kvm_vm_fops = {
  .release= kvm_vm_release,
  .unlocked_ioctl = kvm_vm_ioctl,
-.compat_ioctl   = kvm_vm_ioctl,
+.compat_ioctl   = kvm_vm_compat_ioctl,
  .mmap   = kvm_vm_mmap,
  };
  static int kvm_vm_fault(struct vm_area_struct *vma, struct vm_fault 
*vmf)


This is a bit painful - I tried to avoid compat_ioctl.  Maybe it's 
better to have dirty_bitmap_virt, given no existing users are impacted.




But that misses compat_ptr().  So it looks like we'll need compat_ioctl.

Patch looks fine, except s/log.log/log/.  I'd also sizeof(compat_log) 
instead of sizeof(log) to avoid frightening reviewers.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html