Re: I/O performance of VirtIO

2009-10-13 Thread Jan Kiszka
Michael Tokarev wrote:
 René Pfeiffer wrote:
 Hello!

 I just tested qemu-kvm-0.11.0 with the KVM module of kernel 2.6.31.1. I
 noticed that the I/O performance of an unattended stock Debian Lenny
 install dropped somehow. The test machines ran with kvm-88 and 2.6.30.x
 before. The difference is very noticeable (went from about 5 minutes up
 to 15-25 minutes). The two test machines have different CPUs (one is an
 Intel Core2 CPU, the other runs with an AMD Athlon 64 X2 Dual).

 Is this the effect of added code regarding caching/data integrity to the
 VirtIO block layer or somewhere else? The qemu-system-x86_64 seems to
 hang a lot more in heavy I/O (showing 'D' in top/htop).

 The command line is quite straight-forward:
 qemu-system-x86_64 -drive file=debian.qcow2,if=virtio,boot=on -cdrom \
 /srv/isos/debian-502-i386-netinst.iso -smp 2 -boot d -m 512 -net nic \
 -net user -usb
   ^
 
 Care to try with something more real than user-level networking?
 You're using netinstall which - apparently - tries to use some
 networking d/loading components etc, and userlevel networking is
 known to be very very slow

It can be particularly slow if you use in-kernel irqchips and the
default NIC emulation (up to 10 times slower), some effect I always
wanted to understand on a rainy day. So, when you actually want -net
user, try -no-kvm-irqchip.

Jan



signature.asc
Description: OpenPGP digital signature


Re: kernel bug in kvm_intel

2009-10-13 Thread Avi Kivity

On 10/12/2009 08:42 PM, Andrew Theurer wrote:

On Sun, 2009-10-11 at 07:19 +0200, Avi Kivity wrote:
   

On 10/09/2009 10:04 PM, Andrew Theurer wrote:
 

This is on latest master branch on kvm.git and qemu-kvm.git, running
12 Windows Server2008 VMs, and using oprofile.  I ran again without
oprofile and did not get the BUG.  I am wondering if anyone else is
seeing this.

Thanks,

-Andrew

   

Oct  9 11:55:13 virtvictory-eth0 kernel: BUG: unable to handle kernel
paging request at 9fe9a2b4
Oct  9 11:55:13 virtvictory-eth0 kernel: IP: [a02e1af1]
vmx_vcpu_run+0x26d/0x64f [kvm_intel]
 

Can you run this through objdump or gdb to see what source this
corresponds to?

 

Somewhere here I think (?)

objdump -d
   



Look at the address where vmx_vcpu_run starts, add 0x26d, and show the 
surrounding code.


Thinking about it, it probably _is_ what you showed, due to module page 
alignment.  But please verify this; I can't reconcile the fault address 
(9fe9a2b) with %rsp at the time of the fault.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Biweekly KVM Test report, kernel 94252... qemu 5cc3c...

2009-10-13 Thread Xu, Jiajun
On Monday, October 05, 2009 7:00 PM Avi Kivity wrote:

 On 09/29/2009 05:34 AM, Xu, Jiajun wrote:
 Hi All,
 
 This Weekly KVM Testing Report against lastest kvm.git
 94252a58662dc4ca6191eac479efb40e0716865c and qemu-kvm.git
 5cc3cfb6c2254483ae324da407a13307fe7355f3.
 
 Qemu-kvm tree build issue is fixed by qemu commit
 781774b38c90797add71d029b7fbee43200c66d4.
 There is no other new bug found in this two weeks. There are
 7 old bugs open in bug tracking.
 
 
 Seven Old Issues:
 
 1. Guest hang with exhausted IRQ sources error if 8 VFs assigned
 
 https://sourceforge.net/tracker/?func=detailaid=2847560group_
 id=180599atid=893831
 
 
 Does the attached patch fix this issue?

With attached patch, VF can not be enabled with following error:

igb :01:00.0: can't find IRQ for PCI INT A; probably buggy MP table
igb :01:00.0: setting latency timer to 64
igb :01:00.0: irq 88 for MSI/MSI-X
igb :01:00.0: irq 89 for MSI/MSI-X
igb :01:00.0: irq 90 for MSI/MSI-X
igb :01:00.0: irq 91 for MSI/MSI-X
igb :01:00.0: irq 92 for MSI/MSI-X
igb :01:00.0: irq 93 for MSI/MSI-X
igb :01:00.0: irq 94 for MSI/MSI-X
igb :01:00.0: irq 95 for MSI/MSI-X
igb :01:00.0: irq 96 for MSI/MSI-X
igb :01:00.0: Intel(R) Gigabit Ethernet Network Connection
igb :01:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:30:48:cb:79:e8
igb :01:00.0: eth0: PBA No: 0010ff-0ff
igb :01:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
igb :01:00.1: can't find IRQ for PCI INT B; probably buggy MP table
igb :01:00.1: setting latency timer to 64
igb :01:00.1: irq 97 for MSI/MSI-X
igb :01:00.1: irq 98 for MSI/MSI-X
igb :01:00.1: irq 99 for MSI/MSI-X
igb :01:00.1: irq 100 for MSI/MSI-X
igb :01:00.1: irq 101 for MSI/MSI-X
igb :01:00.1: irq 102 for MSI/MSI-X
igb :01:00.1: irq 103 for MSI/MSI-X
igb :01:00.1: irq 104 for MSI/MSI-X
igb :01:00.1: irq 105 for MSI/MSI-X
igb :01:00.1: Intel(R) Gigabit Ethernet Network Connection
igb :01:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:30:48:cb:79:e9
igb :01:00.1: eth1: PBA No: 0010ff-0ff
igb :01:00.1: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)


Best Regards
Jiajun--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: QemuOpts changes breaks multiple nic options

2009-10-13 Thread Mark McLoughlin
Hi Tom,

On Mon, 2009-10-12 at 17:05 -0500, Tom Lendacky wrote:
 The recent change to QemuOpts for the -net nic option breaks specifying -net 
 nic,... more than once.  The net_init_nic function's return value in net.c is 
 a table index, which is non-zero after the first time it is called.  The 
 qemu_opts_foreach function in qemu-option.c receives the non-zero return 
 value 
 and stops processing further -net options (like associated -net tap options). 
  
 It looks like the usb net function makes use of the index value, so the fix 
 might best be to have qemu_opts_foreach check for a return code  0 as being 
 an error?

Thanks for the report; I sent a patch to qemu-devel yesterday:

  http://lists.gnu.org/archive/html/qemu-devel/2009-10/msg01070.html

Cheers,
Mark.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Little bug fix in pci_hotplug.py

2009-10-13 Thread Yolkfull Chow
If command executed timeout, the return value of status could be None,
which is missed in judge statement:

if s:
   ...

Thanks Jason Wang for pointing this out.

Signed-off-by: Yolkfull Chow yz...@redhat.com
---
 client/tests/kvm/tests/pci_hotplug.py |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/tests/pci_hotplug.py 
b/client/tests/kvm/tests/pci_hotplug.py
index 01d9447..3ad9ea2 100644
--- a/client/tests/kvm/tests/pci_hotplug.py
+++ b/client/tests/kvm/tests/pci_hotplug.py
@@ -83,7 +83,7 @@ def run_pci_hotplug(test, params, env):
 
 # Test the newly added device
 s, o = session.get_command_status_output(params.get(pci_test_cmd))
-if s:
+if s != 0:
 raise error.TestFail(Check for %s device failed after PCI hotplug. 
  Output: %s % (test_type, o))
 
-- 
1.6.2.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] virtio: adding __devexit to virtballoon_remove

2009-10-13 Thread Rusty Russell
Thanks, I already have this from Uwe.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST PATCH 1/3] KVM test: kvm_subprocess.py: do not start tail thread by default

2009-10-13 Thread Michael Goldish
Start the tail thread only if the user specifies a non-None output_func or
termination_func.

Signed-off-by: Michael Goldish mgold...@redhat.com
---
 client/tests/kvm/kvm_subprocess.py |   14 --
 1 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/client/tests/kvm/kvm_subprocess.py 
b/client/tests/kvm/kvm_subprocess.py
index 2ac062a..ede8081 100755
--- a/client/tests/kvm/kvm_subprocess.py
+++ b/client/tests/kvm/kvm_subprocess.py
@@ -596,9 +596,10 @@ class kvm_tail(kvm_spawn):
 self.output_prefix = output_prefix
 
 # Start the thread in the background
+self.tail_thread = None
 self.__thread_kill_requested = False
-self.tail_thread = threading.Thread(None, self._tail)
-self.tail_thread.start()
+if termination_func or output_func:
+self._start_thread()
 
 
 def __getinitargs__(self):
@@ -617,6 +618,8 @@ class kvm_tail(kvm_spawn):
 Must take a single parameter -- the exit status.
 
 self.termination_func = termination_func
+if termination_func and not self.tail_thread:
+self._start_thread()
 
 
 def set_termination_params(self, termination_params):
@@ -637,6 +640,8 @@ class kvm_tail(kvm_spawn):
 output from the process.  Must take a single string parameter.
 
 self.output_func = output_func
+if output_func and not self.tail_thread:
+self._start_thread()
 
 
 def set_output_params(self, output_params):
@@ -726,6 +731,11 @@ class kvm_tail(kvm_spawn):
 pass
 
 
+def _start_thread(self):
+self.tail_thread = threading.Thread(None, self._tail)
+self.tail_thread.start()
+
+
 def _join_thread(self):
 # Wait for the tail thread to exit
 if self.tail_thread:
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST PATCH 2/3] KVM test: kvm_subprocess.py: use only unbound methods as close() hooks

2009-10-13 Thread Michael Goldish
close() will pass 'self' as a parameter to the hook functions, i.e. it will
call hook(self) instead of just hook(), thus allowing the use of unbound
methods rather than bound ones.

This allows us to avoid self referencing: if a bound method is used, a
reference to it is kept in the class instance, and if the method is bound to
the same instance then we have a self-reference that prevents garbage
collection.

Signed-off-by: Michael Goldish mgold...@redhat.com
---
 client/tests/kvm/kvm_subprocess.py |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/client/tests/kvm/kvm_subprocess.py 
b/client/tests/kvm/kvm_subprocess.py
index ede8081..a625315 100755
--- a/client/tests/kvm/kvm_subprocess.py
+++ b/client/tests/kvm/kvm_subprocess.py
@@ -490,7 +490,7 @@ class kvm_spawn:
 _wait(self.lock_server_running_filename)
 # Call all cleanup routines
 for hook in self.close_hooks:
-hook()
+hook(self)
 # Close reader file descriptors
 for fd in self.reader_fds.values():
 try:
@@ -583,7 +583,7 @@ class kvm_tail(kvm_spawn):
 
 # Add a reader and a close hook
 self._add_reader(tail)
-self._add_close_hook(self._join_thread)
+self._add_close_hook(kvm_tail._join_thread)
 
 # Init the superclass
 kvm_spawn.__init__(self, command, id, echo, linesep)
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST PATCH 3/3] KVM test: kvm_subprocess.py: automatically close unreferenced shell sessions

2009-10-13 Thread Michael Goldish
Note that if a session has a tracking thread (i.e. if output_func or
termination_func are set to something other than None) then the session will
not be garbage collected (it must be closed explicitly by the test).

Signed-off-by: Michael Goldish mgold...@redhat.com
---
 client/tests/kvm/kvm_subprocess.py |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/kvm_subprocess.py 
b/client/tests/kvm/kvm_subprocess.py
index a625315..859aa2b 100755
--- a/client/tests/kvm/kvm_subprocess.py
+++ b/client/tests/kvm/kvm_subprocess.py
@@ -1010,6 +1010,10 @@ class kvm_shell_session(kvm_expect):
self.status_test_command)
 
 
+def __del__(self):
+self.close()
+
+
 def set_prompt(self, prompt):
 
 Set the prompt attribute for later use by read_up_to_prompt.
-- 
1.5.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Complete cpu initialization before signaling main thread.

2009-10-13 Thread Gleb Natapov
Otherwise some cpus may start executing code before others
are fully initialized.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 qemu-kvm.c |   26 --
 1 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 62ca050..3765818 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1954,18 +1954,6 @@ static void process_irqchip_events(CPUState *env)
 
 static int kvm_main_loop_cpu(CPUState *env)
 {
-setup_kernel_sigmask(env);
-
-pthread_mutex_lock(qemu_mutex);
-
-kvm_arch_init_vcpu(env);
-#ifdef TARGET_I386
-kvm_tpr_vcpu_start(env);
-#endif
-
-cpu_single_env = env;
-kvm_arch_load_regs(env);
-
 while (1) {
 int run_cpu = !is_cpu_stopped(env);
 if (run_cpu  !kvm_irqchip_in_kernel(kvm_context)) {
@@ -2003,15 +1991,25 @@ static void *ap_main_loop(void *_env)
 on_vcpu(env, kvm_arch_do_ioperm, data);
 #endif
 
-/* signal VCPU creation */
+setup_kernel_sigmask(env);
+
 pthread_mutex_lock(qemu_mutex);
+cpu_single_env = env;
+
+kvm_arch_init_vcpu(env);
+#ifdef TARGET_I386
+kvm_tpr_vcpu_start(env);
+#endif
+
+kvm_arch_load_regs(env);
+
+/* signal VCPU creation */
 current_env-created = 1;
 pthread_cond_signal(qemu_vcpu_cond);
 
 /* and wait for machine initialization */
 while (!qemu_system_ready)
 qemu_cond_wait(qemu_system_cond);
-pthread_mutex_unlock(qemu_mutex);
 
 kvm_main_loop_cpu(env);
 return NULL;
-- 
1.6.3.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST,01/17] Add new module kvm_subprocess

2009-10-13 Thread Michael Goldish

- Chen Cao k...@redhat.com wrote:

 On Mon, Oct 12, 2009 at 09:07:45AM -0400, Michael Goldish wrote:
  You're right, currently the sessions must be closed explicitly.
  This is due to the fact that both qemu and ssh/telnet are handled by
 the
  same code, and qemu has to keep running in the background if we want
 to
  pass it from one test to another.
  
  To deal with this, first, we should use try..finally blocks to close
 all
  sessions in tests.  As far as I know all existing tests (or at least
 most
  of them) already do this.
  
  Second, we can add a destructor function to kvm_shell_session that
 will
  close the session automatically when it's no longer referenced.  At
 the
  moment this won't work because there's a thread running in the
 background
  tracking output from the session, but this thread is usually not
 needed
  for ssh/telnet (it's needed mainly for qemu), so we can selectively
 get
  rid of it and allow the reference count to drop to zero when the
 test exits,
  thus allowing the destructor to be called.
  
  I'll think of a way to do the second thing, and if it works, maybe
 we won't
  need the first.  But for now every test should close its sessions
 explicitly.
  
  BTW, I'm not sure I understand why cleaning up the sessions should
 be
  exhausting in the case you presented.  You can just wrap everything
 in one
  big try..finally block:
  
  session = ...
  
  try:
try:
except:
try:
except:
...
  finally:
session.close()
  
 
 Thanks for your explanation.
 
 It is just boring and error-prone to add lots of
 '(dst|src|tmp|etc)*sesson.close()' to our code (the internal version)
 into different files and big number of functions. and some of the
 'sessions' are in the try...except blocks, and some are not.
 
 We have to make sure where we started the sessions and to close all
 of
 them when they are not needed any longer. I feel it's a little weird
 that we have to do the garbage-collection-like work while using
 python.
 
 so, since this is a known issue, or precisely, limitation of 'ease of
 use', I'm looking forward to your impovement, and I think, we will
 also try to work it out at the same time.
 
 Thanks again for your help.
 
 
 Cao, Chen
 2009-10-13

OK, agreed.  I posted 3 patches that should fix this but I've only given
them minimal testing.  Let me know what you think.

Thanks,
Michael

  - Original Message -
  From: Chen Cao k...@redhat.com
  To: Michael Goldish mgold...@redhat.com
  Cc: autot...@test.kernel.org, kvm@vger.kernel.org
  Sent: Monday, October 12, 2009 8:55:59 AM (GMT+0200) Auto-Detected
  Subject: Re: [KVM-AUTOTEST,01/17] Add new module kvm_subprocess
  
  
  Hi, Michael,
  
  I found that if the sessions initialized using kvm_subprcoess are
 not closed,
  the processes will never exit, and /tmp/kvm_spawn will be filled
 with the
  temporary files.
  
  And we can find in the code,
  # kvm_subprocess.py
  ...
  # Read from child and write to files/pipes
  while True:
  check_termination = False
  # Make a list of reader pipes whose buffers are not
 empty
  fds = [fd for (i, fd) in enumerate(reader_fds) if
 buffers[i]]
  # Wait until there's something to do
  r, w, x = select.select([shell_fd, inpipe_fd], fds, [],
 0.5)
  # If a reader pipe is ready for writing --
  for (i, fd) in enumerate(reader_fds):
  if fd in w:
  bytes_written = os.write(fd, buffers[i])
  buffers[i] = buffers[i][bytes_written:]
  # If there's data to read from the child process --
  if shell_fd in r:
  try:
  data = os.read(shell_fd, 16384)
  except OSError:
  data = 
  if not data:
  check_termination = True
  # Remove carriage returns from the data -- they
 often cause
  # trouble and are normally not needed
  data = data.replace(\r, )
  output_file.write(data)
  output_file.flush()
  for i in range(len(readers)):
  buffers[i] += data
  # If os.read() raised an exception or there was nothing
 to read --
  if check_termination or shell_fd not in r:
  pid, status = os.waitpid(shell_pid, os.WNOHANG)
  if pid:
  status = os.WEXITSTATUS(status)
  break
  # If there's data to read from the client --
  if inpipe_fd in r:
  data = os.read(inpipe_fd, 1024)
  os.write(shell_fd, data)
  ...
  
  that if session.close() is not called, we will loop in the 'while'
 forever.
  
  So, user have to make sure that unnecessary sessions are all
 killed,
  otherwise, running some testcase(s) for huge number of times will
 

Re: [PATCH] allow userspace to adjust kvmclock offset

2009-10-13 Thread Glauber Costa
On Mon, Oct 12, 2009 at 10:53:26AM +0200, Avi Kivity wrote:
 On 10/06/2009 07:24 PM, Glauber Costa wrote:
 When we migrate a kvm guest that uses pvclock between two hosts, we may
 suffer a large skew. This is because there can be significant differences
 between the monotonic clock of the hosts involved. When a new host with
 a much larger monotonic time starts running the guest, the view of time
 will be significantly impacted.

 Situation is much worse when we do the opposite, and migrate to a host with
 a smaller monotonic clock.

 This new proposed ioctl will allow userspace to inform us what is the 
 monotonic
 clock value in the source host, so we can keep the time skew short, and more
 importantly, never goes backwards.



 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index f8f8900..0cd5ad8 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -546,6 +546,7 @@ struct kvm_irqfd {
   #define KVM_CREATE_PIT2   _IOW(KVMIO, 0x77, struct 
 kvm_pit_config)
   #define KVM_SET_BOOT_CPU_ID_IO(KVMIO, 0x78)
   #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
 +#define KVM_ADJUST_CLOCK  _IOW(KVMIO, 0x7a, __u64)


 Please change to a struct with some reserved space.
Ok, can do it.
 

 Do we want an absolute or relative adjustment?
What exactly do you mean?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Don't sync mpstate to/from kernel when unneeded.

2009-10-13 Thread Gleb Natapov
mp_state, unlike other cpu state, can be changed not only from vcpu
context it belongs to, but by other vcpus too. That makes its loading
from kernel/saving back not safe if mp_state value is changed inside
kernel between load and save. For example vcpu 1 loads mp_sate into
user-space and the state is RUNNING, vcpu 0 sends INIT/SIPI to vcpu 1
so in-kernel mp_sate becomes SIPI, vcpu 1 save user-space copy into
kernel and calls vcpu_run(). SIPI sate is lost.

The patch copies mp_sate into kernel only when it is knows that
int-kernel value is outdated. This happens on reset and vmload.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 hw/apic.c |1 +
 monitor.c |2 ++
 qemu-kvm.c|9 -
 qemu-kvm.h|1 -
 target-i386/machine.c |3 +++
 5 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 2952675..729 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -512,6 +512,7 @@ void apic_init_reset(CPUState *env)
 if (kvm_enabled()  qemu_kvm_irqchip_in_kernel()) {
 env-mp_state
 = env-halted ? KVM_MP_STATE_UNINITIALIZED : KVM_MP_STATE_RUNNABLE;
+kvm_load_mpstate(env);
 }
 #endif
 }
diff --git a/monitor.c b/monitor.c
index 7f0f5a9..dd8f2ca 100644
--- a/monitor.c
+++ b/monitor.c
@@ -350,6 +350,7 @@ static CPUState *mon_get_cpu(void)
 mon_set_cpu(0);
 }
 cpu_synchronize_state(cur_mon-mon_cpu);
+kvm_save_mpstate(cur_mon-mon_cpu);
 return cur_mon-mon_cpu;
 }
 
@@ -377,6 +378,7 @@ static void do_info_cpus(Monitor *mon)
 
 for(env = first_cpu; env != NULL; env = env-next_cpu) {
 cpu_synchronize_state(env);
+kvm_save_mpstate(env);
 monitor_printf(mon, %c CPU #%d:,
(env == mon-mon_cpu) ? '*' : ' ',
env-cpu_index);
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 3765818..2a1e0ff 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1609,11 +1609,6 @@ static void on_vcpu(CPUState *env, void (*func)(void 
*data), void *data)
 void kvm_arch_get_registers(CPUState *env)
 {
kvm_arch_save_regs(env);
-   kvm_arch_save_mpstate(env);
-#ifdef KVM_CAP_MP_STATE
-   if (kvm_irqchip_in_kernel(kvm_context))
-   env-halted = (env-mp_state == KVM_MP_STATE_HALTED);
-#endif
 }
 
 static void do_kvm_cpu_synchronize_state(void *_env)
@@ -1707,6 +1702,10 @@ static void kvm_do_save_mpstate(void *_env)
 CPUState *env = _env;
 
 kvm_arch_save_mpstate(env);
+#ifdef KVM_CAP_MP_STATE
+if (kvm_irqchip_in_kernel(kvm_context))
+env-halted = (env-mp_state == KVM_MP_STATE_HALTED);
+#endif
 }
 
 void kvm_save_mpstate(CPUState *env)
diff --git a/qemu-kvm.h b/qemu-kvm.h
index d6748c7..e2a87b8 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -1186,7 +1186,6 @@ void kvm_arch_get_registers(CPUState *env);
 static inline void kvm_arch_put_registers(CPUState *env)
 {
 kvm_load_registers(env);
-kvm_load_mpstate(env);
 }
 
 void kvm_cpu_synchronize_state(CPUState *env);
diff --git a/target-i386/machine.c b/target-i386/machine.c
index e640dad..16d9c57 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -324,6 +324,7 @@ static void cpu_pre_save(void *opaque)
 int i, bit;
 
 cpu_synchronize_state(env);
+kvm_save_mpstate(env);
 
 /* FPU */
 env-fpus_vmstate = (env-fpus  ~0x3800) | (env-fpstt  0x7)  11;
@@ -385,6 +386,8 @@ static int cpu_post_load(void *opaque, int version_id)
 }
 
 tlb_flush(env, 1);
+kvm_load_mpstate(env);
+
 return 0;
 }
 
-- 
1.6.3.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] allow userspace to adjust kvmclock offset

2009-10-13 Thread Avi Kivity

On 10/13/2009 03:28 PM, Glauber Costa wrote:



Do we want an absolute or relative adjustment?
 

What exactly do you mean?
   


Absolute adjustment: clock = t
Relative adjustment: clock += t


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] allow userspace to adjust kvmclock offset

2009-10-13 Thread Glauber Costa
On Tue, Oct 13, 2009 at 03:31:08PM +0300, Avi Kivity wrote:
 On 10/13/2009 03:28 PM, Glauber Costa wrote:

 Do we want an absolute or relative adjustment?
  
 What exactly do you mean?


 Absolute adjustment: clock = t
 Relative adjustment: clock += t
The delta is absolute, but the adjustment in the clock is relative.

So we pick the difference between what userspace is passing us and what
we currently have, then relatively adds up so we can make sure we won't
go back or suffer a too big skew.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] Little bug fix in pci_hotplug.py

2009-10-13 Thread Lucas Meneghel Rodrigues
Applied, thanks!

On Tue, Oct 13, 2009 at 6:13 AM, Yolkfull Chow yz...@redhat.com wrote:
 If command executed timeout, the return value of status could be None,
 which is missed in judge statement:

 if s:
   ...

 Thanks Jason Wang for pointing this out.

 Signed-off-by: Yolkfull Chow yz...@redhat.com
 ---
  client/tests/kvm/tests/pci_hotplug.py |    2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

 diff --git a/client/tests/kvm/tests/pci_hotplug.py 
 b/client/tests/kvm/tests/pci_hotplug.py
 index 01d9447..3ad9ea2 100644
 --- a/client/tests/kvm/tests/pci_hotplug.py
 +++ b/client/tests/kvm/tests/pci_hotplug.py
 @@ -83,7 +83,7 @@ def run_pci_hotplug(test, params, env):

     # Test the newly added device
     s, o = session.get_command_status_output(params.get(pci_test_cmd))
 -    if s:
 +    if s != 0:
         raise error.TestFail(Check for %s device failed after PCI hotplug. 
                              Output: %s % (test_type, o))

 --
 1.6.2.5

 ___
 Autotest mailing list
 autot...@test.kernel.org
 http://test.kernel.org/cgi-bin/mailman/listinfo/autotest




-- 
Lucas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] v2: allow userspace to adjust kvmclock offset

2009-10-13 Thread Glauber Costa
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This new proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and more
importantly, never goes backwards.

[ v2: uses a struct with a padding ]

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/x86.c  |   20 +++-
 include/linux/kvm.h |6 ++
 3 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 179a919..c9b0d9f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -410,6 +410,7 @@ struct kvm_arch{
 
unsigned long irq_sources_bitmap;
u64 vm_init_tsc;
+   s64 kvmclock_offset;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9601bc6..1b6c193 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -699,7 +699,8 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
/* With all the info we got, fill in the values */
 
vcpu-hv_clock.system_time = ts.tv_nsec +
-(NSEC_PER_SEC * (u64)ts.tv_sec);
+(NSEC_PER_SEC * (u64)ts.tv_sec) + 
v-kvm-arch.kvmclock_offset;
+
/*
 * The interface expects us to write an even number signaling that the
 * update is finished. Since the guest won't see the intermediate
@@ -2441,6 +2442,23 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = 0;
break;
}
+   case KVM_ADJUST_CLOCK: {
+   struct timespec now;
+   struct kvm_adjust_clock user_ns;
+   u64 now_ns;
+   long delta;
+
+   r =  -EFAULT;
+   if (copy_from_user(user_ns, argp, sizeof(user_ns)))
+   goto out;
+
+   r = 0;
+   ktime_get_ts(now);
+   now_ns = timespec_to_ns(now);
+   delta = user_ns.clock - now_ns;
+   kvm-arch.kvmclock_offset = delta;
+   break;  
+   }
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index f8f8900..c07fc23 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -497,6 +497,11 @@ struct kvm_irqfd {
__u8  pad[20];
 };
 
+struct kvm_adjust_clock {
+   __u64 clock;
+   __u64 pad[2];
+};
+
 /*
  * ioctls for VM fds
  */
@@ -546,6 +551,7 @@ struct kvm_irqfd {
 #define KVM_CREATE_PIT2   _IOW(KVMIO, 0x77, struct 
kvm_pit_config)
 #define KVM_SET_BOOT_CPU_ID_IO(KVMIO, 0x78)
 #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
+#define KVM_ADJUST_CLOCK _IOW(KVMIO, 0x7a, struct kvm_adjust_clock)
 
 /*
  * ioctls for vcpu fds
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-13 Thread Andrew Theurer
On Tue, 2009-10-13 at 08:50 +0200, Avi Kivity wrote:
 On 10/12/2009 08:42 PM, Andrew Theurer wrote:
  On Sun, 2009-10-11 at 07:19 +0200, Avi Kivity wrote:
 
  On 10/09/2009 10:04 PM, Andrew Theurer wrote:
   
  This is on latest master branch on kvm.git and qemu-kvm.git, running
  12 Windows Server2008 VMs, and using oprofile.  I ran again without
  oprofile and did not get the BUG.  I am wondering if anyone else is
  seeing this.
 
  Thanks,
 
  -Andrew
 
 
  Oct  9 11:55:13 virtvictory-eth0 kernel: BUG: unable to handle kernel
  paging request at 9fe9a2b4
  Oct  9 11:55:13 virtvictory-eth0 kernel: IP: [a02e1af1]
  vmx_vcpu_run+0x26d/0x64f [kvm_intel]
   
  Can you run this through objdump or gdb to see what source this
  corresponds to?
 
   
  Somewhere here I think (?)
 
  objdump -d
 
 
 
 Look at the address where vmx_vcpu_run starts, add 0x26d, and show the 
 surrounding code.
 
 Thinking about it, it probably _is_ what you showed, due to module page 
 alignment.  But please verify this; I can't reconcile the fault address 
 (9fe9a2b) with %rsp at the time of the fault.

Here is the start of the function:

 3884 vmx_vcpu_run:
 3884:   55  push   %rbp
 3885:   48 89 e5mov%rsp,%rbp

and 0x26d later is 0x3af1:

 3ad2:   4c 8b b1 88 01 00 00mov0x188(%rcx),%r14
 3ad9:   4c 8b b9 90 01 00 00mov0x190(%rcx),%r15
 3ae0:   48 8b 89 20 01 00 00mov0x120(%rcx),%rcx
 3ae7:   75 05   jne3aee vmx_vcpu_run+0x26a
 3ae9:   0f 01 c2vmlaunch
 3aec:   eb 03   jmp3af1 vmx_vcpu_run+0x26d
 3aee:   0f 01 c3vmresume
 3af1:   48 87 0c 24 xchg   %rcx,(%rsp)
 3af5:   48 89 81 18 01 00 00mov%rax,0x118(%rcx)
 3afc:   48 89 99 30 01 00 00mov%rbx,0x130(%rcx)
 3b03:   ff 34 24pushq  (%rsp)
 3b06:   8f 81 20 01 00 00   popq   0x120(%rcx)


-Andrew

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel bug in kvm_intel

2009-10-13 Thread Marcelo Tosatti
On Tue, Oct 13, 2009 at 08:50:07AM +0200, Avi Kivity wrote:
 On 10/12/2009 08:42 PM, Andrew Theurer wrote:
 On Sun, 2009-10-11 at 07:19 +0200, Avi Kivity wrote:

 On 10/09/2009 10:04 PM, Andrew Theurer wrote:
  
 This is on latest master branch on kvm.git and qemu-kvm.git, running
 12 Windows Server2008 VMs, and using oprofile.  I ran again without
 oprofile and did not get the BUG.  I am wondering if anyone else is
 seeing this.

 Thanks,

 -Andrew


 Oct  9 11:55:13 virtvictory-eth0 kernel: BUG: unable to handle kernel
 paging request at 9fe9a2b4
 Oct  9 11:55:13 virtvictory-eth0 kernel: IP: [a02e1af1]
 vmx_vcpu_run+0x26d/0x64f [kvm_intel]
  
 Can you run this through objdump or gdb to see what source this
 corresponds to?

  
 Somewhere here I think (?)

 objdump -d



 Look at the address where vmx_vcpu_run starts, add 0x26d, and show the  
 surrounding code.

 Thinking about it, it probably _is_ what you showed, due to module page  
 alignment.  But please verify this; I can't reconcile the fault address  
 (9fe9a2b) with %rsp at the time of the fault.

There's some scary erratas (such as corrupted RSP pushed on the stack   
on event injected, including NMI which is used by oprofile, right after 
VMExit, AAK56) on the Xeon X55xx spec update.   

Andrew, you might make sure the firmware/BIOS is uptodate on this
machine before reproducing.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] device assignment rom fixups

2009-10-13 Thread Gerd Hoffmann
Use new rom loading infrastructure.
Devices can simply register option roms now.

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/device-assignment.c |  144 ---
 hw/device-assignment.h |1 -
 hw/pc.c|3 -
 3 files changed, 61 insertions(+), 87 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 237060f..6f792db 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -37,6 +37,7 @@
 #include sysemu.h
 #include console.h
 #include device-assignment.h
+#include loader.h
 
 /* From linux/ioport.h */
 #define IORESOURCE_IO   0x0100  /* Resource type */
@@ -56,6 +57,8 @@
 #define DEBUG(fmt, ...) do { } while(0)
 #endif
 
+static void assigned_dev_load_option_rom(AssignedDevice *dev);
+
 static uint32_t guest_to_host_ioport(AssignedDevRegion *region, uint32_t addr)
 {
 return region-u.r_baseport + (addr - region-e_physbase);
@@ -1168,6 +1171,7 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 if (assigned_dev_register_msix_mmio(dev))
 goto assigned_out;
 
+assigned_dev_load_option_rom(dev);
 return 0;
 
 assigned_out:
@@ -1329,11 +1333,10 @@ struct option_rom_pci_header {
  * both 2KB and target page size.
  */
 #define OPTION_ROM_ALIGN(x) (((x) + 2047)  ~2047)
-static int scan_option_rom(uint8_t devfn, void *roms, ram_addr_t offset)
+static void scan_option_rom(const char *name, uint8_t devfn, void *roms)
 {
-int i, size, total_size;
+int i, size;
 uint8_t csum;
-ram_addr_t addr;
 struct option_rom_header *rom;
 struct option_rom_pci_header *pcih;
 
@@ -1362,29 +1365,12 @@ static int scan_option_rom(uint8_t devfn, void *roms, 
ram_addr_t offset)
 
 rom = (struct option_rom_header *)((char *)rom + size);
 }
-
-return 0;
+return;
 
  found:
-/* The size should be both 2K-aligned and page-aligned */
-total_size = (TARGET_PAGE_SIZE  2048)
-  ? OPTION_ROM_ALIGN(size + 1)
-  : TARGET_PAGE_ALIGN(size + 1);
-
-/* Size of all available ram space is 0x1 (0xd to 0xe) */
-if ((offset + total_size)  0x1u) {
-fprintf(stderr, Option ROM size %x exceeds available space\n, size);
-return 0;
-}
-
-addr = qemu_ram_alloc(total_size);
-cpu_register_physical_memory(0xd + offset, total_size, addr | 
IO_MEM_ROM);
-
-/* Write ROM data and devfn to phys_addr */
-cpu_physical_memory_write_rom(0xd + offset, (uint8_t *)rom, size);
-cpu_physical_memory_write_rom(0xd + offset + size, devfn, 1);
-
-return total_size;
+rom_add_blob(name ? name : assigned device, rom, size,
+ PC_ROM_MIN_OPTION, PC_ROM_MAX, PC_ROM_ALIGN);
+return;
 }
 
 /*
@@ -1392,75 +1378,67 @@ static int scan_option_rom(uint8_t devfn, void *roms, 
ram_addr_t offset)
  * load the corresponding ROM data to RAM. If an error occurs while loading an
  * option ROM, we just ignore that option ROM and continue with the next one.
  */
-ram_addr_t assigned_dev_load_option_roms(ram_addr_t rom_base_offset)
+static void assigned_dev_load_option_rom(AssignedDevice *dev)
 {
-ram_addr_t offset = rom_base_offset;
-AssignedDevice *dev;
-
-QLIST_FOREACH(dev, devs, next) {
-int size, len;
-void *buf;
-FILE *fp;
-uint8_t i = 1;
-char rom_file[64];
+int size, len;
+void *buf;
+FILE *fp;
+uint8_t i = 1;
+char rom_file[64];
 
-snprintf(rom_file, sizeof(rom_file),
- /sys/bus/pci/devices/:%02x:%02x.%01x/rom,
- dev-host.bus, dev-host.dev, dev-host.func);
+snprintf(rom_file, sizeof(rom_file),
+ /sys/bus/pci/devices/:%02x:%02x.%01x/rom,
+ dev-host.bus, dev-host.dev, dev-host.func);
 
-if (access(rom_file, F_OK))
-continue;
-
-/* Write something to the ROM file to enable it */
-fp = fopen(rom_file, wb);
-if (fp == NULL)
-continue;
-len = fwrite(i, 1, 1, fp);
-fclose(fp);
-if (len != 1)
-continue;
-
-/* The file has to be closed and reopened, otherwise it won't work */
-fp = fopen(rom_file, rb);
-if (fp == NULL)
-continue;
+if (access(rom_file, F_OK))
+return;
 
-fseek(fp, 0, SEEK_END);
-size = ftell(fp);
-fseek(fp, 0, SEEK_SET);
+/* Write something to the ROM file to enable it */
+fp = fopen(rom_file, wb);
+if (fp == NULL)
+return;
+len = fwrite(i, 1, 1, fp);
+fclose(fp);
+if (len != 1)
+return;
 
-buf = malloc(size);
-if (buf == NULL) {
-fclose(fp);
-continue;
-}
+/* The file has to be closed and reopened, otherwise it won't work */
+fp = fopen(rom_file, rb);
+if (fp == NULL)
+return;
 
-fread(buf, size, 1, fp);
-if (!feof(fp) || 

[PATCH] fix quoting in configure

2009-10-13 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 configure |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 2341772..b0d5bd9 100755
--- a/configure
+++ b/configure
@@ -1414,7 +1414,7 @@ if test $kvm_cap_pit != no ; then
 #endif
 int main(void) { return 0; }
 EOF
-  if compile_prog $kvm_cflags ; then
+  if compile_prog $kvm_cflags ; then
 kvm_cap_pit=yes
   else
 if test $kvm_cap_pit = yes ; then
@@ -1438,7 +1438,7 @@ if test $kvm_cap_device_assignment != no ; then
 #endif
 int main(void) { return 0; }
 EOF
-  if compile_prog $kvm_cflags  ; then
+  if compile_prog $kvm_cflags  ; then
 kvm_cap_device_assignment=yes
   else
 if test $kvm_cap_device_assignment = yes ; then
-- 
1.6.2.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem booting guest with Linux 2.6.3x

2009-10-13 Thread Daniel Bareiro
Hi Michael.

On Saturday, 10 October 2009 20:10:16 +0400,
Michael Tokarev wrote:

 But according to it seems, I could verify that the disks that are
 passed with -hdX in KVM-88 are mapped in 2.6.31.2 guests like
 SATA/SCSI devices. With Linux stock 2.6.26 these are mapped like
 IDE disks. Can it be due to some change in the kernel code related
 with KVM?

 It has nothing to do with kvm.  It's different kernel options, all
 kernels since very early 2.6.x are able to see ide disks as hdX or
 sdX, depending on the kernel options and modules loaded.  There are
 2 drivers for each IDE controller - IDE/ATA one, which creates hdX,
 and PATA one which creates sdX.

 According to I was investigating, I have the impression that the
 newest kernels delegate this disks denomination to the use of libata.
 It would be that in 2.6.26 Debian stock kernel not yet was productive
 to be in experimental phase?
 
 Debian stock kernel config does not enable ata devices, only ide ones.

Apparently the Debian GNU/Linux stock kernels has applied a patch [1]
which causes that libata only is enabled for SATA controllers. 

It draws attention to me that being 2.6.31 the last branch of stable
kernel from kernel.org, the Debian developers are applying this patch. I
had thought that at the moment libata was sufficiently stable.

Thanks for your reply.

Regards,
Daniel

[1] 
http://svn.debian.org/viewsvn/kernel/dists/trunk/linux-2.6/debian/patches/debian/drivers-ata-ata_piix-postpone-pata.patch?revision=13847view=markup
-- 
Fingerprint: BFB3 08D6 B4D1 31B2 72B9  29CE 6696 BF1B 14E6 1D37
Powered by Debian GNU/Linux Squeeze - Linux user #188.598


signature.asc
Description: Digital signature


[PATCH 3/4] KVM: x86: Add support for KVM_GET/SET_VCPU_STATE

2009-10-13 Thread Jan Kiszka
Add support for getting/setting MSRs, CPUID tree, and the LACPIC via the
new VCPU state interface. Also in this case we convert the existing
IOCTLs to use the new infrastructure internally.

The MSR interface has to be extended to pass back the number of
processed MSRs via the header structure instead of the return code as
the latter is not available with the new IOCTL. The semantic of the
original KVM_GET/SET_MSRS is not affected by this change.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

 arch/x86/include/asm/kvm.h |8 +-
 arch/x86/kvm/x86.c |  209 
 2 files changed, 138 insertions(+), 79 deletions(-)

diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index f02e87a..1b184c3 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -150,7 +150,7 @@ struct kvm_msr_entry {
 /* for KVM_GET_MSRS and KVM_SET_MSRS */
 struct kvm_msrs {
__u32 nmsrs; /* number of msrs in entries */
-   __u32 pad;
+   __u32 nprocessed; /* return value: successfully processed entries */
 
struct kvm_msr_entry entries[0];
 };
@@ -251,4 +251,10 @@ struct kvm_reinject_control {
__u8 pit_reinject;
__u8 reserved[31];
 };
+
+/* for KVM_GET/SET_VCPU_STATE */
+#define KVM_X86_VCPU_MSRS  1000
+#define KVM_X86_VCPU_CPUID 1001
+#define KVM_X86_VCPU_LAPIC 1002
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 839b1c5..733e2d3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1179,11 +1179,11 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct 
kvm_msrs *msrs,
 static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs,
  int (*do_msr)(struct kvm_vcpu *vcpu,
unsigned index, u64 *data),
- int writeback)
+ int writeback, int write_nprocessed)
 {
struct kvm_msrs msrs;
struct kvm_msr_entry *entries;
-   int r, n;
+   int r;
unsigned size;
 
r = -EFAULT;
@@ -1204,15 +1204,22 @@ static int msr_io(struct kvm_vcpu *vcpu, struct 
kvm_msrs __user *user_msrs,
if (copy_from_user(entries, user_msrs-entries, size))
goto out_free;
 
-   r = n = __msr_io(vcpu, msrs, entries, do_msr);
+   r = __msr_io(vcpu, msrs, entries, do_msr);
if (r  0)
goto out_free;
 
+   msrs.nprocessed = r;
+
r = -EFAULT;
+   if (write_nprocessed 
+   copy_to_user(user_msrs-nprocessed, msrs.nprocessed,
+sizeof(msrs.nprocessed)))
+   goto out_free;
+
if (writeback  copy_to_user(user_msrs-entries, entries, size))
goto out_free;
 
-   r = n;
+   r = msrs.nprocessed;
 
 out_free:
vfree(entries);
@@ -1785,55 +1792,36 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 {
struct kvm_vcpu *vcpu = filp-private_data;
void __user *argp = (void __user *)arg;
+   struct kvm_vcpu_substate substate;
int r;
-   struct kvm_lapic_state *lapic = NULL;
 
switch (ioctl) {
-   case KVM_GET_LAPIC: {
-   lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
-
-   r = -ENOMEM;
-   if (!lapic)
-   goto out;
-   r = kvm_vcpu_ioctl_get_lapic(vcpu, lapic);
-   if (r)
-   goto out;
-   r = -EFAULT;
-   if (copy_to_user(argp, lapic, sizeof(struct kvm_lapic_state)))
-   goto out;
-   r = 0;
+   case KVM_GET_LAPIC:
+   substate.type = KVM_X86_VCPU_LAPIC;
+   substate.offset = 0;
+   r = kvm_arch_vcpu_get_substate(vcpu, argp, substate);
break;
-   }
-   case KVM_SET_LAPIC: {
-   lapic = kmalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
-   r = -ENOMEM;
-   if (!lapic)
-   goto out;
-   r = -EFAULT;
-   if (copy_from_user(lapic, argp, sizeof(struct kvm_lapic_state)))
-   goto out;
-   r = kvm_vcpu_ioctl_set_lapic(vcpu, lapic);
-   if (r)
-   goto out;
-   r = 0;
+   case KVM_SET_LAPIC:
+   substate.type = KVM_X86_VCPU_LAPIC;
+   substate.offset = 0;
+   r = kvm_arch_vcpu_set_substate(vcpu, argp, substate);
break;
-   }
case KVM_INTERRUPT: {
struct kvm_interrupt irq;
 
r = -EFAULT;
if (copy_from_user(irq, argp, sizeof irq))
-   goto out;
+   break;
r = kvm_vcpu_ioctl_interrupt(vcpu, irq);
if (r)
-   goto out;
+   break;
r = 0;
break;
  

[PATCH 4/4] KVM: x86: Add VCPU substate for NMI states

2009-10-13 Thread Jan Kiszka
This plugs an NMI-related hole in the VCPU synchronization between
kernel and user space. So far, neither pending NMIs nor the inhibit NMI
mask was properly read/set which was able to cause problems on
vmsave/restore, live migration and system reset. Fix it by making use
of the new VCPU substate interface.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

 arch/x86/include/asm/kvm.h  |7 +++
 arch/x86/include/asm/kvm_host.h |2 ++
 arch/x86/kvm/svm.c  |   22 ++
 arch/x86/kvm/vmx.c  |   30 ++
 arch/x86/kvm/x86.c  |   26 ++
 5 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index 1b184c3..fd5713a 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -256,5 +256,12 @@ struct kvm_reinject_control {
 #define KVM_X86_VCPU_MSRS  1000
 #define KVM_X86_VCPU_CPUID 1001
 #define KVM_X86_VCPU_LAPIC 1002
+#define KVM_X86_VCPU_NMI   1003
+
+struct kvm_nmi_state {
+   __u8 pending;
+   __u8 masked;
+   __u8 pad1[2];
+};
 
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 179a919..d22a0cd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -513,6 +513,8 @@ struct kvm_x86_ops {
unsigned char *hypercall_addr);
void (*set_irq)(struct kvm_vcpu *vcpu);
void (*set_nmi)(struct kvm_vcpu *vcpu);
+   int (*get_nmi_mask)(struct kvm_vcpu *vcpu);
+   void (*set_nmi_mask)(struct kvm_vcpu *vcpu, int masked);
void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr,
bool has_error_code, u32 error_code);
int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 279a2ae..67ff5f1 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2456,6 +2456,26 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
!(svm-vcpu.arch.hflags  HF_NMI_MASK);
 }
 
+static int svm_get_nmi_mask(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   return !!(svm-vcpu.arch.hflags  HF_NMI_MASK);
+}
+
+static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, int masked)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   if (masked) {
+   svm-vcpu.arch.hflags |= HF_NMI_MASK;
+   svm-vmcb-control.intercept |= (1UL  INTERCEPT_IRET);
+   } else {
+   svm-vcpu.arch.hflags = ~HF_NMI_MASK;
+   svm-vmcb-control.intercept = ~(1UL  INTERCEPT_IRET);
+   }
+}
+
 static int svm_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -2897,6 +2917,8 @@ static struct kvm_x86_ops svm_x86_ops = {
.queue_exception = svm_queue_exception,
.interrupt_allowed = svm_interrupt_allowed,
.nmi_allowed = svm_nmi_allowed,
+   .get_nmi_mask = svm_get_nmi_mask,
+   .set_nmi_mask = svm_set_nmi_mask,
.enable_nmi_window = enable_nmi_window,
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 70020e5..5dd766b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2619,6 +2619,34 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
GUEST_INTR_STATE_NMI));
 }
 
+static int vmx_get_nmi_mask(struct kvm_vcpu *vcpu)
+{
+   if (!cpu_has_virtual_nmis())
+   return to_vmx(vcpu)-soft_vnmi_blocked;
+   else
+   return !!(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) 
+ GUEST_INTR_STATE_NMI);
+}
+
+static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, int masked)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   if (!cpu_has_virtual_nmis()) {
+   if (vmx-soft_vnmi_blocked != masked) {
+   vmx-soft_vnmi_blocked = masked;
+   vmx-vnmi_blocked_time = 0;
+   }
+   } else {
+   if (masked)
+   vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+ GUEST_INTR_STATE_NMI);
+   else
+   vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO,
+   GUEST_INTR_STATE_NMI);
+   }
+}
+
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
return (vmcs_readl(GUEST_RFLAGS)  X86_EFLAGS_IF) 
@@ -3957,6 +3985,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
.queue_exception = vmx_queue_exception,
.interrupt_allowed = vmx_interrupt_allowed,
.nmi_allowed = vmx_nmi_allowed,
+   .get_nmi_mask = vmx_get_nmi_mask,
+   .set_nmi_mask = vmx_set_nmi_mask,
.enable_nmi_window = enable_nmi_window,

[PATCH 1/4] KVM: Reorder IOCTLs in main kvm.h

2009-10-13 Thread Jan Kiszka
Obviously, people tend to extend this header at the bottom - more or
less blindly. Ensure that deprecated stuff gets its own corner again by
moving things to the top. Also add some comments and reindent IOCTLs to
make them more readable and reduce the risk of number collisions.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

 include/linux/kvm.h |  228 ++-
 1 files changed, 114 insertions(+), 114 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index f8f8900..7d8c382 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -14,12 +14,76 @@
 
 #define KVM_API_VERSION 12
 
-/* for KVM_TRACE_ENABLE, deprecated */
+/* *** Deprecated interfaces *** */
+
+#define KVM_TRC_SHIFT   16
+
+#define KVM_TRC_ENTRYEXIT   (1  KVM_TRC_SHIFT)
+#define KVM_TRC_HANDLER (1  (KVM_TRC_SHIFT + 1))
+
+#define KVM_TRC_VMENTRY (KVM_TRC_ENTRYEXIT + 0x01)
+#define KVM_TRC_VMEXIT  (KVM_TRC_ENTRYEXIT + 0x02)
+#define KVM_TRC_PAGE_FAULT  (KVM_TRC_HANDLER + 0x01)
+
+#define KVM_TRC_HEAD_SIZE   12
+#define KVM_TRC_CYCLE_SIZE  8
+#define KVM_TRC_EXTRA_MAX   7
+
+#define KVM_TRC_INJ_VIRQ (KVM_TRC_HANDLER + 0x02)
+#define KVM_TRC_REDELIVER_EVT(KVM_TRC_HANDLER + 0x03)
+#define KVM_TRC_PEND_INTR(KVM_TRC_HANDLER + 0x04)
+#define KVM_TRC_IO_READ  (KVM_TRC_HANDLER + 0x05)
+#define KVM_TRC_IO_WRITE (KVM_TRC_HANDLER + 0x06)
+#define KVM_TRC_CR_READ  (KVM_TRC_HANDLER + 0x07)
+#define KVM_TRC_CR_WRITE (KVM_TRC_HANDLER + 0x08)
+#define KVM_TRC_DR_READ  (KVM_TRC_HANDLER + 0x09)
+#define KVM_TRC_DR_WRITE (KVM_TRC_HANDLER + 0x0A)
+#define KVM_TRC_MSR_READ (KVM_TRC_HANDLER + 0x0B)
+#define KVM_TRC_MSR_WRITE(KVM_TRC_HANDLER + 0x0C)
+#define KVM_TRC_CPUID(KVM_TRC_HANDLER + 0x0D)
+#define KVM_TRC_INTR (KVM_TRC_HANDLER + 0x0E)
+#define KVM_TRC_NMI  (KVM_TRC_HANDLER + 0x0F)
+#define KVM_TRC_VMMCALL  (KVM_TRC_HANDLER + 0x10)
+#define KVM_TRC_HLT  (KVM_TRC_HANDLER + 0x11)
+#define KVM_TRC_CLTS (KVM_TRC_HANDLER + 0x12)
+#define KVM_TRC_LMSW (KVM_TRC_HANDLER + 0x13)
+#define KVM_TRC_APIC_ACCESS  (KVM_TRC_HANDLER + 0x14)
+#define KVM_TRC_TDP_FAULT(KVM_TRC_HANDLER + 0x15)
+#define KVM_TRC_GTLB_WRITE   (KVM_TRC_HANDLER + 0x16)
+#define KVM_TRC_STLB_WRITE   (KVM_TRC_HANDLER + 0x17)
+#define KVM_TRC_STLB_INVAL   (KVM_TRC_HANDLER + 0x18)
+#define KVM_TRC_PPC_INSTR(KVM_TRC_HANDLER + 0x19)
+
 struct kvm_user_trace_setup {
-   __u32 buf_size; /* sub_buffer size of each per-cpu */
-   __u32 buf_nr; /* the number of sub_buffers of each per-cpu */
+   __u32 buf_size;
+   __u32 buf_nr;
 };
 
+#define __KVM_DEPRECATED_MAIN_W_0x06 \
+   _IOW(KVMIO, 0x06, struct kvm_user_trace_setup)
+#define __KVM_DEPRECATED_MAIN_0x07 _IO(KVMIO, 0x07)
+#define __KVM_DEPRECATED_MAIN_0x08 _IO(KVMIO, 0x08)
+
+#define __KVM_DEPRECATED_VM_R_0x70 _IOR(KVMIO, 0x70, struct kvm_assigned_irq)
+
+struct kvm_breakpoint {
+   __u32 enabled;
+   __u32 padding;
+   __u64 address;
+};
+
+struct kvm_debug_guest {
+   __u32 enabled;
+   __u32 pad;
+   struct kvm_breakpoint breakpoints[4];
+   __u32 singlestep;
+};
+
+#define __KVM_DEPRECATED_VCPU_W_0x87 _IOW(KVMIO, 0x87, struct kvm_debug_guest)
+
+/* *** End of deprecated interfaces *** */
+
+
 /* for KVM_CREATE_MEMORY_REGION */
 struct kvm_memory_region {
__u32 slot;
@@ -329,24 +393,6 @@ struct kvm_ioeventfd {
__u8  pad[36];
 };
 
-#define KVM_TRC_SHIFT   16
-/*
- * kvm trace categories
- */
-#define KVM_TRC_ENTRYEXIT   (1  KVM_TRC_SHIFT)
-#define KVM_TRC_HANDLER (1  (KVM_TRC_SHIFT + 1)) /* only 12 bits */
-
-/*
- * kvm trace action
- */
-#define KVM_TRC_VMENTRY (KVM_TRC_ENTRYEXIT + 0x01)
-#define KVM_TRC_VMEXIT  (KVM_TRC_ENTRYEXIT + 0x02)
-#define KVM_TRC_PAGE_FAULT  (KVM_TRC_HANDLER + 0x01)
-
-#define KVM_TRC_HEAD_SIZE   12
-#define KVM_TRC_CYCLE_SIZE  8
-#define KVM_TRC_EXTRA_MAX   7
-
 #define KVMIO 0xAE
 
 /*
@@ -367,12 +413,10 @@ struct kvm_ioeventfd {
  */
 #define KVM_GET_VCPU_MMAP_SIZE_IO(KVMIO,   0x04) /* in bytes */
 #define KVM_GET_SUPPORTED_CPUID   _IOWR(KVMIO, 0x05, struct kvm_cpuid2)
-/*
- * ioctls for kvm trace
- */
-#define KVM_TRACE_ENABLE  _IOW(KVMIO, 0x06, struct 
kvm_user_trace_setup)
-#define KVM_TRACE_PAUSE   _IO(KVMIO,  0x07)
-#define KVM_TRACE_DISABLE _IO(KVMIO,  0x08)
+#define KVM_TRACE_ENABLE  __KVM_DEPRECATED_MAIN_W_0x06
+#define KVM_TRACE_PAUSE   __KVM_DEPRECATED_MAIN_0x07
+#define KVM_TRACE_DISABLE __KVM_DEPRECATED_MAIN_0x08
+
 /*
  * Extension capability list.
  */
@@ -500,52 +544,54 @@ struct kvm_irqfd {
 /*
  * ioctls for VM fds
  */
-#define KVM_SET_MEMORY_REGION _IOW(KVMIO, 0x40, struct kvm_memory_region)
+#define KVM_SET_MEMORY_REGION   

[PATCH 0/4] Extensible VCPU state IOCTL

2009-10-13 Thread Jan Kiszka
As you may have noticed, we are constantly adding IOCTLs as yet another
state field has to be exchanged between kernel and user space. I was
about to add one for the missing hidden NMI states (pending and masked),
but Avi suggested to take this chance, inventing a more easily
extensible interface.

And here comes my suggestion for VCPU states. Please see patch 2 for
details on this approach, patch 4 demonstrates how extensions may look
like in the future.

I will follow up with a patch against qemu upstream to convert
kvm_arch_get/put_registers to the new interface, ie. query/set all
substates via one IOCTL when available. I did not convert qemu-kvm, only
added support for the NMI substate, as the corresponding code will
likely by modified to use the upstream implementation anyway.

Comments welcome, also suggestion for further substates to be added in
this round.

Jan


Find this series also at git://git.kiszka.org/linux-kvm.git queues/vcpu-state

Jan Kiszka (4):
  KVM: Reorder IOCTLs in main kvm.h
  KVM: Add unified KVM_GET/SET_VCPU_STATE IOCTL
  KVM: x86: Add support for KVM_GET/SET_VCPU_STATE
  KVM: x86: Add VCPU substate for NMI states

 arch/ia64/kvm/kvm-ia64.c|   12 ++
 arch/powerpc/kvm/powerpc.c  |   12 ++
 arch/s390/kvm/kvm-s390.c|   12 ++
 arch/x86/include/asm/kvm.h  |   15 ++-
 arch/x86/include/asm/kvm_host.h |2 +
 arch/x86/kvm/svm.c  |   22 +++
 arch/x86/kvm/vmx.c  |   30 
 arch/x86/kvm/x86.c  |  243 -
 include/linux/kvm.h |  246 +--
 include/linux/kvm_host.h|5 +
 virt/kvm/kvm_main.c |  318 +++---
 11 files changed, 637 insertions(+), 280 deletions(-)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] KVM: Add unified KVM_GET/SET_VCPU_STATE IOCTL

2009-10-13 Thread Jan Kiszka
Add a new IOCTL pair to retrieve or set the VCPU state in one chunk.
More precisely, the IOCTL is able to process a list of substates to be
read or written. This list is easily extensible without breaking the
existing ABI, thus we will no longer have to add new IOCTLs when we
discover a missing VCPU state field or want to support new hardware
features.

This patch establishes the generic infrastructure for KVM_GET/
SET_VCPU_STATE and adds support for the generic substates REGS, SREGS,
FPU, and MP. To avoid code duplication, the entry point for the
corresponding original IOCTLs are converted to make use of the new
infrastructure internally, too.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

 arch/ia64/kvm/kvm-ia64.c   |   12 ++
 arch/powerpc/kvm/powerpc.c |   12 ++
 arch/s390/kvm/kvm-s390.c   |   12 ++
 arch/x86/kvm/x86.c |   12 ++
 include/linux/kvm.h|   24 +++
 include/linux/kvm_host.h   |5 +
 virt/kvm/kvm_main.c|  318 +++-
 7 files changed, 303 insertions(+), 92 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 5fdeec5..c3450a6 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1991,3 +1991,15 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu 
*vcpu,
vcpu_put(vcpu);
return r;
 }
+
+int kvm_arch_vcpu_get_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base,
+  struct kvm_vcpu_substate *substate)
+{
+   return -EINVAL;
+}
+
+int kvm_arch_vcpu_set_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base,
+  struct kvm_vcpu_substate *substate)
+{
+   return -EINVAL;
+}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 5902bbc..3336ad5 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -436,3 +436,15 @@ int kvm_arch_init(void *opaque)
 void kvm_arch_exit(void)
 {
 }
+
+int kvm_arch_vcpu_get_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base,
+  struct kvm_vcpu_substate *substate)
+{
+   return -EINVAL;
+}
+
+int kvm_arch_vcpu_set_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base,
+  struct kvm_vcpu_substate *substate)
+{
+   return -EINVAL;
+}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 5445058..978ed6c 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -450,6 +450,18 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
return -EINVAL; /* not implemented yet */
 }
 
+int kvm_arch_vcpu_get_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base,
+  struct kvm_vcpu_substate *substate)
+{
+   return -EINVAL;
+}
+
+int kvm_arch_vcpu_set_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base,
+  struct kvm_vcpu_substate *substate)
+{
+   return -EINVAL;
+}
+
 static void __vcpu_run(struct kvm_vcpu *vcpu)
 {
memcpy(vcpu-arch.sie_block-gg14, vcpu-arch.guest_gprs[14], 16);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 11a6f2f..839b1c5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4662,6 +4662,18 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_put_guest_fpu);
 
+int kvm_arch_vcpu_get_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base,
+  struct kvm_vcpu_substate *substate)
+{
+   return -EINVAL;
+}
+
+int kvm_arch_vcpu_set_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base,
+  struct kvm_vcpu_substate *substate)
+{
+   return -EINVAL;
+}
+
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
if (vcpu-arch.time_page) {
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 7d8c382..da81b89 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -393,6 +393,26 @@ struct kvm_ioeventfd {
__u8  pad[36];
 };
 
+/* for KVM_GET_VCPU_STATE and KVM_SET_VCPU_STATE */
+#define KVM_VCPU_REGS  0
+#define KVM_VCPU_SREGS 1
+#define KVM_VCPU_FPU   2
+#define KVM_VCPU_MP3
+
+struct kvm_vcpu_substate {
+   __u32 type;
+   __u32 pad;
+   __s64 offset;
+};
+
+#define KVM_MAX_VCPU_SUBSTATES 64
+
+struct kvm_vcpu_state {
+   __u32 nsubstates; /* number of elements in substates */
+   __u32 nprocessed; /* return value: successfully processed substates */
+   struct kvm_vcpu_substate substates[0];
+};
+
 #define KVMIO 0xAE
 
 /*
@@ -480,6 +500,7 @@ struct kvm_ioeventfd {
 #endif
 #define KVM_CAP_IOEVENTFD 36
 #define KVM_CAP_SET_IDENTITY_MAP_ADDR 37
+#define KVM_CAP_VCPU_STATE 38
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -642,6 +663,9 @@ struct kvm_irqfd {
 /* IA64 stack access */
 #define KVM_IA64_VCPU_GET_STACK   _IOR(KVMIO,  0x9a, void *)
 #define KVM_IA64_VCPU_SET_STACK   _IOW(KVMIO,  0x9b, void *)
+/* Available 

[RFC][PATCH] kvm: x86: Add support for KVM_GET/PUT_VCPU_STATE

2009-10-13 Thread Jan Kiszka
This is a demonstration patch for the new KVM IOCTLs proposed in [1]. It
converts upstream kvm to use this in favor of the individual IOCTLs to
get/set VCPU registers and related states. It works, fixes the missing
NMI state handling but, of course, only makes sense if the interface is
accepted by kvm.

[1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/41550

---

 kvm-all.c |2 
 kvm.h |2 
 target-i386/cpu.h |1 
 target-i386/kvm.c |  507 +++--
 target-i386/machine.c |1 
 target-ppc/kvm.c  |4 
 6 files changed, 294 insertions(+), 223 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 48ae26c..31bc2f8 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -150,6 +150,7 @@ static void kvm_reset_vcpu(void *opaque)
 {
 CPUState *env = opaque;
 
+kvm_arch_reset_vcpu(env);
 if (kvm_arch_put_registers(env)) {
 fprintf(stderr, Fatal: kvm vcpu reset failed\n);
 abort();
@@ -201,6 +202,7 @@ int kvm_init_vcpu(CPUState *env)
 ret = kvm_arch_init_vcpu(env);
 if (ret == 0) {
 qemu_register_reset(kvm_reset_vcpu, env);
+kvm_arch_reset_vcpu(env);
 ret = kvm_arch_put_registers(env);
 }
 err:
diff --git a/kvm.h b/kvm.h
index e7d5beb..6a82f6a 100644
--- a/kvm.h
+++ b/kvm.h
@@ -93,6 +93,8 @@ int kvm_arch_init(KVMState *s, int smp_cpus);
 
 int kvm_arch_init_vcpu(CPUState *env);
 
+void kvm_arch_reset_vcpu(CPUState *env);
+
 struct kvm_guest_debug;
 struct kvm_debug_exit_arch;
 
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 5929d28..37823fe 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -693,6 +693,7 @@ typedef struct CPUX86State {
 /* For KVM */
 uint64_t interrupt_bitmap[256 / 64];
 uint32_t mp_state;
+uint32_t nmi_pending;
 
 /* in order to simplify APIC support, we leave this pointer to the
user */
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index aa90eff..05ff97a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -221,6 +221,11 @@ int kvm_arch_init_vcpu(CPUState *env)
 return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, cpuid_data);
 }
 
+void kvm_arch_reset_vcpu(CPUState *env)
+{
+env-nmi_pending = 0;
+}
+
 static int kvm_has_msr_star(CPUState *env)
 {
 static int has_msr_star;
@@ -346,113 +351,93 @@ static void kvm_getput_reg(__u64 *kvm_reg, target_ulong 
*qemu_reg, int set)
 *qemu_reg = *kvm_reg;
 }
 
-static int kvm_getput_regs(CPUState *env, int set)
+static void kvm_getput_regs(CPUState *env, struct kvm_regs *regs, int set)
 {
-struct kvm_regs regs;
-int ret = 0;
-
-if (!set) {
-ret = kvm_vcpu_ioctl(env, KVM_GET_REGS, regs);
-if (ret  0)
-return ret;
-}
-
-kvm_getput_reg(regs.rax, env-regs[R_EAX], set);
-kvm_getput_reg(regs.rbx, env-regs[R_EBX], set);
-kvm_getput_reg(regs.rcx, env-regs[R_ECX], set);
-kvm_getput_reg(regs.rdx, env-regs[R_EDX], set);
-kvm_getput_reg(regs.rsi, env-regs[R_ESI], set);
-kvm_getput_reg(regs.rdi, env-regs[R_EDI], set);
-kvm_getput_reg(regs.rsp, env-regs[R_ESP], set);
-kvm_getput_reg(regs.rbp, env-regs[R_EBP], set);
+kvm_getput_reg(regs-rax, env-regs[R_EAX], set);
+kvm_getput_reg(regs-rbx, env-regs[R_EBX], set);
+kvm_getput_reg(regs-rcx, env-regs[R_ECX], set);
+kvm_getput_reg(regs-rdx, env-regs[R_EDX], set);
+kvm_getput_reg(regs-rsi, env-regs[R_ESI], set);
+kvm_getput_reg(regs-rdi, env-regs[R_EDI], set);
+kvm_getput_reg(regs-rsp, env-regs[R_ESP], set);
+kvm_getput_reg(regs-rbp, env-regs[R_EBP], set);
 #ifdef TARGET_X86_64
-kvm_getput_reg(regs.r8, env-regs[8], set);
-kvm_getput_reg(regs.r9, env-regs[9], set);
-kvm_getput_reg(regs.r10, env-regs[10], set);
-kvm_getput_reg(regs.r11, env-regs[11], set);
-kvm_getput_reg(regs.r12, env-regs[12], set);
-kvm_getput_reg(regs.r13, env-regs[13], set);
-kvm_getput_reg(regs.r14, env-regs[14], set);
-kvm_getput_reg(regs.r15, env-regs[15], set);
+kvm_getput_reg(regs-r8, env-regs[8], set);
+kvm_getput_reg(regs-r9, env-regs[9], set);
+kvm_getput_reg(regs-r10, env-regs[10], set);
+kvm_getput_reg(regs-r11, env-regs[11], set);
+kvm_getput_reg(regs-r12, env-regs[12], set);
+kvm_getput_reg(regs-r13, env-regs[13], set);
+kvm_getput_reg(regs-r14, env-regs[14], set);
+kvm_getput_reg(regs-r15, env-regs[15], set);
 #endif
 
-kvm_getput_reg(regs.rflags, env-eflags, set);
-kvm_getput_reg(regs.rip, env-eip, set);
-
-if (set)
-ret = kvm_vcpu_ioctl(env, KVM_SET_REGS, regs);
-
-return ret;
+kvm_getput_reg(regs-rflags, env-eflags, set);
+kvm_getput_reg(regs-rip, env-eip, set);
 }
 
-static int kvm_put_fpu(CPUState *env)
+static void kvm_put_fpu(CPUState *env, struct kvm_fpu *fpu)
 {
-struct kvm_fpu fpu;
 int i;
 
-memset(fpu, 0, sizeof fpu);
-fpu.fsw = env-fpus  ~(7  11);
-fpu.fsw |= (env-fpstt  7)  11;
-fpu.fcw = 

Re: [PATCH] include stdlib.h in qemu-kvm.h

2009-10-13 Thread Marcelo Tosatti
On Thu, Oct 08, 2009 at 03:53:59PM -0300, Glauber Costa wrote:
 abort() needs it. Build with kvm disabled breaks without it.
 
 Signed-off-by: Glauber Costa glom...@redhat.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qemu-kvm] Enable UFO on virtio-net and tap devices

2009-10-13 Thread Marcelo Tosatti
On Fri, Oct 09, 2009 at 08:11:28AM +0100, Mark McLoughlin wrote:
 On Thu, 2009-10-08 at 15:31 -0700, Sridhar Samudrala wrote:
  On Thu, 2009-10-08 at 11:07 +0100, Mark McLoughlin wrote:
   On Wed, 2009-10-07 at 14:50 -0700, Sridhar Samudrala wrote:
linux 2.6.32 includes UDP fragmentation offload support in software. 
So we can enable UFO on the host tap device if supported and allow 
setting
UFO on virtio-net in the guest.
   
   Hmm, we really need to detect whether the host has tuntap UFO support
   before advertising it to the guest. Maybe in net_tap_fd_init() we should
   toggle TUN_F_UFO back and forth and check for EINVAL?
  
  Sure. Here is an updated patch that checks for UFO support in host.
 
 Looks good to me, thanks
 
 Acked-by: Mark McLoughlin mar...@redhat.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Modifying RAM during runtime on guest

2009-10-13 Thread Daniel Bareiro
Hi Jim.

On Wednesday, 07 October 2009 14:21:15 -0400,
Jim Paris wrote:

I noticed no-one answered this, and I just ran into the same
thing myself. As Avi pointed out earlier, it is a guest bug, and
upgrading the guest to 2.6.27 should fix it:

  http://www.mail-archive.com/kvm@vger.kernel.org/msg10849.html

   In this moment I doesn't have Internet connectivity in my house,
   but, as soon as it be possible, I shall download the necessary
   software to compile 2.6.27 or superior and then I tell you the
   result of the tests.
 
  After to have compiled Linux 2.6.30.3 using the Debian way on guest
  Debian GNU/Linux Lenny, when trying to boot the guest with this
  kernel, the bootstrapping is freeze on Loading, please wait...
  message.
  
  In logs I don't get entries of the bootstrapping process with 2.6.30
  (I think it is because the process in itself didn't start). Can it
  be due to a bug using 2.6.30.3 in guest with host KVM-88?
 
 2.6.30.3 should work fine, there must be some other problem.  If you
 remove quiet from the boot command line you should see the kernel
 messages which may indicate the problem.  I'd also recommend just
 trying a standard prebuilt Debian kernel.

   http://packages.debian.org/squeeze/linux-image-2.6.30-1-amd64

As we commented in this [1] thread, the problem was due to a patch that
Debian developers have applied to stock kernels enabling only libata for
the systems having a SATA controller.

For that reason the Debian stock kernel saw disks as hdX and kernels
2.6.31.2 and 2.6.30.3 compiled by myself saw disks like sdX. After
booting, with 2.6.3x, no longer panic is observed when restituting the
memory to its initial value.

Thanks for your reply.

Regards,
Daniel

[1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/41158
-- 
Fingerprint: BFB3 08D6 B4D1 31B2 72B9  29CE 6696 BF1B 14E6 1D37
Powered by Debian GNU/Linux Squeeze - Linux user #188.598


signature.asc
Description: Digital signature


Re: [PATCH 00/10] Clean up vcpu context structure

2009-10-13 Thread Marcelo Tosatti
On Fri, Oct 09, 2009 at 03:03:08PM -0300, Glauber Costa wrote:
 This series aims at cleanin up vcpu_context structure. I am not removing yet
 the fd field, because it is used in the ioctls, and I want to do it 
 separadedly.
 
 But after this series, this structure exists only as a way to hold the file 
 descriptor,
 and is, much cleaner, and much closer to upstream qemu than before.

Applied, thanks. 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] kvm/mmu: Resolve compile warning

2009-10-13 Thread Marcelo Tosatti
javier,

This is fixed in the -next branch of kvm.git. Thanks.

On Sun, Oct 11, 2009 at 02:28:23AM -0400, javier martinez canillas wrote:
 I got this compile warning with today linux-next:
 
 arch/x86/kvm/mmu.c: In function ‘kvm_set_pte_rmapp’:
 arch/x86/kvm/mmu.c:770: warning: cast to pointer from integer of different 
 size
 arch/x86/kvm/mmu.c: In function ‘kvm_set_spte_hva’:
 arch/x86/kvm/mmu.c:849: warning: cast from pointer to integer of different 
 size
 
 This patch solves the issue:
 
 Signed-off-by: Javier Martinez Canillas martinez.jav...@gmail.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check

2009-10-13 Thread Marcelo Tosatti
On Mon, Oct 12, 2009 at 08:51:40AM +0200, Jan Kiszka wrote:
 This (broken) check dates back to the days when this code was shared
 across architectures. x86 has IOMEM, so drop it.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Complete cpu initialization before signaling main thread.

2009-10-13 Thread Marcelo Tosatti
On Tue, Oct 13, 2009 at 02:17:19PM +0200, Gleb Natapov wrote:
 Otherwise some cpus may start executing code before others
 are fully initialized.
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
  qemu-kvm.c |   26 --
  1 files changed, 12 insertions(+), 14 deletions(-)
 
 diff --git a/qemu-kvm.c b/qemu-kvm.c
 index 62ca050..3765818 100644
 --- a/qemu-kvm.c
 +++ b/qemu-kvm.c
 @@ -1954,18 +1954,6 @@ static void process_irqchip_events(CPUState *env)
  
  static int kvm_main_loop_cpu(CPUState *env)
  {
 -setup_kernel_sigmask(env);
 -
 -pthread_mutex_lock(qemu_mutex);
 -
 -kvm_arch_init_vcpu(env);
 -#ifdef TARGET_I386
 -kvm_tpr_vcpu_start(env);
 -#endif
 -
 -cpu_single_env = env;
 -kvm_arch_load_regs(env);
 -
  while (1) {
  int run_cpu = !is_cpu_stopped(env);
  if (run_cpu  !kvm_irqchip_in_kernel(kvm_context)) {
 @@ -2003,15 +1991,25 @@ static void *ap_main_loop(void *_env)
  on_vcpu(env, kvm_arch_do_ioperm, data);
  #endif
  
 -/* signal VCPU creation */
 +setup_kernel_sigmask(env);
 +
  pthread_mutex_lock(qemu_mutex);
 +cpu_single_env = env;
 +
 +kvm_arch_init_vcpu(env);
 +#ifdef TARGET_I386
 +kvm_tpr_vcpu_start(env);
 +#endif
 +
 +kvm_arch_load_regs(env);
 +
 +/* signal VCPU creation */
  current_env-created = 1;
  pthread_cond_signal(qemu_vcpu_cond);
  
  /* and wait for machine initialization */
  while (!qemu_system_ready)
  qemu_cond_wait(qemu_system_cond);
 -pthread_mutex_unlock(qemu_mutex);

You don't set cpu_single_env after reacquiring 
qemu_mutex here (via qemu_cond_wait).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Complete cpu initialization before signaling main thread.

2009-10-13 Thread Marcelo Tosatti
On Tue, Oct 13, 2009 at 03:19:08PM -0300, Marcelo Tosatti wrote:
  @@ -2003,15 +1991,25 @@ static void *ap_main_loop(void *_env)
   on_vcpu(env, kvm_arch_do_ioperm, data);
   #endif
   
  -/* signal VCPU creation */
  +setup_kernel_sigmask(env);
  +
   pthread_mutex_lock(qemu_mutex);
  +cpu_single_env = env;
  +
  +kvm_arch_init_vcpu(env);
  +#ifdef TARGET_I386
  +kvm_tpr_vcpu_start(env);
  +#endif
  +
  +kvm_arch_load_regs(env);
  +
  +/* signal VCPU creation */
   current_env-created = 1;
   pthread_cond_signal(qemu_vcpu_cond);
   
   /* and wait for machine initialization */
   while (!qemu_system_ready)
   qemu_cond_wait(qemu_system_cond);
  -pthread_mutex_unlock(qemu_mutex);
 
 You don't set cpu_single_env after reacquiring 
 qemu_mutex here (via qemu_cond_wait).
 

Also i'm curious about the failure.

Why say, bsp should care about other cpu's register state while doing MP
init?

MP state is set via apic_reset, which happens before qemu_system_ready
is set.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Complete cpu initialization before signaling main thread.

2009-10-13 Thread Gleb Natapov
On Tue, Oct 13, 2009 at 03:19:08PM -0300, Marcelo Tosatti wrote:
 On Tue, Oct 13, 2009 at 02:17:19PM +0200, Gleb Natapov wrote:
  Otherwise some cpus may start executing code before others
  are fully initialized.
  
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   qemu-kvm.c |   26 --
   1 files changed, 12 insertions(+), 14 deletions(-)
  
  diff --git a/qemu-kvm.c b/qemu-kvm.c
  index 62ca050..3765818 100644
  --- a/qemu-kvm.c
  +++ b/qemu-kvm.c
  @@ -1954,18 +1954,6 @@ static void process_irqchip_events(CPUState *env)
   
   static int kvm_main_loop_cpu(CPUState *env)
   {
  -setup_kernel_sigmask(env);
  -
  -pthread_mutex_lock(qemu_mutex);
  -
  -kvm_arch_init_vcpu(env);
  -#ifdef TARGET_I386
  -kvm_tpr_vcpu_start(env);
  -#endif
  -
  -cpu_single_env = env;
  -kvm_arch_load_regs(env);
  -
   while (1) {
   int run_cpu = !is_cpu_stopped(env);
   if (run_cpu  !kvm_irqchip_in_kernel(kvm_context)) {
  @@ -2003,15 +1991,25 @@ static void *ap_main_loop(void *_env)
   on_vcpu(env, kvm_arch_do_ioperm, data);
   #endif
   
  -/* signal VCPU creation */
  +setup_kernel_sigmask(env);
  +
   pthread_mutex_lock(qemu_mutex);
  +cpu_single_env = env;
  +
  +kvm_arch_init_vcpu(env);
  +#ifdef TARGET_I386
  +kvm_tpr_vcpu_start(env);
  +#endif
  +
  +kvm_arch_load_regs(env);
  +
  +/* signal VCPU creation */
   current_env-created = 1;
   pthread_cond_signal(qemu_vcpu_cond);
   
   /* and wait for machine initialization */
   while (!qemu_system_ready)
   qemu_cond_wait(qemu_system_cond);
  -pthread_mutex_unlock(qemu_mutex);
 
 You don't set cpu_single_env after reacquiring 
 qemu_mutex here (via qemu_cond_wait).
Hmm, as far as I can see it is not used any more until kvm_run call.
But may we should set it anyway.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Don't sync mpstate to/from kernel when unneeded.

2009-10-13 Thread Marcelo Tosatti
On Tue, Oct 13, 2009 at 02:17:20PM +0200, Gleb Natapov wrote:
 mp_state, unlike other cpu state, can be changed not only from vcpu
 context it belongs to, but by other vcpus too. That makes its loading
 from kernel/saving back not safe if mp_state value is changed inside
 kernel between load and save. For example vcpu 1 loads mp_sate into
 user-space and the state is RUNNING, vcpu 0 sends INIT/SIPI to vcpu 1
 so in-kernel mp_sate becomes SIPI, vcpu 1 save user-space copy into
 kernel and calls vcpu_run(). SIPI sate is lost.
 
 The patch copies mp_sate into kernel only when it is knows that
 int-kernel value is outdated. This happens on reset and vmload.
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
  hw/apic.c |1 +
  monitor.c |2 ++
  qemu-kvm.c|9 -
  qemu-kvm.h|1 -
  target-i386/machine.c |3 +++
  5 files changed, 10 insertions(+), 6 deletions(-)
 
 diff --git a/hw/apic.c b/hw/apic.c
 index 2952675..729 100644
 --- a/hw/apic.c
 +++ b/hw/apic.c
 @@ -512,6 +512,7 @@ void apic_init_reset(CPUState *env)
  if (kvm_enabled()  qemu_kvm_irqchip_in_kernel()) {
  env-mp_state
  = env-halted ? KVM_MP_STATE_UNINITIALIZED : 
 KVM_MP_STATE_RUNNABLE;
 +kvm_load_mpstate(env);
  }
  #endif
  }
 diff --git a/monitor.c b/monitor.c
 index 7f0f5a9..dd8f2ca 100644
 --- a/monitor.c
 +++ b/monitor.c
 @@ -350,6 +350,7 @@ static CPUState *mon_get_cpu(void)
  mon_set_cpu(0);
  }
  cpu_synchronize_state(cur_mon-mon_cpu);
 +kvm_save_mpstate(cur_mon-mon_cpu);
  return cur_mon-mon_cpu;
  }
  
 @@ -377,6 +378,7 @@ static void do_info_cpus(Monitor *mon)
  
  for(env = first_cpu; env != NULL; env = env-next_cpu) {
  cpu_synchronize_state(env);
 +kvm_save_mpstate(env);
  monitor_printf(mon, %c CPU #%d:,
 (env == mon-mon_cpu) ? '*' : ' ',
 env-cpu_index);
 diff --git a/qemu-kvm.c b/qemu-kvm.c
 index 3765818..2a1e0ff 100644
 --- a/qemu-kvm.c
 +++ b/qemu-kvm.c
 @@ -1609,11 +1609,6 @@ static void on_vcpu(CPUState *env, void (*func)(void 
 *data), void *data)
  void kvm_arch_get_registers(CPUState *env)
  {
   kvm_arch_save_regs(env);
 - kvm_arch_save_mpstate(env);
 -#ifdef KVM_CAP_MP_STATE
 - if (kvm_irqchip_in_kernel(kvm_context))
 - env-halted = (env-mp_state == KVM_MP_STATE_HALTED);
 -#endif

Why don't you keep saving it here (so there's no need to do it
explicitly elsewhere), and only explictly loading?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Don't sync mpstate to/from kernel when unneeded.

2009-10-13 Thread Gleb Natapov
On Tue, Oct 13, 2009 at 03:36:13PM -0300, Marcelo Tosatti wrote:
 On Tue, Oct 13, 2009 at 02:17:20PM +0200, Gleb Natapov wrote:
  mp_state, unlike other cpu state, can be changed not only from vcpu
  context it belongs to, but by other vcpus too. That makes its loading
  from kernel/saving back not safe if mp_state value is changed inside
  kernel between load and save. For example vcpu 1 loads mp_sate into
  user-space and the state is RUNNING, vcpu 0 sends INIT/SIPI to vcpu 1
  so in-kernel mp_sate becomes SIPI, vcpu 1 save user-space copy into
  kernel and calls vcpu_run(). SIPI sate is lost.
  
  The patch copies mp_sate into kernel only when it is knows that
  int-kernel value is outdated. This happens on reset and vmload.
  
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   hw/apic.c |1 +
   monitor.c |2 ++
   qemu-kvm.c|9 -
   qemu-kvm.h|1 -
   target-i386/machine.c |3 +++
   5 files changed, 10 insertions(+), 6 deletions(-)
  
  diff --git a/hw/apic.c b/hw/apic.c
  index 2952675..729 100644
  --- a/hw/apic.c
  +++ b/hw/apic.c
  @@ -512,6 +512,7 @@ void apic_init_reset(CPUState *env)
   if (kvm_enabled()  qemu_kvm_irqchip_in_kernel()) {
   env-mp_state
   = env-halted ? KVM_MP_STATE_UNINITIALIZED : 
  KVM_MP_STATE_RUNNABLE;
  +kvm_load_mpstate(env);
   }
   #endif
   }
  diff --git a/monitor.c b/monitor.c
  index 7f0f5a9..dd8f2ca 100644
  --- a/monitor.c
  +++ b/monitor.c
  @@ -350,6 +350,7 @@ static CPUState *mon_get_cpu(void)
   mon_set_cpu(0);
   }
   cpu_synchronize_state(cur_mon-mon_cpu);
  +kvm_save_mpstate(cur_mon-mon_cpu);
   return cur_mon-mon_cpu;
   }
   
  @@ -377,6 +378,7 @@ static void do_info_cpus(Monitor *mon)
   
   for(env = first_cpu; env != NULL; env = env-next_cpu) {
   cpu_synchronize_state(env);
  +kvm_save_mpstate(env);
   monitor_printf(mon, %c CPU #%d:,
  (env == mon-mon_cpu) ? '*' : ' ',
  env-cpu_index);
  diff --git a/qemu-kvm.c b/qemu-kvm.c
  index 3765818..2a1e0ff 100644
  --- a/qemu-kvm.c
  +++ b/qemu-kvm.c
  @@ -1609,11 +1609,6 @@ static void on_vcpu(CPUState *env, void (*func)(void 
  *data), void *data)
   void kvm_arch_get_registers(CPUState *env)
   {
  kvm_arch_save_regs(env);
  -   kvm_arch_save_mpstate(env);
  -#ifdef KVM_CAP_MP_STATE
  -   if (kvm_irqchip_in_kernel(kvm_context))
  -   env-halted = (env-mp_state == KVM_MP_STATE_HALTED);
  -#endif
 
 Why don't you keep saving it here (so there's no need to do it
 explicitly elsewhere), and only explictly loading?
To keep kvm_arch_get_registers/kvm_arch_set_registers symmetric I guess.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] kvm/mmu: Resolve compile warning

2009-10-13 Thread Javier Martinez Canillas
Thank you. Sorry for the noise.

Best regards

-
Javier Martínez Canillas
+595 981 88 66 58



On Tue, Oct 13, 2009 at 1:10 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 javier,

 This is fixed in the -next branch of kvm.git. Thanks.

 On Sun, Oct 11, 2009 at 02:28:23AM -0400, javier martinez canillas wrote:
 I got this compile warning with today linux-next:

 arch/x86/kvm/mmu.c: In function ‘kvm_set_pte_rmapp’:
 arch/x86/kvm/mmu.c:770: warning: cast to pointer from integer of different 
 size
 arch/x86/kvm/mmu.c: In function ‘kvm_set_spte_hva’:
 arch/x86/kvm/mmu.c:849: warning: cast from pointer to integer of different 
 size

 This patch solves the issue:

 Signed-off-by: Javier Martinez Canillas martinez.jav...@gmail.com


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] qemu-kvm: Fix configure to respect --kerneldir

2009-10-13 Thread Jan Kiszka
This simplifies working with new features without having to update the
locally mirrored headers. It also reduces the diff to upstream.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
v2: Rebase over git head

 configure |   46 --
 1 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/configure b/configure
index 2341772..fdefcf6 100755
--- a/configure
+++ b/configure
@@ -1346,24 +1346,7 @@ fi
 ##
 # kvm probe
 if test $kvm != no ; then
-  case $cpu in
-  i386 | x86_64)
-kvm_arch=x86
-;;
-  ppc)
-kvm_arch=powerpc
-;;
-  *)
-kvm_arch=$cpu
-;;
-  esac
-
-  kvm_cflags=-I$source_path/kvm/include
-  kvm_cflags=$kvm_cflags -include $source_path/kvm/include/linux/config.h
-  kvm_cflags=$kvm_cflags -I$source_path/kvm/include/$kvm_arch
-  kvm_cflags=$kvm_cflags -idirafter $source_path/compat
-
-  cat  $TMPC EOF
+cat  $TMPC EOF
 #include linux/kvm.h
 #if !defined(KVM_API_VERSION) || KVM_API_VERSION  12 || KVM_API_VERSION  12
 #error Invalid KVM version
@@ -1379,6 +1362,33 @@ if test $kvm != no ; then
 #endif
 int main(void) { return 0; }
 EOF
+  if test $kerneldir !=  ; then
+  kvm_cflags=-I$kerneldir/include
+  if test \( $cpu = i386 -o $cpu = x86_64 \) \
+ -a -d $kerneldir/arch/x86/include ; then
+kvm_cflags=$kvm_cflags -I$kerneldir/arch/x86/include
+   elif test $cpu = ppc -a -d $kerneldir/arch/powerpc/include ; then
+   kvm_cflags=$kvm_cflags -I$kerneldir/arch/powerpc/include
+elif test -d $kerneldir/arch/$cpu/include ; then
+kvm_cflags=$kvm_cflags -I$kerneldir/arch/$cpu/include
+  fi
+  else
+  case $cpu in
+  i386 | x86_64)
+kvm_arch=x86
+;;
+  ppc)
+kvm_arch=powerpc
+;;
+  *)
+kvm_arch=$cpu
+;;
+  esac
+  kvm_cflags=-I$source_path/kvm/include
+  kvm_cflags=$kvm_cflags -include $source_path/kvm/include/linux/config.h
+  kvm_cflags=$kvm_cflags -I$source_path/kvm/include/$kvm_arch
+  fi
+  kvm_cflags=$kvm_cflags -idirafter $source_path/compat
   if compile_prog $kvm_cflags  ; then
 kvm=yes
   else



signature.asc
Description: OpenPGP digital signature


Re: [Autotest] [PATCH] Add a kvm test guest_s4 which supports both Linux and Windows platform

2009-10-13 Thread Lucas Meneghel Rodrigues
Hi Yolkfull and Chen:

Thanks for your test! I have some comments and doubts to clear, most
of them are about content of the messages delivered for the user and
some other details.

On Sun, Sep 27, 2009 at 6:11 AM, Yolkfull Chow yz...@redhat.com wrote:
 For this case, Ken Cao wrote the linux part previously and I did extensive
 modifications on Windows platform support.

 Signed-off-by: Ken Cao k...@redhat.com
 Signed-off-by: Yolkfull Chow yz...@redhat.com
 ---
  client/tests/kvm/kvm_tests.cfg.sample |   14 +++
  client/tests/kvm/tests/guest_s4.py    |   66 
 +
  2 files changed, 80 insertions(+), 0 deletions(-)
  create mode 100644 client/tests/kvm/tests/guest_s4.py

 diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
 b/client/tests/kvm/kvm_tests.cfg.sample
 index 285a38f..f9ecb61 100644
 --- a/client/tests/kvm/kvm_tests.cfg.sample
 +++ b/client/tests/kvm/kvm_tests.cfg.sample
 @@ -94,6 +94,14 @@ variants:
     - linux_s3:     install setup
         type = linux_s3

 +    - guest_s4:
 +        type = guest_s4
 +        check_s4_support_cmd = grep -q disk /sys/power/state
 +        test_s4_cmd = cd /tmp/;nohup tcpdump -q -t ip host localhost
 +        check_s4_cmd = pgrep tcpdump
 +        set_s4_cmd = echo disk  /sys/power/state
 +        kill_test_s4_cmd = pkill tcpdump
 +
     - timedrift:    install setup
         type = timedrift
         extra_params +=  -rtc-td-hack
 @@ -382,6 +390,12 @@ variants:
             # Alternative host load:
             #host_load_command = dd if=/dev/urandom of=/dev/null
             host_load_instances = 8
 +        guest_s4:
 +            check_s4_support_cmd = powercfg /hibernate on
 +            test_s4_cmd = start /B ping -n 3000 localhost
 +            check_s4_cmd = tasklist | find /I ping
 +            set_s4_cmd = rundll32.exe PowrProf.dll, SetSuspendState
 +            kill_test_s4_cmd = taskkill /IM ping.exe /F

         variants:
             - Win2000:
 diff --git a/client/tests/kvm/tests/guest_s4.py 
 b/client/tests/kvm/tests/guest_s4.py
 new file mode 100644
 index 000..5d8fbdf
 --- /dev/null
 +++ b/client/tests/kvm/tests/guest_s4.py
 @@ -0,0 +1,66 @@
 +import logging, time
 +from autotest_lib.client.common_lib import error
 +import kvm_test_utils, kvm_utils
 +
 +
 +def run_guest_s4(test, params, env):
 +    
 +    Suspend guest to disk,supports both Linux  Windows OSes.
 +
 +   �...@param test: kvm test object.
 +   �...@param params: Dictionary with test parameters.
 +   �...@param env: Dictionary with the test environment.
 +    
 +    vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
 +    session = kvm_test_utils.wait_for_login(vm)
 +
 +    logging.info(Checking whether VM supports S4)
 +    status = session.get_command_status(params.get(check_s4_support_cmd))
 +    if status is None:
 +        logging.error(Failed to check if S4 exists)
 +    elif status != 0:
 +        raise error.TestFail(Guest does not support S4)
 +
 +    logging.info(Waiting for a while for X to start...)

Yes, generally X starts a bit later than the SSH service, so I
understand the time being here, however:

 * In fact we are waiting for all services of the guest to be up and
functional, so depending on the level of load, I don't think 10s is
gonna make it. So I suggest something = 30s
 * It's also true that just wait for a given time and hope that it
will be OK kinda sucks, so ideally we need to write utility functions
to stablish as well as possible when all services of a host are fully
booted up. Stated this way, it looks simple, but it's not.

Autotest experience suggests that there's no real sane way to
determine when a linux box is booted up, but we can take a
semi-rational approach and verify if all services for the current run
level have the status up or a similar approach. For windows, I was
talking to Yaniv Kaul and it seems that processing the output of the
'sc query' command might give what we want. Bottom line, I'd like to
add a TODO item, and write a function to stablish (fairly confidently)
that a windows/linux guest is booted up.

 +    time.sleep(10)
 +
 +    # Start up a program(tcpdump for linux OS  ping for M$ OS), as a flag.
 +    # If the program died after suspend, then fails this testcase.
 +    test_s4_cmd = params.get(test_s4_cmd)
 +    session.sendline(test_s4_cmd)
 +
 +    # Get the second session to start S4
 +    session2 = kvm_test_utils.wait_for_login(vm)
 +
 +    check_s4_cmd = params.get(check_s4_cmd)
 +    if session2.get_command_status(check_s4_cmd):
 +        raise error.TestError(Failed to launch %s background % test_s4_cmd)
 +    logging.info(Launched command background in guest: %s % test_s4_cmd)
 +
 +    # Implement S4
 +    logging.info(Start suspend to disk now...)
 +    session2.sendline(params.get(set_s4_cmd))
 +
 +    if not kvm_utils.wait_for(vm.is_dead, 360, 30, 2):
 +        raise error.TestFail(VM refuse to go down,suspend failed)
 +    logging.info(VM suspended successfully.)
 

Re: [Autotest] [PATCH 4/6] KVM test: Add unattended install script

2009-10-13 Thread Ryan Harper
* Lucas Meneghel Rodrigues l...@redhat.com [2009-10-09 15:41]:
 In order to make it possible to prepare the environment
 for the guests installation, we have to:
 
 

 +class UnattendedInstall(object):
 +
 +Creates a floppy disk image that will contain a config file for 
 unattended
 +OS install. Optionally, sets up a PXE install server using qemu built in
 +TFTP and DHCP servers to install a particular operating system. The
 +parameters to the script are retrieved from environment variables.
 +
 +def __init__(self):
 +
 +Gets params from environment variables and sets class attributes.
 +
 +script_dir = os.path.dirname(sys.modules[__name__].__file__)
 +kvm_test_dir = os.path.abspath(os.path.join(script_dir, ..))
 +images_dir = os.path.join(kvm_test_dir, 'images')
 +self.deps_dir = os.path.join(kvm_test_dir, 'deps')
 +self.unattended_dir = os.path.join(kvm_test_dir, 'unattended')
 +
 +try:
 +tftp_root = os.environ['KVM_TEST_tftp']
 +self.tftp_root = os.path.join(images_dir, tftp_root)

Testing this out, the directory is just slightly wrong.  My tftp_root
value ends up being:

/home/rharper/work/git/autotest/client/tests/kvm/images/images/tftpboot

The tftp param is built in kvm_vm.py by combining root_dir
(/home/rharper/work/git/autotest/client/tests/kvm) + the tftp value from
kvm_tests.cfg (defaults to 'images/tftpboot').  So if we want to keep
the relative tftp path in kvm_tests.cfg, then I think we need  update
the unattended script.

I think want we want instead is:

self.tftp_root = os.path.join(kvm_test_dir, tftp_root)

I made this small change and I can now run the fc11 unattended install.

diff --git a/client/tests/kvm/scripts/unattended.py 
b/client/tests/kvm/scripts/unattended.py
index 6ceeef1..febea6e 100755
--- a/client/tests/kvm/scripts/unattended.py
+++ b/client/tests/kvm/scripts/unattended.py
@@ -33,7 +33,7 @@ class UnattendedInstall(object):
 
 try:
 tftp_root = os.environ['KVM_TEST_tftp']
-self.tftp_root = os.path.join(images_dir, tftp_root)
+self.tftp_root = os.path.join(kvm_test_dir, tftp_root)
 if not os.path.isdir(self.tftp_root):
 os.makedirs(self.tftp_root)
 except KeyError:


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] v3: allow userspace to adjust kvmclock offset

2009-10-13 Thread Glauber Costa
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[ v2: uses a struct with a padding ]
[ v3: provide an ioctl to get clock data too ]

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/x86.c  |   35 ++-
 include/linux/kvm.h |7 +++
 3 files changed, 42 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 179a919..c9b0d9f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -410,6 +410,7 @@ struct kvm_arch{
 
unsigned long irq_sources_bitmap;
u64 vm_init_tsc;
+   s64 kvmclock_offset;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9601bc6..58a380a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -699,7 +699,8 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
/* With all the info we got, fill in the values */
 
vcpu-hv_clock.system_time = ts.tv_nsec +
-(NSEC_PER_SEC * (u64)ts.tv_sec);
+(NSEC_PER_SEC * (u64)ts.tv_sec) + 
v-kvm-arch.kvmclock_offset;
+
/*
 * The interface expects us to write an even number signaling that the
 * update is finished. Since the guest won't see the intermediate
@@ -2441,6 +2442,38 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = 0;
break;
}
+   case KVM_SET_CLOCK: {
+   struct timespec now;
+   struct kvm_clock_data user_ns;
+   u64 now_ns;
+   long delta;
+
+   r =  -EFAULT;
+   if (copy_from_user(user_ns, argp, sizeof(user_ns)))
+   goto out;
+
+   r = 0;
+   ktime_get_ts(now);
+   now_ns = timespec_to_ns(now);
+   delta = user_ns.clock - now_ns;
+   kvm-arch.kvmclock_offset = delta;
+   break;  
+   }
+   case KVM_GET_CLOCK: {
+   struct timespec now;
+   struct kvm_clock_data user_ns;
+   u64 now_ns;
+
+   ktime_get_ts(now);
+   now_ns = timespec_to_ns(now);
+   user_ns.clock = kvm-arch.kvmclock_offset + now_ns;
+
+   if (copy_to_user(argp, user_ns, sizeof(user_ns)))
+   r =  -EFAULT;
+
+   break;  
+   }
+
default:
;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index f8f8900..ad0ecbc 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -497,6 +497,11 @@ struct kvm_irqfd {
__u8  pad[20];
 };
 
+struct kvm_clock_data {
+   __u64 clock;
+   __u64 pad[2];
+};
+
 /*
  * ioctls for VM fds
  */
@@ -546,6 +551,8 @@ struct kvm_irqfd {
 #define KVM_CREATE_PIT2   _IOW(KVMIO, 0x77, struct 
kvm_pit_config)
 #define KVM_SET_BOOT_CPU_ID_IO(KVMIO, 0x78)
 #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
+#define KVM_SET_CLOCK_IOW(KVMIO, 0x7a, struct kvm_clock_data)
+#define KVM_GET_CLOCK_IOW(KVMIO, 0x7b, struct kvm_clock_data)
 
 /*
  * ioctls for vcpu fds
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] v3: allow userspace to adjust kvmclock offset

2009-10-13 Thread Frederik Deweerdt
On Tue, Oct 13, 2009 at 04:55:05PM -0400, Glauber Costa wrote:
 + case KVM_SET_CLOCK: {
 + struct timespec now;
 + struct kvm_clock_data user_ns;
 + u64 now_ns;
 + long delta;

Should'nt that read s64? I guess such a large value won't happen in
practice, but the 32bits case would truncate the value differently in
the substraction below.

Regards,
Frederik
 +
 + r =  -EFAULT;
 + if (copy_from_user(user_ns, argp, sizeof(user_ns)))
 + goto out;
 +
 + r = 0;
 + ktime_get_ts(now);
 + now_ns = timespec_to_ns(now);
 + delta = user_ns.clock - now_ns;
 + kvm-arch.kvmclock_offset = delta;
 + break;  
 + }
 + case KVM_GET_CLOCK: {
 + struct timespec now;
 + struct kvm_clock_data user_ns;
 + u64 now_ns;
 +
 + ktime_get_ts(now);
 + now_ns = timespec_to_ns(now);
 + user_ns.clock = kvm-arch.kvmclock_offset + now_ns;
 +
 + if (copy_to_user(argp, user_ns, sizeof(user_ns)))
 + r =  -EFAULT;
 +
 + break;  
 + }
 +
   default:
   ;
   }
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index f8f8900..ad0ecbc 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -497,6 +497,11 @@ struct kvm_irqfd {
   __u8  pad[20];
  };
  
 +struct kvm_clock_data {
 + __u64 clock;
 + __u64 pad[2];
 +};
 +
  /*
   * ioctls for VM fds
   */
 @@ -546,6 +551,8 @@ struct kvm_irqfd {
  #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct 
 kvm_pit_config)
  #define KVM_SET_BOOT_CPU_ID_IO(KVMIO, 0x78)
  #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
 +#define KVM_SET_CLOCK  _IOW(KVMIO, 0x7a, struct 
 kvm_clock_data)
 +#define KVM_GET_CLOCK  _IOW(KVMIO, 0x7b, struct 
 kvm_clock_data)
  
  /*
   * ioctls for vcpu fds
 -- 
 1.6.2.2
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 4/6] KVM test: Add unattended install script

2009-10-13 Thread Lucas Meneghel Rodrigues
On Tue, Oct 13, 2009 at 5:52 PM, Ryan Harper ry...@us.ibm.com wrote:
 * Lucas Meneghel Rodrigues l...@redhat.com [2009-10-09 15:41]:
 In order to make it possible to prepare the environment
 for the guests installation, we have to:



 +class UnattendedInstall(object):
 +    
 +    Creates a floppy disk image that will contain a config file for 
 unattended
 +    OS install. Optionally, sets up a PXE install server using qemu built in
 +    TFTP and DHCP servers to install a particular operating system. The
 +    parameters to the script are retrieved from environment variables.
 +    
 +    def __init__(self):
 +        
 +        Gets params from environment variables and sets class attributes.
 +        
 +        script_dir = os.path.dirname(sys.modules[__name__].__file__)
 +        kvm_test_dir = os.path.abspath(os.path.join(script_dir, ..))
 +        images_dir = os.path.join(kvm_test_dir, 'images')
 +        self.deps_dir = os.path.join(kvm_test_dir, 'deps')
 +        self.unattended_dir = os.path.join(kvm_test_dir, 'unattended')
 +
 +        try:
 +            tftp_root = os.environ['KVM_TEST_tftp']
 +            self.tftp_root = os.path.join(images_dir, tftp_root)

 Testing this out, the directory is just slightly wrong.  My tftp_root
 value ends up being:

 /home/rharper/work/git/autotest/client/tests/kvm/images/images/tftpboot

 The tftp param is built in kvm_vm.py by combining root_dir
 (/home/rharper/work/git/autotest/client/tests/kvm) + the tftp value from
 kvm_tests.cfg (defaults to 'images/tftpboot').  So if we want to keep
 the relative tftp path in kvm_tests.cfg, then I think we need  update
 the unattended script.

 I think want we want instead is:

 self.tftp_root = os.path.join(kvm_test_dir, tftp_root)

 I made this small change and I can now run the fc11 unattended install.

Thanks for pointing this out! I thought I had fixed this on the final
patchset version I commited, but turns out I didn't :)

Commited as

http://autotest.kernel.org/changeset/3842

 diff --git a/client/tests/kvm/scripts/unattended.py 
 b/client/tests/kvm/scripts/unattended.py
 index 6ceeef1..febea6e 100755
 --- a/client/tests/kvm/scripts/unattended.py
 +++ b/client/tests/kvm/scripts/unattended.py
 @@ -33,7 +33,7 @@ class UnattendedInstall(object):

         try:
             tftp_root = os.environ['KVM_TEST_tftp']
 -            self.tftp_root = os.path.join(images_dir, tftp_root)
 +            self.tftp_root = os.path.join(kvm_test_dir, tftp_root)
             if not os.path.isdir(self.tftp_root):
                 os.makedirs(self.tftp_root)
         except KeyError:


 --
 Ryan Harper
 Software Engineer; Linux Technology Center
 IBM Corp., Austin, Tx
 ry...@us.ibm.com
 ___
 Autotest mailing list
 autot...@test.kernel.org
 http://test.kernel.org/cgi-bin/mailman/listinfo/autotest




-- 
Lucas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device assignment rom fixups

2009-10-13 Thread Marcelo Tosatti
On Tue, Oct 13, 2009 at 05:20:34PM +0200, Gerd Hoffmann wrote:
 Use new rom loading infrastructure.
 Devices can simply register option roms now.
 
 Signed-off-by: Gerd Hoffmann kra...@redhat.com

Applied, thanks.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] qemu-kvm: Fix configure to respect --kerneldir

2009-10-13 Thread Marcelo Tosatti
On Tue, Oct 13, 2009 at 09:01:09PM +0200, Jan Kiszka wrote:
 This simplifies working with new features without having to update the
 locally mirrored headers. It also reduces the diff to upstream.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sync guest calls made async on host - SQLite performance

2009-10-13 Thread Christoph Hellwig
On Sun, Oct 11, 2009 at 11:16:42AM +0200, Avi Kivity wrote:
if scsi is used, you incur the cost of virtualization,
if virtio is used, your guests fsyncs incur less cost.
 
 So back to the question to the kvm team.  It appears that with the 
 stock KVM setup customers who need higher data integrity (through 
 fsync) should steer away from virtio for the moment.
 
 Is that assessment correct?
 
 
 Christoph, wasn't there a bug where the guest didn't wait for requests 
 in response to a barrier request?

Can't remember anything like that.  The bug was the complete lack of
cache flush infrastructure for virtio, and the lack of advertising a
volative write cache on ide.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sync guest calls made async on host - SQLite performance

2009-10-13 Thread Anthony Liguori

Matthew Tippett wrote:

Thanks Duncan for reproducing the behavior outside myself and Phoronix.

I dug deeper into the actual syscalls being made by sqlite.  The 
salient part of the behaviour is small sequential writes followed by a 
fdatasync (effectively a metadata-free fsync).


As Dustin indicates,

   if scsi is used, you incur the cost of virtualization,
   if virtio is used, your guests fsyncs incur less cost.

So back to the question to the kvm team.  It appears that with the 
stock KVM setup customers who need higher data integrity (through 
fsync) should steer away from virtio for the moment.


Is that assessment correct?


No, it's an absurd assessment.

You have additional layers of caching happening because you're running a 
guest from a filesystem on the host.


A benchmark running under a guest that happens do be faster than the 
host does not indicate anything.  It could be that the benchmark is 
poorly written.


What operation, specifically, do you think is not behaving properly 
under kvm?  ext4 (karmic's default filesystem) does not enable barriers 
by default so it's unlikely this is anything barrier related.



Regards,

Matthew


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sync guest calls made async on host - SQLite performance

2009-10-13 Thread Matthew Tippett



No, it's an absurd assessment.

You have additional layers of caching happening because you're running a 
guest from a filesystem on the host.


Comments below.

A benchmark running under a guest that happens do be faster than the 
host does not indicate anything.  It could be that the benchmark is 
poorly written.


I believe that I have removed the benchmark from discussion, we are now 
looking at semantics of small writes followed by


What operation, specifically, do you think is not behaving properly 
under kvm?  ext4 (karmic's default filesystem) does not enable barriers 
by default so it's unlikely this is anything barrier related.




Re-quoting me from two replies ago.

===
I dug deeper into the actual syscalls being made by sqlite.  The salient 
part of the behaviour is small sequential writes followed by a

fdatasync (effectively a metadata-free fsync).
===

And quoting from Dustin

===
I have tried this, exactly as you have described.  The tests took:

 * 1162.08033204 seconds on native hardware
 * 2306.68306303 seconds in a kvm using if=scsi disk
 * 405.382308006 seconds in a kvm using if=virtio
===

And finally Christoph

===
Can't remember anything like that.  The bug was the complete lack of
cache flush infrastructure for virtio, and the lack of advertising a
volative write cache on ide.
===

The _Operation_ that I believe is not behaving as expected is fdatasync 
under virtio. I understand your position that this is not a bug, but a 
configuration/packaging issue.


So I'll put it to you differently.  When a Linux guest issues a fsync or 
fdatasync what should occur?


o If the system has been configured in writeback mode then you don't 
worry about getting the data to the disk, so when the hypervisor has 
received the data, be happy with it.


o If the system is configured in writethrough mode, shouldn't the 
hypervisor look to get the data to disk ASAP?  Whether this is 
immediately, or batched with other data, I'll leave it to you guys.


As mentioned above, I am not saying it is a bug in KVM, and may well be 
a poor choice of configuration options within distributions.  From what 
I can interpret from above, scsi and writethrough is the safest model to 
go for.  By extension, for enterprise workloads where data integrity is 
more critical the default configuration of KVM under Ubuntu and possibly 
other distributions may be a poor choice.


Regards,

Matthew
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] Add a kvm test guest_s4 which supports both Linux and Windows platform

2009-10-13 Thread Yolkfull Chow
On Tue, Oct 13, 2009 at 05:29:40PM -0300, Lucas Meneghel Rodrigues wrote:
 Hi Yolkfull and Chen:
 
 Thanks for your test! I have some comments and doubts to clear, most
 of them are about content of the messages delivered for the user and
 some other details.
 
 On Sun, Sep 27, 2009 at 6:11 AM, Yolkfull Chow yz...@redhat.com wrote:
  For this case, Ken Cao wrote the linux part previously and I did extensive
  modifications on Windows platform support.
 
  Signed-off-by: Ken Cao k...@redhat.com
  Signed-off-by: Yolkfull Chow yz...@redhat.com
  ---
   client/tests/kvm/kvm_tests.cfg.sample |   14 +++
   client/tests/kvm/tests/guest_s4.py    |   66 
  +
   2 files changed, 80 insertions(+), 0 deletions(-)
   create mode 100644 client/tests/kvm/tests/guest_s4.py
 
  diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
  b/client/tests/kvm/kvm_tests.cfg.sample
  index 285a38f..f9ecb61 100644
  --- a/client/tests/kvm/kvm_tests.cfg.sample
  +++ b/client/tests/kvm/kvm_tests.cfg.sample
  @@ -94,6 +94,14 @@ variants:
      - linux_s3:     install setup
          type = linux_s3
 
  +    - guest_s4:
  +        type = guest_s4
  +        check_s4_support_cmd = grep -q disk /sys/power/state
  +        test_s4_cmd = cd /tmp/;nohup tcpdump -q -t ip host localhost
  +        check_s4_cmd = pgrep tcpdump
  +        set_s4_cmd = echo disk  /sys/power/state
  +        kill_test_s4_cmd = pkill tcpdump
  +
      - timedrift:    install setup
          type = timedrift
          extra_params +=  -rtc-td-hack
  @@ -382,6 +390,12 @@ variants:
              # Alternative host load:
              #host_load_command = dd if=/dev/urandom of=/dev/null
              host_load_instances = 8
  +        guest_s4:
  +            check_s4_support_cmd = powercfg /hibernate on
  +            test_s4_cmd = start /B ping -n 3000 localhost
  +            check_s4_cmd = tasklist | find /I ping
  +            set_s4_cmd = rundll32.exe PowrProf.dll, SetSuspendState
  +            kill_test_s4_cmd = taskkill /IM ping.exe /F
 
          variants:
              - Win2000:
  diff --git a/client/tests/kvm/tests/guest_s4.py 
  b/client/tests/kvm/tests/guest_s4.py
  new file mode 100644
  index 000..5d8fbdf
  --- /dev/null
  +++ b/client/tests/kvm/tests/guest_s4.py
  @@ -0,0 +1,66 @@
  +import logging, time
  +from autotest_lib.client.common_lib import error
  +import kvm_test_utils, kvm_utils
  +
  +
  +def run_guest_s4(test, params, env):
  +    
  +    Suspend guest to disk,supports both Linux  Windows OSes.
  +
  +   �...@param test: kvm test object.
  +   �...@param params: Dictionary with test parameters.
  +   �...@param env: Dictionary with the test environment.
  +    
  +    vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
  +    session = kvm_test_utils.wait_for_login(vm)
  +
  +    logging.info(Checking whether VM supports S4)
  +    status = session.get_command_status(params.get(check_s4_support_cmd))
  +    if status is None:
  +        logging.error(Failed to check if S4 exists)
  +    elif status != 0:
  +        raise error.TestFail(Guest does not support S4)
  +
  +    logging.info(Waiting for a while for X to start...)
 
 Yes, generally X starts a bit later than the SSH service, so I
 understand the time being here, however:
 
  * In fact we are waiting for all services of the guest to be up and
 functional, so depending on the level of load, I don't think 10s is
 gonna make it. So I suggest something = 30s

Yeah,reasonable, we did ignore the circumstance with workload. But as
you metioned,it can depend on different level of workload, therefore 30s
may be not enough as well. Your idea that write a utility function
waiting for some services up is good I think, thus it could be something
like:

def wait_services_up(services_list):
...

and for this case:

wait_services_up([Xorg]) for Linux and
wait_services_up([explore.exe]) for Windows.

  * It's also true that just wait for a given time and hope that it
 will be OK kinda sucks, so ideally we need to write utility functions
 to stablish as well as possible when all services of a host are fully
 booted up. Stated this way, it looks simple, but it's not.
 
 Autotest experience suggests that there's no real sane way to
 determine when a linux box is booted up, but we can take a
 semi-rational approach and verify if all services for the current run
 level have the status up or a similar approach. For windows, I was
 talking to Yaniv Kaul and it seems that processing the output of the
 'sc query' command might give what we want. Bottom line, I'd like to
 add a TODO item, and write a function to stablish (fairly confidently)
 that a windows/linux guest is booted up.
 
  +    time.sleep(10)
  +
  +    # Start up a program(tcpdump for linux OS  ping for M$ OS), as a flag.
  +    # If the program died after suspend, then fails this testcase.
  +    test_s4_cmd = params.get(test_s4_cmd)
  +    session.sendline(test_s4_cmd)
  +
  +    # Get the 

Re: sync guest calls made async on host - SQLite performance

2009-10-13 Thread Dustin Kirkland
On Tue, Oct 13, 2009 at 9:09 PM, Matthew Tippett tippe...@gmail.com wrote:
 I believe that I have removed the benchmark from discussion, we are now
 looking at semantics of small writes followed by
...
 And quoting from Dustin

 ===
 I have tried this, exactly as you have described.  The tests took:

  * 1162.08033204 seconds on native hardware
  * 2306.68306303 seconds in a kvm using if=scsi disk
  * 405.382308006 seconds in a kvm using if=virtio

Hang on now...

My timings are from running the Phoronix test *as you described*.  I
have not looked at what magic is happening inside of this Phoronix
test.  I am most certainly *not* speaking as to the quality or
legitimacy of the test.

:-Dustin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][RFC] Xen PV-on-HVM guest support

2009-10-13 Thread Ed Swierk
As we discussed a while back, support for Xen PV-on-HVM guests can be
implemented almost entirely in userspace, except for handling one
annoying MSR that maps a Xen hypercall blob into guest address space.

A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future.  Thus this patch
adds special support for the Xen HVM MSR.

At Avi's suggestion[1] I implemented a new ioctl, KVM_XEN_HVM_CONFIG,
that lets userspace tell KVM which MSR the guest will write to, as well
as the starting address and size of the hypercall blobs (one each for
32-bit and 64-bit) that userspace has loaded from files.  When the guest
writes to the MSR, KVM copies one page of the blob from userspace to the
guest.

I've tested this patch against a hacked-up version of Gerd's userspace
code[2]; I'm happy to share those hacks if anyone is interested.

[1] http://www.mail-archive.com/kvm@vger.kernel.org/msg16065.html
[2]
http://git.et.redhat.com/?p=qemu-kraxel.git;a=log;h=refs/heads/xenner.v5

Signed-off-by: Ed Swierk eswi...@aristanetworks.com

---
diff -BurN a/include/asm-x86/kvm.h b/include/asm-x86/kvm.h
--- a/include/asm-x86/kvm.h 2009-10-13 20:40:55.0 -0700
+++ b/include/asm-x86/kvm.h 2009-10-13 20:21:07.0 -0700
@@ -59,6 +59,7 @@
 #define __KVM_HAVE_MSIX
 #define __KVM_HAVE_MCE
 #define __KVM_HAVE_PIT_STATE2
+#define __KVM_HAVE_XEN_HVM
 
 /* Architectural interrupt line count. */
 #define KVM_NR_INTERRUPTS 256
diff -BurN a/include/linux/kvm.h b/include/linux/kvm.h
--- a/include/linux/kvm.h   2009-10-13 20:40:55.0 -0700
+++ b/include/linux/kvm.h   2009-10-13 20:21:26.0 -0700
@@ -476,6 +476,9 @@
 #endif
 #define KVM_CAP_IOEVENTFD 36
 #define KVM_CAP_SET_IDENTITY_MAP_ADDR 37
+#ifdef __KVM_HAVE_XEN_HVM
+#define KVM_CAP_XEN_HVM 90
+#endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -528,6 +531,14 @@
 };
 #endif
 
+#ifdef KVM_CAP_XEN_HVM
+struct kvm_xen_hvm_config {
+   __u32 msr;
+   __u64 blob_addr[2];
+   __u8 blob_size[2];
+};
+#endif
+
 #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
 
 struct kvm_irqfd {
@@ -586,6 +597,7 @@
 #define KVM_CREATE_PIT2   _IOW(KVMIO, 0x77, struct 
kvm_pit_config)
 #define KVM_SET_BOOT_CPU_ID_IO(KVMIO, 0x78)
 #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
+#define KVM_XEN_HVM_CONFIG_IOW(KVMIO, 0xa1, struct kvm_xen_hvm_config)
 
 /*
  * ioctls for vcpu fds
diff -BurN a/include/linux/kvm_host.h b/include/linux/kvm_host.h
--- a/include/linux/kvm_host.h  2009-10-13 20:40:55.0 -0700
+++ b/include/linux/kvm_host.h  2009-10-13 20:27:03.0 -0700
@@ -236,6 +236,10 @@
unsigned long mmu_notifier_seq;
long mmu_notifier_count;
 #endif
+
+#ifdef KVM_CAP_XEN_HVM
+   struct kvm_xen_hvm_config xen_hvm_config;
+#endif
 };
 
 /* The guest did something we don't support. */
diff -BurN a/x86/x86.c b/x86/x86.c
--- a/x86/x86.c 2009-10-13 20:40:58.0 -0700
+++ b/x86/x86.c 2009-10-13 20:33:49.0 -0700
@@ -875,6 +875,33 @@
return 0;
 }
 
+#ifdef KVM_CAP_XEN_HVM
+static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
+{
+   int blob = !!(vcpu-arch.shadow_efer  EFER_LME);
+   u32 pnum = data  ~PAGE_MASK;
+   u64 paddr = data  PAGE_MASK;
+   u8 *page;
+   int r = 1;
+   printk(KERN_INFO kvm: loading xen hvm blob %d page %d at %llx\n,
+  blob, pnum, paddr);
+   if (pnum = vcpu-kvm-xen_hvm_config.blob_size[blob])
+   goto out;
+   page = kzalloc(PAGE_SIZE, GFP_KERNEL);
+   if (!page)
+   goto out;
+   if (copy_from_user(page, (u8 *)vcpu-kvm-xen_hvm_config.blob_addr[blob]
+  + pnum * PAGE_SIZE, PAGE_SIZE))
+   goto out_free;
+   kvm_write_guest(vcpu-kvm, paddr, page, PAGE_SIZE);
+   r = 0;
+out_free:
+   kfree(page);
+out:
+   return r;
+}
+#endif
+
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
switch (msr) {
@@ -990,6 +1017,10 @@
0x%x data 0x%llx\n, msr, data);
break;
default:
+#ifdef KVM_CAP_XEN_HVM
+   if (msr  (msr == vcpu-kvm-xen_hvm_config.msr))
+   return xen_hvm_config(vcpu, data);
+#endif
if (!ignore_msrs) {
pr_unimpl(vcpu, unhandled wrmsr: 0x%x data %llx\n,
msr, data);
@@ -2453,6 +2484,17 @@
r = 0;
break;
}
+#ifdef KVM_CAP_XEN_HVM
+   case KVM_XEN_HVM_CONFIG: {
+   r = -EFAULT;
+   printk(KERN_INFO kvm: configuring xen hvm\n);
+   if (copy_from_user(kvm-xen_hvm_config, argp,
+  sizeof(struct kvm_xen_hvm_config)))
+   goto out;
+   r = 0;
+   break;
+   }
+#endif
default:
;
}


--
To unsubscribe from this list: 

Added VM Exit on RDTSC, trouble handling in userspace

2009-10-13 Thread Kurt Kiefer

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

In short, I have a need for trapping RDTSC with a VM Exit and this  
works, but I'm having trouble handling it in userspace. I have added  
the hooks I need (I only care about VMX right now), but a piece of the  
puzzle is missing and I don't know which. When I go back to userspace,  
it's triggering a different (faulty) execution vs. handling only in  
the kernel. Here's what I've done:



1. Added the CPU_BASED_RDTSC_EXITING flag to  
MSR_IA32_VMX_PROCBASED_CTLS in vmx.c:setup_vmcs_config()



2. Defined KVM_EXIT_RDTSC, and hooked into EXIT_REASON_RDTSC my  
handler for the exit:


static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu,
  struct kvm_run *kvm_run) = {
// ...
  [EXIT_REASON_RDTSC]   = handle_rdtsc,
// ...
}

static int handle_rdtsc(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
  u64 data;

  if (vmx_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, data)) {
kvm_inject_gp(vcpu, 0);
return 1;
  }

  vcpu-run-exit_reason = KVM_EXIT_RDTSC;
  vcpu-arch.regs[VCPU_REGS_RAX] = data  -1u;
  vcpu-arch.regs[VCPU_REGS_RDX] = (data  32)  -1u;

  skip_emulated_instruction(vcpu);

  // flag a need for userspace invervention
  // note: this works when we return 1 and we don't involve userspace
  return 0;
}


3. Handle KVM_EXIT_RDTSC in libkvm.c:kvm_run() :

case KVM_EXIT_RDTSC:
  r = handle_rdtsc_usp(kvm, vcpu, env);
  break;

via a handler where I do _nothing_ :

static int handle_rdtsc_usp(kvm_context_t kvm, int vcpu, void *data)
{
  return 0;
}



All well and good, right? I can add print statements to my userspace  
handle_rtsc_usp() and see I get in there just fine. However, when I  
try to boot Linux, the following code is called over and over and  
over, and Linux will never load:


Breakpoint 4, 0xc01103d3 in ?? ()
(gdb) x/10i $rip-10
0xc01103c9: lea0x0(%rdi,%riz,1),%edi
0xc01103d0: push   %rbp
0xc01103d1: mov%esp,%ebp
0xc01103d3: rdtsc
0xc01103d5: pop%rbp
0xc01103d6: retq

If I only handle the exit in the kernel (by returning 1 from  
handle_rdtsc()), everything works and Linux will load! I counted the  
number of RDTSC exits before linux fully loads to be somewhere around  
20. If I exit all the way to userspace (return 0 in my  
handle_rdtsc()) that count is infinitely surpassed in number of exits,  
wall time, and the value of RDTSC.


So is anything glaringly wrong with my modifications? Maybe there is  
there some extra state that needs to be restored on VM entry? Is there  
an interrupt flag that needs to be cleared? Maybe I need to do  
something with kvm_run.if_flag or  
kvm_run.ready_for_interrupt_injection? Please, I need help, I'm losing  
sleep over this!



Thanks,

Kurt
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.12 (Darwin)

iEYEARECAAYFAkrVZvQACgkQYFGmU9mnI1FqvgCcC/+PswoXHQ5kVgv5tC6UadiA
KKgAoKrLgsYSJN0+1d0pox9vzsLHoQIc
=cQzR
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html