Re: Clock jumps

2010-05-25 Thread Gleb Natapov
Adding kvm to CC.

On Mon, May 24, 2010 at 04:06:32PM +, Orion Poplawski wrote:
 I have a KVM virtual machine running 2.6.33.4-95.fc13.x86_64 on a CentOS 5.5
 host whose clock jumps about 8-12 hours a couple times a day.  I have no idea
 what is causing it.  Fedora 12 and Centos 5.5 KVM machines run fine on the 
 same
 host.  Is there any debugging I can enable to see what is jumping the clock?
 
 kvm-clock: cpu 0, msr 0:1ba4741, boot clock
 kvm-clock: cpu 0, msr 0:1e15741, primary cpu clock
 Switching to clocksource kvm-clock
 rtc_cmos 00:01: setting system clock to 2010-05-20 16:59:48 UTC (1274374788)
 
 Thanks,
 
  Orion
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: VMX: Enable XSAVE/XRSTORE for guest

2010-05-25 Thread Sheng Yang
On Monday 24 May 2010 21:36:12 Avi Kivity wrote:
 On 05/24/2010 01:03 PM, Sheng Yang wrote:
  From: Dexuan Cuidexuan@intel.com
  
  Enable XSAVE/XRSTORE for guest.
  
  Change from V3:
  1. Enforced the assumption that host OS would use all available xstate
  bits. 2. Various fixes, addressed Avi's comments.
  
  I am still not clear about why we need to reload guest xcr0 when
  cr4.osxsave set...
 
 When cr4.osxsave=0, then the guest executes with the host xcr0 (since
 xgetbv will trap; this is similar to the guest running with the host fpu
 if cr0.ts=0).  So if cr4.osxsave transtions, we need to transition xcr0
 as well.

Yes...
 
  @@ -3354,6 +3356,29 @@ static int handle_wbinvd(struct kvm_vcpu *vcpu)
  
  return 1;

}
  
  +static int handle_xsetbv(struct kvm_vcpu *vcpu)
  +{
  +   u64 new_bv = kvm_read_edx_eax(vcpu);
  +
  +   if (kvm_register_read(vcpu, VCPU_REGS_RCX) != 0)
  +   goto err;
  +   if (vmx_get_cpl(vcpu) != 0)
  +   goto err;
  +   if (!(new_bv  XSTATE_FP))
  +   goto err;
  +   if ((new_bv  XSTATE_YMM)  !(new_bv  XSTATE_SSE))
  +   goto err;
  +   if (new_bv  ~XCNTXT_MASK)
  +   goto err;
 
 Ok.  This means we must update kvm immediately when XCNTXT_MASK changes.
 
 (Otherwise we would use KVM_XCNTXT_MASK which is always smaller than
 than XCNTXT_MASK).

I guess use host_xcr0 here is better?
 
  +   vcpu-arch.xcr0 = new_bv;
  +   xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu-arch.xcr0);
  +   skip_emulated_instruction(vcpu);
  +   return 1;
  +err:
  +   kvm_inject_gp(vcpu, 0);
  +   return 1;
  +}
  +
  
  
  @@ -4124,6 +4176,8 @@ int kvm_arch_init(void *opaque)
  
  perf_register_guest_info_callbacks(kvm_guest_cbs);
  
  +   host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
  +
  
  return 0;
 
 Will fault on old cpu.

...
 
EXPORT_SYMBOL_GPL(fx_init);
  
  @@ -5134,6 +5195,12 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
  
  vcpu-guest_fpu_loaded = 1;
  unlazy_fpu(current);
  
  +   /*
  +* Restore all possible states in the guest,
  +* and assume host would use all available bits.
  +*/
  +   if (cpu_has_xsave  vcpu-arch.xcr0)
  +   xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0);
  
  fpu_restore_checking(vcpu-arch.guest_fpu);
 
 I think we need to reload xcr0 now to the guest's value.
 
  trace_kvm_fpu(1);

}
  
  @@ -5144,6 +5211,13 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
  
  return;
  
  vcpu-guest_fpu_loaded = 0;
  
  +   /*
  +* Save all possible states in the guest,
  +* and assume host would use all available bits.
  +* Also load host_xcr0 for host usage.
  +*/
  +   if (cpu_has_xsave  vcpu-arch.xcr0)
  +   xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0);
  
  fpu_save_init(vcpu-arch.guest_fpu);
  ++vcpu-stat.fpu_reload;
  set_bit(KVM_REQ_DEACTIVATE_FPU,vcpu-requests);
 
 This might be unnecessary.
 
 So far xcr0 life cycle is almost that of
 save_host_state()/load_host_state(), but not exactly.  When loading the
 guest fpu we switch temporarily to host xcr0, then we have to switch
 back, but only if gcr4.osxsave.  When saving the guest fpu, we're
 
 already using the host xcr0:
  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
  {
  
  kvm_x86_ops-vcpu_put(vcpu);
  kvm_put_guest_fpu(vcpu);
  
  }
 
 One way to simplify this is to have a vcpu-guest_xcr0_loaded flag and
 check it when needed.  So the transition matrix is:
 
save_host_state: if gcr4.osxsave, set guest_xcr0_loaded, load it
set gcr4.osxsave: ditto
clear gcr4.osxsave: do nothing
load_host_state: if guest_xcr0_loaded, clear it, reload host xcr0
fpu switching: if (switched) switch; reload fpu; if (switched) switch
 
 may be simplified if we move xcr0 reload back to guest entry (... :)
 but make it lazy:
 
save_host_state: nothing
set cr4.osxsave: nothing
clear cr4.osxsave: nothing
guest entry: if (gcr4.osxsave  !guest_xcr0_loaded) {
 guest_xcr0_loaded = true, load gxcr0 }
load_host_state: if (guest_xcr0_loaded) { guest_xcr0_loaded = false;
 load host xcr0 }
fpu switching:  if (guest_xcr0_loaded) { guest_xcr0_loaded = false;
 load host xcr0 }, do fpu stuff
 
 So we delay xcr0 reload as late as possible for both entry and exit.

I think I got it. But why we need do it at load_host_state()? I guess just 
put 
code before fpu testing in kvm_put_guest_fpu() is fine?

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for May 25

2010-05-25 Thread Gautham R Shenoy
On Mon, May 24, 2010 at 05:21:04PM -0700, Chris Wright wrote:
 Please send in any agenda items you are interested in covering.
 

Sorry for the delayed response.

If the community is interested, I would
like to discuss the Generic Asynchronous task offloading framework
patches posted to the community on 24th May 2010.
URL:http://lists.gnu.org/archive/html/qemu-devel/2010-05/msg02227.html

Brief Description: The patch series extracts out the task offloading
framework code from posix-aio-compat.c which is currently being used
only by the paio subsystem to create a generic task offloading framework
that could be used by other subsystems within qemu. Currently virtio-9p
and asynchronous-encoding from vnc server can make use of the generic
framework.

Points for discussion:
- Is a generic task offloading framework the way to go for subsystems
  such as virtio-9p, which would like to emulate the AIO behaviour
  that allows us to free the vcpu thread to handle any other guest requests.

- Currently the AIO helper threads indicate the completion of the task
  to the IO-thread by sending a SIGUSR2, the handler for which does a
  write() to the file descriptor on which the IO thread is waiting using
  a select. Should we use this signal-handling mechanism to communicate
  between the generic asynchronous helper threads and the IO-Thread ?


 If we have a lack of agenda items I'll cancel the week's call.
 
 thanks,
 -chris
 

-- 
Thanks and Regards
gautham
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: VMX: Enable XSAVE/XRSTORE for guest

2010-05-25 Thread Avi Kivity

On 05/25/2010 09:28 AM, Sheng Yang wrote:



@@ -3354,6 +3356,29 @@ static int handle_wbinvd(struct kvm_vcpu *vcpu)

return 1;

   }

+static int handle_xsetbv(struct kvm_vcpu *vcpu)
+{
+   u64 new_bv = kvm_read_edx_eax(vcpu);
+
+   if (kvm_register_read(vcpu, VCPU_REGS_RCX) != 0)
+   goto err;
+   if (vmx_get_cpl(vcpu) != 0)
+   goto err;
+   if (!(new_bv   XSTATE_FP))
+   goto err;
+   if ((new_bv   XSTATE_YMM)   !(new_bv   XSTATE_SSE))
+   goto err;
+   if (new_bv   ~XCNTXT_MASK)
+   goto err;
   

Ok.  This means we must update kvm immediately when XCNTXT_MASK changes.

(Otherwise we would use KVM_XCNTXT_MASK which is always smaller than
than XCNTXT_MASK).
 

I guess use host_xcr0 here is better?
   


Yes - it might be smaller than XCNTXT_MASK


may be simplified if we move xcr0 reload back to guest entry (... :)
but make it lazy:

save_host_state: nothing
set cr4.osxsave: nothing
clear cr4.osxsave: nothing
guest entry: if (gcr4.osxsave  !guest_xcr0_loaded) {
guest_xcr0_loaded = true, load gxcr0 }
load_host_state: if (guest_xcr0_loaded) { guest_xcr0_loaded = false;
load host xcr0 }
fpu switching:  if (guest_xcr0_loaded) { guest_xcr0_loaded = false;
load host xcr0 }, do fpu stuff

So we delay xcr0 reload as late as possible for both entry and exit.
 

I think I got it. But why we need do it at load_host_state()? I guess just put
code before fpu testing in kvm_put_guest_fpu() is fine?
   


Right, load_host_state() is bad because it is vmx specific.  
kvm_put_guest_fpu() (or perhaps kvm_arch_vcpu_put()) is better.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Fix host panic if ioctl called with wrong index

2010-05-25 Thread Michael S. Tsirkin
On Tue, May 25, 2010 at 11:10:36AM +0530, Krishna Kumar wrote:
 From: Krishna Kumar krkum...@in.ibm.com
 
 Missed a boundary value check in vhost_set_vring. The host panics if
 idx == nvqs is used in ioctl commands in vhost_virtqueue_init.
 
 Signed-off-by: Krishna Kumar krkum...@in.ibm.com

Thanks, applied.

 ---
  drivers/vhost/vhost.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff -ruNp org/drivers/vhost/vhost.c new/drivers/vhost/vhost.c
 --- org/drivers/vhost/vhost.c 2010-05-24 09:25:57.0 +0530
 +++ new/drivers/vhost/vhost.c 2010-05-24 09:26:53.0 +0530
 @@ -374,7 +374,7 @@ static long vhost_set_vring(struct vhost
   r = get_user(idx, idxp);
   if (r  0)
   return r;
 - if (idx  d-nvqs)
 + if (idx = d-nvqs)
   return -ENOBUFS;
  
   vq = d-vqs + idx;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/5] trace: Add trace-events file for declaring trace events

2010-05-25 Thread Stefan Hajnoczi
On Mon, May 24, 2010 at 11:20 PM, Anthony Liguori
aligu...@linux.vnet.ibm.com wrote:
 +# check if trace backend exists
 +
 +sh tracetool --$trace_backend --check-backend  /dev/null 2  /dev/null


 This will fail if objdir != srcdir.  You have to qualify tracetool with the
 path to srcdir.

Thanks Anthony, fixed on my branch.  I'll resend a v2 together with other fixes.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows guest debugging on KVM/Qemu

2010-05-25 Thread Avi Kivity

On 05/24/2010 11:07 PM, Neo Jia wrote:

hi,

I am using KVM/Qemu to debug my Windows guest according to KVM wiki
page (http://www.linux-kvm.org/page/WindowsGuestDrivers/GuestDebugging).
It works for me and also I can only use one Windows guest and bind its
serial port to a TCP port and run Virtual Serial Ports Emulator on
my Windows dev machine.

The problem is that these kind of connection is really slow. Is there
any known issue with KVM serial port driver? There is a good
discussion about the same issue one year ago. Not sure if there is any
improvement or not after that.
   


How slow?  Can you measure it (without a debugger, just guest-to-guest 
file transfer)?


slirp used to be ridiculously slow but some recent change made it fairly 
fast.  Probably a missing wakeup, perhaps serial has the same problem.  
In any case I recommend testing with qemu-kvm.git master.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [RFC PATCH] AMD IOMMU emulation

2010-05-25 Thread Joerg Roedel
On Mon, May 24, 2010 at 08:10:16PM +, Blue Swirl wrote:
 On Mon, May 24, 2010 at 3:40 PM, Joerg Roedel j...@8bytes.org wrote:
  +
  +#define MMIO_SIZE               0x2028
 
  This size should be a power-of-two value. In this case probably 0x4000.
 
 Not really, the devices can reserve regions of any size. There were
 some implementation deficiencies in earlier versions of QEMU, where
 the whole page would be reserved anyway, but this limitation has been
 removed long time ago.

The drivers for AMD IOMMU expect that to be 0x4000. At least the Linux
driver maps the MMIO region with this size. So the emulation should
reserve this amount of MMIO space too.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 19/23] Introduce ft_tranx_ready(), and modify migrate_fd_put_ready() when ft_mode is on.

2010-05-25 Thread Yoshiaki Tamura
Introduce ft_tranx_ready() which kicks the FT transaction cycle.  When
ft_mode is on, migrate_fd_put_ready() would open ft_transaction file
and turn on event_tap.  To end or cancel ft_transaction, ft_mode and
event_tap is turned off.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |   78 --
 1 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/migration.c b/migration.c
index 2adf7ad..5b90d37 100644
--- a/migration.c
+++ b/migration.c
@@ -21,6 +21,7 @@
 #include qemu_socket.h
 #include block-migration.h
 #include qemu-objects.h
+#include event-tap.h
 
 //#define DEBUG_MIGRATION
 
@@ -375,6 +376,49 @@ void migrate_fd_connect(FdMigrationState *s)
 migrate_fd_put_ready(s);
 }
 
+static int ft_tranx_ready(void)
+{
+FdMigrationState *s = migrate_to_fms(current_migration);
+int ret = -1;
+
+if (ft_mode != FT_TRANSACTION  ft_mode != FT_INIT) {
+return ret;
+}
+
+if (qemu_transaction_begin(s-file)  0) {
+fprintf(stderr, tranx_begin failed\n);
+goto error_out;
+}
+
+/* make the VM state consistent by flushing outstanding requests. */
+vm_stop(0);
+qemu_aio_flush();
+bdrv_flush_all();
+
+if (qemu_savevm_state_all(s-mon, s-file)  0) {
+fprintf(stderr, savevm_state_all failed\n);
+goto error_out;
+}
+
+if (qemu_transaction_commit(s-file)  0) {
+fprintf(stderr, tranx_commit failed\n);
+goto error_out;
+}
+
+ret = 0;
+goto unpause_out;
+
+error_out:
+ft_mode = FT_OFF;
+qemu_savevm_state_cancel(s-mon, s-file);
+migrate_fd_cleanup(s);
+event_tap_unregister();
+
+unpause_out:
+vm_start();
+return ret;
+}
+
 void migrate_fd_put_ready(void *opaque)
 {
 FdMigrationState *s = opaque;
@@ -402,8 +446,30 @@ void migrate_fd_put_ready(void *opaque)
 } else {
 state = MIG_STATE_COMPLETED;
 }
-migrate_fd_cleanup(s);
-s-state = state;
+
+if (ft_mode  state == MIG_STATE_COMPLETED) {
+/* close buffered_file and open ft_transaction.
+ * Note: file discriptor won't get closed,
+ * but reused by ft_transaction. */
+socket_set_block(s-fd);
+socket_set_nodelay(s-fd);
+qemu_fclose(s-file);
+s-file = qemu_fopen_ops_ft_tranx(s,
+  migrate_fd_put_buffer,
+  migrate_fd_get_buffer,
+  migrate_fd_close,
+  1);
+
+/* events are tapped from now. */
+event_tap_register(ft_tranx_ready);
+
+if (old_vm_running) {
+vm_start();
+}
+} else {
+migrate_fd_cleanup(s);
+s-state = state;
+}
 }
 }
 
@@ -423,8 +489,14 @@ void migrate_fd_cancel(MigrationState *mig_state)
 DPRINTF(cancelling migration\n);
 
 s-state = MIG_STATE_CANCELLED;
-qemu_savevm_state_cancel(s-mon, s-file);
 
+if (ft_mode == FT_TRANSACTION) {
+qemu_transaction_cancel(s-file);
+ft_mode = FT_OFF;
+event_tap_unregister();
+}
+
+qemu_savevm_state_cancel(s-mon, s-file);
 migrate_fd_cleanup(s);
 }
 
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 23/23] Add a parser to accept FT migration incoming mode.

2010-05-25 Thread Yoshiaki Tamura
The option looks like, -incoming protocol:address:port,ft_mode

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |   14 +-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/migration.c b/migration.c
index 3334650..a4850f9 100644
--- a/migration.c
+++ b/migration.c
@@ -42,7 +42,19 @@ static MigrationState *current_migration;
 
 void qemu_start_incoming_migration(const char *uri)
 {
-const char *p;
+const char *p = uri;
+
+/* check ft_mode option  */
+while (*p != '\0') {
+if (*p == ',') {
+p++;
+if (!strcmp(p, ft_mode)) {
+ft_mode = FT_INIT;
+break;
+}
+}
+p++;
+}
 
 if (strstart(uri, tcp:, p))
 tcp_start_incoming_migration(p);
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 16/23] Insert event_tap_mmio() to cpu_physical_memory_rw().

2010-05-25 Thread Yoshiaki Tamura
Record mmio write event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 exec.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/exec.c b/exec.c
index d5c2a05..e9ed477 100644
--- a/exec.c
+++ b/exec.c
@@ -44,6 +44,7 @@
 #include hw/hw.h
 #include osdep.h
 #include kvm.h
+#include event-tap.h
 #if defined(CONFIG_USER_ONLY)
 #include qemu.h
 #include signal.h
@@ -3373,6 +3374,9 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, 
uint8_t *buf,
 io_index = (pd  IO_MEM_SHIFT)  (IO_MEM_NB_ENTRIES - 1);
 if (p)
 addr1 = (addr  ~TARGET_PAGE_MASK) + p-region_offset;
+
+event_tap_mmio(addr, buf, len);
+
 /* XXX: could force cpu_single_env to NULL to avoid
potential bugs */
 if (l = 4  ((addr1  3) == 0)) {
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 17/23] Skip assert() when event_tap_state weren't EVENT_TAP_OFF.

2010-05-25 Thread Yoshiaki Tamura
Skip assert(!cpu_single_env) in resume_all_threads() when
event_tap_state weren't EVENT_TAP_OFF.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 qemu-kvm.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 1414f49..e28bf59 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -18,6 +18,7 @@
 #include compatfd.h
 #include gdbstub.h
 #include monitor.h
+#include event-tap.h
 
 #include qemu-kvm.h
 #include libkvm.h
@@ -1770,7 +1771,8 @@ static void resume_all_threads(void)
 {
 CPUState *penv = first_cpu;
 
-assert(!cpu_single_env);
+if (event_tap_get_state() == EVENT_TAP_OFF)
+assert(!cpu_single_env);
 
 while (penv) {
 penv-stop = 0;
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 04/23] Use cpu_physical_memory_get_dirty_range() to check multiple dirty pages.

2010-05-25 Thread Yoshiaki Tamura
Modifies ram_save_block() and ram_save_remaining() to use
cpu_physical_memory_get_dirty_range() to check multiple dirty and non-dirty
pages at once.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 vl.c |   52 +---
 1 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/vl.c b/vl.c
index 729c955..70a8aed 100644
--- a/vl.c
+++ b/vl.c
@@ -2779,7 +2779,8 @@ static int ram_save_block(QEMUFile *f)
 static ram_addr_t current_addr = 0;
 ram_addr_t saved_addr = current_addr;
 ram_addr_t addr = 0;
-int found = 0;
+ram_addr_t dirty_rams[HOST_LONG_BITS];
+int i, found = 0;
 
 while (addr  last_ram_offset) {
 if (kvm_enabled()  current_addr == 0) {
@@ -2791,28 +2792,33 @@ static int ram_save_block(QEMUFile *f)
 return 0;
 }
 }
-if (cpu_physical_memory_get_dirty(current_addr, MIGRATION_DIRTY_FLAG)) 
{
+if ((found = cpu_physical_memory_get_dirty_range(
+ current_addr, last_ram_offset, dirty_rams, HOST_LONG_BITS,
+ MIGRATION_DIRTY_FLAG))) {
 uint8_t *p;
 
-cpu_physical_memory_reset_dirty(current_addr,
-current_addr + TARGET_PAGE_SIZE,
-MIGRATION_DIRTY_FLAG);
+for (i = 0; i  found; i++) {
+ram_addr_t page_addr = dirty_rams[i];
+cpu_physical_memory_reset_dirty(page_addr,
+page_addr + TARGET_PAGE_SIZE,
+MIGRATION_DIRTY_FLAG);
 
-p = qemu_get_ram_ptr(current_addr);
+p = qemu_get_ram_ptr(page_addr);
 
-if (is_dup_page(p, *p)) {
-qemu_put_be64(f, current_addr | RAM_SAVE_FLAG_COMPRESS);
-qemu_put_byte(f, *p);
-} else {
-qemu_put_be64(f, current_addr | RAM_SAVE_FLAG_PAGE);
-qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
+if (is_dup_page(p, *p)) {
+qemu_put_be64(f, page_addr | RAM_SAVE_FLAG_COMPRESS);
+qemu_put_byte(f, *p);
+} else {
+qemu_put_be64(f, page_addr | RAM_SAVE_FLAG_PAGE);
+qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
+}
 }
 
-found = 1;
 break;
+} else {
+addr += dirty_rams[0];
+current_addr = (saved_addr + addr) % last_ram_offset;
 }
-addr += TARGET_PAGE_SIZE;
-current_addr = (saved_addr + addr) % last_ram_offset;
 }
 
 return found;
@@ -2822,12 +2828,20 @@ static uint64_t bytes_transferred;
 
 static ram_addr_t ram_save_remaining(void)
 {
-ram_addr_t addr;
+ram_addr_t addr = 0;
 ram_addr_t count = 0;
+ram_addr_t dirty_rams[HOST_LONG_BITS];
+int found = 0;
 
-for (addr = 0; addr  last_ram_offset; addr += TARGET_PAGE_SIZE) {
-if (cpu_physical_memory_get_dirty(addr, MIGRATION_DIRTY_FLAG))
-count++;
+while (addr  last_ram_offset) {
+if ((found = cpu_physical_memory_get_dirty_range(
+ addr, last_ram_offset, dirty_rams, HOST_LONG_BITS,
+ MIGRATION_DIRTY_FLAG))) {
+count += found;
+addr = dirty_rams[found - 1] + TARGET_PAGE_SIZE;
+} else {
+addr += dirty_rams[0];
+}
 }
 
 return count;
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 00/23] Kemari for KVM v0.1.1

2010-05-25 Thread Yoshiaki Tamura
Hi,

This patch series is a revised version of Kemari for KVM, which applied comments
for the previous post.  The current code is based on qemu-kvm.git
2b644fd0e737407133c88054ba498e772ce01f27.

On the contrary to the previous version, this series doesn't require any
modifications to KVM.  The I/O events are caputured in net/block layer instead
of device emulation layer.  The transmission/transaction protocol, and most of
the control logic is implemented in QEMU.

We prepared a demonstration video again.  This time the guest is Windows XP
without virtio drivers.  The demonstration scenario is,

1. Play with a guest VM (This guest has e1000 and ide)
# The guest image should be a NFS/SAN.
2. Start incoming side with, -incoming protocol:address:port,ft_mode
3. Start Kemari to synchronize the VM by running the following command in QEMU.
Just add -k option to usual migrate command.
migrate -d -k tcp:192.168.0.20:
3. Check the status by calling info migrate.
4. Go back to the VM to play the pinball.
5. Kill the the VM. (VNC client also disappears)
6. Press c to continue the VM on the other host.
7. Bring up the VNC client (Sorry, it pops outside of video capture.)
8. Confirm that the pinball works, then shutdown.

http://www.osrg.net/kemari/download/kemari-kvm-winxp.mov

The repository contains all patches we're sending with this message.  For those
who want to try, please pull the following repository.

git://kemari.git.sourceforge.net/gitroot/kemari/kemari

The changes from v0.1 - v0.1.1 are:

- events are tapped in net/block layer instead of device emulation layer. 
- Introduce a new option for -incoming to accept FT transaction.
- Removed writev() support to QEMUFile and FdMigrationState for now.  I would
  post this work in a different series.
- Modified virtio-blk save/load handler to send inuse variable to
  correctly replay.
- Removed configure --enable-ft-mode.
- Removed unnecessary check for qemu_realloc().

I hope people like this approach, and looking forward to suggestions/comments.

Thanks,

Yoshi

Yoshiaki Tamura (23):
  Modify DIRTY_FLAG value and introduce DIRTY_IDX to use as indexes of
bit-based phys_ram_dirty.
  Introduce cpu_physical_memory_get_dirty_range().
  Use cpu_physical_memory_set_dirty_range() to update phys_ram_dirty.
  Use cpu_physical_memory_get_dirty_range() to check multiple dirty
pages.
  Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and
qemu_clear_buffer().
  Introduce read() to FdMigrationState.
  Introduce skip_header parameter to qemu_loadvm_state().
  Introduce some socket util functions.
  Introduce fault tolerant VM transaction QEMUFile and ft_mode.
  Introduce util functions to control ft_transaction from savevm layer.
  Introduce qemu_savevm_state_all().
  Insent event-tap callbacks to net/block layer.
  Introduce event-tap.
  Call init handler of event-tap at main().
  Insert event_tap_ioport() to ioport_write().
  Insert event_tap_mmio() to cpu_physical_memory_rw().
  Skip assert() when event_tap_state weren't EVENT_TAP_OFF.
  Call event_tap_replay() at vm_start().
  Introduce ft_tranx_ready(), and modify migrate_fd_put_ready() when
ft_mode is on.
  Modify tcp_accept_incoming_migration() to handle ft_mode, and add a
hack not to close fd when ft_mode is enabled.
  virtio-blk: Modify save/load handler to handle inuse varialble.
  Introduce -k option to enable FT migration mode (Kemari).
  Add a parser to accept FT migration incoming mode.

 Makefile.objs|1 +
 Makefile.target  |1 +
 block.c  |   22 +++
 block.h  |4 +
 cpu-all.h|  134 -
 event-tap.c  |  184 
 event-tap.h  |   32 
 exec.c   |  131 +
 ft_transaction.c |  418 ++
 ft_transaction.h |   54 +++
 hw/hw.h  |7 +
 hw/virtio.c  |8 +-
 ioport.c |2 +
 migration-exec.c |2 +-
 migration-fd.c   |2 +-
 migration-tcp.c  |   52 +++-
 migration-unix.c |2 +-
 migration.c  |  110 ++-
 migration.h  |3 +
 net/queue.c  |   18 +++
 net/queue.h  |3 +
 osdep.c  |   13 ++
 qemu-char.c  |   25 +++-
 qemu-kvm.c   |   23 ++--
 qemu-monitor.hx  |7 +-
 qemu_socket.h|4 +
 savevm.c |  146 +--
 sysemu.h |3 +-
 vl.c |   57 +---
 29 files changed, 1371 insertions(+), 97 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h
 create mode 100644 ft_transaction.c
 create mode 100644 ft_transaction.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 05/23] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer().

2010-05-25 Thread Yoshiaki Tamura
Currently buf size is fixed at 32KB.  It would be useful if it could
be flexible.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/hw.h  |2 ++
 savevm.c |   21 -
 2 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index 05131a0..fc9ed29 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -61,6 +61,8 @@ void qemu_fflush(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
+void *qemu_realloc_buffer(QEMUFile *f, int size);
+void qemu_clear_buffer(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/savevm.c b/savevm.c
index 2fd3de6..b9bb9f4 100644
--- a/savevm.c
+++ b/savevm.c
@@ -174,7 +174,8 @@ struct QEMUFile {
when reading */
 int buf_index;
 int buf_size; /* 0 when writing */
-uint8_t buf[IO_BUF_SIZE];
+int buf_max_size;
+uint8_t *buf;
 
 int has_error;
 };
@@ -424,6 +425,9 @@ QEMUFile *qemu_fopen_ops(void *opaque, 
QEMUFilePutBufferFunc *put_buffer,
 f-get_rate_limit = get_rate_limit;
 f-is_write = 0;
 
+f-buf_max_size = IO_BUF_SIZE;
+f-buf = qemu_mallocz(sizeof(uint8_t) * f-buf_max_size);
+
 return f;
 }
 
@@ -454,6 +458,20 @@ void qemu_fflush(QEMUFile *f)
 }
 }
 
+void *qemu_realloc_buffer(QEMUFile *f, int size)
+{
+f-buf_max_size = size;
+f-buf = qemu_realloc(f-buf, f-buf_max_size);
+
+return f-buf;
+}
+
+void qemu_clear_buffer(QEMUFile *f)
+{
+f-buf_size = f-buf_index = f-buf_offset = 0;
+memset(f-buf, 0, f-buf_max_size);
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
 int len;
@@ -479,6 +497,7 @@ int qemu_fclose(QEMUFile *f)
 qemu_fflush(f);
 if (f-close)
 ret = f-close(f-opaque);
+qemu_free(f-buf);
 qemu_free(f);
 return ret;
 }
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 09/23] Introduce fault tolerant VM transaction QEMUFile and ft_mode.

2010-05-25 Thread Yoshiaki Tamura
This code implements VM transaction protocol.  Like buffered_file, it
sits between savevm and migration layer.  With this architecture, VM
transaction protocol is implemented mostly independent from other
existing code.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 Makefile.objs|1 +
 ft_transaction.c |  418 ++
 ft_transaction.h |   54 +++
 migration.c  |3 +
 4 files changed, 476 insertions(+), 0 deletions(-)
 create mode 100644 ft_transaction.c
 create mode 100644 ft_transaction.h

diff --git a/Makefile.objs b/Makefile.objs
index b73e2cb..4388fb3 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -78,6 +78,7 @@ common-obj-y += qemu-char.o savevm.o #aio.o
 common-obj-y += msmouse.o ps2.o
 common-obj-y += qdev.o qdev-properties.o
 common-obj-y += qemu-config.o block-migration.o
+common-obj-y += ft_transaction.o
 
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
diff --git a/ft_transaction.c b/ft_transaction.c
new file mode 100644
index 000..92dc681
--- /dev/null
+++ b/ft_transaction.c
@@ -0,0 +1,418 @@
+/*
+ * Fault tolerant VM transaction QEMUFile
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation. 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * This source code is based on buffered_file.c.
+ * Copyright IBM, Corp. 2008
+ * Authors:
+ *  Anthony Liguorialigu...@us.ibm.com
+ */
+
+#include qemu-common.h
+#include hw/hw.h
+#include qemu-timer.h
+#include sysemu.h
+#include qemu-char.h
+#include ft_transaction.h
+
+// #define DEBUG_FT_TRANSACTION
+
+typedef struct QEMUFileFtTranx
+{
+FtTranxPutBufferFunc *put_buffer;
+FtTranxGetBufferFunc *get_buffer;
+FtTranxCloseFunc *close;
+void *opaque;
+QEMUFile *file;
+int has_error;
+int is_sender;
+int buf_max_size;
+enum QEMU_VM_TRANSACTION_STATE tranx_state;
+uint16_t tranx_id;
+uint32_t seq;
+} QEMUFileFtTranx;
+
+#define IO_BUF_SIZE 32768
+
+#ifdef DEBUG_FT_TRANSACTION
+#define dprintf(fmt, ...) \
+do { printf(ft_transaction:  fmt, ## __VA_ARGS__); } while (0)
+#else
+#define dprintf(fmt, ...) \
+do { } while (0)
+#endif
+
+static ssize_t ft_tranx_flush_buffer(void *opaque, void *buf, int size)
+{
+QEMUFileFtTranx *s = opaque;
+size_t offset = 0;
+ssize_t len;
+
+while (offset  size) {
+len = s-put_buffer(s-opaque, (uint8_t *)buf + offset, size - offset);
+
+if (len = 0) {
+fprintf(stderr, ft transaction flush buffer failed \n);
+s-has_error = 1;
+offset = -EINVAL;
+break;
+}
+
+offset += len;
+}
+
+return offset;
+}
+
+static int ft_tranx_send_header(QEMUFileFtTranx *s)
+{
+int ret = -1;
+
+dprintf(send header %d\n, s-tranx_state);
+
+ret = ft_tranx_flush_buffer(s, s-tranx_state, sizeof(uint16_t));
+if (ret  0) {
+goto out;
+}
+ret = ft_tranx_flush_buffer(s, s-tranx_id, sizeof(uint16_t));
+
+out:
+return ret;
+}
+
+static int ft_tranx_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, 
int size)
+{
+QEMUFileFtTranx *s = opaque;
+ssize_t ret = -1;
+
+if (s-has_error) {
+fprintf(stderr, flush when error, bailing\n);
+return -EINVAL;
+}
+
+ret = ft_tranx_send_header(s);
+if (ret  0) {
+goto out;
+}
+
+ret = ft_tranx_flush_buffer(s, s-seq, sizeof(s-seq));
+if (ret  0) {
+goto out;
+}
+s-seq++;
+
+ret = ft_tranx_flush_buffer(s, size, sizeof(uint32_t));
+if (ret  0) {
+goto out;
+}
+
+ret = ft_tranx_flush_buffer(s, (uint8_t *)buf, size);
+
+out:
+return ret;
+}
+
+#if 0
+static int ft_tranx_put_vector(void *opaque, struct iovec *vector, int64_t 
pos, int count)
+{
+QEMUFileFtTranx *s = opaque;
+ssize_t ret = -1;
+int i;
+uint32_t size = 0;
+
+dprintf(putting %d vectors at % PRId64 \n, count, pos);
+
+if (s-has_error) {
+dprintf(put vector when error, bailing\n);
+return -EINVAL;
+}
+
+ret = ft_tranx_send_header(s);
+if (ret  0) {
+return ret;
+}
+
+ret = ft_tranx_flush_buffer(s, s-seq, sizeof(s-seq));
+if (ret  0) {
+return ret;
+}
+s-seq++;
+
+for (i = 0; i  count; i++)
+size += vector[i].iov_len;
+
+ret = ft_tranx_flush_buffer(s, size, sizeof(uint32_t));
+if (ret  0) {
+return ret;
+}
+
+while (count  0) {
+/* 
+ * It will continue calling put_vector even if count  IOV_MAX.
+ */
+ret = s-put_vector(s-opaque, vector,
+((countIOV_MAX)?IOV_MAX:count));
+
+if (ret = 0) {
+fprintf(stderr, ft transaction putting vector\n);
+s-has_error = 1;
+

[RFC PATCH 11/23] Introduce qemu_savevm_state_all().

2010-05-25 Thread Yoshiaki Tamura
Introduce qemu_savevm_state_all() to send the memory and device info
together, while avoiding cancelling memory state tracking.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 savevm.c |   60 
 sysemu.h |1 +
 2 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/savevm.c b/savevm.c
index 81cb711..25ccbb8 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1468,6 +1468,66 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 return 0;
 }
 
+int qemu_savevm_state_all(Monitor *mon, QEMUFile *f)
+{
+SaveStateEntry *se;
+
+QTAILQ_FOREACH(se, savevm_handlers, entry) {
+int len;
+
+if (se-save_live_state == NULL)
+continue;
+
+/* Section type */
+qemu_put_byte(f, QEMU_VM_SECTION_START);
+qemu_put_be32(f, se-section_id);
+
+/* ID string */
+len = strlen(se-idstr);
+qemu_put_byte(f, len);
+qemu_put_buffer(f, (uint8_t *)se-idstr, len);
+
+qemu_put_be32(f, se-instance_id);
+qemu_put_be32(f, se-version_id);
+if (ft_mode == FT_INIT) {
+/* This is workaround. */
+se-save_live_state(mon, f, QEMU_VM_SECTION_START, se-opaque);
+} else {
+se-save_live_state(mon, f, QEMU_VM_SECTION_PART, se-opaque);
+}
+}
+
+ft_mode = FT_TRANSACTION;
+QTAILQ_FOREACH(se, savevm_handlers, entry) {
+int len;
+
+   if (se-save_state == NULL  se-vmsd == NULL)
+   continue;
+
+/* Section type */
+qemu_put_byte(f, QEMU_VM_SECTION_FULL);
+qemu_put_be32(f, se-section_id);
+
+/* ID string */
+len = strlen(se-idstr);
+qemu_put_byte(f, len);
+qemu_put_buffer(f, (uint8_t *)se-idstr, len);
+
+qemu_put_be32(f, se-instance_id);
+qemu_put_be32(f, se-version_id);
+
+vmstate_save(f, se);
+}
+
+qemu_put_byte(f, QEMU_VM_EOF);
+
+if (qemu_file_has_error(f))
+return -EIO;
+
+return 0;
+}
+
+
 void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f)
 {
 SaveStateEntry *se;
diff --git a/sysemu.h b/sysemu.h
index 6c1441f..df314bb 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -67,6 +67,7 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int 
blk_enable,
 int shared);
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f);
 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
+int qemu_savevm_state_all(Monitor *mon, QEMUFile *f);
 void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
 int qemu_loadvm_state(QEMUFile *f, int skip_header);
 
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 00/23] Kemari for KVM v0.1.1

2010-05-25 Thread Yoshiaki Tamura
Hi,

This patch series is a revised version of Kemari for KVM, which applied comments
for the previous post.  The current code is based on qemu-kvm.git
2b644fd0e737407133c88054ba498e772ce01f27.

On the contrary to the previous version, this series doesn't require any
modifications to KVM.  The I/O events are caputured in net/block layer instead
of device emulation layer.  The transmission/transaction protocol, and most of
the control logic is implemented in QEMU.

We prepared a demonstration video again.  This time the guest is Windows XP
without virtio drivers.  The demonstration scenario is,

1. Play with a guest VM (This guest has e1000 and ide)
# The guest image should be a NFS/SAN.
2. Start incoming side with, -incoming protocol:address:port,ft_mode
3. Start Kemari to synchronize the VM by running the following command in QEMU.
Just add -k option to usual migrate command.
migrate -d -k tcp:192.168.0.20:
3. Check the status by calling info migrate.
4. Go back to the VM to play the pinball.
5. Kill the the VM. (VNC client also disappears)
6. Press c to continue the VM on the other host.
7. Bring up the VNC client (Sorry, it pops outside of video capture.)
8. Confirm that the pinball works, then shutdown.

http://www.osrg.net/kemari/download/kemari-kvm-winxp.mov

The repository contains all patches we're sending with this message.  For those
who want to try, pull the following repository.

git://kemari.git.sourceforge.net/gitroot/kemari/kemari

The changes from v0.1 - v0.1.1 are:

- events are tapped in net/block layer instead of device emulation layer. 
- Introduce a new option for -incoming to accept FT transaction.
- Removed writev() support to QEMUFile and FdMigrationState for now.  I would
  post this work in a different series.
- Modified virtio-blk save/load handler to send inuse variable to
  correctly replay.
- Removed configure --enable-ft-mode.
- Removed unnecessary check for qemu_realloc().

I hope people like this approach, and looking forward to suggestions/comments.

Thanks,

Yoshi

Yoshiaki Tamura (23):
  Modify DIRTY_FLAG value and introduce DIRTY_IDX to use as indexes of
bit-based phys_ram_dirty.
  Introduce cpu_physical_memory_get_dirty_range().
  Use cpu_physical_memory_set_dirty_range() to update phys_ram_dirty.
  Use cpu_physical_memory_get_dirty_range() to check multiple dirty
pages.
  Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and
qemu_clear_buffer().
  Introduce read() to FdMigrationState.
  Introduce skip_header parameter to qemu_loadvm_state().
  Introduce some socket util functions.
  Introduce fault tolerant VM transaction QEMUFile and ft_mode.
  Introduce util functions to control ft_transaction from savevm layer.
  Introduce qemu_savevm_state_all().
  Insent event-tap callbacks to net/block layer.
  Introduce event-tap.
  Call init handler of event-tap at main().
  Insert event_tap_ioport() to ioport_write().
  Insert event_tap_mmio() to cpu_physical_memory_rw().
  Skip assert() when event_tap_state weren't EVENT_TAP_OFF.
  Call event_tap_replay() at vm_start().
  Introduce ft_tranx_ready(), and modify migrate_fd_put_ready() when
ft_mode is on.
  Modify tcp_accept_incoming_migration() to handle ft_mode, and add a
hack not to close fd when ft_mode is enabled.
  virtio-blk: Modify save/load handler to handle inuse varialble.
  Introduce -k option to enable FT migration mode (Kemari).
  Add a parser to accept FT migration incoming mode.

 Makefile.objs|1 +
 Makefile.target  |1 +
 block.c  |   22 +++
 block.h  |4 +
 cpu-all.h|  134 -
 event-tap.c  |  184 
 event-tap.h  |   32 
 exec.c   |  131 +
 ft_transaction.c |  418 ++
 ft_transaction.h |   54 +++
 hw/hw.h  |7 +
 hw/virtio.c  |8 +-
 ioport.c |2 +
 migration-exec.c |2 +-
 migration-fd.c   |2 +-
 migration-tcp.c  |   52 +++-
 migration-unix.c |2 +-
 migration.c  |  110 ++-
 migration.h  |3 +
 net/queue.c  |   18 +++
 net/queue.h  |3 +
 osdep.c  |   13 ++
 qemu-char.c  |   25 +++-
 qemu-kvm.c   |   23 ++--
 qemu-monitor.hx  |7 +-
 qemu_socket.h|4 +
 savevm.c |  146 +--
 sysemu.h |3 +-
 vl.c |   57 +---
 29 files changed, 1371 insertions(+), 97 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h
 create mode 100644 ft_transaction.c
 create mode 100644 ft_transaction.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 13/23] Introduce event-tap.

2010-05-25 Thread Yoshiaki Tamura
event-tap controls when to start ft transaction, and inserts callbacks
to the net/block.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 Makefile.target |1 +
 event-tap.c |  184 +++
 event-tap.h |   32 ++
 3 files changed, 217 insertions(+), 0 deletions(-)
 create mode 100644 event-tap.c
 create mode 100644 event-tap.h

diff --git a/Makefile.target b/Makefile.target
index 82caf20..a49b21f 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -188,6 +188,7 @@ obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 # MSI-X depends on kvm for interrupt injection,
 # so moved it from Makefile.objs to Makefile.target for now
 obj-y += msix.o
+obj-y += event-tap.o
 
 obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
 LIBS+=-lz
diff --git a/event-tap.c b/event-tap.c
new file mode 100644
index 000..5d3a338
--- /dev/null
+++ b/event-tap.c
@@ -0,0 +1,184 @@
+/*
+ * Event Tap functions for QEMU
+ *
+ * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation. 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include block.h
+#include ioport.h
+#include osdep.h
+#include hw/hw.h
+#include net/queue.h
+#include event-tap.h
+
+static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF;
+
+typedef struct EventTapIOport {
+uint32_t address;
+uint32_t data;
+int index;
+} EventTapIOport;
+
+#define MMIO_BUF_SIZE 8
+
+typedef struct EventTapMMIO {
+uint64_t address;
+uint8_t buf[MMIO_BUF_SIZE];
+int len;
+} EventTapMMIO;
+
+#define EVENT_TAP_IOPORT 1
+#define EVENT_TAP_MMIO   2
+
+typedef struct EventTapLog {
+int mode;
+union {
+EventTapIOport ioport ;
+EventTapMMIO mmio;
+};
+} EventTapLog;
+
+static EventTapLog last_event_tap;
+
+int event_tap_register(int (*cb)(void))
+{
+if (cb == NULL || event_tap_state != EVENT_TAP_OFF)
+return -1;
+
+bdrv_event_tap_register(cb);
+qemu_net_event_tap_register(cb);
+event_tap_state = EVENT_TAP_ON;
+
+return 0;
+}
+
+int event_tap_unregister(void)
+{
+if (event_tap_state == EVENT_TAP_OFF)
+return -1;
+
+bdrv_event_tap_unregister();
+qemu_net_event_tap_unregister();
+event_tap_state = EVENT_TAP_OFF;
+
+return 0;
+}
+
+void event_tap_suspend(void)
+{
+if (event_tap_state == EVENT_TAP_ON)
+event_tap_state = EVENT_TAP_SUSPEND;
+}
+
+void event_tap_resume(void)
+{
+if (event_tap_state == EVENT_TAP_SUSPEND)
+event_tap_state = EVENT_TAP_ON;
+}
+
+int event_tap_get_state(void)
+{
+return event_tap_state;
+}
+
+void event_tap_ioport(int index, uint32_t address, uint32_t data)
+{
+if (event_tap_state != EVENT_TAP_ON) {
+return;
+}
+
+last_event_tap.mode = EVENT_TAP_IOPORT;
+last_event_tap.ioport.index = index;
+last_event_tap.ioport.address = address;
+last_event_tap.ioport.data = data;
+}
+
+void event_tap_mmio(uint64_t address, uint8_t *buf, int len)
+{
+if (event_tap_state != EVENT_TAP_ON || len  MMIO_BUF_SIZE) {
+return;
+}
+
+last_event_tap.mode = EVENT_TAP_MMIO;
+last_event_tap.mmio.address = address;
+last_event_tap.mmio.len = len;
+memcpy(last_event_tap.mmio.buf, buf, len);
+}
+
+static void event_tap_reset(void)
+{
+memset(last_event_tap, 0, sizeof(last_event_tap));
+}
+
+void event_tap_replay(void)
+{
+if (event_tap_state != EVENT_TAP_REPLAY) {
+return;
+}
+
+switch (last_event_tap.mode) {
+case EVENT_TAP_IOPORT:
+switch (last_event_tap.ioport.index) {
+case 0:
+cpu_outb(last_event_tap.ioport.address, 
last_event_tap.ioport.data);
+break;
+case 1:
+cpu_outw(last_event_tap.ioport.address, 
last_event_tap.ioport.data);
+break;
+case 2:
+cpu_outl(last_event_tap.ioport.address, 
last_event_tap.ioport.data);
+break;
+}
+event_tap_reset();
+break;
+case EVENT_TAP_MMIO:
+cpu_physical_memory_rw(last_event_tap.mmio.address,
+   last_event_tap.mmio.buf,
+   last_event_tap.mmio.len, 1);
+event_tap_reset();
+break;
+}
+}
+
+static void event_tap_save(QEMUFile *f, void *opaque)
+{
+qemu_put_byte(f, last_event_tap.mode);
+
+if (last_event_tap.mode == EVENT_TAP_IOPORT) {
+qemu_put_be32(f, last_event_tap.ioport.index);
+qemu_put_be32(f, last_event_tap.ioport.address);
+qemu_put_byte(f, last_event_tap.ioport.data);
+} else {
+qemu_put_be64(f, last_event_tap.mmio.address);
+qemu_put_byte(f, last_event_tap.mmio.len);
+qemu_put_buffer(f, last_event_tap.mmio.buf, last_event_tap.mmio.len);
+}
+}
+
+static int event_tap_load(QEMUFile *f, void *opaque, int version_id)
+{
+last_event_tap.mode = qemu_get_byte(f);
+
+if (last_event_tap.mode == 

[RFC PATCH 22/23] Introduce -k option to enable FT migration mode (Kemari).

2010-05-25 Thread Yoshiaki Tamura
When -k option is set to migrate command, it will turn on ft_mode to
start FT migration mode (Kemari).

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration.c |3 +++
 qemu-monitor.hx |7 ---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/migration.c b/migration.c
index 5b90d37..3334650 100644
--- a/migration.c
+++ b/migration.c
@@ -71,6 +71,9 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 return -1;
 }
 
+if (qdict_get_int(qdict, ft))
+ft_mode = FT_INIT;
+
 if (strstart(uri, tcp:, p)) {
 s = tcp_start_outgoing_migration(mon, p, max_throttle, detach,
  (int)qdict_get_int(qdict, blk), 
diff --git a/qemu-monitor.hx b/qemu-monitor.hx
index 16c45b7..22b72d9 100644
--- a/qemu-monitor.hx
+++ b/qemu-monitor.hx
@@ -765,13 +765,14 @@ ETEXI
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,uri:s,
-.params = [-d] [-b] [-i] uri,
+.args_type  = detach:-d,blk:-b,inc:-i,ft:-k,uri:s,
+.params = [-d] [-b] [-i] [-k] uri,
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t -k for FT migration mode (Kemari),
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 03/23] Use cpu_physical_memory_set_dirty_range() to update phys_ram_dirty.

2010-05-25 Thread Yoshiaki Tamura
Modifies kvm_get_dirty_pages_log_range to use
cpu_physical_memory_set_dirty_range() to update the row of the
bit-based phys_ram_dirty bitmap at once.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 qemu-kvm.c |   19 +++
 1 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 29365a9..1414f49 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2323,8 +2323,8 @@ static int kvm_get_dirty_pages_log_range(unsigned long 
start_addr,
  unsigned long offset,
  unsigned long mem_size)
 {
-unsigned int i, j;
-unsigned long page_number, addr, addr1, c;
+unsigned int i;
+unsigned long page_number, addr, addr1;
 ram_addr_t ram_addr;
 unsigned int len = ((mem_size / TARGET_PAGE_SIZE) + HOST_LONG_BITS - 1) /
 HOST_LONG_BITS;
@@ -2335,16 +2335,11 @@ static int kvm_get_dirty_pages_log_range(unsigned long 
start_addr,
  */
 for (i = 0; i  len; i++) {
 if (bitmap[i] != 0) {
-c = leul_to_cpu(bitmap[i]);
-do {
-j = ffsl(c) - 1;
-c = ~(1ul  j);
-page_number = i * HOST_LONG_BITS + j;
-addr1 = page_number * TARGET_PAGE_SIZE;
-addr = offset + addr1;
-ram_addr = cpu_get_physical_page_desc(addr);
-cpu_physical_memory_set_dirty(ram_addr);
-} while (c != 0);
+page_number = i * HOST_LONG_BITS;
+addr1 = page_number * TARGET_PAGE_SIZE;
+addr = offset + addr1;
+ram_addr = cpu_get_physical_page_desc(addr);
+cpu_physical_memory_set_dirty_range(ram_addr, 
leul_to_cpu(bitmap[i]));
 }
 }
 return 0;
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 10/23] Introduce util functions to control ft_transaction from savevm layer.

2010-05-25 Thread Yoshiaki Tamura
To utilize ft_transaction function, savevm needs interfaces to be
exported.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/hw.h  |5 +
 savevm.c |   41 +
 2 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index fc9ed29..5a48a91 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -54,6 +54,8 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc 
*put_buffer,
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
+QEMUFile *qemu_fopen_transaction(int fd);
+QEMUFile *qemu_fopen_tranx_sender(void *opaque);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
@@ -63,6 +65,9 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int 
size);
 void qemu_put_byte(QEMUFile *f, int v);
 void *qemu_realloc_buffer(QEMUFile *f, int size);
 void qemu_clear_buffer(QEMUFile *f);
+int qemu_transaction_begin(QEMUFile *f);
+int qemu_transaction_commit(QEMUFile *f);
+int qemu_transaction_cancel(QEMUFile *f);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/savevm.c b/savevm.c
index 2ab883b..81cb711 100644
--- a/savevm.c
+++ b/savevm.c
@@ -82,6 +82,7 @@
 #include migration.h
 #include qemu_socket.h
 #include qemu-queue.h
+#include ft_transaction.h
 
 /* point to the block driver where the snapshots are managed */
 static BlockDriverState *bs_snapshots;
@@ -207,6 +208,21 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
+static ssize_t socket_put_buffer(void *opaque, const void *buf, size_t size)
+{
+QEMUFileSocket *s = opaque;
+ssize_t len;
+
+do {
+len = send(s-fd, (void *)buf, size, 0);
+} while (len == -1  socket_error() == EINTR);
+
+if (len == -1)
+len = -socket_error();
+
+return len;
+}
+
 static int socket_close(void *opaque)
 {
 QEMUFileSocket *s = opaque;
@@ -335,6 +351,16 @@ QEMUFile *qemu_fopen_socket(int fd)
 return s-file;
 }
 
+QEMUFile *qemu_fopen_transaction(int fd)
+{
+QEMUFileSocket *s = qemu_mallocz(sizeof(QEMUFileSocket));
+
+s-fd = fd;
+s-file = qemu_fopen_ops_ft_tranx(s, socket_put_buffer, socket_get_buffer,
+ socket_close, 0);
+return s-file;
+}
+
 static int file_put_buffer(void *opaque, const uint8_t *buf,
 int64_t pos, int size)
 {
@@ -472,6 +498,21 @@ void qemu_clear_buffer(QEMUFile *f)
 memset(f-buf, 0, f-buf_max_size);
 }
 
+int qemu_transaction_begin(QEMUFile *f)
+{
+return qemu_ft_tranx_begin(f-opaque);
+}
+
+int qemu_transaction_commit(QEMUFile *f)
+{
+return qemu_ft_tranx_commit(f-opaque);
+}
+
+int qemu_transaction_cancel(QEMUFile *f)
+{
+return qemu_ft_tranx_cancel(f-opaque);
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
 int len;
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 20/23] Modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled.

2010-05-25 Thread Yoshiaki Tamura
When ft_mode is set in the header, tcp_accept_incoming_migration()
receives ft_transaction iteratively.  We also need a hack no to close
fd before moving to ft_transaction mode, so that we can reuse the fd
for it.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration-tcp.c |   36 +++-
 1 files changed, 35 insertions(+), 1 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 767a2f1..a5d9b6d 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -18,6 +18,7 @@
 #include sysemu.h
 #include buffered_file.h
 #include block.h
+#include ft_transaction.h
 
 //#define DEBUG_MIGRATION_TCP
 
@@ -55,7 +56,8 @@ static int socket_read(FdMigrationState *s, const void * buf, 
size_t size)
 static int tcp_close(FdMigrationState *s)
 {
 DPRINTF(tcp_close\n);
-if (s-fd != -1) {
+/* FIX ME: accessing ft_mode here isn't clean */
+if (s-fd != -1  ft_mode != FT_INIT) {
 close(s-fd);
 s-fd = -1;
 }
@@ -181,6 +183,38 @@ static void tcp_accept_incoming_migration(void *opaque)
 fprintf(stderr, load of migration failed\n);
 goto out_fopen;
 }
+
+/* ft_mode is set by qemu_loadvm_state(). */
+if (ft_mode == FT_INIT) {
+/* close normal QEMUFile first before reusing connection. */
+qemu_fclose(f);
+socket_set_nodelay(c);
+socket_set_timeout(c, 5);
+/* don't autostart to avoid split brain. */
+autostart = 0;
+
+f = qemu_fopen_transaction(c);
+if (f == NULL) {
+fprintf(stderr, could not qemu_fopen transaction\n);
+goto out;
+}
+
+/* need to wait sender to setup. */
+if (qemu_transaction_begin(f)  0) {
+goto out_fopen;
+}
+
+/* loop until transaction breaks */
+while ((ft_mode != FT_OFF)  (ret == 0)) {
+ret = qemu_loadvm_state(f, 1);
+}
+
+/* if migrate_cancel was called at the sender  */
+if (ft_mode == FT_OFF) {
+goto out_fopen;
+}
+}
+
 qemu_announce_self();
 DPRINTF(successfully loaded vm state\n);
 
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 14/23] Call init handler of event-tap at main().

2010-05-25 Thread Yoshiaki Tamura
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 vl.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 70a8aed..56d12c7 100644
--- a/vl.c
+++ b/vl.c
@@ -169,6 +169,8 @@ int main(int argc, char **argv)
 
 #include qemu-queue.h
 
+#include event-tap.h
+
 //#define DEBUG_NET
 //#define DEBUG_SLIRP
 
@@ -5949,6 +5951,8 @@ int main(int argc, char **argv, char **envp)
 
 blk_mig_init();
 
+event_tap_init();
+
 if (default_cdrom) {
 /* we always create the cdrom drive, even if no disk is there */
 drive_add(NULL, CDROM_ALIAS);
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 15/23] Insert event_tap_ioport() to ioport_write().

2010-05-25 Thread Yoshiaki Tamura
Record ioport event to replay it upon failover.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 ioport.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/ioport.c b/ioport.c
index 53dd87a..ad7a017 100644
--- a/ioport.c
+++ b/ioport.c
@@ -26,6 +26,7 @@
  */
 
 #include ioport.h
+#include event-tap.h
 
 /***/
 /* IO Port */
@@ -75,6 +76,7 @@ static void ioport_write(int index, uint32_t address, 
uint32_t data)
 default_ioport_writel
 };
 IOPortWriteFunc *func = ioport_write_table[index][address];
+event_tap_ioport(index, address, data);
 if (!func)
 func = default_func[index];
 func(ioport_opaque[address], address, data);
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 08/23] Introduce some socket util functions.

2010-05-25 Thread Yoshiaki Tamura
Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 osdep.c   |   13 +
 qemu-char.c   |   25 -
 qemu_socket.h |4 
 3 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/osdep.c b/osdep.c
index 3bab79a..63444e7 100644
--- a/osdep.c
+++ b/osdep.c
@@ -201,6 +201,12 @@ void socket_set_nonblock(int fd)
 ioctlsocket(fd, FIONBIO, opt);
 }
 
+void socket_set_block(int fd)
+{
+unsigned long opt = 0;
+ioctlsocket(fd, FIONBIO, opt);
+}
+
 int inet_aton(const char *cp, struct in_addr *ia)
 {
 uint32_t addr = inet_addr(cp);
@@ -223,6 +229,13 @@ void socket_set_nonblock(int fd)
 fcntl(fd, F_SETFL, f | O_NONBLOCK);
 }
 
+void socket_set_block(int fd)
+{
+int f;
+f = fcntl(fd, F_GETFL);
+fcntl(fd, F_SETFL, f  ~O_NONBLOCK);
+}
+
 void qemu_set_cloexec(int fd)
 {
 int f;
diff --git a/qemu-char.c b/qemu-char.c
index 4169492..ccdf394 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2092,12 +2092,35 @@ static void tcp_chr_telnet_init(int fd)
 send(fd, (char *)buf, 3, 0);
 }
 
-static void socket_set_nodelay(int fd)
+void socket_set_delay(int fd)
+{
+int val = 0;
+setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)val, sizeof(val));
+}
+
+void socket_set_nodelay(int fd)
 {
 int val = 1;
 setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)val, sizeof(val));
 }
 
+void socket_set_timeout(int fd, int s)
+{
+struct timeval tv = {
+.tv_sec = s,
+.tv_usec = 0
+};
+/* Set socket_timeout */
+if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO,
+   tv, sizeof(tv))  0) {
+fprintf(stderr, failed to set SO_RCVTIMEO\n);
+}
+if (setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO,
+   tv, sizeof(tv))  0) {
+fprintf(stderr, fialed to set SO_SNDTIMEO\n);
+}
+}
+
 static void tcp_chr_accept(void *opaque)
 {
 CharDriverState *chr = opaque;
diff --git a/qemu_socket.h b/qemu_socket.h
index 7ee46ac..8eae465 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -35,6 +35,10 @@ int inet_aton(const char *cp, struct in_addr *ia);
 int qemu_socket(int domain, int type, int protocol);
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 void socket_set_nonblock(int fd);
+void socket_set_block(int fd);
+void socket_set_nodelay(int fd);
+void socket_set_delay(int fd);
+void socket_set_timeout(int fd, int s);
 int send_all(int fd, const void *buf, int len1);
 
 /* New, ipv6-ready socket helper functions, see qemu-sockets.c */
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 18/23] Call event_tap_replay() at vm_start().

2010-05-25 Thread Yoshiaki Tamura
Call event_tap_replay() at vm_start() to replay the last ioport/mmio
event upon failover.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 vl.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 56d12c7..762440d 100644
--- a/vl.c
+++ b/vl.c
@@ -3094,6 +3094,7 @@ void vm_start(void)
 vm_state_notify(1, 0);
 qemu_rearm_alarm_timer(alarm_timer);
 resume_all_vcpus();
+event_tap_replay();
 }
 }
 
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 07/23] Introduce skip_header parameter to qemu_loadvm_state().

2010-05-25 Thread Yoshiaki Tamura
Introduce skip_header parameter to qemu_loadvm_state() so that it can
be called iteratively without reading the header.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 migration-exec.c |2 +-
 migration-fd.c   |2 +-
 migration-tcp.c  |2 +-
 migration-unix.c |2 +-
 savevm.c |   24 +---
 sysemu.h |2 +-
 6 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/migration-exec.c b/migration-exec.c
index 3edc026..5839a6d 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -113,7 +113,7 @@ static void exec_accept_incoming_migration(void *opaque)
 QEMUFile *f = opaque;
 int ret;
 
-ret = qemu_loadvm_state(f);
+ret = qemu_loadvm_state(f, 0);
 if (ret  0) {
 fprintf(stderr, load of migration failed\n);
 goto err;
diff --git a/migration-fd.c b/migration-fd.c
index 0cc74ad..0e97ed0 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -106,7 +106,7 @@ static void fd_accept_incoming_migration(void *opaque)
 QEMUFile *f = opaque;
 int ret;
 
-ret = qemu_loadvm_state(f);
+ret = qemu_loadvm_state(f, 0);
 if (ret  0) {
 fprintf(stderr, load of migration failed\n);
 goto err;
diff --git a/migration-tcp.c b/migration-tcp.c
index cffc4df..767a2f1 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -176,7 +176,7 @@ static void tcp_accept_incoming_migration(void *opaque)
 goto out;
 }
 
-ret = qemu_loadvm_state(f);
+ret = qemu_loadvm_state(f, 0);
 if (ret  0) {
 fprintf(stderr, load of migration failed\n);
 goto out_fopen;
diff --git a/migration-unix.c b/migration-unix.c
index b7aab38..dd99a73 100644
--- a/migration-unix.c
+++ b/migration-unix.c
@@ -168,7 +168,7 @@ static void unix_accept_incoming_migration(void *opaque)
 goto out;
 }
 
-ret = qemu_loadvm_state(f);
+ret = qemu_loadvm_state(f, 0);
 if (ret  0) {
 fprintf(stderr, load of migration failed\n);
 goto out_fopen;
diff --git a/savevm.c b/savevm.c
index b9bb9f4..2ab883b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1489,7 +1489,7 @@ typedef struct LoadStateEntry {
 int version_id;
 } LoadStateEntry;
 
-int qemu_loadvm_state(QEMUFile *f)
+int qemu_loadvm_state(QEMUFile *f, int skip_header)
 {
 QLIST_HEAD(, LoadStateEntry) loadvm_handlers =
 QLIST_HEAD_INITIALIZER(loadvm_handlers);
@@ -1498,17 +1498,19 @@ int qemu_loadvm_state(QEMUFile *f)
 unsigned int v;
 int ret;
 
-v = qemu_get_be32(f);
-if (v != QEMU_VM_FILE_MAGIC)
-return -EINVAL;
+if (!skip_header) {
+v = qemu_get_be32(f);
+if (v != QEMU_VM_FILE_MAGIC)
+return -EINVAL;
 
-v = qemu_get_be32(f);
-if (v == QEMU_VM_FILE_VERSION_COMPAT) {
-fprintf(stderr, SaveVM v2 format is obsolete and don't work 
anymore\n);
-return -ENOTSUP;
+v = qemu_get_be32(f);
+if (v == QEMU_VM_FILE_VERSION_COMPAT) {
+fprintf(stderr, SaveVM v2 format is obsolete and don't work 
anymore\n);
+return -ENOTSUP;
+}
+if (v != QEMU_VM_FILE_VERSION)
+return -ENOTSUP;
 }
-if (v != QEMU_VM_FILE_VERSION)
-return -ENOTSUP;
 
 while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
 uint32_t instance_id, version_id, section_id;
@@ -1833,7 +1835,7 @@ int load_vmstate(Monitor *mon, const char *name)
 monitor_printf(mon, Could not open VM state file\n);
 return -EINVAL;
 }
-ret = qemu_loadvm_state(f);
+ret = qemu_loadvm_state(f, 0);
 qemu_fclose(f);
 if (ret  0) {
 monitor_printf(mon, Error %d while loading VM state\n, ret);
diff --git a/sysemu.h b/sysemu.h
index 647a468..6c1441f 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -68,7 +68,7 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int 
blk_enable,
 int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f);
 int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f);
 void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f);
-int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state(QEMUFile *f, int skip_header);
 
 void qemu_errors_to_file(FILE *fp);
 void qemu_errors_to_mon(Monitor *mon);
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 12/23] Insent event-tap callbacks to net/block layer.

2010-05-25 Thread Yoshiaki Tamura
Introduce event-tap callbacks to functions which actually fire outputs
at net/block layer.  By synchronizing VMs before outputs are fired, we
can failover to the receiver upon failure.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 block.c |   22 ++
 block.h |4 
 net/queue.c |   18 ++
 net/queue.h |3 +++
 4 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 31d1ba4..cf73c47 100644
--- a/block.c
+++ b/block.c
@@ -59,6 +59,8 @@ BlockDriverState *bdrv_first;
 
 static BlockDriver *first_drv;
 
+static int (*bdrv_event_tap)(void);
+
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
@@ -787,6 +789,10 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num,
 set_dirty_bitmap(bs, sector_num, nb_sectors, 1);
 }
 
+if (bdrv_event_tap != NULL) {
+bdrv_event_tap();
+}
+
 return drv-bdrv_write(bs, sector_num, buf, nb_sectors);
 }
 
@@ -1851,6 +1857,10 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
 MultiwriteCB *mcb;
 int i;
 
+if (bdrv_event_tap != NULL) {
+bdrv_event_tap();
+}
+
 if (num_reqs == 0) {
 return 0;
 }
@@ -2277,3 +2287,15 @@ int64_t bdrv_get_dirty_count(BlockDriverState *bs)
 {
 return bs-dirty_count;
 }
+
+void bdrv_event_tap_register(int (*cb)(void))
+{
+if (bdrv_event_tap == NULL) {
+bdrv_event_tap = cb;
+}
+}
+
+void bdrv_event_tap_unregister(void)
+{
+bdrv_event_tap = NULL;
+}
diff --git a/block.h b/block.h
index edf5704..b5139db 100644
--- a/block.h
+++ b/block.h
@@ -207,4 +207,8 @@ int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
 void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
   int nr_sectors);
 int64_t bdrv_get_dirty_count(BlockDriverState *bs);
+
+void bdrv_event_tap_register(int (*cb)(void));
+void bdrv_event_tap_unregister(void);
+
 #endif
diff --git a/net/queue.c b/net/queue.c
index 2ea6cd0..a542efe 100644
--- a/net/queue.c
+++ b/net/queue.c
@@ -57,6 +57,8 @@ struct NetQueue {
 unsigned delivering : 1;
 };
 
+static int (*net_event_tap)(void);
+
 NetQueue *qemu_new_net_queue(NetPacketDeliver *deliver,
  NetPacketDeliverIOV *deliver_iov,
  void *opaque)
@@ -151,6 +153,8 @@ static ssize_t qemu_net_queue_deliver(NetQueue *queue,
 ssize_t ret = -1;
 
 queue-delivering = 1;
+if (net_event_tap)
+net_event_tap();
 ret = queue-deliver(sender, flags, data, size, queue-opaque);
 queue-delivering = 0;
 
@@ -166,6 +170,8 @@ static ssize_t qemu_net_queue_deliver_iov(NetQueue *queue,
 ssize_t ret = -1;
 
 queue-delivering = 1;
+if (net_event_tap)
+net_event_tap();
 ret = queue-deliver_iov(sender, flags, iov, iovcnt, queue-opaque);
 queue-delivering = 0;
 
@@ -258,3 +264,15 @@ void qemu_net_queue_flush(NetQueue *queue)
 qemu_free(packet);
 }
 }
+
+void qemu_net_event_tap_register(int (*cb)(void))
+{
+if (net_event_tap == NULL) {
+net_event_tap = cb;
+}
+}
+
+void qemu_net_event_tap_unregister(void)
+{
+net_event_tap = NULL;
+}
diff --git a/net/queue.h b/net/queue.h
index a31958e..5b031c1 100644
--- a/net/queue.h
+++ b/net/queue.h
@@ -68,4 +68,7 @@ ssize_t qemu_net_queue_send_iov(NetQueue *queue,
 void qemu_net_queue_purge(NetQueue *queue, VLANClientState *from);
 void qemu_net_queue_flush(NetQueue *queue);
 
+void qemu_net_event_tap_register(int (*cb)(void));
+void qemu_net_event_tap_unregister(void);
+
 #endif /* QEMU_NET_QUEUE_H */
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 02/23] Introduce cpu_physical_memory_get_dirty_range().

2010-05-25 Thread Yoshiaki Tamura
It checks the first row and puts dirty addr in the array.  If the
first row is empty, it skips to the first non-dirty row or the end
addr, and put the length in the first entry of the array.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp
---
 cpu-all.h |4 +++
 exec.c|   67 +
 2 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 3f8762d..27187d4 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -1007,6 +1007,10 @@ static inline void 
cpu_physical_memory_mask_dirty_range(ram_addr_t start,
 }
 }
 
+int cpu_physical_memory_get_dirty_range(ram_addr_t start, ram_addr_t end, 
+ram_addr_t *dirty_rams, int length,
+int dirty_flags);
+
 void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
  int dirty_flags);
 void cpu_tlb_update_dirty(CPUState *env);
diff --git a/exec.c b/exec.c
index bf8d703..d5c2a05 100644
--- a/exec.c
+++ b/exec.c
@@ -1962,6 +1962,73 @@ static inline void tlb_reset_dirty_range(CPUTLBEntry 
*tlb_entry,
 }
 }
 
+/* It checks the first row and puts dirty addrs in the array.
+   If the first row is empty, it skips to the first non-dirty row
+   or the end addr, and put the length in the first entry of the array. */
+int cpu_physical_memory_get_dirty_range(ram_addr_t start, ram_addr_t end, 
+ram_addr_t *dirty_rams, int length,
+int dirty_flag)
+{
+unsigned long p = 0, page_number;
+ram_addr_t addr;
+ram_addr_t s_idx = (start  TARGET_PAGE_BITS) / HOST_LONG_BITS;
+ram_addr_t e_idx = (end  TARGET_PAGE_BITS) / HOST_LONG_BITS;
+int i, j, offset, dirty_idx = dirty_flag_to_idx(dirty_flag);
+
+/* mask bits before the start addr */
+offset = (start  TARGET_PAGE_BITS)  (HOST_LONG_BITS - 1);
+cpu_physical_memory_sync_master(s_idx);
+p |= phys_ram_dirty[dirty_idx][s_idx]  ~((1UL  offset) - 1);
+
+if (s_idx == e_idx) {
+/* mask bits after the end addr */
+offset = (end  TARGET_PAGE_BITS)  (HOST_LONG_BITS - 1);
+p = (1UL  offset) - 1;
+}
+
+if (p == 0) {
+/* when the row is empty */
+ram_addr_t skip;
+if (s_idx == e_idx) {
+skip = end;
+   } else {
+/* skip empty rows */
+while (s_idx  e_idx) {
+s_idx++;
+cpu_physical_memory_sync_master(s_idx);
+
+if (phys_ram_dirty[dirty_idx][s_idx] != 0) {
+break;
+}
+}
+skip = (s_idx * HOST_LONG_BITS * TARGET_PAGE_SIZE);
+}
+dirty_rams[0] = skip - start;
+i = 0;
+
+} else if (p == ~0UL) {
+/* when the row is fully dirtied */
+addr = start;
+for (i = 0; i  length; i++) {
+dirty_rams[i] = addr;
+addr += TARGET_PAGE_SIZE;
+}
+} else {
+/* when the row is partially dirtied */
+i = 0;
+do {
+j = ffsl(p) - 1;
+p = ~(1UL  j);
+page_number = s_idx * HOST_LONG_BITS + j;
+addr = page_number * TARGET_PAGE_SIZE;
+dirty_rams[i] = addr;
+i++;
+} while (p != 0  i  length);
+}
+
+return i;
+}
+
 /* Note: start and end must be within the same ram block.  */
 void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
  int dirty_flags)
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 21/23] virtio-blk: Modify save/load handler to handle inuse varialble.

2010-05-25 Thread Yoshiaki Tamura
Modify inuse type to uint16_t, let save/load to handle, and revert
last_avail_idx with inuse if there are outstanding emulation.

Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp
---
 hw/virtio.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 7c020a3..502929c 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -70,7 +70,7 @@ struct VirtQueue
 VRing vring;
 target_phys_addr_t pa;
 uint16_t last_avail_idx;
-int inuse;
+uint16_t inuse;
 uint16_t vector;
 void (*handle_output)(VirtIODevice *vdev, VirtQueue *vq);
 };
@@ -641,6 +641,7 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 qemu_put_be32(f, vdev-vq[i].vring.num);
 qemu_put_be64(f, vdev-vq[i].pa);
 qemu_put_be16s(f, vdev-vq[i].last_avail_idx);
+qemu_put_be16s(f, vdev-vq[i].inuse);
 if (vdev-binding-save_queue)
 vdev-binding-save_queue(vdev-binding_opaque, i, f);
 }
@@ -678,6 +679,11 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f)
 vdev-vq[i].vring.num = qemu_get_be32(f);
 vdev-vq[i].pa = qemu_get_be64(f);
 qemu_get_be16s(f, vdev-vq[i].last_avail_idx);
+qemu_get_be16s(f, vdev-vq[i].inuse);
+
+/* revert last_avail_idx if there are outstanding emulation. */
+vdev-vq[i].last_avail_idx -= vdev-vq[i].inuse;
+vdev-vq[i].inuse = 0;
 
 if (vdev-vq[i].pa) {
 virtqueue_init(vdev-vq[i]);
-- 
1.7.0.31.g1df487

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Gentoo guest with smp: emerge freeze while recompile world

2010-05-25 Thread Avi Kivity

On 05/24/2010 12:15 AM, Riccardo wrote:



Please try with kvmclock disabled.
 

I have recompile gentoo-sources-2.6.34 without kvm-clock:
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
# dmesg | grep clock
[0.00] hpet clockevent registered
[0.661050] Switching to clocksource tsc

And with this kernel all working fine! (emerge -e world)
It's a problem in the kvm-clock for kernel=2.6.32
   


Can you provide the traces with kvmclock enables so we can see what went 
wrong?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/5] trace: Add trace-events file for declaring trace events

2010-05-25 Thread Avi Kivity

On 05/25/2010 01:07 AM, Anthony Liguori wrote:


Interesting approach as it lets us defer the tracing backend decision.


Also, it's compatible with the multiplatform nature of qemu.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/24/2010 10:38 PM, Anthony Liguori wrote:



- Building a plugin API seems a bit simpler to me, although I'm to
sure if I'd get the
   idea correctly:
   The block layer has already some kind of api (.bdrv_file_open, 
.bdrv_read). We
   could simply compile the block-drivers as shared objects and 
create a method

   for loading the necessary modules at runtime.


That approach would be a recipe for disaster.   We would have to 
introduce a new, reduced functionality block API that was supported 
for plugins.  Otherwise, the only way a plugin could keep up with our 
API changes would be if it was in tree which defeats the purpose of 
having plugins.


We could guarantee API/ABI stability in a stable branch but not across 
releases.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/24/2010 10:16 PM, Anthony Liguori wrote:

On 05/24/2010 06:56 AM, Avi Kivity wrote:

On 05/24/2010 02:42 PM, MORITA Kazutaka wrote:



The server would be local and talk over a unix domain socket, perhaps
anonymous.

nbd has other issues though, such as requiring a copy and no 
support for

metadata operations such as snapshot and file size extension.


Sorry, my explanation was unclear.  I'm not sure how running servers
on localhost can solve the problem.


The local server can convert from the local (nbd) protocol to the 
remote (sheepdog, ceph) protocol.



What I wanted to say was that we cannot specify the image of VM. With
nbd protocol, command line arguments are as follows:

  $ qemu nbd:hostname:port

As this syntax shows, with nbd protocol the client cannot pass the VM
image name to the server.


We would extend it to allow it to connect to a unix domain socket:

  qemu nbd:unix:/path/to/socket


nbd is a no-go because it only supports a single, synchronous I/O 
operation at a time and has no mechanism for extensibility.


If we go this route, I think two options are worth considering.  The 
first would be a purely socket based approach where we just accepted 
the extra copy.


The other potential approach would be shared memory based.  We export 
all guest ram as shared memory along with a small bounce buffer pool.  
We would then use a ring queue (potentially even using virtio-blk) and 
an eventfd for notification.


We can't actually export guest memory unless we allocate it as a shared 
memory object, which has many disadvantages.  The only way to export 
anonymous memory now is vmsplice(), which is fairly limited.





The server at the other end would associate the socket with a 
filename and forward it to the server using the remote protocol.


However, I don't think nbd would be a good protocol.  My preference 
would be for a plugin API, or for a new local protocol that uses 
splice() to avoid copies.


I think a good shared memory implementation would be preferable to 
plugins.  I think it's worth attempting to do a plugin interface for 
the block layer but I strongly suspect it would not be sufficient.


I would not want to see plugins that interacted with BlockDriverState 
directly, for instance.  We change it far too often.  Our main loop 
functions are also not terribly stable so I'm not sure how we would 
handle that (unless we forced all block plugins to be in a separate 
thread).


If we manage to make a good long-term stable plugin API, it would be a 
good candidate for the block layer itself.


Some OSes manage to have a stable block driver ABI, so it should be 
possible, if difficult.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/24/2010 10:19 PM, Anthony Liguori wrote:

On 05/24/2010 06:03 AM, Avi Kivity wrote:

On 05/24/2010 11:27 AM, Stefan Hajnoczi wrote:

On Sun, May 23, 2010 at 1:01 PM, Avi Kivitya...@redhat.com  wrote:

On 05/21/2010 12:29 AM, Anthony Liguori wrote:
I'd be more interested in enabling people to build these types of 
storage

systems without touching qemu.

Both sheepdog and ceph ultimately transmit I/O over a socket to a 
central

daemon, right?

That incurs an extra copy.

Besides a shared memory approach, I wonder if the splice() family of
syscalls could be used to send/receive data through a storage daemon
without the daemon looking at or copying the data?


Excellent idea.


splice() eventually requires a copy.  You cannot splice() to linux-aio 
so you'd have to splice() to a temporary buffer and then call into 
linux-aio.  With shared memory, you can avoid ever bringing the data 
into memory via O_DIRECT and linux-aio.


If the final destination is a socket, then you end up queuing guest 
memory as an skbuff.  In theory we could do an aio splice to block 
devices but I don't think that's realistic given our experience with aio 
changes.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [qemu-kvm tests PATCH] qemu-kvm tests cleanup

2010-05-25 Thread Avi Kivity

On 05/15/2010 11:12 AM, Asias He wrote:

fix test/x86/msr.c fail to build on i386

   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v2] KVM: MMU: allow more page become unsync at getting sp time

2010-05-25 Thread Avi Kivity

On 05/24/2010 10:41 AM, Xiao Guangrong wrote:

Allow more page become asynchronous at getting sp time, if need create new
shadow page for gfn but it not allow unsync(level  1), we should unsync all
gfn's unsync page
   


Both applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ixgbe: macvlan on PF/VF when SRIOV is enabled

2010-05-25 Thread Shirley Ma
On Mon, 2010-05-24 at 10:54 -0700, Rose, Gregory V wrote:
 We look forward to it and will be happy to provide feedback.
I have submitted the patch to make macvlan on PF works when SRIOV is
enabled.

 One thing you can do is allocate VFs and then load the VF driver in
 your host domain and then assign each of them a macvlan filter.  You'd
 get a similar effect.

That's I am trying to make it work for macvlan on VFs in host domain. I
need to add VF secondary addresses in address filter, right?

Do you have any aggregation performance comparison between multiple
macvlans on PF and single macvlan per VF in host domain? I will run some
test to figure it out. If you have some data to share that would be
great.

Thanks
Shirley

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VMX: Fix and improve guest state validity checks

2010-05-25 Thread Avi Kivity

On 05/13/2010 11:15 PM, Mohammed Gamal wrote:

On Thu, May 13, 2010 at 9:24 AM, Avi Kivitya...@redhat.com  wrote:
   

On 05/11/2010 07:52 PM, Mohammed Gamal wrote:
 

- Add 's' and 'g' field checks on segment registers
- Correct SS checks for request and descriptor privilege levels

Signed-off-by: Mohammed Gamalm.gamal...@gmail.com
---
  arch/x86/kvm/vmx.c |   73
+++
  1 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 777e00d..9805c2a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2121,16 +2121,30 @@ static bool stack_segment_valid(struct kvm_vcpu
*vcpu)
vmx_get_segment(vcpu,ss, VCPU_SREG_SS);
ss_rpl = ss.selectorSELECTOR_RPL_MASK;

-   if (ss.unusable)
+   if (ss.dpl != ss_rpl) /* DPL != RPL */
+   return false;
+
+   if (ss.unusable) /* Short-circuit */
return true;

   

If ss.unusable, do the dpl and rpl have any meaning?
 

The idea is that dpl and rpl are checked on vmentry regardless of
whether ss is usable or not. While the other checks are performed only
if ss is usable.
   


Any reference to back this up?  I think rpl is valid regardless of 
ss.unusable (i.e. loading selector 0003 results in an unusable segment 
with rpl=3), but I don't see how dpl can be valid in an unusable segment.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for May 18

2010-05-25 Thread Avi Kivity

On 05/19/2010 11:20 AM, Christoph Hellwig wrote:


It's time we get a proper bugzilla.qemu.org for both qemu and qemu-kvm
that can be used sanely.  If you ask nicely you might even get a virtual
instance of bugzilla.kernel.org which works quite nicely.
   


That would be my preference too but there's a limit to how much we can 
juggle the bug database around.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: host panic on kernel 2.6.34

2010-05-25 Thread Avi Kivity

Copying netdev, bridge mailing lists.

On 05/24/2010 11:23 AM, Hao, Xudong wrote:

Hi all
I build latest kvm 37dec075a7854f0f550540bf3b9bbeef37c11e2a, based on kernel 
2.6.34, after kvm and kvm_intel module loaded, then /etc/init.d/kvm start, a 
few minutes later, the system will panic.

kernel: 2.6.34
kvm: 37dec075a7854f0f550540bf3b9bbeef37c11e2a
qemu-kvm: 69dd59a66aaf56d1e8e4c96d0a0923c9cf8f79a0

BUG: unable to handle kernel NULL pointer dereference at 0018
IP: [f914c05b] br_mdb_ip_get+0x2e/0x1aa [bridge]
*pdpt = 35fbb001 *pde = 
Oops:  [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
Modules linked in: bridge stp autofs4 hidp rfcomm l2cap crc16 bluetooth rfkill ]

Pid: 0, comm: swapper Not tainted 2.6.34 #1 X7DWA/X7DWA
EIP: 0060:[f914c05b] EFLAGS: 00010246 CPU: 0
EIP is at br_mdb_ip_get+0x2e/0x1aa [bridge]
EAX: c5801d40 EBX: c5801d40 ECX: faef EDX: 
ESI: f67e03c0 EDI: f5249200 EBP: c5801c94 ESP: c5801c80
  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
Process swapper (pid: 0, ti=c5801000 task=c07f2fe0 task.ti=c07de000)
Stack:
  c5801d40  c5801d40 f67e03c0 f5249200 c5801cb0 f914c6fd fff90006
0  f67e0940 f6326740 f627e064 f67e03c0 c5801d78 f914dd0c f76af140 f6326740
0  f5249200 f67e03c0 0014 f6326758 c5801d54 c08eb440 c5801cf4 c5801d00
Call Trace:
  [f914c6fd] ? br_multicast_leave_group+0x52/0x128 [bridge]
  [f914dd0c] ? br_multicast_rcv+0x6dc/0xe90 [bridge]
  [c0650420] ? fib_lookup+0x2c/0x3a
  [c064cd15] ? fib_validate_source+0x29d/0x2b4
  [c0621175] ? nf_hook_slow+0x3b/0x92
  [f9147b39] ? br_handle_frame_finish+0x53/0x17e [bridge]
  [f914b880] ? br_nf_pre_routing_finish+0x264/0x27c [bridge]
  [c0621175] ? nf_hook_slow+0x3b/0x92
  [f914b61c] ? br_nf_pre_routing_finish+0x0/0x27c [bridge]
  [f914bf6f] ? br_nf_pre_routing+0x553/0x570 [bridge]
  [c0621107] ? nf_iterate+0x2f/0x62
  [f9147ae6] ? br_handle_frame_finish+0x0/0x17e [bridge]
  [c0621175] ? nf_hook_slow+0x3b/0x92
  [f9147ae6] ? br_handle_frame_finish+0x0/0x17e [bridge]
  [f9147dda] ? br_handle_frame+0x176/0x198 [bridge]
  [f9147ae6] ? br_handle_frame_finish+0x0/0x17e [bridge]
  [c060643b] ? __netif_receive_skb+0x29a/0x37e
  [c0607023] ? dev_gro_receive+0xfd/0x1d2
  [c0606e03] ? netif_receive_skb+0x61/0x67
  [c0607199] ? __napi_gro_receive+0xa1/0xba
  [c0606e7e] ? napi_skb_finish+0x1e/0x33
  [c0607201] ? napi_gro_receive+0x20/0x24
  [f8867cfc] ? igb_poll+0x706/0xa39 [igb]
  [c06093b2] ? net_rx_action+0x97/0x13b
  [c0430641] ? __do_softirq+0x80/0xf4
  [c04305c1] ? __do_softirq+0x0/0xf4
  IRQ
  [c04305bf] ? irq_exit+0x29/0x2b
  [c040373e] ? do_IRQ+0x85/0x9b
  [c0402ca9] ? common_interrupt+0x29/0x30
  [c0407c4f] ? mwait_idle+0x4c/0x52
  [c0401a08] ? cpu_idle+0x3a/0x4e
  [c066cf16] ? rest_init+0x62/0x64
  [c08248dd] ? start_kernel+0x2c2/0x2c7
  [c08240b3] ? i386_start_kernel+0xb3/0xb8
Code: 57 56 53 83 ec 08 89 45 f0 89 55 ec 8b 42 10 66 83 f8 08 74 0e 31 db 66 3
EIP: [f914c05b] br_mdb_ip_get+0x2e/0x1aa [bridge] SS:ESP 0068:c5801c80
CR2: 0018
---[ end trace 907f878ab4cd8031 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G  D 2.6.34 #1
Call Trace:
  [c042c31b] panic+0x3e/0xaa
  [c0681caa] oops_end+0x8c/0x9b
  [c041e710] no_context+0x153/0x15d
  [c041e8a2] __bad_area_nosemaphore+0xe5/0xed
  [c041e90e] bad_area_nosemaphore+0xd/0x13
  [c06838b0] do_page_fault+0x375/0x37d
  [c0650420] ? fib_lookup+0x2c/0x3a
  [c0624431] ? ip_route_input_common+0x695/0xf2f
  [c068353b] ? do_page_fault+0x0/0x37d
  [c06813d6] error_code+0x66/0x6c
  [c068353b] ? do_page_fault+0x0/0x37d
  [f914c05b] ? br_mdb_ip_get+0x2e/0x1aa [bridge]
  [f914c6fd] br_multicast_leave_group+0x52/0x128 [bridge]
  [f914dd0c] br_multicast_rcv+0x6dc/0xe90 [bridge]
  [c0650420] ? fib_lookup+0x2c/0x3a
  [c064cd15] ? fib_validate_source+0x29d/0x2b4
  [c0621175] ? nf_hook_slow+0x3b/0x92
  [f9147b39] br_handle_frame_finish+0x53/0x17e [bridge]
  [f914b880] br_nf_pre_routing_finish+0x264/0x27c [bridge]
  [c0621175] ? nf_hook_slow+0x3b/0x92
  [f914b61c] ? br_nf_pre_routing_finish+0x0/0x27c [bridge]
  [f914bf6f] br_nf_pre_routing+0x553/0x570 [bridge]
  [c0621107] nf_iterate+0x2f/0x62
  [f9147ae6] ? br_handle_frame_finish+0x0/0x17e [bridge]
  [c0621175] nf_hook_slow+0x3b/0x92
  [f9147ae6] ? br_handle_frame_finish+0x0/0x17e [bridge]
  [f9147dda] br_handle_frame+0x176/0x198 [bridge]
  [f9147ae6] ? br_handle_frame_finish+0x0/0x17e [bridge]
  [c060643b] __netif_receive_skb+0x29a/0x37e
  [c0607023] ? dev_gro_receive+0xfd/0x1d2
  [c0606e03] netif_receive_skb+0x61/0x67
  [c0607199] ? __napi_gro_receive+0xa1/0xba
  [c0606e7e] napi_skb_finish+0x1e/0x33
  [c0607201] napi_gro_receive+0x20/0x24
  [f8867cfc] igb_poll+0x706/0xa39 [igb]
  [c06093b2] net_rx_action+0x97/0x13b
  [c0430641] __do_softirq+0x80/0xf4
  [c04305c1] ? __do_softirq+0x0/0xf4
  IRQ   [c04305bf] ? irq_exit+0x29/0x2b
  [c040373e] ? do_IRQ+0x85/0x9b
  [c0402ca9] ? common_interrupt+0x29/0x30
  [c0407c4f] ? 

[PATCH v2 0/7] Tracing backends

2010-05-25 Thread Stefan Hajnoczi
After the RFC discussion, updated patches which I propose for review and merge:

The following patches against qemu.git allow static trace events to be declared
in QEMU.  Trace events use a lightweight syntax and are independent of the
backend tracing system (e.g. LTTng UST).

Supported backends are:
 * my trivial tracer (simple)
 * LTTng Userspace Tracer (ust)
 * no tracer (nop, the default)

The ./configure option to choose a backend is --trace-backend=.

Main point of this patchset: adding new trace events is easy and we can switch
between backends without modifying the code.

These patches are also available at:
http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/tracing

v2:
[PATCH 1/7] trace: Add trace-events file for declaring trace events
 * Use $source_path/tracetool in ./configure
 * Include qemu-common.h in trace.h so common types are available

[PATCH 2/7] trace: Support disabled events in trace-events
 * New in v2: makes it easy to build only a subset of trace events

[PATCH 3/7] trace: Add simple built-in tracing backend
 * Make simpletrace.py parse trace-events instead of generating Python

[PATCH 4/7] trace: Add LTTng Userspace Tracer backend

[PATCH 5/7] trace: Trace qemu_malloc() and qemu_vmalloc()
 * Record pointer result from allocation functions

[PATCH 6/7] trace: Trace virtio-blk, multiwrite, and paio_submit

[PATCH 7/7] trace: Trace virtqueue operations
 * New in v2: observe virtqueue buffer add/remove and notifies

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] trace: Support disabled events in trace-events

2010-05-25 Thread Stefan Hajnoczi
Sometimes it is useful to disable a trace event.  Removing the event
from trace-events is not enough since source code will call the
trace_*() function for the event.

This patch makes it easy to build without specific trace events by
marking them disabled in trace-events:

disable multiwrite_cb(void *mcb, int ret) mcb %p ret %d

This builds without the multiwrite_cb trace event.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
v2:
 * This patch is new in v2

 trace-events |4 +++-
 tracetool|   10 --
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/trace-events b/trace-events
index a37d3cc..5efaa86 100644
--- a/trace-events
+++ b/trace-events
@@ -12,10 +12,12 @@
 #
 # Format of a trace event:
 #
-# name(type1 arg1[, type2 arg2] ...) format-string
+# [disable] name(type1 arg1[, type2 arg2] ...) format-string
 #
 # Example: qemu_malloc(size_t size) size %zu
 #
+# The disable keyword will build without the trace event.
+#
 # The name must be a valid as a C function name.
 #
 # Types should be standard C types.  Use void * for pointers because the trace
diff --git a/tracetool b/tracetool
index 766a9ba..53d3612 100755
--- a/tracetool
+++ b/tracetool
@@ -110,7 +110,7 @@ linetoc_end_nop()
 # Process stdin by calling begin, line, and end functions for the backend
 convert()
 {
-local begin process_line end
+local begin process_line end str disable
 begin=lineto$1_begin_$backend
 process_line=lineto$1_$backend
 end=lineto$1_end_$backend
@@ -123,8 +123,14 @@ convert()
 str=${str%%#*}
 test -z $str  continue
 
+# Process the line.  The nop backend handles disabled lines.
+disable=${str%%disable*}
 echo
-$process_line $str
+if test -z $disable; then
+lineto$1_nop ${str##disable}
+else
+$process_line $str
+fi
 done
 
 echo
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] trace: Add trace-events file for declaring trace events

2010-05-25 Thread Stefan Hajnoczi
This patch introduces the trace-events file where trace events can be
declared like so:

qemu_malloc(size_t size) size %zu
qemu_free(void *ptr) ptr %p

These trace event declarations are processed by a new tool called
tracetool to generate code for the trace events.  Trace event
declarations are independent of the backend tracing system (LTTng User
Space Tracing, ftrace markers, DTrace).

The default nop backend generates empty trace event functions.
Therefore trace events are disabled by default.

The trace-events file serves two purposes:

1. Adding trace events is easy.  It is not necessary to understand the
   details of a backend tracing system.  The trace-events file is a
   single location where trace events can be declared without code
   duplication.

2. QEMU is not tightly coupled to one particular backend tracing system.
   In order to support tracing across QEMU host platforms and to
   anticipate new backend tracing systems that are currently maturing,
   it is important to be flexible and not tied to one system.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
v2:
 * Use $source_path/tracetool in ./configure
 * Include qemu-common.h in trace.h so common types are available

 .gitignore  |2 +
 Makefile|   17 -
 Makefile.objs   |5 ++
 Makefile.target |1 +
 configure   |   19 ++
 trace-events|   24 
 tracetool   |  165 +++
 7 files changed, 229 insertions(+), 4 deletions(-)
 create mode 100644 trace-events
 create mode 100755 tracetool

diff --git a/.gitignore b/.gitignore
index fdfe2f0..4644557 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,6 +2,8 @@ config-devices.*
 config-all-devices.*
 config-host.*
 config-target.*
+trace.h
+trace.c
 *-softmmu
 *-darwin-user
 *-linux-user
diff --git a/Makefile b/Makefile
index 7986bf6..a9f79a9 100644
--- a/Makefile
+++ b/Makefile
@@ -1,6 +1,6 @@
 # Makefile for QEMU.
 
-GENERATED_HEADERS = config-host.h
+GENERATED_HEADERS = config-host.h trace.h
 
 ifneq ($(wildcard config-host.mak),)
 # Put the all: rule here so that config-host.mak can contain dependencies.
@@ -130,16 +130,24 @@ bt-host.o: QEMU_CFLAGS += $(BLUEZ_CFLAGS)
 
 iov.o: iov.c iov.h
 
+trace.h: trace-events
+   $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -h  
$  $@,  GEN   $@)
+
+trace.c: trace-events
+   $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -c  
$  $@,  GEN   $@)
+
+trace.o: trace.c
+
 ##
 
 qemu-img.o: qemu-img-cmds.h
 qemu-img.o qemu-tool.o qemu-nbd.o qemu-io.o: $(GENERATED_HEADERS)
 
-qemu-img$(EXESUF): qemu-img.o qemu-tool.o qemu-error.o $(block-obj-y) 
$(qobject-obj-y)
+qemu-img$(EXESUF): qemu-img.o qemu-tool.o qemu-error.o $(trace-obj-y) 
$(block-obj-y) $(qobject-obj-y)
 
-qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(block-obj-y) 
$(qobject-obj-y)
+qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(trace-obj-y) 
$(block-obj-y) $(qobject-obj-y)
 
-qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(block-obj-y) 
$(qobject-obj-y)
+qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(trace-obj-y) 
$(block-obj-y) $(qobject-obj-y)
 
 qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
$(call quiet-command,sh $(SRC_PATH)/hxtool -h  $  $@,  GEN   $@)
@@ -157,6 +165,7 @@ clean:
rm -f *.o *.d *.a $(TOOLS) TAGS cscope.* *.pod *~ */*~
rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d 
net/*.o net/*.d
rm -f qemu-img-cmds.h
+   rm -f trace.c trace.h
$(MAKE) -C tests clean
for d in $(ALL_SUBDIRS) libhw32 libhw64 libuser libdis libdis-user; do \
if test -d $$d; then $(MAKE) -C $$d $@ || exit 1; fi; \
diff --git a/Makefile.objs b/Makefile.objs
index 1a942e5..20e709e 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -251,6 +251,11 @@ libdis-$(CONFIG_S390_DIS) += s390-dis.o
 libdis-$(CONFIG_SH4_DIS) += sh4-dis.o
 libdis-$(CONFIG_SPARC_DIS) += sparc-dis.o
 
+##
+# trace
+
+trace-obj-y = trace.o
+
 vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS)
 
 vl.o: QEMU_CFLAGS+=$(SDL_CFLAGS)
diff --git a/Makefile.target b/Makefile.target
index fda5bf3..8f7b564 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -293,6 +293,7 @@ $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y): $(GENERATED_HEADERS)
 
 obj-y += $(addprefix ../, $(common-obj-y))
 obj-y += $(addprefix ../libdis/, $(libdis-y))
+obj-y += $(addprefix ../, $(trace-obj-y))
 obj-y += $(libobj-y)
 obj-y += $(addprefix $(HWDIR)/, $(hw-obj-y))
 
diff --git a/configure b/configure
index 3cd2c5f..e94e113 100755
--- a/configure
+++ b/configure
@@ -299,6 +299,7 @@ pkgversion=
 check_utests=no
 user_pie=no
 zero_malloc=
+trace_backend=nop
 
 # OS specific
 if check_define __linux__ ; then
@@ -494,6 +495,8 @@ for opt do
   ;;
   --target-list=*) target_list=$optarg
   ;;
+  

[PATCH 4/7] trace: Add LTTng Userspace Tracer backend

2010-05-25 Thread Stefan Hajnoczi
This patch adds LTTng Userspace Tracer (UST) backend support.  The UST
system requires no kernel support but libust and liburcu must be
installed.

$ ./configure --trace-backend ust
$ make

Start the UST daemon:
$ ustd 

List available tracepoints and enable some:
$ ustctl --list-markers $(pgrep qemu)
[...]
{PID: 5458, channel/marker: ust/paio_submit, state: 0, fmt: acb %p
opaque %p sector_num %lu nb_sectors %lu type %lu 0x4b32ba}
$ ustctl --enable-marker ust/paio_submit $(pgrep qemu)

Run the trace:
$ ustctl --create-trace $(pgrep qemu)
$ ustctl --start-trace $(pgrep qemu)
[...]
$ ustctl --stop-trace $(pgrep qemu)
$ ustctl --destroy-trace $(pgrep qemu)

Trace results can be viewed using lttv-gui.

More information about UST:
http://lttng.org/ust

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 configure |5 +++-
 tracetool |   77 +++-
 2 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 7d2c69b..675d0fc 100755
--- a/configure
+++ b/configure
@@ -829,7 +829,7 @@ echo   --enable-docsenable documentation build
 echo   --disable-docs   disable documentation build
 echo   --disable-vhost-net  disable vhost-net acceleration support
 echo   --enable-vhost-net   enable vhost-net acceleration support
-echo   --trace-backend=BTrace backend nop simple
+echo   --trace-backend=BTrace backend nop simple ust
 echo 
 echo NOTE: The object files are built at the place where configure is 
launched
 exit 1
@@ -2302,6 +2302,9 @@ bsd)
 esac
 
 echo TRACE_BACKEND=$trace_backend  $config_host_mak
+if test $trace_backend = ust; then
+  LIBS=-lust $LIBS
+fi
 
 tools=
 if test `expr $target_list : .*softmmu.*` != 0 ; then
diff --git a/tracetool b/tracetool
index f094ddc..9ea9c08 100755
--- a/tracetool
+++ b/tracetool
@@ -3,12 +3,13 @@
 usage()
 {
 cat 2 EOF
-usage: $0 [--nop | --simple] [-h | -c]
+usage: $0 [--nop | --simple | --ust] [-h | -c]
 Generate tracing code for a file on stdin.
 
 Backends:
   --nop Tracing disabled
   --simple  Simple built-in backend
+  --ust LTTng User Space Tracing backend
 
 Output formats:
   -hGenerate .h file
@@ -179,6 +180,78 @@ linetoc_end_simple()
 return
 }
 
+linetoh_begin_ust()
+{
+echo #include ust/tracepoint.h
+}
+
+linetoh_ust()
+{
+local name args argnames
+name=$(get_name $1)
+args=$(get_args $1)
+argnames=$(get_argnames $1)
+
+cat EOF
+DECLARE_TRACE(ust_$name, TPPROTO($args), TPARGS($argnames));
+#define trace_$name trace_ust_$name
+EOF
+}
+
+linetoh_end_ust()
+{
+# Clean up after UST headers which pollute the namespace
+cat EOF
+#undef mutex_lock
+#undef mutex_unlock
+EOF
+}
+
+linetoc_begin_ust()
+{
+cat EOF
+#include ust/marker.h
+#include trace.h
+EOF
+}
+
+linetoc_ust()
+{
+local name args argnames fmt
+name=$(get_name $1)
+args=$(get_args $1)
+argnames=$(get_argnames $1)
+fmt=$(get_fmt $1)
+
+cat EOF
+DEFINE_TRACE(ust_$name);
+
+static void ust_${name}_probe($args)
+{
+trace_mark(ust, $name, $fmt, $argnames);
+}
+EOF
+
+# Collect names for later
+names=$names $name
+}
+
+linetoc_end_ust()
+{
+cat EOF
+static void __attribute__((constructor)) trace_init(void)
+{
+EOF
+
+for name in $names; do
+cat EOF
+register_trace_ust_$name(ust_${name}_probe);
+EOF
+done
+
+echo }
+}
+
 # Process stdin by calling begin, line, and end functions for the backend
 convert()
 {
@@ -228,7 +301,7 @@ tracetoc()
 
 # Choose backend
 case $1 in
---nop | --simple) backend=${1#--} ;;
+--nop | --simple | --ust) backend=${1#--} ;;
 *) usage ;;
 esac
 shift
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] trace: Trace virtqueue operations

2010-05-25 Thread Stefan Hajnoczi
This patch adds trace events for virtqueue operations including
adding/removing buffers, notifying the guest, and receiving a notify
from the guest.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
v2:
 * This patch is new in v2

 hw/virtio.c  |8 
 trace-events |8 
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index 4475bb3..a5741ae 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -13,6 +13,7 @@
 
 #include inttypes.h
 
+#include trace.h
 #include virtio.h
 #include sysemu.h
 
@@ -205,6 +206,8 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 unsigned int offset;
 int i;
 
+trace_virtqueue_fill(vq, elem, len, idx);
+
 offset = 0;
 for (i = 0; i  elem-in_num; i++) {
 size_t size = MIN(len - offset, elem-in_sg[i].iov_len);
@@ -232,6 +235,7 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 {
 /* Make sure buffer is written before we update index. */
 wmb();
+trace_virtqueue_flush(vq, count);
 vring_used_idx_increment(vq, count);
 vq-inuse -= count;
 }
@@ -422,6 +426,7 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
 
 vq-inuse++;
 
+trace_virtqueue_pop(vq, elem, elem-in_num, elem-out_num);
 return elem-in_num + elem-out_num;
 }
 
@@ -560,6 +565,7 @@ int virtio_queue_get_num(VirtIODevice *vdev, int n)
 void virtio_queue_notify(VirtIODevice *vdev, int n)
 {
 if (n  VIRTIO_PCI_QUEUE_MAX  vdev-vq[n].vring.desc) {
+trace_virtio_queue_notify(vdev, n, vdev-vq[n]);
 vdev-vq[n].handle_output(vdev, vdev-vq[n]);
 }
 }
@@ -597,6 +603,7 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 
 void virtio_irq(VirtQueue *vq)
 {
+trace_virtio_irq(vq);
 vq-vdev-isr |= 0x01;
 virtio_notify_vector(vq-vdev, vq-vector);
 }
@@ -609,6 +616,7 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
  (vq-inuse || vring_avail_idx(vq) != vq-last_avail_idx)))
 return;
 
+trace_virtio_notify(vdev, vq);
 vdev-isr |= 0x01;
 virtio_notify_vector(vdev, vq-vector);
 }
diff --git a/trace-events b/trace-events
index 48415f8..a533414 100644
--- a/trace-events
+++ b/trace-events
@@ -35,6 +35,14 @@ qemu_memalign(size_t alignment, size_t size, void *ptr) 
alignment %zu size %zu
 qemu_valloc(size_t size, void *ptr) size %zu ptr %p
 qemu_vfree(void *ptr) ptr %p
 
+# hw/virtio.c
+virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) 
vq %p elem %p len %u idx %u
+virtqueue_flush(void *vq, unsigned int count) vq %p count %u
+virtqueue_pop(void *vq, void *elem, unsigned int in_num, unsigned int out_num) 
vq %p elem %p in_num %u out_num %u
+virtio_queue_notify(void *vdev, int n, void *vq) vdev %p n %d vq %p
+virtio_irq(void *vq) vq %p
+virtio_notify(void *vdev, void *vq) vdev %p vq %p
+
 # block.c
 multiwrite_cb(void *mcb, int ret) mcb %p ret %d
 bdrv_aio_multiwrite(void *mcb, int num_callbacks, int num_reqs) mcb %p 
num_callbacks %d num_reqs %d
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] trace: Trace virtio-blk, multiwrite, and paio_submit

2010-05-25 Thread Stefan Hajnoczi
This patch adds trace events that make it possible to observe
virtio-blk.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 block.c|7 +++
 hw/virtio-blk.c|7 +++
 posix-aio-compat.c |2 ++
 trace-events   |   14 ++
 4 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 0b0966c..56db112 100644
--- a/block.c
+++ b/block.c
@@ -23,6 +23,7 @@
  */
 #include config-host.h
 #include qemu-common.h
+#include trace.h
 #include monitor.h
 #include block_int.h
 #include module.h
@@ -1922,6 +1923,8 @@ static void multiwrite_cb(void *opaque, int ret)
 {
 MultiwriteCB *mcb = opaque;
 
+trace_multiwrite_cb(mcb, ret);
+
 if (ret  0  !mcb-error) {
 mcb-error = ret;
 multiwrite_user_cb(mcb);
@@ -2065,6 +2068,8 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
 // Check for mergable requests
 num_reqs = multiwrite_merge(bs, reqs, num_reqs, mcb);
 
+trace_bdrv_aio_multiwrite(mcb, mcb-num_callbacks, num_reqs);
+
 // Run the aio requests
 for (i = 0; i  num_reqs; i++) {
 acb = bdrv_aio_writev(bs, reqs[i].sector, reqs[i].qiov,
@@ -2075,9 +2080,11 @@ int bdrv_aio_multiwrite(BlockDriverState *bs, 
BlockRequest *reqs, int num_reqs)
 // submitted yet. Otherwise we'll wait for the submitted AIOs to
 // complete and report the error in the callback.
 if (mcb-num_requests == 0) {
+trace_bdrv_aio_multiwrite_earlyfail(mcb);
 reqs[i].error = -EIO;
 goto fail;
 } else {
+trace_bdrv_aio_multiwrite_latefail(mcb, i);
 mcb-num_requests++;
 multiwrite_cb(mcb, -EIO);
 break;
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 5d7f1a2..706f109 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -13,6 +13,7 @@
 
 #include qemu-common.h
 #include sysemu.h
+#include trace.h
 #include virtio-blk.h
 #include block_int.h
 #ifdef __linux__
@@ -50,6 +51,8 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, int 
status)
 {
 VirtIOBlock *s = req-dev;
 
+trace_virtio_blk_req_complete(req, status);
+
 req-in-status = status;
 virtqueue_push(s-vq, req-elem, req-qiov.size + sizeof(*req-in));
 virtio_notify(s-vdev, s-vq);
@@ -87,6 +90,8 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
 {
 VirtIOBlockReq *req = opaque;
 
+trace_virtio_blk_rw_complete(req, ret);
+
 if (ret) {
 int is_read = !(req-out-type  VIRTIO_BLK_T_OUT);
 if (virtio_blk_handle_rw_error(req, -ret, is_read))
@@ -263,6 +268,8 @@ static void virtio_blk_handle_flush(BlockRequest *blkreq, 
int *num_writes,
 static void virtio_blk_handle_write(BlockRequest *blkreq, int *num_writes,
 VirtIOBlockReq *req, BlockDriverState **old_bs)
 {
+trace_virtio_blk_handle_write(req, req-out-sector, req-qiov.size / 512);
+
 if (req-out-sector  req-dev-sector_mask) {
 virtio_blk_rw_complete(req, -EIO);
 return;
diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index b43c531..c2200fe 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -25,6 +25,7 @@
 #include qemu-queue.h
 #include osdep.h
 #include qemu-common.h
+#include trace.h
 #include block_int.h
 
 #include block/raw-posix-aio.h
@@ -583,6 +584,7 @@ BlockDriverAIOCB *paio_submit(BlockDriverState *bs, int fd,
 acb-next = posix_aio_state-first_aio;
 posix_aio_state-first_aio = acb;
 
+trace_paio_submit(acb, opaque, sector_num, nb_sectors, type);
 qemu_paio_submit(acb);
 return acb-common;
 }
diff --git a/trace-events b/trace-events
index 3fde0c6..48415f8 100644
--- a/trace-events
+++ b/trace-events
@@ -34,3 +34,17 @@ qemu_free(void *ptr) ptr %p
 qemu_memalign(size_t alignment, size_t size, void *ptr) alignment %zu size 
%zu ptr %p
 qemu_valloc(size_t size, void *ptr) size %zu ptr %p
 qemu_vfree(void *ptr) ptr %p
+
+# block.c
+multiwrite_cb(void *mcb, int ret) mcb %p ret %d
+bdrv_aio_multiwrite(void *mcb, int num_callbacks, int num_reqs) mcb %p 
num_callbacks %d num_reqs %d
+bdrv_aio_multiwrite_earlyfail(void *mcb) mcb %p
+bdrv_aio_multiwrite_latefail(void *mcb, int i) mcb %p i %d
+
+# hw/virtio-blk.c
+virtio_blk_req_complete(void *req, int status) req %p status %d
+virtio_blk_rw_complete(void *req, int ret) req %p ret %d
+virtio_blk_handle_write(void *req, unsigned long sector, unsigned long 
nsectors) req %p sector %lu nsectors %lu
+
+# posix-aio-compat.c
+paio_submit(void *acb, void *opaque, unsigned long sector_num, unsigned long 
nb_sectors, unsigned long type) acb %p opaque %p sector_num %lu nb_sectors %lu 
type %lu
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] trace: Trace qemu_malloc() and qemu_vmalloc()

2010-05-25 Thread Stefan Hajnoczi
It is often useful to instrument memory management functions in order to
find leaks or performance problems.  This patch adds trace events for
the memory allocation primitives.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
v2:
 * Record pointer result from allocation functions

 osdep.c   |   24 ++--
 qemu-malloc.c |   12 ++--
 trace-events  |   10 ++
 3 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/osdep.c b/osdep.c
index abbc8a2..a6b7726 100644
--- a/osdep.c
+++ b/osdep.c
@@ -50,6 +50,7 @@
 #endif
 
 #include qemu-common.h
+#include trace.h
 #include sysemu.h
 #include qemu_socket.h
 
@@ -71,25 +72,34 @@ static void *oom_check(void *ptr)
 #if defined(_WIN32)
 void *qemu_memalign(size_t alignment, size_t size)
 {
+void *ptr;
+
 if (!size) {
 abort();
 }
-return oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
+ptr = oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
+trace_qemu_memalign(alignment, size, ptr);
+return ptr;
 }
 
 void *qemu_vmalloc(size_t size)
 {
+void *ptr;
+
 /* FIXME: this is not exactly optimal solution since VirtualAlloc
has 64Kb granularity, but at least it guarantees us that the
memory is page aligned. */
 if (!size) {
 abort();
 }
-return oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
+ptr = oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
+trace_qemu_vmalloc(size, ptr);
+return ptr;
 }
 
 void qemu_vfree(void *ptr)
 {
+trace_qemu_vfree(ptr);
 VirtualFree(ptr, 0, MEM_RELEASE);
 }
 
@@ -97,21 +107,22 @@ void qemu_vfree(void *ptr)
 
 void *qemu_memalign(size_t alignment, size_t size)
 {
+void *ptr;
 #if defined(_POSIX_C_SOURCE)  !defined(__sun__)
 int ret;
-void *ptr;
 ret = posix_memalign(ptr, alignment, size);
 if (ret != 0) {
 fprintf(stderr, Failed to allocate %zu B: %s\n,
 size, strerror(ret));
 abort();
 }
-return ptr;
 #elif defined(CONFIG_BSD)
-return oom_check(valloc(size));
+ptr = oom_check(valloc(size));
 #else
-return oom_check(memalign(alignment, size));
+ptr = oom_check(memalign(alignment, size));
 #endif
+trace_qemu_memalign(alignment, size, ptr);
+return ptr;
 }
 
 /* alloc shared memory pages */
@@ -122,6 +133,7 @@ void *qemu_vmalloc(size_t size)
 
 void qemu_vfree(void *ptr)
 {
+trace_qemu_vfree(ptr);
 free(ptr);
 }
 
diff --git a/qemu-malloc.c b/qemu-malloc.c
index 6cdc5de..72de60a 100644
--- a/qemu-malloc.c
+++ b/qemu-malloc.c
@@ -22,6 +22,7 @@
  * THE SOFTWARE.
  */
 #include qemu-common.h
+#include trace.h
 #include stdlib.h
 
 static void *oom_check(void *ptr)
@@ -39,6 +40,7 @@ void *get_mmap_addr(unsigned long size)
 
 void qemu_free(void *ptr)
 {
+trace_qemu_free(ptr);
 free(ptr);
 }
 
@@ -53,18 +55,24 @@ static int allow_zero_malloc(void)
 
 void *qemu_malloc(size_t size)
 {
+void *ptr;
 if (!size  !allow_zero_malloc()) {
 abort();
 }
-return oom_check(malloc(size ? size : 1));
+ptr = oom_check(malloc(size ? size : 1));
+trace_qemu_malloc(size, ptr);
+return ptr;
 }
 
 void *qemu_realloc(void *ptr, size_t size)
 {
+void *newptr;
 if (!size  !allow_zero_malloc()) {
 abort();
 }
-return oom_check(realloc(ptr, size ? size : 1));
+newptr = oom_check(realloc(ptr, size ? size : 1));
+trace_qemu_realloc(ptr, size, newptr);
+return newptr;
 }
 
 void *qemu_mallocz(size_t size)
diff --git a/trace-events b/trace-events
index 5efaa86..3fde0c6 100644
--- a/trace-events
+++ b/trace-events
@@ -24,3 +24,13 @@
 # system may not have the necessary headers included.
 #
 # The format-string should be a sprintf()-compatible format string.
+
+# qemu-malloc.c
+qemu_malloc(size_t size, void *ptr) size %zu ptr %p
+qemu_realloc(void *ptr, size_t size, void *newptr) ptr %p size %zu newptr %p
+qemu_free(void *ptr) ptr %p
+
+# osdep.c
+qemu_memalign(size_t alignment, size_t size, void *ptr) alignment %zu size 
%zu ptr %p
+qemu_valloc(size_t size, void *ptr) size %zu ptr %p
+qemu_vfree(void *ptr) ptr %p
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] trace: Add simple built-in tracing backend

2010-05-25 Thread Stefan Hajnoczi
This patch adds a simple tracer which produces binary trace files and is
built into QEMU.  The main purpose of this patch is to show how new
tracing backends can be added to tracetool.

To try out the simple backend:

./configure --trace-backend=simple
make

After running QEMU you can pretty-print the trace:

./simpletrace.py trace-events /tmp/trace.log

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
I intend for this tracing backend to be replaced by something based on Prerna's
work.  For now it is useful for basic tracing.

v2:
 * Make simpletrace.py parse trace-events instead of generating Python

 .gitignore |1 +
 Makefile.objs  |3 ++
 configure  |2 +-
 simpletrace.c  |   64 ++
 simpletrace.py |   53 ++
 tracetool  |   78 +--
 6 files changed, 197 insertions(+), 4 deletions(-)
 create mode 100644 simpletrace.c
 create mode 100755 simpletrace.py

diff --git a/.gitignore b/.gitignore
index 4644557..5128452 100644
--- a/.gitignore
+++ b/.gitignore
@@ -39,6 +39,7 @@ qemu-monitor.texi
 *.log
 *.pdf
 *.pg
+*.pyc
 *.toc
 *.tp
 *.vr
diff --git a/Makefile.objs b/Makefile.objs
index 20e709e..7cb40ac 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -255,6 +255,9 @@ libdis-$(CONFIG_SPARC_DIS) += sparc-dis.o
 # trace
 
 trace-obj-y = trace.o
+ifeq ($(TRACE_BACKEND),simple)
+trace-obj-y += simpletrace.o
+endif
 
 vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS)
 
diff --git a/configure b/configure
index e94e113..7d2c69b 100755
--- a/configure
+++ b/configure
@@ -829,7 +829,7 @@ echo   --enable-docsenable documentation build
 echo   --disable-docs   disable documentation build
 echo   --disable-vhost-net  disable vhost-net acceleration support
 echo   --enable-vhost-net   enable vhost-net acceleration support
-echo   --trace-backend=BTrace backend nop
+echo   --trace-backend=BTrace backend nop simple
 echo 
 echo NOTE: The object files are built at the place where configure is 
launched
 exit 1
diff --git a/simpletrace.c b/simpletrace.c
new file mode 100644
index 000..2fec4d3
--- /dev/null
+++ b/simpletrace.c
@@ -0,0 +1,64 @@
+#include stdlib.h
+#include stdio.h
+#include trace.h
+
+typedef struct {
+unsigned long event;
+unsigned long x1;
+unsigned long x2;
+unsigned long x3;
+unsigned long x4;
+unsigned long x5;
+} TraceRecord;
+
+enum {
+TRACE_BUF_LEN = 64 * 1024 / sizeof(TraceRecord),
+};
+
+static TraceRecord trace_buf[TRACE_BUF_LEN];
+static unsigned int trace_idx;
+static FILE *trace_fp;
+
+static void trace(TraceEvent event, unsigned long x1,
+  unsigned long x2, unsigned long x3,
+  unsigned long x4, unsigned long x5) {
+TraceRecord *rec = trace_buf[trace_idx];
+rec-event = event;
+rec-x1 = x1;
+rec-x2 = x2;
+rec-x3 = x3;
+rec-x4 = x4;
+rec-x5 = x5;
+
+if (++trace_idx == TRACE_BUF_LEN) {
+trace_idx = 0;
+
+if (!trace_fp) {
+trace_fp = fopen(/tmp/trace.log, w);
+}
+if (trace_fp) {
+size_t result = fwrite(trace_buf, sizeof trace_buf, 1, trace_fp);
+result = result;
+}
+}
+}
+
+void trace1(TraceEvent event, unsigned long x1) {
+trace(event, x1, 0, 0, 0, 0);
+}
+
+void trace2(TraceEvent event, unsigned long x1, unsigned long x2) {
+trace(event, x1, x2, 0, 0, 0);
+}
+
+void trace3(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
long x3) {
+trace(event, x1, x2, x3, 0, 0);
+}
+
+void trace4(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
long x3, unsigned long x4) {
+trace(event, x1, x2, x3, x4, 0);
+}
+
+void trace5(TraceEvent event, unsigned long x1, unsigned long x2, unsigned 
long x3, unsigned long x4, unsigned long x5) {
+trace(event, x1, x2, x3, x4, x5);
+}
diff --git a/simpletrace.py b/simpletrace.py
new file mode 100755
index 000..d6631ba
--- /dev/null
+++ b/simpletrace.py
@@ -0,0 +1,53 @@
+#!/usr/bin/env python
+import sys
+import struct
+import re
+
+trace_fmt = 'LL'
+trace_len = struct.calcsize(trace_fmt)
+event_re  = re.compile(r'(disable\s+)?([a-zA-Z0-9_]+)\(([^)]*)\)\s+([^]*)')
+
+def parse_events(fobj):
+def get_argnames(args):
+return tuple(arg.split()[-1].lstrip('*') for arg in args.split(','))
+
+events = {}
+event_num = 0
+for line in fobj:
+m = event_re.match(line.strip())
+if m is None:
+continue
+
+disable, name, args, fmt = m.groups()
+if disable:
+continue
+
+events[event_num] = (name,) + get_argnames(args)
+event_num += 1
+return events
+
+def read_record(fobj):
+s = fobj.read(trace_len)
+if len(s) != trace_len:
+return None
+return struct.unpack(trace_fmt, s)
+
+def format_record(events, rec):
+event = events[rec[0]]
+fields = 

Re: [PATCH] VMX: Fix and improve guest state validity checks

2010-05-25 Thread Mohammed Gamal
On Tue, May 25, 2010 at 12:37 PM, Avi Kivity a...@redhat.com wrote:
 On 05/13/2010 11:15 PM, Mohammed Gamal wrote:

 On Thu, May 13, 2010 at 9:24 AM, Avi Kivitya...@redhat.com  wrote:


 On 05/11/2010 07:52 PM, Mohammed Gamal wrote:


 - Add 's' and 'g' field checks on segment registers
 - Correct SS checks for request and descriptor privilege levels

 Signed-off-by: Mohammed Gamalm.gamal...@gmail.com
 ---
  arch/x86/kvm/vmx.c |   73
 +++
  1 files changed, 67 insertions(+), 6 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 777e00d..9805c2a 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2121,16 +2121,30 @@ static bool stack_segment_valid(struct kvm_vcpu
 *vcpu)
        vmx_get_segment(vcpu,ss, VCPU_SREG_SS);
        ss_rpl = ss.selector    SELECTOR_RPL_MASK;

 -       if (ss.unusable)
 +       if (ss.dpl != ss_rpl) /* DPL != RPL */
 +               return false;
 +
 +       if (ss.unusable) /* Short-circuit */
                return true;



 If ss.unusable, do the dpl and rpl have any meaning?


 The idea is that dpl and rpl are checked on vmentry regardless of
 whether ss is usable or not. While the other checks are performed only
 if ss is usable.


 Any reference to back this up?  I think rpl is valid regardless of
 ss.unusable (i.e. loading selector 0003 results in an unusable segment with
 rpl=3), but I don't see how dpl can be valid in an unusable segment.

Intel 64 and IA-32 Architectures Software Developer’s Manual Volume
3B, System Programming Guide, Part 2, Chapter 22, Section 22.3.1.2:
Checks on Guest Segment Registers.
You'll note that DS, ES, FS, GS checks are done when the segment is
usable. SS checks are not necessarily checked only when the segment is
usable.
 --
 error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Kevin Wolf
Am 23.05.2010 14:01, schrieb Avi Kivity:
 On 05/21/2010 12:29 AM, Anthony Liguori wrote:

 I'd be more interested in enabling people to build these types of 
 storage systems without touching qemu.

 Both sheepdog and ceph ultimately transmit I/O over a socket to a 
 central daemon, right? 
 
 That incurs an extra copy.
 
 So could we not standardize a protocol for this that both sheepdog and 
 ceph could implement?
 
 The protocol already exists, nbd.  It doesn't support snapshotting etc. 
 but we could extend it.
 
 But IMO what's needed is a plugin API for the block layer.

What would it buy us, apart from more downstreams and having to maintain
a stable API and ABI? Hiding block drivers somewhere else doesn't make
them stop existing, they just might not be properly integrated, but
rather hacked in to fit that limited stable API.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: irq problems after live migration with 0.12.4

2010-05-25 Thread Peter Lieven

Michael Tokarev wrote:

23.05.2010 13:55, Peter Lieven wrote:

Hi,

after live migrating ubuntu 9.10 server (2.6.31-14-server) and suse 
linux 10.1 (2.6.16.13-4-smp)
it happens sometimes that the guest runs into irq problems. i mention 
these 2 guest oss
since i have seen the error there. there are likely others around 
with the same problem.


on the host i run 2.6.33.3 (kernel+mod) and qemu-kvm 0.12.4.

i started a vm with:
/usr/bin/qemu-kvm-0.12.4  -net 
tap,vlan=141,script=no,downscript=no,ifname=tap0 -net 
nic,vlan=141,model=e1000,macaddr=52:54:00:ff:00:72   -drive 
file=/dev/sdb,if=ide,boot=on,cache=none,aio=native  -m 1024 -cpu 
qemu64,model_id='Intel(R) Xeon(R) CPU   E5430  @ 2.66GHz'  
-monitor tcp:0:4001,server,nowait -vnc :1 -name 
'migration-test-9-10'  -boot order=dc,menu=on  -k de  -incoming 
tcp:172.21.55.22:5001  -pidfile /var/run/qemu/vm-155.pid  -mem-path 
/hugepages -mem-prealloc  -rtc base=utc,clock=host -usb -usbdevice 
tablet


for testing i have a clean ubuntu 9.10 server 64-bit install and 
created a small script with fetches a dvd iso from a local server and 
checking md5sum in an endless loop.


the download performance is approx. 50MB/s on that vm.

to trigger the error i did several migrations of the vm throughout 
the last days. finally I ended up in the following oops in the guest:


[64442.298521] irq 10: nobody cared (try booting with the irqpoll 
option)
[64442.299175] Pid: 0, comm: swapper Not tainted 2.6.31-14-server 
#48-Ubuntu

[64442.299179] Call Trace:
[64442.299185]IRQ   [810b4b96] __report_bad_irq+0x26/0xa0
[64442.299227]  [810b4d9c] note_interrupt+0x18c/0x1d0
[64442.299232]  [810b5415] handle_fasteoi_irq+0xd5/0x100
[64442.299244]  [81014bdd] handle_irq+0x1d/0x30
[64442.299246]  [810140b7] do_IRQ+0x67/0xe0
[64442.299249]  [810129d3] ret_from_intr+0x0/0x11
[64442.299266]  [810b3234] ? handle_IRQ_event+0x24/0x160
[64442.299269]  [810b529f] ? handle_edge_irq+0xcf/0x170
[64442.299271]  [81014bdd] ? handle_irq+0x1d/0x30
[64442.299273]  [810140b7] ? do_IRQ+0x67/0xe0
[64442.299275]  [810129d3] ? ret_from_intr+0x0/0x11
[64442.299290]  [81526b14] ? _spin_unlock_irqrestore+0x14/0x20
[64442.299302]  [8133257c] ? scsi_dispatch_cmd+0x16c/0x2d0
[64442.299307]  [8133963a] ? scsi_request_fn+0x3aa/0x500
[64442.299322]  [8125fafc] ? __blk_run_queue+0x6c/0x150
[64442.299324]  [8125fcbb] ? blk_run_queue+0x2b/0x50
[64442.299327]  [8133899f] ? scsi_run_queue+0xcf/0x2a0
[64442.299336]  [81339a0d] ? scsi_next_command+0x3d/0x60
[64442.299338]  [8133a21b] ? scsi_end_request+0xab/0xb0
[64442.299340]  [8133a50e] ? scsi_io_completion+0x9e/0x4d0
[64442.299348]  [81036419] ? default_spin_lock_flags+0x9/0x10
[64442.299351]  [8133224d] ? scsi_finish_command+0xbd/0x130
[64442.299353]  [8133aa95] ? scsi_softirq_done+0x145/0x170
[64442.299356]  [81264e6d] ? blk_done_softirq+0x7d/0x90
[64442.299368]  [810651fd] ? __do_softirq+0xbd/0x200
[64442.299370]  [810131ac] ? call_softirq+0x1c/0x30
[64442.299372]  [81014b85] ? do_softirq+0x55/0x90
[64442.299374]  [81064f65] ? irq_exit+0x85/0x90
[64442.299376]  [810140c0] ? do_IRQ+0x70/0xe0
[64442.299379]  [810129d3] ? ret_from_intr+0x0/0x11
[64442.299380]EOI   [810356f6] ? native_safe_halt+0x6/0x10
[64442.299390]  [8101a20c] ? default_idle+0x4c/0xe0
[64442.299395]  [815298f5] ? 
atomic_notifier_call_chain+0x15/0x20

[64442.299398]  [81010e02] ? cpu_idle+0xb2/0x100
[64442.299406]  [815123c6] ? rest_init+0x66/0x70
[64442.299424]  [81838047] ? start_kernel+0x352/0x35b
[64442.299427]  [8183759a] ? 
x86_64_start_reservations+0x125/0x129

[64442.299429]  [81837698] ? x86_64_start_kernel+0xfa/0x109
[64442.299433] handlers:
[64442.299840] [ab80] (e1000_intr+0x0/0x190 [e1000])
[64442.300046] Disabling IRQ #10


See also LP bug #584131 (https://bugs.launchpad.net/bugs/584131)
and original Debian bug#580649 (http://bugs.debian.org/580649)

Not sure if they're related...

/mjt

michael, do you have any ideas what i got do to debug whats happening?
looking at launchpad and debian bug tracker i found other bugs also
with a maybe related problem. so this issue might be greater...

thanks
peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VMX: Fix and improve guest state validity checks

2010-05-25 Thread Avi Kivity

On 05/25/2010 01:36 PM, Mohammed Gamal wrote:



Any reference to back this up?  I think rpl is valid regardless of
ss.unusable (i.e. loading selector 0003 results in an unusable segment with
rpl=3), but I don't see how dpl can be valid in an unusable segment.

 

Intel 64 and IA-32 Architectures Software Developer’s Manual Volume
3B, System Programming Guide, Part 2, Chapter 22, Section 22.3.1.2:
Checks on Guest Segment Registers.
You'll note that DS, ES, FS, GS checks are done when the segment is
usable. SS checks are not necessarily checked only when the segment is
usable.
   


Strange, but consistent with


  If the unusable bit is 1, the base address, the segment limit, and the
  remainder of the access rights are undefined after VM entry. The only
  exceptions are the following:
  — Bits 3:0 of the base address for SS are cleared to 0.
  — SS.DPL: always loaded from the SS access-rights field. This will be
   the current privilege level (CPL) after the VM entry completes.
  — SS.B: set to 1.
  — The base addresses for FS and GS: always loaded. On processors
   that support Intel 64 architecture, the values loaded for base
   addresses for FS and GS are also manifest in the FS.base and
   GS.base MSRs.
  — The base address for LDTR on processors that support Intel 64 archi-
   tecture: set to an undefined but canonical value.
  — Bits 63:32 of the base addresses for SS, DS, and ES on processors
   that support Intel 64 architecture: cleared to 0.


So you are right.

Seems to me we can simplify vmx_get_cpl() on this basis to look at ss.dpl.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [RFC PATCH] AMD IOMMU emulation

2010-05-25 Thread Eduard - Gabriel Munteanu
On Tue, May 25, 2010 at 10:39:22AM +0200, Joerg Roedel wrote:
 On Mon, May 24, 2010 at 08:10:16PM +, Blue Swirl wrote:
  On Mon, May 24, 2010 at 3:40 PM, Joerg Roedel j...@8bytes.org wrote:
   +
   +#define MMIO_SIZE ? ? ? ? ? ? ? 0x2028
  
   This size should be a power-of-two value. In this case probably 0x4000.
  
  Not really, the devices can reserve regions of any size. There were
  some implementation deficiencies in earlier versions of QEMU, where
  the whole page would be reserved anyway, but this limitation has been
  removed long time ago.
 
 The drivers for AMD IOMMU expect that to be 0x4000. At least the Linux
 driver maps the MMIO region with this size. So the emulation should
 reserve this amount of MMIO space too.
 
   Joerg

Yeah, I'll change that, since I already reserve 0x4000 bytes in SeaBIOS
for it (I did that to deal with the 16 KiB alignment requirement).


Eduard

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 02:02 PM, Kevin Wolf wrote:





So could we not standardize a protocol for this that both sheepdog and
ceph could implement?
   

The protocol already exists, nbd.  It doesn't support snapshotting etc.
but we could extend it.

But IMO what's needed is a plugin API for the block layer.
 

What would it buy us, apart from more downstreams and having to maintain
a stable API and ABI?


Currently if someone wants to add a new block format, they have to 
upstream it and wait for a new qemu to be released.  With a plugin API, 
they can add a new block format to an existing, supported qemu.



Hiding block drivers somewhere else doesn't make
them stop existing, they just might not be properly integrated, but
rather hacked in to fit that limited stable API.
   


They would hack it to fit the current API, and hack the API in qemu.git 
to fit their requirements for the next release.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VMX: Properly return error to userspace on vmentry failure

2010-05-25 Thread Avi Kivity

On 05/24/2010 01:01 AM, Mohammed Gamal wrote:

The vmexit handler returns KVM_EXIT_UNKNOWN since there is no handler
for vmentry failures. This intercepts vmentry failures and returns
KVM_FAIL_ENTRY to userspace instead.

Signed-off-by: Mohammed Gamalm.gamal...@gmail.com
---
  arch/x86/kvm/vmx.c |7 +++
  1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 99ae513..4edcffb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3665,6 +3665,13 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
if (enable_ept  is_paging(vcpu))
vcpu-arch.cr3 = vmcs_readl(GUEST_CR3);

+   if (exit_reason  VMX_EXIT_REASONS_FAILED_VMENTRY) {
+   vcpu-run-exit_reason = KVM_EXIT_FAIL_ENTRY;
+   vcpu-run-fail_entry.hardware_entry_failure_reason
+   = exit_reason  ~VMX_EXIT_REASONS_FAILED_VMENTRY;
+   return 0;
+   }
+
if (unlikely(vmx-fail)) {
vcpu-run-exit_reason = KVM_EXIT_FAIL_ENTRY;
vcpu-run-fail_entry.hardware_entry_failure_reason
   


How does the user distinguish between KVM_EXIT_FAIL_ENTRY due to an exit 
reason with bit 31 set and vmlauch/vmresume failure (vmx-fail set)?  We 
need separate exit codes (with documentation in api.txt).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] VMX: Add constant for invalid guest state exit reason

2010-05-25 Thread Avi Kivity

On 05/24/2010 01:01 AM, Mohammed Gamal wrote:

For the sake of completeness, this patch adds a symbolic
constant for VMX exit reason 0x21 (invalid guest state).
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VMX: Properly return error to userspace on vmentry failure

2010-05-25 Thread Mohammed Gamal
On Tue, May 25, 2010 at 2:45 PM, Avi Kivity a...@redhat.com wrote:
 On 05/24/2010 01:01 AM, Mohammed Gamal wrote:

 The vmexit handler returns KVM_EXIT_UNKNOWN since there is no handler
 for vmentry failures. This intercepts vmentry failures and returns
 KVM_FAIL_ENTRY to userspace instead.

 Signed-off-by: Mohammed Gamalm.gamal...@gmail.com
 ---
  arch/x86/kvm/vmx.c |    7 +++
  1 files changed, 7 insertions(+), 0 deletions(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 99ae513..4edcffb 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -3665,6 +3665,13 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
        if (enable_ept  is_paging(vcpu))
                vcpu-arch.cr3 = vmcs_readl(GUEST_CR3);

 +       if (exit_reason  VMX_EXIT_REASONS_FAILED_VMENTRY) {
 +               vcpu-run-exit_reason = KVM_EXIT_FAIL_ENTRY;
 +               vcpu-run-fail_entry.hardware_entry_failure_reason
 +                       = exit_reason  ~VMX_EXIT_REASONS_FAILED_VMENTRY;
 +               return 0;
 +       }
 +
        if (unlikely(vmx-fail)) {
                vcpu-run-exit_reason = KVM_EXIT_FAIL_ENTRY;
                vcpu-run-fail_entry.hardware_entry_failure_reason


 How does the user distinguish between KVM_EXIT_FAIL_ENTRY due to an exit
 reason with bit 31 set and vmlauch/vmresume failure (vmx-fail set)?  We
 need separate exit codes (with documentation in api.txt).

In both cases the vm fails entry, and I don't think the hardware entry
failure reason codes would overlap between the vmx-fail case and exit
reasons with bit 31 set, so why should there be such distinction
between both cases?

 --
 error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Christoph Hellwig
On Tue, May 25, 2010 at 02:25:53PM +0300, Avi Kivity wrote:
 Currently if someone wants to add a new block format, they have to  
 upstream it and wait for a new qemu to be released.  With a plugin API,  
 they can add a new block format to an existing, supported qemu.

So?  Unless we want a stable driver ABI which I fundamentally oppose as
it would make block driver development hell they'd have to wait for
a new release of the block layer.  It's really just going to be a lot
of pain for no major gain.  qemu releases are frequent enough, and if
users care enough they can also easily patch qemu.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] trace: Trace virtqueue operations

2010-05-25 Thread Avi Kivity

On 05/25/2010 01:24 PM, Stefan Hajnoczi wrote:

This patch adds trace events for virtqueue operations including
adding/removing buffers, notifying the guest, and receiving a notify
from the guest.

diff --git a/trace-events b/trace-events
index 48415f8..a533414 100644
--- a/trace-events
+++ b/trace-events
@@ -35,6 +35,14 @@ qemu_memalign(size_t alignment, size_t size, void *ptr) 
alignment %zu size %zu
  qemu_valloc(size_t size, void *ptr) size %zu ptr %p
  qemu_vfree(void *ptr) ptr %p

+# hw/virtio.c
+virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) vq 
%p elem %p len %u idx %u
+virtqueue_flush(void *vq, unsigned int count) vq %p count %u
+virtqueue_pop(void *vq, void *elem, unsigned int in_num, unsigned int out_num) vq 
%p elem %p in_num %u out_num %u
+virtio_queue_notify(void *vdev, int n, void *vq) vdev %p n %d vq %p
+virtio_irq(void *vq) vq %p
+virtio_notify(void *vdev, void *vq) vdev %p vq %p
+
   



Those %ps are more or less useless.  We need better ways of identifying 
them.


Linux uses %pTYPE to pretty print arbitrary types.  We could do 
something similar (not the same since we don't want our own printf 
implementation).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VMX: Properly return error to userspace on vmentry failure

2010-05-25 Thread Avi Kivity

On 05/25/2010 03:01 PM, Mohammed Gamal wrote:


How does the user distinguish between KVM_EXIT_FAIL_ENTRY due to an exit
reason with bit 31 set and vmlauch/vmresume failure (vmx-fail set)?  We
need separate exit codes (with documentation in api.txt).
 

In both cases the vm fails entry, and I don't think the hardware entry
failure reason codes would overlap between the vmx-fail case and exit
reasons with bit 31 set, so why should there be such distinction
between both cases?
   


Only 5 more error codes (28-33) and we have overlap.

If you return the new codes with bit 31 still set then we can use the 
existing KVM_EXIT_FAIL_ENTRY.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 03:03 PM, Christoph Hellwig wrote:

On Tue, May 25, 2010 at 02:25:53PM +0300, Avi Kivity wrote:
   

Currently if someone wants to add a new block format, they have to
upstream it and wait for a new qemu to be released.  With a plugin API,
they can add a new block format to an existing, supported qemu.
 

So?  Unless we want a stable driver ABI which I fundamentally oppose as
it would make block driver development hell


We'd only freeze it for a major release.


they'd have to wait for
a new release of the block layer.  It's really just going to be a lot
of pain for no major gain.  qemu releases are frequent enough, and if
users care enough they can also easily patch qemu.
   


May not be so easy for them, they lose binary updates from their distro 
and have to keep repatching.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] VMX: Properly return error to userspace on vmentry failure

2010-05-25 Thread Mohammed Gamal
On Tue, May 25, 2010 at 3:10 PM, Avi Kivity a...@redhat.com wrote:
 On 05/25/2010 03:01 PM, Mohammed Gamal wrote:

 How does the user distinguish between KVM_EXIT_FAIL_ENTRY due to an exit
 reason with bit 31 set and vmlauch/vmresume failure (vmx-fail set)?  We
 need separate exit codes (with documentation in api.txt).


 In both cases the vm fails entry, and I don't think the hardware entry
 failure reason codes would overlap between the vmx-fail case and exit
 reasons with bit 31 set, so why should there be such distinction
 between both cases?


 Only 5 more error codes (28-33) and we have overlap.

 If you return the new codes with bit 31 still set then we can use the
 existing KVM_EXIT_FAIL_ENTRY.

That'd be a better idea.



 --
 error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 04:14 AM, Avi Kivity wrote:

On 05/24/2010 10:38 PM, Anthony Liguori wrote:



- Building a plugin API seems a bit simpler to me, although I'm to
sure if I'd get the
   idea correctly:
   The block layer has already some kind of api (.bdrv_file_open, 
.bdrv_read). We
   could simply compile the block-drivers as shared objects and 
create a method

   for loading the necessary modules at runtime.


That approach would be a recipe for disaster.   We would have to 
introduce a new, reduced functionality block API that was supported 
for plugins.  Otherwise, the only way a plugin could keep up with our 
API changes would be if it was in tree which defeats the purpose of 
having plugins.


We could guarantee API/ABI stability in a stable branch but not across 
releases.


We have releases every six months.  There would be tons of block plugins 
that didn't work for random sets of releases.  That creates a lot of 
user confusion and unhappiness.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 06:25 AM, Avi Kivity wrote:

On 05/25/2010 02:02 PM, Kevin Wolf wrote:





So could we not standardize a protocol for this that both sheepdog and
ceph could implement?

The protocol already exists, nbd.  It doesn't support snapshotting etc.
but we could extend it.

But IMO what's needed is a plugin API for the block layer.

What would it buy us, apart from more downstreams and having to maintain
a stable API and ABI?


Currently if someone wants to add a new block format, they have to 
upstream it and wait for a new qemu to be released.  With a plugin 
API, they can add a new block format to an existing, supported qemu.


Whether we have a plugin or protocol based mechanism to implement block 
formats really ends up being just an implementation detail.


In order to implement either, we need to take a subset of block 
functionality that we feel we can support long term and expose that.  
Right now, that's basically just querying characteristics (like size and 
geometry) and asynchronous reads and writes.


A protocol based mechanism has the advantage of being more robust in the 
face of poorly written block backends so if it's possible to make it 
perform as well as a plugin, it's a preferable approach.


Plugins that just expose chunks of QEMU internal state directly (like 
BlockDriver) are a really bad idea IMHO.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:17 PM, Anthony Liguori wrote:

On 05/25/2010 04:14 AM, Avi Kivity wrote:

On 05/24/2010 10:38 PM, Anthony Liguori wrote:



- Building a plugin API seems a bit simpler to me, although I'm to
sure if I'd get the
   idea correctly:
   The block layer has already some kind of api (.bdrv_file_open, 
.bdrv_read). We
   could simply compile the block-drivers as shared objects and 
create a method

   for loading the necessary modules at runtime.


That approach would be a recipe for disaster.   We would have to 
introduce a new, reduced functionality block API that was supported 
for plugins.  Otherwise, the only way a plugin could keep up with 
our API changes would be if it was in tree which defeats the purpose 
of having plugins.


We could guarantee API/ABI stability in a stable branch but not 
across releases.


We have releases every six months.  There would be tons of block 
plugins that didn't work for random sets of releases.  That creates a 
lot of user confusion and unhappiness.


The current situation is that those block format drivers only exist in 
qemu.git or as patches.  Surely that's even more unhappiness.


Confusion could be mitigated:

  $ qemu -module my-fancy-block-format-driver.so
  my-fancy-block-format-driver.so does not support this version of qemu 
(0.19.2).  Please contact my-fancy-block-format-driver-de...@example.org.


The question is how many such block format drivers we expect.  We now 
have two in the pipeline (ceph, sheepdog), it's reasonable to assume 
we'll want an lvm2 driver and btrfs driver.  This is an area with a lot 
of activity and a relatively simply interface.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread MORITA Kazutaka
At Mon, 24 May 2010 14:16:32 -0500,
Anthony Liguori wrote:
 
 On 05/24/2010 06:56 AM, Avi Kivity wrote:
  On 05/24/2010 02:42 PM, MORITA Kazutaka wrote:
 
  The server would be local and talk over a unix domain socket, perhaps
  anonymous.
 
  nbd has other issues though, such as requiring a copy and no support 
  for
  metadata operations such as snapshot and file size extension.
 
  Sorry, my explanation was unclear.  I'm not sure how running servers
  on localhost can solve the problem.
 
  The local server can convert from the local (nbd) protocol to the 
  remote (sheepdog, ceph) protocol.
 
  What I wanted to say was that we cannot specify the image of VM. With
  nbd protocol, command line arguments are as follows:
 
$ qemu nbd:hostname:port
 
  As this syntax shows, with nbd protocol the client cannot pass the VM
  image name to the server.
 
  We would extend it to allow it to connect to a unix domain socket:
 
qemu nbd:unix:/path/to/socket
 
 nbd is a no-go because it only supports a single, synchronous I/O 
 operation at a time and has no mechanism for extensibility.
 
 If we go this route, I think two options are worth considering.  The 
 first would be a purely socket based approach where we just accepted the 
 extra copy.
 
 The other potential approach would be shared memory based.  We export 
 all guest ram as shared memory along with a small bounce buffer pool.  
 We would then use a ring queue (potentially even using virtio-blk) and 
 an eventfd for notification.
 

The shared memory approach assumes that there is a local server who
can talk with the storage system.  But Ceph doesn't require the local
server, and Sheepdog would be extended to support VMs running outside
the storage system.  We could run a local daemon who can only work as
proxy, but I don't think it looks a clean approach.  So I think a
socket based approach is the right way to go.

BTW, is it required to design a common interface?  The way Sheepdog
replicates data is different from Ceph, so I think it is not possible
to define a common protocol as Christian says.

Regards,

Kazutaka

  The server at the other end would associate the socket with a filename 
  and forward it to the server using the remote protocol.
 
  However, I don't think nbd would be a good protocol.  My preference 
  would be for a plugin API, or for a new local protocol that uses 
  splice() to avoid copies.
 
 I think a good shared memory implementation would be preferable to 
 plugins.  I think it's worth attempting to do a plugin interface for the 
 block layer but I strongly suspect it would not be sufficient.
 
 I would not want to see plugins that interacted with BlockDriverState 
 directly, for instance.  We change it far too often.  Our main loop 
 functions are also not terribly stable so I'm not sure how we would 
 handle that (unless we forced all block plugins to be in a separate thread).
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] trace: Trace virtqueue operations

2010-05-25 Thread Stefan Hajnoczi
On Tue, May 25, 2010 at 1:04 PM, Avi Kivity a...@redhat.com wrote:
 Those %ps are more or less useless.  We need better ways of identifying
 them.

You're right, the vq pointer is useless in isolation.  We don't know
which virtio device or which virtqueue number.

With the full context of a trace it would be possible to correlate the
vq pointer if we had trace events for vdev and vq setup.

Adding custom formatters is could be tricky since the format string is
passed only to tracing backends that use it, like UST.  And UST uses
its own sprintf implementation which we don't have direct control
over.

I think we just need to guarantee that any pointer can be correlated
with previous trace entries that give context for that pointer.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:25 AM, Avi Kivity wrote:

On 05/25/2010 04:17 PM, Anthony Liguori wrote:

On 05/25/2010 04:14 AM, Avi Kivity wrote:

On 05/24/2010 10:38 PM, Anthony Liguori wrote:



- Building a plugin API seems a bit simpler to me, although I'm to
sure if I'd get the
   idea correctly:
   The block layer has already some kind of api (.bdrv_file_open, 
.bdrv_read). We
   could simply compile the block-drivers as shared objects and 
create a method

   for loading the necessary modules at runtime.


That approach would be a recipe for disaster.   We would have to 
introduce a new, reduced functionality block API that was supported 
for plugins.  Otherwise, the only way a plugin could keep up with 
our API changes would be if it was in tree which defeats the 
purpose of having plugins.


We could guarantee API/ABI stability in a stable branch but not 
across releases.


We have releases every six months.  There would be tons of block 
plugins that didn't work for random sets of releases.  That creates a 
lot of user confusion and unhappiness.


The current situation is that those block format drivers only exist in 
qemu.git or as patches.  Surely that's even more unhappiness.


Confusion could be mitigated:

  $ qemu -module my-fancy-block-format-driver.so
  my-fancy-block-format-driver.so does not support this version of 
qemu (0.19.2).  Please contact 
my-fancy-block-format-driver-de...@example.org.


The question is how many such block format drivers we expect.  We now 
have two in the pipeline (ceph, sheepdog), it's reasonable to assume 
we'll want an lvm2 driver and btrfs driver.  This is an area with a 
lot of activity and a relatively simply interface.


If we expose a simple interface, I'm all for it.  But BlockDriver is not 
simple and things like the snapshoting API need love.


Of course, there's certainly a question of why we're solving this in 
qemu at all.  Wouldn't it be more appropriate to either (1) implement a 
kernel module for ceph/sheepdog if performance matters or (2) implement 
BUSE to complement FUSE and CUSE to enable proper userspace block devices.


If you want to use a block device within qemu, you almost certainly want 
to be able to manipulate it on the host using standard tools (like mount 
and parted) so it stands to reason that addressing this in the kernel 
makes more sense.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:25 PM, Anthony Liguori wrote:
Currently if someone wants to add a new block format, they have to 
upstream it and wait for a new qemu to be released.  With a plugin 
API, they can add a new block format to an existing, supported qemu.



Whether we have a plugin or protocol based mechanism to implement 
block formats really ends up being just an implementation detail.


True.

In order to implement either, we need to take a subset of block 
functionality that we feel we can support long term and expose that.  
Right now, that's basically just querying characteristics (like size 
and geometry) and asynchronous reads and writes.


Unfortunately, you're right.

A protocol based mechanism has the advantage of being more robust in 
the face of poorly written block backends so if it's possible to make 
it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.



Plugins that just expose chunks of QEMU internal state directly (like 
BlockDriver) are a really bad idea IMHO.


Also, we don't want to expose all of the qemu API.  We should default 
the visibility attribute to hidden and expose only select functions, 
perhaps under their own interface.  And no inlines.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:35 PM, Anthony Liguori wrote:

On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more robust in 
the face of poorly written block backends so if it's possible to 
make it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very strong 
argument as to why a shared memory mechanism was not possible or at 
least plausible.


I'm not sure I understand why shared memory is such a bad thing wrt 
KVM.  Can you elaborate?  Is it simply a matter of fork()?


fork() doesn't work in the with of memory hotplug.  What else is there?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more robust in 
the face of poorly written block backends so if it's possible to make 
it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very strong 
argument as to why a shared memory mechanism was not possible or at 
least plausible.


I'm not sure I understand why shared memory is such a bad thing wrt 
KVM.  Can you elaborate?  Is it simply a matter of fork()?




Plugins that just expose chunks of QEMU internal state directly (like 
BlockDriver) are a really bad idea IMHO.


Also, we don't want to expose all of the qemu API.  We should default 
the visibility attribute to hidden and expose only select functions, 
perhaps under their own interface.  And no inlines.


Yeah, if we did plugins, this would be a key requirement.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add support for protocol driver create_options

2010-05-25 Thread Kevin Wolf
Am 24.05.2010 08:34, schrieb MORITA Kazutaka:
 At Fri, 21 May 2010 18:57:36 +0200,
 Kevin Wolf wrote:

 Am 20.05.2010 07:36, schrieb MORITA Kazutaka:
 +
 +/*
 + * Append an option list (list) to an option list (dest).
 + *
 + * If dest is NULL, a new copy of list is created.
 + *
 + * Returns a pointer to the first element of dest (or the newly allocated 
 copy)
 + */
 +QEMUOptionParameter *append_option_parameters(QEMUOptionParameter *dest,
 +QEMUOptionParameter *list)
 +{
 +size_t num_options, num_dest_options;
 +
 +num_options = count_option_parameters(dest);
 +num_dest_options = num_options;
 +
 +num_options += count_option_parameters(list);
 +
 +dest = qemu_realloc(dest, (num_options + 1) * 
 sizeof(QEMUOptionParameter));
 +
 +while (list  list-name) {
 +if (get_option_parameter(dest, list-name) == NULL) {
 +dest[num_dest_options++] = *list;

 You need to add a dest[num_dest_options].name = NULL; here. Otherwise
 the next loop iteration works on uninitialized memory and possibly an
 unterminated list. I got a segfault for that reason.

 
 I forgot to add it, sorry.
 Fixed version is below.
 
 Thanks,
 
 Kazutaka
 
 ==
 This patch enables protocol drivers to use their create options which
 are not supported by the format.  For example, protcol drivers can use
 a backing_file option with raw format.
 
 Signed-off-by: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp

$ ./qemu-img create -f qcow2 -o cluster_size=4k /tmp/test.qcow2 4G
Unknown option 'cluster_size'
qemu-img: Invalid options for file format 'qcow2'.

I think you added another num_dest_options++ which shouldn't be there.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] trace: Trace virtqueue operations

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:27 PM, Stefan Hajnoczi wrote:

On Tue, May 25, 2010 at 1:04 PM, Avi Kivitya...@redhat.com  wrote:
   

Those %ps are more or less useless.  We need better ways of identifying
them.
 

You're right, the vq pointer is useless in isolation.  We don't know
which virtio device or which virtqueue number.

With the full context of a trace it would be possible to correlate the
vq pointer if we had trace events for vdev and vq setup.

Adding custom formatters is could be tricky since the format string is
passed only to tracing backends that use it, like UST.  And UST uses
its own sprintf implementation which we don't have direct control
over.
   


Hm.  Perhaps we can convert %{type} to %p for backends which don't 
support it, and to whatever format they do support for those that do.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Kevin Wolf
Am 25.05.2010 15:25, schrieb Anthony Liguori:
 On 05/25/2010 06:25 AM, Avi Kivity wrote:
 On 05/25/2010 02:02 PM, Kevin Wolf wrote:


 So could we not standardize a protocol for this that both sheepdog and
 ceph could implement?
 The protocol already exists, nbd.  It doesn't support snapshotting etc.
 but we could extend it.

 But IMO what's needed is a plugin API for the block layer.
 What would it buy us, apart from more downstreams and having to maintain
 a stable API and ABI?

 Currently if someone wants to add a new block format, they have to 
 upstream it and wait for a new qemu to be released.  With a plugin 
 API, they can add a new block format to an existing, supported qemu.
 
 Whether we have a plugin or protocol based mechanism to implement block 
 formats really ends up being just an implementation detail.
 
 In order to implement either, we need to take a subset of block 
 functionality that we feel we can support long term and expose that.  
 Right now, that's basically just querying characteristics (like size and 
 geometry) and asynchronous reads and writes.
 
 A protocol based mechanism has the advantage of being more robust in the 
 face of poorly written block backends so if it's possible to make it 
 perform as well as a plugin, it's a preferable approach.
 
 Plugins that just expose chunks of QEMU internal state directly (like 
 BlockDriver) are a really bad idea IMHO.

I'm still not convinced that we need either. I share Christoph's concern
that we would make our life harder for almost no gain. It's probably a
very small group of users (if it exists at all) that wants to add new
block drivers themselves, but at the same time can't run upstream qemu.

But if we were to decide that there's no way around it, I agree with you
that directly exposing the internal API isn't going to work.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:36 AM, Avi Kivity wrote:


We'd need a kernel-level generic snapshot API for this eventually.

or (2) implement BUSE to complement FUSE and CUSE to enable proper 
userspace block devices.


Likely slow due do lots of copying.  Also needs a snapshot API.


The kernel could use splice.


(ABUSE was proposed a while ago by Zach).

If you want to use a block device within qemu, you almost certainly 
want to be able to manipulate it on the host using standard tools 
(like mount and parted) so it stands to reason that addressing this 
in the kernel makes more sense.


qemu-nbd also allows this.

This reasoning also applies to qcow2, btw.


I know.

Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:38 AM, Avi Kivity wrote:

On 05/25/2010 04:35 PM, Anthony Liguori wrote:

On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more robust 
in the face of poorly written block backends so if it's possible to 
make it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very strong 
argument as to why a shared memory mechanism was not possible or at 
least plausible.


I'm not sure I understand why shared memory is such a bad thing wrt 
KVM.  Can you elaborate?  Is it simply a matter of fork()?


fork() doesn't work in the with of memory hotplug.  What else is there?



Is it that fork() doesn't work or is it that fork() is very expensive?

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:54 PM, Anthony Liguori wrote:

On 05/25/2010 08:36 AM, Avi Kivity wrote:


We'd need a kernel-level generic snapshot API for this eventually.

or (2) implement BUSE to complement FUSE and CUSE to enable proper 
userspace block devices.


Likely slow due do lots of copying.  Also needs a snapshot API.


The kernel could use splice.


Still can't make guest memory appear in (A)BUSE process memory without 
either mmu tricks (vmsplice in reverse) or a copy.  May be workable for 
an (A)BUSE driver that talks over a network, and thus can splice() its 
way out.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] trace: Trace virtqueue operations

2010-05-25 Thread Stefan Hajnoczi
On Tue, May 25, 2010 at 2:52 PM, Avi Kivity a...@redhat.com wrote:
 Hm.  Perhaps we can convert %{type} to %p for backends which don't support
 it, and to whatever format they do support for those that do.

True.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 04:55 PM, Anthony Liguori wrote:

On 05/25/2010 08:38 AM, Avi Kivity wrote:

On 05/25/2010 04:35 PM, Anthony Liguori wrote:

On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more robust 
in the face of poorly written block backends so if it's possible 
to make it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very strong 
argument as to why a shared memory mechanism was not possible or at 
least plausible.


I'm not sure I understand why shared memory is such a bad thing wrt 
KVM.  Can you elaborate?  Is it simply a matter of fork()?


fork() doesn't work in the with of memory hotplug.  What else is there?



Is it that fork() doesn't work or is it that fork() is very expensive?


It doesn't work, fork() is done at block device creation time, which 
freezes the child memory map, while guest memory is allocated at hotplug 
time.


fork() actually isn't very expensive since we use MADV_DONTFORK 
(probably fast enough for everything except realtime).


It may be possible to do a processfd() which can be mmap()ed by another 
process to export anonymous memory using mmu notifiers, not sure how 
easy or mergeable that is.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Kevin Wolf
Am 25.05.2010 15:25, schrieb Avi Kivity:
 On 05/25/2010 04:17 PM, Anthony Liguori wrote:
 On 05/25/2010 04:14 AM, Avi Kivity wrote:
 On 05/24/2010 10:38 PM, Anthony Liguori wrote:

 - Building a plugin API seems a bit simpler to me, although I'm to
 sure if I'd get the
idea correctly:
The block layer has already some kind of api (.bdrv_file_open, 
 .bdrv_read). We
could simply compile the block-drivers as shared objects and 
 create a method
for loading the necessary modules at runtime.

 That approach would be a recipe for disaster.   We would have to 
 introduce a new, reduced functionality block API that was supported 
 for plugins.  Otherwise, the only way a plugin could keep up with 
 our API changes would be if it was in tree which defeats the purpose 
 of having plugins.

 We could guarantee API/ABI stability in a stable branch but not 
 across releases.

 We have releases every six months.  There would be tons of block 
 plugins that didn't work for random sets of releases.  That creates a 
 lot of user confusion and unhappiness.
 
 The current situation is that those block format drivers only exist in 
 qemu.git or as patches.  Surely that's even more unhappiness.

The difference is that in the current situation these drivers will be
part of the next qemu release, so the patch may be obsolete, but you
don't even need it any more.

If you start keeping block drivers outside qemu and not even try
integrating them, they'll stay external.

 Confusion could be mitigated:
 
$ qemu -module my-fancy-block-format-driver.so
my-fancy-block-format-driver.so does not support this version of qemu 
 (0.19.2).  Please contact my-fancy-block-format-driver-de...@example.org.
 
 The question is how many such block format drivers we expect.  We now 
 have two in the pipeline (ceph, sheepdog), it's reasonable to assume 
 we'll want an lvm2 driver and btrfs driver.  This is an area with a lot 
 of activity and a relatively simply interface.

What's the reason for not having these drivers upstream? Do we gain
anything by hiding them from our users and requiring them to install the
drivers separately from somewhere else?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Propagate fpu_alloc errors

2010-05-25 Thread Jan Kiszka
Memory allocation may fail. Propagate such errors.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/svm.c  |7 ++-
 arch/x86/kvm/vmx.c  |4 +++-
 arch/x86/kvm/x86.c  |   11 +--
 4 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d08bb4a..0cd0f29 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -624,7 +624,7 @@ int kvm_pic_set_irq(void *opaque, int irq, int level);
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 
-void fx_init(struct kvm_vcpu *vcpu);
+int fx_init(struct kvm_vcpu *vcpu);
 
 void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu);
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 4af2c12..5f25e59 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -903,13 +903,18 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm-asid_generation = 0;
init_vmcb(svm);
 
-   fx_init(svm-vcpu);
+   err = fx_init(svm-vcpu);
+   if (err)
+   goto free_page4;
+
svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(svm-vcpu))
svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
 
return svm-vcpu;
 
+free_page4:
+   __free_page(hsave_page);
 free_page3:
__free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
 free_page2:
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 99ae513..61bdae3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2661,7 +2661,9 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
msr |= MSR_IA32_APICBASE_BSP;
kvm_set_apic_base(vmx-vcpu, msr);
 
-   fx_init(vmx-vcpu);
+   ret = fx_init(vmx-vcpu);
+   if (ret != 0)
+   goto out;
 
seg_setup(VCPU_SREG_CS);
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7be1d36..e773d93 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5113,12 +5113,19 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, 
struct kvm_fpu *fpu)
return 0;
 }
 
-void fx_init(struct kvm_vcpu *vcpu)
+int fx_init(struct kvm_vcpu *vcpu)
 {
-   fpu_alloc(vcpu-arch.guest_fpu);
+   int err;
+
+   err = fpu_alloc(vcpu-arch.guest_fpu);
+   if (err)
+   return err;
+
fpu_finit(vcpu-arch.guest_fpu);
 
vcpu-arch.cr0 |= X86_CR0_ET;
+
+   return 0;
 }
 EXPORT_SYMBOL_GPL(fx_init);
 
-- 
1.6.0.2
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: svm: Drop unused local variable

2010-05-25 Thread Jan Kiszka
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/svm.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5f25e59..3c03c36 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1491,8 +1491,6 @@ static void svm_handle_mce(struct vcpu_svm *svm)
 * Erratum 383 triggered. Guest state is corrupt so kill the
 * guest.
 */
-   struct kvm_run *kvm_run = svm-vcpu.run;
-
pr_err(KVM: Guest triggered AMD Erratum 383\n);
 
set_bit(KVM_REQ_TRIPLE_FAULT, svm-vcpu.requests);
-- 
1.6.0.2
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:57 AM, Avi Kivity wrote:

On 05/25/2010 04:54 PM, Anthony Liguori wrote:

On 05/25/2010 08:36 AM, Avi Kivity wrote:


We'd need a kernel-level generic snapshot API for this eventually.

or (2) implement BUSE to complement FUSE and CUSE to enable proper 
userspace block devices.


Likely slow due do lots of copying.  Also needs a snapshot API.


The kernel could use splice.


Still can't make guest memory appear in (A)BUSE process memory without 
either mmu tricks (vmsplice in reverse) or a copy.  May be workable 
for an (A)BUSE driver that talks over a network, and thus can splice() 
its way out.


splice() actually takes offset parameter so it may be possible to treat 
that offset parameter as a file offset.  That would essentially allow 
you to implement a splice() based thread pool where splice() replaces 
preadv/pwritev.


It's not quite linux-aio, but it should take you pretty far.   I think 
the main point is that the problem of allowing block plugins to qemu is 
the same as block plugins for the kernel.  The kernel doesn't provide a 
stable interface (and we probably can't for the same reasons) and it's 
generally discourage from a code quality perspective.


That said, making an external program work well as a block backend is 
identical to making userspace block devices fast.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 08:55 AM, Avi Kivity wrote:

On 05/25/2010 04:53 PM, Kevin Wolf wrote:


I'm still not convinced that we need either. I share Christoph's concern
that we would make our life harder for almost no gain. It's probably a
very small group of users (if it exists at all) that wants to add new
block drivers themselves, but at the same time can't run upstream qemu.



The first part of your argument may be true, but the second isn't.  No 
user can run upstream qemu.git.  It's not tested or supported, and has 
no backwards compatibility guarantees.


Yes, it does have backwards compatibility guarantees.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vhost-net: fix reversed logic in mask notifiers

2010-05-25 Thread Michael S. Tsirkin
When guest notifier is assigned, we set mask notifier,
which will assign kvm irqfd.
When guest notifier is unassigned, mask notifier is unset,
which should unassign kvm irqfd.

The way to do this is to call mask notifier telling it to mask the vector.
This, unless vector is already masked which unassigns irqfd already.

The logic in unassign was reversed, which left kvm irqfd assigned.

This patch is qemu-kvm only as irqfd is not upstream.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reported-by: Amit Shah amit.s...@redhat.com
---
 hw/msix.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 8f9a621..1398680 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -617,6 +617,7 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, 
void *opaque)
 assert(opaque);
 assert(!dev-msix_mask_notifier_opaque[vector]);
 
+/* Unmask the new notifier unless vector is masked. */
 if (msix_is_masked(dev, vector)) {
 return 0;
 }
@@ -638,12 +639,13 @@ int msix_unset_mask_notifier(PCIDevice *dev, unsigned 
vector)
 assert(dev-msix_mask_notifier);
 assert(dev-msix_mask_notifier_opaque[vector]);
 
+/* Mask the old notifier unless it is already masked. */
 if (msix_is_masked(dev, vector)) {
 return 0;
 }
 r = dev-msix_mask_notifier(dev, vector,
 dev-msix_mask_notifier_opaque[vector],
-msix_is_masked(dev, vector));
+!msix_is_masked(dev, vector));
 if (r  0) {
 return r;
 }
-- 
1.7.1.12.g42b7f
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 09:01 AM, Avi Kivity wrote:

On 05/25/2010 04:55 PM, Anthony Liguori wrote:

On 05/25/2010 08:38 AM, Avi Kivity wrote:

On 05/25/2010 04:35 PM, Anthony Liguori wrote:

On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more robust 
in the face of poorly written block backends so if it's possible 
to make it perform as well as a plugin, it's a preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very 
strong argument as to why a shared memory mechanism was not 
possible or at least plausible.


I'm not sure I understand why shared memory is such a bad thing wrt 
KVM.  Can you elaborate?  Is it simply a matter of fork()?


fork() doesn't work in the with of memory hotplug.  What else is there?



Is it that fork() doesn't work or is it that fork() is very expensive?


It doesn't work, fork() is done at block device creation time, which 
freezes the child memory map, while guest memory is allocated at 
hotplug time.


Now I'm confused.  I thought you were saying shared memory somehow 
affects fork().  If you're talking about shared memory inheritance via 
fork(), that's less important.  You can also pass /dev/shm fd's via 
SCM_RIGHTs to establish shared memory segments dynamically.


Regards,

Anthony Liguori

fork() actually isn't very expensive since we use MADV_DONTFORK 
(probably fast enough for everything except realtime).


It may be possible to do a processfd() which can be mmap()ed by 
another process to export anonymous memory using mmu notifiers, not 
sure how easy or mergeable that is.




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Kevin Wolf
Am 25.05.2010 15:55, schrieb Avi Kivity:
 On 05/25/2010 04:53 PM, Kevin Wolf wrote:

 I'm still not convinced that we need either. I share Christoph's concern
 that we would make our life harder for almost no gain. It's probably a
 very small group of users (if it exists at all) that wants to add new
 block drivers themselves, but at the same time can't run upstream qemu.


 
 The first part of your argument may be true, but the second isn't.  No 
 user can run upstream qemu.git.  It's not tested or supported, and has 
 no backwards compatibility guarantees.

The second part was basically meant to say developers don't count here.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: svm: Drop unused local variable

2010-05-25 Thread Roedel, Joerg
Ah right, thanks :)

On Tue, May 25, 2010 at 10:02:15AM -0400, Jan Kiszka wrote:
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Acked-by: Joerg Roedel joerg.roe...@amd.com

 ---
  arch/x86/kvm/svm.c |2 --
  1 files changed, 0 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 5f25e59..3c03c36 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1491,8 +1491,6 @@ static void svm_handle_mce(struct vcpu_svm *svm)
* Erratum 383 triggered. Guest state is corrupt so kill the
* guest.
*/
 - struct kvm_run *kvm_run = svm-vcpu.run;
 -
   pr_err(KVM: Guest triggered AMD Erratum 383\n);
  
   set_bit(KVM_REQ_TRIPLE_FAULT, svm-vcpu.requests);
 -- 
 1.6.0.2
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-net: fix reversed logic in mask notifiers

2010-05-25 Thread Amit Shah
On (Tue) May 25 2010 [17:00:43], Michael S. Tsirkin wrote:
 When guest notifier is assigned, we set mask notifier,
 which will assign kvm irqfd.
 When guest notifier is unassigned, mask notifier is unset,
 which should unassign kvm irqfd.
 
 The way to do this is to call mask notifier telling it to mask the vector.
 This, unless vector is already masked which unassigns irqfd already.
 
 The logic in unassign was reversed, which left kvm irqfd assigned.
 
 This patch is qemu-kvm only as irqfd is not upstream.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Reported-by: Amit Shah amit.s...@redhat.com

Acked-by: Amit Shah amit.s...@redhat.com

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-net: fix reversed logic in mask notifiers

2010-05-25 Thread Juan Quintela
Michael S. Tsirkin m...@redhat.com wrote:
 When guest notifier is assigned, we set mask notifier,
 which will assign kvm irqfd.
 When guest notifier is unassigned, mask notifier is unset,
 which should unassign kvm irqfd.

 The way to do this is to call mask notifier telling it to mask the vector.
 This, unless vector is already masked which unassigns irqfd already.

 The logic in unassign was reversed, which left kvm irqfd assigned.

 This patch is qemu-kvm only as irqfd is not upstream.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 Reported-by: Amit Shah amit.s...@redhat.com
 ---
  hw/msix.c |4 +++-
  1 files changed, 3 insertions(+), 1 deletions(-)

 diff --git a/hw/msix.c b/hw/msix.c
 index 8f9a621..1398680 100644
 --- a/hw/msix.c
 +++ b/hw/msix.c
 @@ -617,6 +617,7 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned 
 vector, void *opaque)
  assert(opaque);
  assert(!dev-msix_mask_notifier_opaque[vector]);
  
 +/* Unmask the new notifier unless vector is masked. */
  if (msix_is_masked(dev, vector)) {
  return 0;
  }
 @@ -638,12 +639,13 @@ int msix_unset_mask_notifier(PCIDevice *dev, unsigned 
 vector)
  assert(dev-msix_mask_notifier);
  assert(dev-msix_mask_notifier_opaque[vector]);
  
 +/* Mask the old notifier unless it is already masked. */
  if (msix_is_masked(dev, vector)) {
  return 0;
  }
  r = dev-msix_mask_notifier(dev, vector,
  dev-msix_mask_notifier_opaque[vector],
 -msix_is_masked(dev, vector));
 +!msix_is_masked(dev, vector));

Why don't put just a 1 here?

we have:

if (msix_is_masked())
   return 0
r = msix_mask_notifier(., !msix_is_masked());

i.e. at that point msix_is_masked() is false, or we really, really needs
locking.

Puttting a !foo, when we know that it needs to be an 1 looks strange.

Later, Juan.

PD.  Yes, I already asked in a previous version to just have two
methods, mask/unmask.  we now at call time which one we need.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-net: fix reversed logic in mask notifiers

2010-05-25 Thread Michael S. Tsirkin
On Tue, May 25, 2010 at 04:37:36PM +0200, Juan Quintela wrote:
 Michael S. Tsirkin m...@redhat.com wrote:
  When guest notifier is assigned, we set mask notifier,
  which will assign kvm irqfd.
  When guest notifier is unassigned, mask notifier is unset,
  which should unassign kvm irqfd.
 
  The way to do this is to call mask notifier telling it to mask the vector.
  This, unless vector is already masked which unassigns irqfd already.
 
  The logic in unassign was reversed, which left kvm irqfd assigned.
 
  This patch is qemu-kvm only as irqfd is not upstream.
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  Reported-by: Amit Shah amit.s...@redhat.com
  ---
   hw/msix.c |4 +++-
   1 files changed, 3 insertions(+), 1 deletions(-)
 
  diff --git a/hw/msix.c b/hw/msix.c
  index 8f9a621..1398680 100644
  --- a/hw/msix.c
  +++ b/hw/msix.c
  @@ -617,6 +617,7 @@ int msix_set_mask_notifier(PCIDevice *dev, unsigned 
  vector, void *opaque)
   assert(opaque);
   assert(!dev-msix_mask_notifier_opaque[vector]);
   
  +/* Unmask the new notifier unless vector is masked. */
   if (msix_is_masked(dev, vector)) {
   return 0;
   }
  @@ -638,12 +639,13 @@ int msix_unset_mask_notifier(PCIDevice *dev, unsigned 
  vector)
   assert(dev-msix_mask_notifier);
   assert(dev-msix_mask_notifier_opaque[vector]);
   
  +/* Mask the old notifier unless it is already masked. */
   if (msix_is_masked(dev, vector)) {
   return 0;
   }
   r = dev-msix_mask_notifier(dev, vector,
   dev-msix_mask_notifier_opaque[vector],
  -msix_is_masked(dev, vector));
  +!msix_is_masked(dev, vector));
 
 Why don't put just a 1 here?
 
 we have:
 
 if (msix_is_masked())
return 0
 r = msix_mask_notifier(., !msix_is_masked());
 
 i.e. at that point msix_is_masked() is false, or we really, really needs
 locking.
 
 Puttting a !foo, when we know that it needs to be an 1 looks strange.
 
 Later, Juan.
 
 PD.  Yes, I already asked in a previous version to just have two
 methods, mask/unmask.  we now at call time which one we need.


I find msix_is_masked clearer here than true since you don't need
to look up definition to understand what this 'true' stands for.
The value is clear from code above. What do you think?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-net: fix reversed logic in mask notifiers

2010-05-25 Thread Gerd Hoffmann

On 05/25/10 16:00, Michael S. Tsirkin wrote:

When guest notifier is assigned, we set mask notifier,
which will assign kvm irqfd.
When guest notifier is unassigned, mask notifier is unset,
which should unassign kvm irqfd.

The way to do this is to call mask notifier telling it to mask the vector.
This, unless vector is already masked which unassigns irqfd already.

The logic in unassign was reversed, which left kvm irqfd assigned.

This patch is qemu-kvm only as irqfd is not upstream.

Signed-off-by: Michael S. Tsirkinm...@redhat.com
Reported-by: Amit Shahamit.s...@redhat.com


Acked-by: Gerd Hoffmann kra...@redhat.com

cheers,
  Gerd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/10] Redirct and make use of the guest serial console

2010-05-25 Thread Lucas Meneghel Rodrigues
On Tue, 2010-05-11 at 17:03 +0800, Jason Wang wrote:
 The guest console is useful for failure troubleshooting especially for
 the one who has calltrace. And as we plan to push the network related
 test in the next few weeks, we found the serial session in more
 reliable during the network testing. So this patchset logs the guest
 serial throught the redirectied serial of guest and also enable the
 ability to log into guest through serial console. I only open the
 serial console for linux, I would do some investigation on windows
 guests. 
 
 Change from v1:
 
 - Coding style improvement according to the suggestions from Michael Goldish
 - Improve the username sending handling in remote_login()
 - Change the matching re of login to [Ll]ogin:\s*$
 - Check whether vm have already dead in dumpping thread
 - Return none rather than raise exception when met unknown shell_client
 - Keep tty0 for all linux guests
 - Enable the serial console in unattended installation
 - Add a helper to check whether the panic information was occured 
 - Keep the porcess() at its original location in preprocess()

Jason, after a long conversation I've had with Michael during the
previous week, we reached some common points:

1 - We believe it is possible to be able to both log in *and* log serial
console output. That will require changes to kvm_subprocess and might
take a little bit more time.
2 - We know you guys are depending on this patchset to be accepted in
order to proceed with the network related cases. However, we ask for a
little more patience, and we'd like to get your opinions on the patches
that we are going to roll out. This way we can get to a better solution
for all of us.

So, please bear with us and I'll try to see with Michael and Dor if we
can prioritize this work to not block work items for you guys.

Cheers,

Lucas

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-net: fix reversed logic in mask notifiers

2010-05-25 Thread Juan Quintela
Michael S. Tsirkin m...@redhat.com wrote:
 On Tue, May 25, 2010 at 04:37:36PM +0200, Juan Quintela wrote:

 we have:
 
 if (msix_is_masked())
return 0
 r = msix_mask_notifier(., !msix_is_masked());
 
 i.e. at that point msix_is_masked() is false, or we really, really needs
 locking.
 
 Puttting a !foo, when we know that it needs to be an 1 looks strange.
 
 Later, Juan.
 
 PD.  Yes, I already asked in a previous version to just have two
 methods, mask/unmask.  we now at call time which one we need.


 I find msix_is_masked clearer here than true since you don't need
 to look up definition to understand what this 'true' stands for.
 The value is clear from code above. What do you think?

I preffer the change, but it is up to you.

at that point, we are using !msix_masked() to mean true

i.e. we know that msix_masked() is false.  What you want to do is mask.

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 05:05 PM, Anthony Liguori wrote:

On 05/25/2010 09:01 AM, Avi Kivity wrote:

On 05/25/2010 04:55 PM, Anthony Liguori wrote:

On 05/25/2010 08:38 AM, Avi Kivity wrote:

On 05/25/2010 04:35 PM, Anthony Liguori wrote:

On 05/25/2010 08:31 AM, Avi Kivity wrote:
A protocol based mechanism has the advantage of being more 
robust in the face of poorly written block backends so if it's 
possible to make it perform as well as a plugin, it's a 
preferable approach.


May be hard due to difficulty of exposing guest memory.


If someone did a series to add plugins, I would expect a very 
strong argument as to why a shared memory mechanism was not 
possible or at least plausible.


I'm not sure I understand why shared memory is such a bad thing 
wrt KVM.  Can you elaborate?  Is it simply a matter of fork()?


fork() doesn't work in the with of memory hotplug.  What else is 
there?




Is it that fork() doesn't work or is it that fork() is very expensive?


It doesn't work, fork() is done at block device creation time, which 
freezes the child memory map, while guest memory is allocated at 
hotplug time.


Now I'm confused.  I thought you were saying shared memory somehow 
affects fork().  If you're talking about shared memory inheritance via 
fork(), that's less important. 


The latter.  Why is it less important?  If you don't inherit the memory, 
you can't access it.


You can also pass /dev/shm fd's via SCM_RIGHTs to establish shared 
memory segments dynamically.


Doesn't work for anonymous memory.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Avi Kivity

On 05/25/2010 05:09 PM, Kevin Wolf wrote:



The first part of your argument may be true, but the second isn't.  No
user can run upstream qemu.git.  It's not tested or supported, and has
no backwards compatibility guarantees.
 

The second part was basically meant to say developers don't count here.
   


Agreed.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm

2010-05-25 Thread Anthony Liguori

On 05/25/2010 10:00 AM, Avi Kivity wrote:
The latter.  Why is it less important?  If you don't inherit the 
memory, you can't access it.


You can also pass /dev/shm fd's via SCM_RIGHTs to establish shared 
memory segments dynamically.


Doesn't work for anonymous memory.


What's wrong with /dev/shm memory?

Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-net: fix reversed logic in mask notifiers

2010-05-25 Thread Michael S. Tsirkin
On Tue, May 25, 2010 at 04:58:15PM +0200, Juan Quintela wrote:
 Michael S. Tsirkin m...@redhat.com wrote:
  On Tue, May 25, 2010 at 04:37:36PM +0200, Juan Quintela wrote:
 
  we have:
  
  if (msix_is_masked())
 return 0
  r = msix_mask_notifier(., !msix_is_masked());
  
  i.e. at that point msix_is_masked() is false, or we really, really needs
  locking.
  
  Puttting a !foo, when we know that it needs to be an 1 looks strange.
  
  Later, Juan.
  
  PD.  Yes, I already asked in a previous version to just have two
  methods, mask/unmask.  we now at call time which one we need.
 
 
  I find msix_is_masked clearer here than true since you don't need
  to look up definition to understand what this 'true' stands for.
  The value is clear from code above. What do you think?
 
 I preffer the change, but it is up to you.
 
 at that point, we are using !msix_masked() to mean true
 
 i.e. we know that msix_masked() is false.  What you want to do is mask.
 
 Later, Juan.

Right. I guess I'll keep it as is, when I look at it with a fresh mind
next time, I'll clean it all up.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >