Re: [PATCH v2] virtio_blk: unlock vblk-lock during kick

2012-06-04 Thread Christian Borntraeger
On 04/06/12 03:57, Rusty Russell wrote:
 Unfortunately, this conflicts with Asias He's deadlock fix, which has
 us just using the block-layer-supplied spinlock.
 
 If we drop the lock around the kick as you suggest, we're playing with
 fire.  All the virtio backends have an atomic notify, so they're OK,
 and the block layer *looks* safe at a glance, but there's no assurances.

Well, the kick itself returns early, but in the host every other action
is already asynchronously running - not caring about the guest locks at all.
So if removing the lock around the kick causes a problem, then the problem
is already present, no?

Christian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking

2012-06-04 Thread Markus Armbruster
Anthony Liguori anth...@codemonkey.ws writes:

 On 05/29/2012 04:14 PM, Markus Armbruster wrote:
 Luiz Capitulinolcapitul...@redhat.com  writes:

 On Mon, 28 May 2012 12:17:04 +0100
 Stefan Hajnoczistefa...@linux.vnet.ibm.com  wrote:

 What we need to decide is whether it's okay to drop QEMU VLANs
 completely and change dump command-line syntax?

 I'd vote for dropping it.

 I think vlan-hub doesn't hurt anyone because the code has been isolated
 and we keep backwards compatibility.  So I'd personally still go the
 vlan-hub route for QEMU 1.x.

 Just to make it clear: I'm not against this series. I'm against having
 the functionality in qemu. If we want to keep the functionality, then I
 completely agree that this series is the way to go.

 I agree with Luiz: if we want to reimplement that much of networking
 within QEMU, this series does it in a much better way than VLANs, but
 I'd rather not do it at all.

 Just advice, not a strong objection.

 Doesn't the same logic apply to reimplementing file systems?
 Shouldn't we drop qcow3 in favor of using btrfs?

btrfs isn't ready for production, so this is a hypothetical question.

 It's easy to make the NIH argument when it's a feature you don't care about.

 A lot of people use vlans.  It's the only way -net socket is useful
 too.  Just because most KVM/libvirt users don't doesn't mean they
 aren't an important feature to preserve.

I specifically asked for evidence on actual use of VLANs, and which uses
of VLANs can't be readily upgraded to better-performing external
solutions.  You asserting it is used a lot isn't a full answer, but
it's (slightly) better than nothing.

 I would strongly nack any attempt to remove vlans w/o providing some
 mechanism for backwards compatibility which is exactly what this patch
 series does.

Roma locuta, causa finita.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Documentation/kvm : Add documentation on Hypercalls

2012-06-04 Thread Raghavendra K T

On 06/04/2012 09:30 AM, Rob Landley wrote:

On 05/31/2012 12:46 PM, H. Peter Anvin wrote:

On 05/31/2012 01:01 AM, Raghavendra K T wrote:

+
+TODO:
+1. more information on input and output needed?
+2. Add more detail to purpose of hypercalls.



1. definitely, including the hypercall ABI.

-hpa



I was wondering about that. It looks like
Documentation/virtual/kvm/api.txt might cover some of that already in
section 5, but it doesn't look complete...

Also, could I get a 00-INDEX file for this directory explaining what
these individual files are? I think api.txt is supposed to be
host-side API for controlling a guest VM (from userspace via ioctls,
looks like),


api.txt has  plenty of information apart from  features which are 
related to guest side api (generally  controls / knows PV features 
available, that may also involve host query of guest cpuid).

and capability (of) related to host.

 and hypercalls.txt is guest-side API for poking the host.

hypercall can be used to take some actions (as needed by guest / 
exchange information)




How someone would write host-side code that _responds_ to a hypercall, I
have no idea.  (It goes in the host kernel?)

Rob



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] virtio_blk: unlock vblk-lock during kick

2012-06-04 Thread Stefan Hajnoczi
On Mon, Jun 04, 2012 at 11:27:39AM +0930, Rusty Russell wrote:
 On Wed, 30 May 2012 15:39:05 +0200, Christian Borntraeger 
 borntrae...@de.ibm.com wrote:
  On 30/05/12 15:19, Stefan Hajnoczi wrote:
   Holding the vblk-lock across kick causes poor scalability in SMP
   guests.  If one CPU is doing virtqueue kick and another CPU touches the
   vblk-lock it will have to spin until virtqueue kick completes.
   
   This patch reduces system% CPU utilization in SMP guests that are
   running multithreaded I/O-bound workloads.  The improvements are small
   but show as iops and SMP are increased.
  
  Funny, recently I got a bug report regarding spinlock lockup
  (see http://lkml.indiana.edu/hypermail/linux/kernel/1205.3/02201.html)
  Turned out that blk_done was called on many guest cpus while the guest
  was heavily paging on one virtio block device. (and the guest had much
  more cpus than the host)
  This patch will probably reduce the pressure for those cases as well.
  we can then finish requests if somebody else is doing the kick.
  
  IIRC there were some other approaches to address this lock holding during
  kick but this looks like the less intrusive one.
  
   Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
  Acked-by: Christian Borntraeger borntrae...@de.ibm.com
 
 Unfortunately, this conflicts with Asias He's deadlock fix, which has
 us just using the block-layer-supplied spinlock.
 
 If we drop the lock around the kick as you suggest, we're playing with
 fire.  All the virtio backends have an atomic notify, so they're OK,
 and the block layer *looks* safe at a glance, but there's no assurances.

There are assurances:

Documentation/block/biodoc.txt:
Drivers are free to drop the queue lock themselves, if required.

Other drivers including rbd, nbd, cciss, and probably others drop the
lock too.

Stefan

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] virtio_blk: unlock vblk-lock during kick

2012-06-04 Thread Asias He

On 06/01/2012 05:13 PM, Stefan Hajnoczi wrote:

Holding the vblk-lock across kick causes poor scalability in SMP
guests.  If one CPU is doing virtqueue kick and another CPU touches the
vblk-lock it will have to spin until virtqueue kick completes.

This patch reduces system% CPU utilization in SMP guests that are
running multithreaded I/O-bound workloads.  The improvements are small
but show as iops and SMP are increased.

Khoa Huynhk...@us.ibm.com  provided initial performance data that
indicates this optimization is worthwhile at high iops.

Asias Heas...@redhat.com  reports the following fio results:

Host: Linux 3.4.0+ #302 SMP x86_64 GNU/Linux
Guest: same as host kernel

Average 3 runs:
with locked kick
readiops=119907.50 bw=59954.00 runt=35018.50 io=2048.00
write   iops=217187.00 bw=108594.00 runt=19312.00 io=2048.00
readiops=33948.00 bw=16974.50 runt=186820.50 io=3095.70
write   iops=35014.00 bw=17507.50 runt=181151.00 io=3095.70
clat (usec) max=3484.10 avg=121085.38 stdev=174416.11 min=0.00
clat (usec) max=3438.30 avg=59863.35 stdev=116607.69 min=0.00
clat (usec) max=3745.65 avg=454501.30 stdev=332699.00 min=0.00
clat (usec) max=4089.75 avg=442374.99 stdev=304874.62 min=0.00
cpu sys=615.12 majf=24080.50 ctx=64253616.50 usr=68.08 minf=17907363.00
cpu sys=1235.95 majf=23389.00 ctx=59788148.00 usr=98.34 minf=20020008.50
cpu sys=764.96 majf=28414.00 ctx=848279274.00 usr=36.39 minf=19737254.00
cpu sys=714.13 majf=21853.50 ctx=854608972.00 usr=33.56 minf=18256760.50

with unlocked kick
readiops=118559.00 bw=59279.66 runt=35400.66 io=2048.00
write   iops=227560.00 bw=113780.33 runt=18440.00 io=2048.00
readiops=34567.66 bw=17284.00 runt=183497.33 io=3095.70
write   iops=34589.33 bw=17295.00 runt=183355.00 io=3095.70
clat (usec) max=3485.56 avg=121989.58 stdev=197355.15 min=0.00
clat (usec) max=3222.33 avg=57784.11 stdev=141002.89 min=0.00
clat (usec) max=4060.93 avg=447098.65 stdev=315734.33 min=0.00
clat (usec) max=3656.30 avg=447281.70 stdev=314051.33 min=0.00
cpu sys=683.78 majf=24501.33 ctx=64435364.66 usr=68.91 minf=17907893.33
cpu sys=1218.24 majf=25000.33 ctx=60451475.00 usr=101.04 minf=19757720.00
cpu sys=740.39 majf=24809.00 ctx=845290443.66 usr=37.25 minf=19349958.33
cpu sys=723.63 majf=27597.33 ctx=850199927.33 usr=35.35 minf=19092343.00

FIO config file:

[global]
exec_prerun=echo 3  /proc/sys/vm/drop_caches
group_reporting
norandommap
ioscheduler=noop
thread
bs=512
size=4MB
direct=1
filename=/dev/vdb
numjobs=256
ioengine=aio
iodepth=64
loops=3

Signed-off-by: Stefan Hajnoczistefa...@linux.vnet.ibm.com
---
Other block drivers (cciss, rbd, nbd) use spin_unlock_irq() so I followed that.
To me this seems wrong: blk_run_queue() uses spin_lock_irqsave() but we enable
irqs with spin_unlock_irq().  If the caller of blk_run_queue() had irqs
disabled and we enable them again this could be a problem, right?  Can someone
more familiar with kernel locking comment?


blk_run_queue() is not used in our code path. We use __blk_run_queue(). 
The code path is:


generic_make_request() - q-make_request_fn() - blk_queue_bio() 
-__blk_run_queue() - q-request_fn() - do_virtblk_request().


__blk_run_queue() is called with interrupts disabled and queue lock 
locked. In blk_queue_bio, __blk_run_queue() is protected by 
spin_lock_irq(q-queue_lock).


The lock in block layer seems a bit confusing, e.g.block/blk-core.c. 
There are fixed use of spin_lock_irq() and spin_lock_irqsave() pair.




  drivers/block/virtio_blk.c |   10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 774c31d..d674977 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -199,8 +199,14 @@ static void do_virtblk_request(struct request_queue *q)
issued++;
}

-   if (issued)
-   virtqueue_kick(vblk-vq);
+   if (!issued)
+   return;
+
+   if (virtqueue_kick_prepare(vblk-vq)) {
+   spin_unlock_irq(vblk-disk-queue-queue_lock);
+   virtqueue_notify(vblk-vq);
+   spin_lock_irq(vblk-disk-queue-queue_lock);
+   }
  }

  /* return id (s/n) string for *disk to *id_str



--
Asias
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] virtio_blk: unlock vblk-lock during kick

2012-06-04 Thread Asias He

On 06/04/2012 09:57 AM, Rusty Russell wrote:

On Wed, 30 May 2012 15:39:05 +0200, Christian 
Borntraegerborntrae...@de.ibm.com  wrote:

On 30/05/12 15:19, Stefan Hajnoczi wrote:

Holding the vblk-lock across kick causes poor scalability in SMP
guests.  If one CPU is doing virtqueue kick and another CPU touches the
vblk-lock it will have to spin until virtqueue kick completes.

This patch reduces system% CPU utilization in SMP guests that are
running multithreaded I/O-bound workloads.  The improvements are small
but show as iops and SMP are increased.


Funny, recently I got a bug report regarding spinlock lockup
(see http://lkml.indiana.edu/hypermail/linux/kernel/1205.3/02201.html)
Turned out that blk_done was called on many guest cpus while the guest
was heavily paging on one virtio block device. (and the guest had much
more cpus than the host)
This patch will probably reduce the pressure for those cases as well.
we can then finish requests if somebody else is doing the kick.

IIRC there were some other approaches to address this lock holding during
kick but this looks like the less intrusive one.


Signed-off-by: Stefan Hajnoczistefa...@linux.vnet.ibm.com

Acked-by: Christian Borntraegerborntrae...@de.ibm.com


Unfortunately, this conflicts with Asias He's deadlock fix, which has
us just using the block-layer-supplied spinlock.


There is a v3 which solved this conflicts.



If we drop the lock around the kick as you suggest, we're playing with
fire.  All the virtio backends have an atomic notify, so they're OK,
and the block layer *looks* safe at a glance, but there's no assurances.

It seems like a workaround to the fact that we don't have hcall-backed
spinlocks like Xen, or that our virtio device is too laggy.

Cheers,
Rusty.





--
Asias
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Documentation/kvm : Add documentation on Hypercalls

2012-06-04 Thread Raghavendra K T

On 05/31/2012 11:14 PM, Randy Dunlap wrote:

On 05/31/2012 01:01 AM, Raghavendra K T wrote:


From: Raghavendra K Traghavendra...@linux.vnet.ibm.com

Thanks Alex for KVM_HC_FEATURES inputs and Jan for VAPIC_POLL_IRQ

Signed-off-by: Raghavendra K Traghavendra...@linux.vnet.ibm.com
---
diff --git a/Documentation/virtual/kvm/hypercalls.txt 
b/Documentation/virtual/kvm/hypercalls.txt
new file mode 100644
index 000..c79335a
--- /dev/null
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -0,0 +1,46 @@
+KVM Hypercalls Documentation
+===
+The template for each hypercall is:
+1. Hypercall name, value.
+2. Architecture(s)
+3. Status (deprecated, obsolete, active)
+4. Purpose
+
+1. KVM_HC_VAPIC_POLL_IRQ
+
+Value: 1
+Architecture: x86



No Status, does that imply Active ?



Oh! It was unintentionally left out. will add that.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM entry failed, hardware error

2012-06-04 Thread Gleb Natapov
On Sun, Jun 03, 2012 at 06:25:33PM +0200, Johannes Bauer wrote:
 Therefore, I've uploaded the compressed trace.dat file, so you can maybe
 have a look why the report tool barfs and interpret it correctly. I
 can't figure it out. The trace is here:
 
 http://spornkuller.de/trace.dat.bz2
 
I can read this trace. Can you do info pci in qemu's monitor
after failure? What is your command line?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] virtio_blk: unlock vblk-lock during kick

2012-06-04 Thread Asias He

On 06/04/2012 09:57 AM, Rusty Russell wrote:

On Wed, 30 May 2012 15:39:05 +0200, Christian 
Borntraegerborntrae...@de.ibm.com  wrote:

On 30/05/12 15:19, Stefan Hajnoczi wrote:

Holding the vblk-lock across kick causes poor scalability in SMP
guests.  If one CPU is doing virtqueue kick and another CPU touches the
vblk-lock it will have to spin until virtqueue kick completes.

This patch reduces system% CPU utilization in SMP guests that are
running multithreaded I/O-bound workloads.  The improvements are small
but show as iops and SMP are increased.


Funny, recently I got a bug report regarding spinlock lockup
(see http://lkml.indiana.edu/hypermail/linux/kernel/1205.3/02201.html)
Turned out that blk_done was called on many guest cpus while the guest
was heavily paging on one virtio block device. (and the guest had much
more cpus than the host)
This patch will probably reduce the pressure for those cases as well.
we can then finish requests if somebody else is doing the kick.

IIRC there were some other approaches to address this lock holding during
kick but this looks like the less intrusive one.


Signed-off-by: Stefan Hajnoczistefa...@linux.vnet.ibm.com

Acked-by: Christian Borntraegerborntrae...@de.ibm.com


Unfortunately, this conflicts with Asias He's deadlock fix, which has
us just using the block-layer-supplied spinlock.

If we drop the lock around the kick as you suggest, we're playing with
fire.  All the virtio backends have an atomic notify, so they're OK,
and the block layer *looks* safe at a glance, but there's no assurances.


Why are we playing with fire if we drop the lock around the kick?
Which one do you think is un-safe, calling virtqueue_notify() with lock 
dropped or dropping the lock in q-request_fn()?



It seems like a workaround to the fact that we don't have hcall-backed
spinlocks like Xen, or that our virtio device is too laggy.

Cheers,
Rusty.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Asias
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM entry failed, hardware error

2012-06-04 Thread Avi Kivity
On 06/04/2012 11:53 AM, Gleb Natapov wrote:
 On Sun, Jun 03, 2012 at 06:25:33PM +0200, Johannes Bauer wrote:
 Therefore, I've uploaded the compressed trace.dat file, so you can maybe
 have a look why the report tool barfs and interpret it correctly. I
 can't figure it out. The trace is here:
 
 http://spornkuller.de/trace.dat.bz2
 
 I can read this trace. Can you do info pci in qemu's monitor
 after failure? What is your command line?

Also after the failure:

 x/256b 0x2b



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 02/41] arch_init: export RAM_SAVE_xxx flags for postcopy

2012-06-04 Thread Isaku Yamahata
Those constants will be also used by postcopy.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |7 ---
 arch_init.h |7 +++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 38e0173..bd4e61e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -88,13 +88,6 @@ const uint32_t arch_type = QEMU_ARCH;
 /***/
 /* ram save/restore */
 
-#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
-#define RAM_SAVE_FLAG_COMPRESS 0x02
-#define RAM_SAVE_FLAG_MEM_SIZE 0x04
-#define RAM_SAVE_FLAG_PAGE 0x08
-#define RAM_SAVE_FLAG_EOS  0x10
-#define RAM_SAVE_FLAG_CONTINUE 0x20
-
 #ifdef __ALTIVEC__
 #include altivec.h
 #define VECTYPEvector unsigned char
diff --git a/arch_init.h b/arch_init.h
index c7cb94a..7cc3fa7 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -30,4 +30,11 @@ int tcg_available(void);
 int kvm_available(void);
 int xen_available(void);
 
+#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
+#define RAM_SAVE_FLAG_COMPRESS 0x02
+#define RAM_SAVE_FLAG_MEM_SIZE 0x04
+#define RAM_SAVE_FLAG_PAGE 0x08
+#define RAM_SAVE_FLAG_EOS  0x10
+#define RAM_SAVE_FLAG_CONTINUE 0x20
+
 #endif
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/41] arch_init: export sort_ram_list() and ram_save_block()

2012-06-04 Thread Isaku Yamahata
This will be used by postcopy.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |4 ++--
 migration.h |2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index a9e8b74..38e0173 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -164,7 +164,7 @@ static int is_dup_page(uint8_t *page)
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
-static int ram_save_block(QEMUFile *f)
+int ram_save_block(QEMUFile *f)
 {
 RAMBlock *block = last_block;
 ram_addr_t offset = last_offset;
@@ -273,7 +273,7 @@ static int block_compar(const void *a, const void *b)
 return strcmp((*ablock)-idstr, (*bblock)-idstr);
 }
 
-static void sort_ram_list(void)
+void sort_ram_list(void)
 {
 RAMBlock *block, *nblock, **blocks;
 int n;
diff --git a/migration.h b/migration.h
index 2e9ca2e..8b9509c 100644
--- a/migration.h
+++ b/migration.h
@@ -76,6 +76,8 @@ uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
 
+void sort_ram_list(void);
+int ram_save_block(QEMUFile *f);
 int ram_save_live(QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 12/41] arch_init: factor out setting last_block, last_offset

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   13 -
 arch_init.h |1 +
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 2617478..22d9691 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -203,6 +203,12 @@ int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t 
offset)
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
+void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset)
+{
+last_block = block;
+last_offset = offset;
+}
+
 int ram_save_block(QEMUFile *f)
 {
 RAMBlock *block = last_block;
@@ -230,9 +236,7 @@ int ram_save_block(QEMUFile *f)
 }
 } while (block != last_block || offset != last_offset);
 
-last_block = block;
-last_offset = offset;
-
+ram_save_set_last_block(block, offset);
 return bytes_sent;
 }
 
@@ -349,8 +353,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 if (stage == 1) {
 bytes_transferred = 0;
 last_block_sent = NULL;
-last_block = NULL;
-last_offset = 0;
+ram_save_set_last_block(NULL, 0);
 sort_ram_list();
 
 /* Make sure all dirty bits are set */
diff --git a/arch_init.h b/arch_init.h
index 7f5c77a..15548cd 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -40,6 +40,7 @@ int xen_available(void);
 #define RAM_SAVE_VERSION_ID 4 /* currently version 4 */
 
 #if defined(NEED_CPU_H)  !defined(CONFIG_USER_ONLY)
+void ram_save_set_last_block(RAMBlock *block, ram_addr_t offset);
 int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 RAMBlock *ram_find_block(const char *id, uint8_t len);
 void *ram_load_host_from_stream_offset(QEMUFile *f,
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 18/41] QEMUFile: add qemu_file_fd() for later use

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 qemu-file.h |1 +
 savevm.c|   12 
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index 331ac8b..98a8023 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -71,6 +71,7 @@ QEMUFile *qemu_fopen_socket(int fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
+int qemu_file_fd(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_buffered_file_drain(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index fb47529..cba1a69 100644
--- a/savevm.c
+++ b/savevm.c
@@ -178,6 +178,7 @@ struct QEMUFile {
 uint8_t buf[IO_BUF_SIZE];
 
 int last_error;
+int fd; /* -1 means fd isn't associated */
 };
 
 typedef struct QEMUFileStdio
@@ -276,6 +277,7 @@ QEMUFile *qemu_popen(FILE *stdio_file, const char *mode)
 s-file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_pclose, 
 NULL, NULL, NULL);
 }
+s-file-fd = fileno(stdio_file);
 return s-file;
 }
 
@@ -291,6 +293,7 @@ QEMUFile *qemu_popen_cmd(const char *command, const char 
*mode)
 return qemu_popen(popen_file, mode);
 }
 
+/* TODO: replace this with qemu_file_fd() */
 int qemu_stdio_fd(QEMUFile *f)
 {
 QEMUFileStdio *p;
@@ -325,6 +328,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 s-file = qemu_fopen_ops(s, stdio_put_buffer, NULL, stdio_fclose, 
 NULL, NULL, NULL);
 }
+s-file-fd = fd;
 return s-file;
 
 fail:
@@ -339,6 +343,7 @@ QEMUFile *qemu_fopen_socket(int fd)
 s-fd = fd;
 s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, 
 NULL, NULL, NULL);
+s-file-fd = fd;
 return s-file;
 }
 
@@ -381,6 +386,7 @@ QEMUFile *qemu_fopen(const char *filename, const char *mode)
 s-file = qemu_fopen_ops(s, NULL, file_get_buffer, stdio_fclose, 
   NULL, NULL, NULL);
 }
+s-file-fd = fileno(s-stdio_file);
 return s-file;
 fail:
 g_free(s);
@@ -431,10 +437,16 @@ QEMUFile *qemu_fopen_ops(void *opaque, 
QEMUFilePutBufferFunc *put_buffer,
 f-set_rate_limit = set_rate_limit;
 f-get_rate_limit = get_rate_limit;
 f-is_write = 0;
+f-fd = -1;
 
 return f;
 }
 
+int qemu_file_fd(QEMUFile *f)
+{
+return f-fd;
+}
+
 int qemu_file_get_error(QEMUFile *f)
 {
 return f-last_error;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 21/41] savevm: rename QEMUFileSocket to QEMUFileFD, socket_close to fd_close

2012-06-04 Thread Isaku Yamahata
Later the structure will be shared.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 savevm.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/savevm.c b/savevm.c
index 4b560b3..2fb0c3e 100644
--- a/savevm.c
+++ b/savevm.c
@@ -187,14 +187,14 @@ typedef struct QEMUFileStdio
 QEMUFile *file;
 } QEMUFileStdio;
 
-typedef struct QEMUFileSocket
+typedef struct QEMUFileFD
 {
 QEMUFile *file;
-} QEMUFileSocket;
+} QEMUFileFD;
 
 static int socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
 {
-QEMUFileSocket *s = opaque;
+QEMUFileFD *s = opaque;
 ssize_t len;
 
 do {
@@ -207,9 +207,9 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
-static int socket_close(void *opaque)
+static int fd_close(void *opaque)
 {
-QEMUFileSocket *s = opaque;
+QEMUFileFD *s = opaque;
 g_free(s);
 return 0;
 }
@@ -325,9 +325,9 @@ fail:
 
 QEMUFile *qemu_fopen_socket(int fd)
 {
-QEMUFileSocket *s = g_malloc0(sizeof(QEMUFileSocket));
+QEMUFileFD *s = g_malloc0(sizeof(QEMUFileFD));
 
-s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, 
+s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, fd_close,
 NULL, NULL, NULL);
 s-file-fd = fd;
 return s-file;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 22/41] savevm/QEMUFile: introduce qemu_fopen_fd

2012-06-04 Thread Isaku Yamahata
Introduce nonblocking fd read backend of QEMUFile.
This will be used by postcopy live migration.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 qemu-file.h |1 +
 savevm.c|   40 
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index 1a12e7d..af5b123 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -68,6 +68,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc 
*put_buffer,
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
+QEMUFile *qemu_fopen_fd(int fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_file_fd(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index 2fb0c3e..5640614 100644
--- a/savevm.c
+++ b/savevm.c
@@ -207,6 +207,35 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
+static int fd_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+QEMUFileFD *s = opaque;
+ssize_t len = 0;
+
+while (size  0) {
+ssize_t ret = read(s-file-fd, buf, size);
+if (ret == -1) {
+if (errno == EINTR) {
+continue;
+}
+if (len == 0) {
+len = -errno;
+}
+break;
+}
+
+if (ret == 0) {
+/* the write end of the pipe is closed */
+break;
+}
+len += ret;
+buf += ret;
+size -= ret;
+}
+
+return len;
+}
+
 static int fd_close(void *opaque)
 {
 QEMUFileFD *s = opaque;
@@ -333,6 +362,17 @@ QEMUFile *qemu_fopen_socket(int fd)
 return s-file;
 }
 
+QEMUFile *qemu_fopen_fd(int fd)
+{
+QEMUFileFD *s = g_malloc0(sizeof(*s));
+
+fcntl_setfl(fd, O_NONBLOCK);
+s-file = qemu_fopen_ops(s, NULL, fd_get_buffer, fd_close,
+ NULL, NULL, NULL);
+s-file-fd = fd;
+return s-file;
+}
+
 static int file_put_buffer(void *opaque, const uint8_t *buf,
 int64_t pos, int size)
 {
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 17/41] savevm, buffered_file: introduce method to drain buffer of buffered file

2012-06-04 Thread Isaku Yamahata
Introduce a new method to drain the buffer of QEMUBufferedFile.
When postcopy migration, buffer size can increase unboundedly.
To keep the buffer size reasonably small, introduce the method to
wait for buffer to drain.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 buffered_file.c |   20 +++-
 buffered_file.h |1 +
 qemu-file.h |1 +
 savevm.c|7 +++
 4 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index f170aa0..a38caec 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -170,6 +170,15 @@ static int buffered_put_buffer(void *opaque, const uint8_t 
*buf, int64_t pos, in
 return offset;
 }
 
+static void buffered_drain(QEMUFileBuffered *s)
+{
+while (!qemu_file_get_error(s-file)  s-buffer_size) {
+buffered_flush(s);
+if (s-freeze_output)
+s-wait_for_unfreeze(s-opaque);
+}
+}
+
 static int buffered_close(void *opaque)
 {
 QEMUFileBuffered *s = opaque;
@@ -177,11 +186,7 @@ static int buffered_close(void *opaque)
 
 DPRINTF(closing\n);
 
-while (!qemu_file_get_error(s-file)  s-buffer_size) {
-buffered_flush(s);
-if (s-freeze_output)
-s-wait_for_unfreeze(s-opaque);
-}
+buffered_drain(s);
 
 ret = s-close(s-opaque);
 
@@ -291,3 +296,8 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque,
 
 return s-file;
 }
+
+void qemu_buffered_file_drain_buffer(void *buffered_file)
+{
+buffered_drain(buffered_file);
+}
diff --git a/buffered_file.h b/buffered_file.h
index 98d358b..cd8e1e8 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -26,5 +26,6 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque, size_t 
xfer_limit,
   BufferedPutReadyFunc *put_ready,
   BufferedWaitForUnfreezeFunc 
*wait_for_unfreeze,
   BufferedCloseFunc *close);
+void qemu_buffered_file_drain_buffer(void *buffered_file);
 
 #endif
diff --git a/qemu-file.h b/qemu-file.h
index 880ef4b..331ac8b 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -72,6 +72,7 @@ QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
+void qemu_buffered_file_drain(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
diff --git a/savevm.c b/savevm.c
index 2992f97..fb47529 100644
--- a/savevm.c
+++ b/savevm.c
@@ -85,6 +85,7 @@
 #include cpus.h
 #include memory.h
 #include qmp-commands.h
+#include buffered_file.h
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
@@ -477,6 +478,12 @@ void qemu_fflush(QEMUFile *f)
 }
 }
 
+void qemu_buffered_file_drain(QEMUFile *f)
+{
+qemu_fflush(f);
+qemu_buffered_file_drain_buffer(f-opaque);
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
 int len;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 20/41] savevm/QEMUFileSocket: drop duplicated member fd

2012-06-04 Thread Isaku Yamahata
fd is already stored in QEMUFile so drop duplicated member
QEMUFileSocket::fd.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 savevm.c |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/savevm.c b/savevm.c
index ec9f5d0..4b560b3 100644
--- a/savevm.c
+++ b/savevm.c
@@ -189,7 +189,6 @@ typedef struct QEMUFileStdio
 
 typedef struct QEMUFileSocket
 {
-int fd;
 QEMUFile *file;
 } QEMUFileSocket;
 
@@ -199,7 +198,7 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 ssize_t len;
 
 do {
-len = qemu_recv(s-fd, buf, size, 0);
+len = qemu_recv(s-file-fd, buf, size, 0);
 } while (len == -1  socket_error() == EINTR);
 
 if (len == -1)
@@ -328,7 +327,6 @@ QEMUFile *qemu_fopen_socket(int fd)
 {
 QEMUFileSocket *s = g_malloc0(sizeof(QEMUFileSocket));
 
-s-fd = fd;
 s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, 
 NULL, NULL, NULL);
 s-file-fd = fd;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 24/41] migration: export migrate_fd_completed() and migrate_fd_cleanup()

2012-06-04 Thread Isaku Yamahata
This will be used by postcopy migration.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration.c |4 ++--
 migration.h |2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration.c b/migration.c
index 753addb..48a8f68 100644
--- a/migration.c
+++ b/migration.c
@@ -159,7 +159,7 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 
 /* shared migration helpers */
 
-static int migrate_fd_cleanup(MigrationState *s)
+int migrate_fd_cleanup(MigrationState *s)
 {
 int ret = 0;
 
@@ -187,7 +187,7 @@ void migrate_fd_error(MigrationState *s)
 migrate_fd_cleanup(s);
 }
 
-static void migrate_fd_completed(MigrationState *s)
+void migrate_fd_completed(MigrationState *s)
 {
 DPRINTF(setting completed state\n);
 if (migrate_fd_cleanup(s)  0) {
diff --git a/migration.h b/migration.h
index 6cf4512..d0dd536 100644
--- a/migration.h
+++ b/migration.h
@@ -62,7 +62,9 @@ int fd_start_incoming_migration(const char *path);
 
 int fd_start_outgoing_migration(MigrationState *s, const char *fdname);
 
+int migrate_fd_cleanup(MigrationState *s);
 void migrate_fd_error(MigrationState *s);
+void migrate_fd_completed(MigrationState *s);
 
 void migrate_fd_connect(MigrationState *s);
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 15/41] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip

2012-06-04 Thread Isaku Yamahata
Those will be used by postcopy.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 qemu-file.h |3 +++
 savevm.c|6 +++---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index 31b83f6..a285bef 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -88,6 +88,9 @@ void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
 int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
 int qemu_get_byte(QEMUFile *f);
+int qemu_peek_byte(QEMUFile *f, int offset);
+int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
+void qemu_file_skip(QEMUFile *f, int size);
 
 static inline unsigned int qemu_get_ubyte(QEMUFile *f)
 {
diff --git a/savevm.c b/savevm.c
index 2d18bab..8ad843f 100644
--- a/savevm.c
+++ b/savevm.c
@@ -588,14 +588,14 @@ void qemu_put_byte(QEMUFile *f, int v)
 qemu_fflush(f);
 }
 
-static void qemu_file_skip(QEMUFile *f, int size)
+void qemu_file_skip(QEMUFile *f, int size)
 {
 if (f-buf_index + size = f-buf_size) {
 f-buf_index += size;
 }
 }
 
-static int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
+int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
 {
 int pending;
 int index;
@@ -643,7 +643,7 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
 return done;
 }
 
-static int qemu_peek_byte(QEMUFile *f, int offset)
+int qemu_peek_byte(QEMUFile *f, int offset)
 {
 int index = f-buf_index + offset;
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 33/41] postcopy: introduce -postcopy and -postcopy-flags option

2012-06-04 Thread Isaku Yamahata
This patch prepares for postcopy livemigration.
It introduces -postcopy option and its internal flag, migration_postcopy.
It introduces -postcopy-flags for chaging the behavior of incoming postcopy
mainly for benchmark/debug.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration.h |3 +++
 qemu-options.hx |   22 ++
 vl.c|8 
 3 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/migration.h b/migration.h
index 59e6e68..4bbcf06 100644
--- a/migration.h
+++ b/migration.h
@@ -103,4 +103,7 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+extern bool incoming_postcopy;
+extern unsigned long incoming_postcopy_flags;
+
 #endif
diff --git a/qemu-options.hx b/qemu-options.hx
index 8b66264..a9af31e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2616,6 +2616,28 @@ STEXI
 Prepare for incoming migration, listen on @var{port}.
 ETEXI
 
+DEF(postcopy, 0, QEMU_OPTION_postcopy,
+-postcopy postcopy incoming migration when -incoming is specified\n,
+QEMU_ARCH_ALL)
+STEXI
+@item -postcopy
+@findex -postcopy
+start incoming migration in postcopy mode.
+ETEXI
+
+DEF(postcopy-flags, HAS_ARG, QEMU_OPTION_postcopy_flags,
+-postcopy-flags unsigned-int(flags)\n
+  flags for postcopy incoming migration\n
+   when -incoming and -postcopy are specified.\n
+   This is for benchmark/debug purpose (default: 0)\n,
+QEMU_ARCH_ALL)
+STEXI
+@item -postcopy-flags int
+@findex -postcopy-flags
+Specify flags for incoming postcopy migration when -incoming and -postcopy are
+specified. This is for benchamrk/debug purpose. (default: 0)
+ETEXI
+
 DEF(nodefaults, 0, QEMU_OPTION_nodefaults, \
 -nodefaults don't create default devices\n, QEMU_ARCH_ALL)
 STEXI
diff --git a/vl.c b/vl.c
index 62dc343..1674abb 100644
--- a/vl.c
+++ b/vl.c
@@ -189,6 +189,8 @@ int mem_prealloc = 0; /* force preallocation of physical 
target memory */
 int nb_nics;
 NICInfo nd_table[MAX_NICS];
 int autostart;
+bool incoming_postcopy = false; /* When -incoming is specified, postcopy mode 
*/
+unsigned long incoming_postcopy_flags = 0; /* flags for postcopy incoming mode 
*/
 static int rtc_utc = 1;
 static int rtc_date_offset = -1; /* -1 means no change */
 QEMUClock *rtc_clock;
@@ -3115,6 +3117,12 @@ int main(int argc, char **argv, char **envp)
 incoming = optarg;
 runstate_set(RUN_STATE_INMIGRATE);
 break;
+case QEMU_OPTION_postcopy:
+incoming_postcopy = true;
+break;
+case QEMU_OPTION_postcopy_flags:
+incoming_postcopy_flags = strtoul(optarg, NULL, 0);
+break;
 case QEMU_OPTION_nodefaults:
 default_serial = 0;
 default_parallel = 0;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 35/41] postcopy: introduce helper functions for postcopy

2012-06-04 Thread Isaku Yamahata
This patch introduces helper function for postcopy to access
umem char device and to communicate between incoming-qemu and umemd.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
changes v1 - v2:
- code simplification
- make fault trigger more robust
- introduce struct umem_pages
---
 umem.c |  364 
 umem.h |  101 ++
 2 files changed, 465 insertions(+), 0 deletions(-)
 create mode 100644 umem.c
 create mode 100644 umem.h

diff --git a/umem.c b/umem.c
new file mode 100644
index 000..64eaab5
--- /dev/null
+++ b/umem.c
@@ -0,0 +1,364 @@
+/*
+ * umem.c: user process backed memory module for postcopy livemigration
+ *
+ * Copyright (c) 2011
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata yamahata at valinux co jp
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see http://www.gnu.org/licenses/.
+ */
+
+#include sys/ioctl.h
+#include sys/mman.h
+
+#include linux/umem.h
+
+#include bitops.h
+#include sysemu.h
+#include hw/hw.h
+#include umem.h
+
+//#define DEBUG_UMEM
+#ifdef DEBUG_UMEM
+#include sys/syscall.h
+#define DPRINTF(format, ...)\
+do {\
+printf(%d:%ld %s:%d format, getpid(), syscall(SYS_gettid),\
+   __func__, __LINE__, ## __VA_ARGS__); \
+} while (0)
+#else
+#define DPRINTF(format, ...)do { } while (0)
+#endif
+
+#define DEV_UMEM/dev/umem
+
+UMem *umem_new(void *hostp, size_t size)
+{
+struct umem_init uinit = {
+.size = size,
+};
+UMem *umem;
+
+assert((size % getpagesize()) == 0);
+umem = g_new(UMem, 1);
+umem-fd = open(DEV_UMEM, O_RDWR);
+if (umem-fd  0) {
+perror(can't open DEV_UMEM);
+abort();
+}
+
+if (ioctl(umem-fd, UMEM_INIT, uinit)  0) {
+perror(UMEM_INIT);
+abort();
+}
+if (ftruncate(uinit.shmem_fd, uinit.size)  0) {
+perror(truncate(\shmem_fd\));
+abort();
+}
+
+umem-nbits = 0;
+umem-nsets = 0;
+umem-faulted = NULL;
+umem-page_shift = ffs(getpagesize()) - 1;
+umem-shmem_fd = uinit.shmem_fd;
+umem-size = uinit.size;
+umem-umem = mmap(hostp, size, PROT_EXEC | PROT_READ | PROT_WRITE,
+  MAP_PRIVATE | MAP_FIXED, umem-fd, 0);
+if (umem-umem == MAP_FAILED) {
+perror(mmap(UMem) failed);
+abort();
+}
+return umem;
+}
+
+void umem_destroy(UMem *umem)
+{
+if (umem-fd != -1) {
+close(umem-fd);
+}
+if (umem-shmem_fd != -1) {
+close(umem-shmem_fd);
+}
+g_free(umem-faulted);
+g_free(umem);
+}
+
+void umem_get_page_request(UMem *umem, struct umem_pages *page_request)
+{
+ssize_t ret = read(umem-fd, page_request-pgoffs,
+   page_request-nr * sizeof(page_request-pgoffs[0]));
+if (ret  0) {
+perror(daemon: umem read);
+abort();
+}
+page_request-nr = ret / sizeof(page_request-pgoffs[0]);
+}
+
+void umem_mark_page_cached(UMem *umem, struct umem_pages *page_cached)
+{
+const void *buf = page_cached-pgoffs;
+ssize_t left = page_cached-nr * sizeof(page_cached-pgoffs[0]);
+
+while (left  0) {
+ssize_t ret = write(umem-fd, buf, left);
+if (ret == -1) {
+if (errno == EINTR)
+continue;
+
+perror(daemon: umem write);
+abort();
+}
+
+left -= ret;
+buf += ret;
+}
+}
+
+void umem_unmap(UMem *umem)
+{
+munmap(umem-umem, umem-size);
+umem-umem = NULL;
+}
+
+void umem_close(UMem *umem)
+{
+close(umem-fd);
+umem-fd = -1;
+}
+
+void *umem_map_shmem(UMem *umem)
+{
+umem-nbits = umem-size  umem-page_shift;
+umem-nsets = 0;
+umem-faulted = g_new0(unsigned long, BITS_TO_LONGS(umem-nbits));
+
+umem-shmem = mmap(NULL, umem-size, PROT_READ | PROT_WRITE, MAP_SHARED,
+   umem-shmem_fd, 0);
+if (umem-shmem == MAP_FAILED) {
+perror(daemon: mmap(\shmem\));
+abort();
+}
+return umem-shmem;
+}
+
+void umem_unmap_shmem(UMem *umem)
+{
+munmap(umem-shmem, umem-size);
+umem-shmem = NULL;
+}
+
+void umem_remove_shmem(UMem *umem, size_t offset, size_t size)
+{
+int s = offset  umem-page_shift;
+  

[PATCH v2 40/41] migrate: add -m (movebg) option to migrate command

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hmp-commands.hx  |5 +++--
 hmp.c|3 ++-
 migration.c  |8 +++-
 migration.h  |1 +
 qapi-schema.json |2 +-
 qmp-commands.hx  |2 +-
 savevm.c |1 +
 7 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 38e5c95..1912cb8 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -798,15 +798,16 @@ ETEXI
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
+.args_type  = 
detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s,
  forward:i?,backward:i?,
-.params = [-d] [-b] [-i] [-p [-n] uri [forward] [backword],
+.params = [-d] [-b] [-i] [-p [-n] [-m] uri [forward] [backword],
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
  (base image shared between src and destination)
  \n\t\t\t-p for migration with postcopy mode enabled
+ \n\t\t\t-m for move background transfer of postcopy mode
  \n\t\t\t-n for no background transfer of postcopy mode
  \n\t\t\tforward: the number of pages to 
  forward-prefault when postcopy (default 0)
diff --git a/hmp.c b/hmp.c
index 79a9c86..dd3f307 100644
--- a/hmp.c
+++ b/hmp.c
@@ -912,6 +912,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 int blk = qdict_get_try_bool(qdict, blk, 0);
 int inc = qdict_get_try_bool(qdict, inc, 0);
 int postcopy = qdict_get_try_bool(qdict, postcopy, 0);
+int movebg = qdict_get_try_bool(qdict, movebg, 0);
 int nobg = qdict_get_try_bool(qdict, nobg, 0);
 int forward = qdict_get_try_int(qdict, forward, 0);
 int backward = qdict_get_try_int(qdict, backward, 0);
@@ -919,7 +920,7 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 Error *err = NULL;
 
 qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
-!!postcopy, postcopy, !!nobg, nobg,
+!!postcopy, postcopy, !!movebg, movebg, !!nobg, nobg,
 !!forward, forward, !!backward, backward,
 err);
 if (err) {
diff --git a/migration.c b/migration.c
index e026085..c5e6820 100644
--- a/migration.c
+++ b/migration.c
@@ -422,7 +422,9 @@ void migrate_del_blocker(Error *reason)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
  bool has_inc, bool inc, bool has_detach, bool detach,
- bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
+ bool has_postcopy, bool postcopy,
+ bool has_movebg, bool movebg,
+ bool has_nobg, bool nobg,
  bool has_forward, int64_t forward,
  bool has_backward, int64_t backward,
  Error **errp)
@@ -432,6 +434,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 .blk = false,
 .shared = false,
 .postcopy = false,
+.movebg = false,
 .nobg = false,
 .prefault_forward = 0,
 .prefault_backward = 0,
@@ -448,6 +451,9 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 if (has_postcopy) {
 params.postcopy = postcopy;
 }
+if (has_movebg) {
+params.movebg = movebg;
+}
 if (has_nobg) {
 params.nobg = nobg;
 }
diff --git a/migration.h b/migration.h
index 9a9b9c6..1e98b20 100644
--- a/migration.h
+++ b/migration.h
@@ -23,6 +23,7 @@ struct MigrationParams {
 int blk;
 int shared;
 int postcopy;
+int movebg;
 int nobg;
 int64_t prefault_forward;
 int64_t prefault_backward;
diff --git a/qapi-schema.json b/qapi-schema.json
index 83c2170..ef2f48e 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1718,7 +1718,7 @@
 ##
 { 'command': 'migrate',
   'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
-   '*postcopy': 'bool', '*nobg': 'bool',
+   '*postcopy': 'bool', '*movebg': 'bool', '*nobg': 'bool',
'*forward': 'int', '*backward': 'int'} }
 
 # @xen-save-devices-state:
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7b5e5b7..5c9ecc8 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -469,7 +469,7 @@ EQMP
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
+.args_type  = 
detach:-d,blk:-b,inc:-i,postcopy:-p,movebg:-m,nobg:-n,uri:s,
 .mhandler.cmd_new = qmp_marshal_input_migrate,
 },
 
diff --git a/savevm.c b/savevm.c
index 48b636d..19bb8f1 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1781,6 +1781,7 @@ static int 

[PATCH v2 37/41] postcopy: implement outgoing part of postcopy live migration

2012-06-04 Thread Isaku Yamahata
This patch implements postcopy live migration for outgoing part

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Changes v1 - v2:
- fix parameter to qemu_fdopen()
- handle QEMU_UMEM_REQ_EOC properly
  when PO_STATE_ALL_PAGES_SENT, QEMU_UMEM_REQ_EOC request was ignored.
  handle properly it.
- flush on-demand page unconditionally
- improve postcopy_outgoing_ram_save_live and postcopy_outgoing_begin()
- use qemu_fopen_fd
- use memory api instead of obsolete api
- segv in postcopy_outgoing_check_all_ram_sent()
- catch up qapi change
---
 arch_init.c   |   19 ++-
 migration-exec.c  |4 +
 migration-fd.c|   17 ++
 migration-postcopy-stub.c |   22 +++
 migration-postcopy.c  |  450 +
 migration-tcp.c   |   25 ++-
 migration-unix.c  |   26 ++-
 migration.c   |   32 +++-
 migration.h   |   12 ++
 savevm.c  |   22 ++-
 sysemu.h  |2 +-
 11 files changed, 614 insertions(+), 17 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 22d9691..3599e5c 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -154,6 +154,13 @@ static int is_dup_page(uint8_t *page)
 return 1;
 }
 
+static bool outgoing_postcopy = false;
+
+void ram_save_set_params(const MigrationParams *params, void *opaque)
+{
+outgoing_postcopy = params-postcopy;
+}
+
 static RAMBlock *last_block_sent = NULL;
 static uint64_t bytes_transferred;
 
@@ -343,6 +350,15 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 uint64_t expected_time = 0;
 int ret;
 
+if (stage == 1) {
+bytes_transferred = 0;
+last_block_sent = NULL;
+ram_save_set_last_block(NULL, 0);
+}
+if (outgoing_postcopy) {
+return postcopy_outgoing_ram_save_live(f, stage, opaque);
+}
+
 if (stage  0) {
 memory_global_dirty_log_stop();
 return 0;
@@ -351,9 +367,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 memory_global_sync_dirty_bitmap(get_system_memory());
 
 if (stage == 1) {
-bytes_transferred = 0;
-last_block_sent = NULL;
-ram_save_set_last_block(NULL, 0);
 sort_ram_list();
 
 /* Make sure all dirty bits are set */
diff --git a/migration-exec.c b/migration-exec.c
index 7f08b3b..a90da5c 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -64,6 +64,10 @@ int exec_start_outgoing_migration(MigrationState *s, const 
char *command)
 {
 FILE *f;
 
+if (s-params.postcopy) {
+return -ENOSYS;
+}
+
 f = popen(command, w);
 if (f == NULL) {
 DPRINTF(Unable to popen exec target\n);
diff --git a/migration-fd.c b/migration-fd.c
index 42b8162..83b5f18 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -90,6 +90,23 @@ int fd_start_outgoing_migration(MigrationState *s, const 
char *fdname)
 s-write = fd_write;
 s-close = fd_close;
 
+if (s-params.postcopy) {
+int flags = fcntl(s-fd, F_GETFL);
+if ((flags  O_ACCMODE) != O_RDWR) {
+goto err_after_open;
+}
+
+s-fd_read = dup(s-fd);
+if (s-fd_read == -1) {
+goto err_after_open;
+}
+s-file_read = qemu_fopen_fd(s-fd_read);
+if (s-file_read == NULL) {
+close(s-fd_read);
+goto err_after_open;
+}
+}
+
 migrate_fd_connect(s);
 return 0;
 
diff --git a/migration-postcopy-stub.c b/migration-postcopy-stub.c
index f9ebcbe..9c64827 100644
--- a/migration-postcopy-stub.c
+++ b/migration-postcopy-stub.c
@@ -24,6 +24,28 @@
 #include sysemu.h
 #include migration.h
 
+int postcopy_outgoing_create_read_socket(MigrationState *s)
+{
+return -ENOSYS;
+}
+
+int postcopy_outgoing_ram_save_live(Monitor *mon,
+QEMUFile *f, int stage, void *opaque)
+{
+return -ENOSYS;
+}
+
+void *postcopy_outgoing_begin(MigrationState *ms)
+{
+return NULL;
+}
+
+int postcopy_outgoing_ram_save_background(Monitor *mon, QEMUFile *f,
+  void *postcopy)
+{
+return -ENOSYS;
+}
+
 int postcopy_incoming_init(const char *incoming, bool incoming_postcopy)
 {
 return -ENOSYS;
diff --git a/migration-postcopy.c b/migration-postcopy.c
index 5913e05..eb37094 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -177,6 +177,456 @@ static void postcopy_incoming_send_req(QEMUFile *f,
 }
 }
 
+static int postcopy_outgoing_recv_req_idstr(QEMUFile *f,
+struct qemu_umem_req *req,
+size_t *offset)
+{
+int ret;
+
+req-len = qemu_peek_byte(f, *offset);
+*offset += 1;
+if (req-len == 0) {
+return -EAGAIN;
+}
+req-idstr = g_malloc((int)req-len + 1);
+ret = qemu_peek_buffer(f, (uint8_t*)req-idstr, req-len, *offset);
+*offset += ret;
+if (ret != req-len) {
+g_free(req-idstr);
+

[PATCH v2 38/41] postcopy/outgoing: add forward, backward option to specify the size of prefault

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 hmp-commands.hx  |   15 ++-
 hmp.c|3 +++
 migration.c  |   20 
 migration.h  |2 ++
 qapi-schema.json |3 ++-
 5 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 3c647f7..38e5c95 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -798,26 +798,31 @@ ETEXI
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
-.params = [-d] [-b] [-i] [-p [-n]] uri,
+.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
+ forward:i?,backward:i?,
+.params = [-d] [-b] [-i] [-p [-n] uri [forward] [backword],
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
  (base image shared between src and destination)
  \n\t\t\t-p for migration with postcopy mode enabled
- \n\t\t\t-n for no background transfer of postcopy mode,
+ \n\t\t\t-n for no background transfer of postcopy mode
+ \n\t\t\tforward: the number of pages to 
+ forward-prefault when postcopy (default 0)
+ \n\t\t\tbackward: the number of pages to 
+ backward-prefault when postcopy (default 0),
 .mhandler.cmd = hmp_migrate,
 },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri}
+@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri} @var{forward} @var{backward}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
-b for migration with full copy of disk
-i for migration with incremental copy of disk (base image is shared)
-   -p for migration with postcopy mode enabled
+   -p for migration with postcopy mode enabled (forward/backward is 
prefault size when postcopy)
-n for migration with postcopy mode enabled without background transfer
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index d546a52..79a9c86 100644
--- a/hmp.c
+++ b/hmp.c
@@ -913,11 +913,14 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 int inc = qdict_get_try_bool(qdict, inc, 0);
 int postcopy = qdict_get_try_bool(qdict, postcopy, 0);
 int nobg = qdict_get_try_bool(qdict, nobg, 0);
+int forward = qdict_get_try_int(qdict, forward, 0);
+int backward = qdict_get_try_int(qdict, backward, 0);
 const char *uri = qdict_get_str(qdict, uri);
 Error *err = NULL;
 
 qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
 !!postcopy, postcopy, !!nobg, nobg,
+!!forward, forward, !!backward, backward,
 err);
 if (err) {
 monitor_printf(mon, migrate: %s\n, error_get_pretty(err));
diff --git a/migration.c b/migration.c
index e8be0d1..e026085 100644
--- a/migration.c
+++ b/migration.c
@@ -423,6 +423,8 @@ void migrate_del_blocker(Error *reason)
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
  bool has_inc, bool inc, bool has_detach, bool detach,
  bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
+ bool has_forward, int64_t forward,
+ bool has_backward, int64_t backward,
  Error **errp)
 {
 MigrationState *s = migrate_get_current();
@@ -431,6 +433,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 .shared = false,
 .postcopy = false,
 .nobg = false,
+.prefault_forward = 0,
+.prefault_backward = 0,
 };
 const char *p;
 int ret;
@@ -447,6 +451,22 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 if (has_nobg) {
 params.nobg = nobg;
 }
+if (has_forward) {
+if (forward  0) {
+error_set(errp, QERR_INVALID_PARAMETER_VALUE,
+  forward, forward = 0);
+return;
+}
+params.prefault_forward = forward;
+}
+if (has_backward) {
+if (backward  0) {
+error_set(errp, QERR_INVALID_PARAMETER_VALUE,
+  backward, backward = 0);
+return;
+}
+params.prefault_backward = backward;
+}
 
 if (s-state == MIG_STATE_ACTIVE) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
diff --git a/migration.h b/migration.h
index 90f3bdf..9a9b9c6 100644
--- a/migration.h
+++ b/migration.h
@@ -24,6 +24,8 @@ struct MigrationParams {
 int shared;
 int postcopy;
 int nobg;
+int64_t prefault_forward;
+int64_t prefault_backward;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/qapi-schema.json b/qapi-schema.json
index 5861fb9..83c2170 100644

[PATCH v2 32/41] savevm: add new section that is used by postcopy

2012-06-04 Thread Isaku Yamahata
This is used by postcopy to tell the total length of QEMU_VM_SECTION_FULL
and QEMU_VM_SUBSECTION from outgoing to incoming.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 savevm.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/savevm.c b/savevm.c
index 318ec61..3adabad 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1597,6 +1597,7 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
 #define QEMU_VM_SECTION_END  0x03
 #define QEMU_VM_SECTION_FULL 0x04
 #define QEMU_VM_SUBSECTION   0x05
+#define QEMU_VM_POSTCOPY 0x10
 
 bool qemu_savevm_state_blocked(Error **errp)
 {
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 39/41] postcopy/outgoing: implement prefault

2012-06-04 Thread Isaku Yamahata
When page is requested, send surrounding pages are also sent.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration-postcopy.c |   56 +
 1 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/migration-postcopy.c b/migration-postcopy.c
index eb37094..6165657 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -353,6 +353,36 @@ int postcopy_outgoing_ram_save_live(QEMUFile *f, int 
stage, void *opaque)
 return ret;
 }
 
+static void postcopy_outgoing_ram_save_page(PostcopyOutgoingState *s,
+uint64_t pgoffset, bool *written,
+bool forward,
+int prefault_pgoffset)
+{
+ram_addr_t offset;
+int ret;
+
+if (forward) {
+pgoffset += prefault_pgoffset;
+} else {
+if (pgoffset  prefault_pgoffset) {
+return;
+}
+pgoffset -= prefault_pgoffset;
+}
+
+offset = pgoffset  TARGET_PAGE_BITS;
+if (offset = s-last_block_read-length) {
+assert(forward);
+assert(prefault_pgoffset  0);
+return;
+}
+
+ret = ram_save_page(s-mig_buffered_write, s-last_block_read, offset);
+if (ret  0) {
+*written = true;
+}
+}
+
 /*
  * return value
  *   0: continue postcopy mode
@@ -364,6 +394,7 @@ static int 
postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
 bool *written)
 {
 int i;
+uint64_t j;
 RAMBlock *block;
 
 DPRINTF(cmd %d state %d\n, req-cmd, s-state);
@@ -398,11 +429,26 @@ static int 
postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
 break;
 }
 for (i = 0; i  req-nr; i++) {
-DPRINTF(offs[%d] 0x%PRIx64\n, i, req-pgoffs[i]);
-int ret = ram_save_page(s-mig_buffered_write, s-last_block_read,
-req-pgoffs[i]  TARGET_PAGE_BITS);
-if (ret  0) {
-*written = true;
+DPRINTF(pgoffs[%d] 0x%PRIx64\n, i, req-pgoffs[i]);
+postcopy_outgoing_ram_save_page(s, req-pgoffs[i], written,
+true, 0);
+}
+/* forward prefault */
+for (j = 1; j = s-ms-params.prefault_forward; j++) {
+for (i = 0; i  req-nr; i++) {
+DPRINTF(pgoffs[%d] + 0x%PRIx64 0x%PRIx64\n,
+i, j, req-pgoffs[i] + j);
+postcopy_outgoing_ram_save_page(s, req-pgoffs[i], written,
+true, j);
+}
+}
+/* backward prefault */
+for (j = 1; j = s-ms-params.prefault_backward; j++) {
+for (i = 0; i  req-nr; i++) {
+DPRINTF(pgoffs[%d] - 0x%PRIx64 0x%PRIx64\n,
+i, j, req-pgoffs[i] - j);
+postcopy_outgoing_ram_save_page(s, req-pgoffs[i], written,
+false, j);
 }
 }
 break;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 34/41] postcopy outgoing: add -p and -n option to migrate command

2012-06-04 Thread Isaku Yamahata
Added -p option to migrate command for postcopy mode and
introduce postcopy parameter for migration to indicate that postcopy mode
is enabled.
Add -n option for postcopy migration which indicates disabling background
transfer.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Chnages v1 - v2:
- catch up for qapi change
---
 hmp-commands.hx  |   12 
 hmp.c|6 +-
 migration.c  |9 +
 migration.h  |2 ++
 qapi-schema.json |3 ++-
 qmp-commands.hx  |4 +++-
 savevm.c |2 ++
 7 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 18cb415..3c647f7 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -798,23 +798,27 @@ ETEXI
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,uri:s,
-.params = [-d] [-b] [-i] uri,
+.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
+.params = [-d] [-b] [-i] [-p [-n]] uri,
 .help   = migrate to URI (using -d to not wait for completion)
  \n\t\t\t -b for migration without shared storage with
   full copy of disk\n\t\t\t -i for migration without 
  shared storage with incremental copy of disk 
- (base image shared between src and destination),
+ (base image shared between src and destination)
+ \n\t\t\t-p for migration with postcopy mode enabled
+ \n\t\t\t-n for no background transfer of postcopy mode,
 .mhandler.cmd = hmp_migrate,
 },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] @var{uri}
+@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
-b for migration with full copy of disk
-i for migration with incremental copy of disk (base image is shared)
+   -p for migration with postcopy mode enabled
+   -n for migration with postcopy mode enabled without background transfer
 ETEXI
 
 {
diff --git a/hmp.c b/hmp.c
index bb0952e..d546a52 100644
--- a/hmp.c
+++ b/hmp.c
@@ -911,10 +911,14 @@ void hmp_migrate(Monitor *mon, const QDict *qdict)
 int detach = qdict_get_try_bool(qdict, detach, 0);
 int blk = qdict_get_try_bool(qdict, blk, 0);
 int inc = qdict_get_try_bool(qdict, inc, 0);
+int postcopy = qdict_get_try_bool(qdict, postcopy, 0);
+int nobg = qdict_get_try_bool(qdict, nobg, 0);
 const char *uri = qdict_get_str(qdict, uri);
 Error *err = NULL;
 
-qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false, err);
+qmp_migrate(uri, !!blk, blk, !!inc, inc, false, false,
+!!postcopy, postcopy, !!nobg, nobg,
+err);
 if (err) {
 monitor_printf(mon, migrate: %s\n, error_get_pretty(err));
 error_free(err);
diff --git a/migration.c b/migration.c
index 3b97aec..7ad62ef 100644
--- a/migration.c
+++ b/migration.c
@@ -388,12 +388,15 @@ void migrate_del_blocker(Error *reason)
 
 void qmp_migrate(const char *uri, bool has_blk, bool blk,
  bool has_inc, bool inc, bool has_detach, bool detach,
+ bool has_postcopy, bool postcopy, bool has_nobg, bool nobg,
  Error **errp)
 {
 MigrationState *s = migrate_get_current();
 MigrationParams params = {
 .blk = false,
 .shared = false,
+.postcopy = false,
+.nobg = false,
 };
 const char *p;
 int ret;
@@ -404,6 +407,12 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 if (has_inc) {
 params.shared = inc;
 }
+if (has_postcopy) {
+params.postcopy = postcopy;
+}
+if (has_nobg) {
+params.nobg = nobg;
+}
 
 if (s-state == MIG_STATE_ACTIVE) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
diff --git a/migration.h b/migration.h
index 4bbcf06..091b446 100644
--- a/migration.h
+++ b/migration.h
@@ -22,6 +22,8 @@
 struct MigrationParams {
 int blk;
 int shared;
+int postcopy;
+int nobg;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/qapi-schema.json b/qapi-schema.json
index 2ca7195..5861fb9 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1717,7 +1717,8 @@
 # Since: 0.14.0
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' } }
+  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' ,
+   '*postcopy': 'bool', '*nobg': 'bool'} }
 
 # @xen-save-devices-state:
 #
diff --git a/qmp-commands.hx b/qmp-commands.hx
index db980fa..7b5e5b7 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -469,7 +469,7 @@ EQMP
 
 {
 .name   = migrate,
-.args_type  = detach:-d,blk:-b,inc:-i,uri:s,
+.args_type  = detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s,
 .mhandler.cmd_new = qmp_marshal_input_migrate,
 },
 
@@ -483,6 

[PATCH v2 41/41] migration/postcopy: add movebg mode

2012-06-04 Thread Isaku Yamahata
When movebg mode is enabled, the point to send background page is set
to the next page to on-demand page.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration-postcopy.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/migration-postcopy.c b/migration-postcopy.c
index 6165657..3df88d7 100644
--- a/migration-postcopy.c
+++ b/migration-postcopy.c
@@ -442,6 +442,14 @@ static int 
postcopy_outgoing_handle_req(PostcopyOutgoingState *s,
 true, j);
 }
 }
+if (s-ms-params.movebg) {
+ram_addr_t last_offset =
+(req-pgoffs[req-nr - 1] + s-ms-params.prefault_forward) 
+TARGET_PAGE_BITS;
+last_offset = MIN(last_offset,
+  s-last_block_read-length - TARGET_PAGE_SIZE);
+ram_save_set_last_block(s-last_block_read, last_offset);
+}
 /* backward prefault */
 for (j = 1; j = s-ms-params.prefault_backward; j++) {
 for (i = 0; i  req-nr; i++) {
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 36/41] postcopy: implement incoming part of postcopy live migration

2012-06-04 Thread Isaku Yamahata
This patch implements postcopy live migration for incoming part

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Changes v3 - v4:
- fork umemd early to address qemu devices touching guest ram via
  post/pre_load
- code clean up on initialization
- Makefile.target
  migration-postcopy.c is target dependent due to TARGET_PAGE_xxx
  So it can't be shared between target architecture.
- use qemu_fopen_fd
- introduce incoming_flags_use_umem_make_present flag
- use MADV_DONTNEED

Changes v2 - v3:
- make incoming socket nonblocking
- several clean ups
- Dropped QEMUFilePipe
- Moved QEMUFileNonblock to buffered_file
- Split out into umem/incoming/outgoing

Changes v1 - v2:
- make mig_read nonblocking when socket
- updates for umem device changes
---
 Makefile.target|5 +
 cpu-all.h  |7 +
 exec.c |   20 +-
 migration-exec.c   |4 +
 migration-fd.c |6 +
 .../linux/umem.h = migration-postcopy-stub.c  |   47 +-
 migration-postcopy.c   | 1267 
 migration.c|4 +
 migration.h|   13 +
 qemu-common.h  |1 +
 qemu-options.hx|5 +-
 savevm.c   |   43 +
 vl.c   |8 +-
 13 files changed, 1409 insertions(+), 21 deletions(-)
 copy linux-headers/linux/umem.h = migration-postcopy-stub.c (55%)
 create mode 100644 migration-postcopy.c

diff --git a/Makefile.target b/Makefile.target
index 1582904..618bd3e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -4,6 +4,7 @@ GENERATED_HEADERS = config-target.h
 CONFIG_NO_PCI = $(if $(subst n,,$(CONFIG_PCI)),n,y)
 CONFIG_NO_KVM = $(if $(subst n,,$(CONFIG_KVM)),n,y)
 CONFIG_NO_XEN = $(if $(subst n,,$(CONFIG_XEN)),n,y)
+CONFIG_NO_POSTCOPY = $(if $(subst n,,$(CONFIG_POSTCOPY)),n,y)
 
 include ../config-host.mak
 include config-devices.mak
@@ -196,6 +197,10 @@ LIBS+=-lz
 
 obj-i386-$(CONFIG_KVM) += hyperv.o
 
+obj-$(CONFIG_POSTCOPY) += migration-postcopy.o
+obj-$(CONFIG_NO_POSTCOPY) += migration-postcopy-stub.o
+common-obj-$(CONFIG_POSTCOPY) += umem.o
+
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
 QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
 QEMU_CFLAGS += $(VNC_JPEG_CFLAGS)
diff --git a/cpu-all.h b/cpu-all.h
index ff7f827..e0956bc 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -486,6 +486,9 @@ extern ram_addr_t ram_size;
 /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
 #define RAM_PREALLOC_MASK   (1  0)
 
+/* RAM is allocated via umem for postcopy incoming mode */
+#define RAM_POSTCOPY_UMEM_MASK  (1  1)
+
 typedef struct RAMBlock {
 struct MemoryRegion *mr;
 uint8_t *host;
@@ -497,6 +500,10 @@ typedef struct RAMBlock {
 #if defined(__linux__)  !defined(TARGET_S390X)
 int fd;
 #endif
+
+#ifdef CONFIG_POSTCOPY
+UMem *umem;/* for incoming postcopy mode */
+#endif
 } RAMBlock;
 
 typedef struct RAMList {
diff --git a/exec.c b/exec.c
index 785..e5ff2ed 100644
--- a/exec.c
+++ b/exec.c
@@ -36,6 +36,7 @@
 #include arch_init.h
 #include memory.h
 #include exec-memory.h
+#include migration.h
 #if defined(CONFIG_USER_ONLY)
 #include qemu.h
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
@@ -2632,6 +2633,13 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 new_block-host = host;
 new_block-flags |= RAM_PREALLOC_MASK;
 } else {
+#ifdef CONFIG_POSTCOPY
+if (incoming_postcopy) {
+ram_addr_t page_size = getpagesize();
+size = (size + page_size - 1)  ~(page_size - 1);
+mem_path = NULL;
+}
+#endif
 if (mem_path) {
 #if defined (__linux__)  !defined(TARGET_S390X)
 new_block-host = file_ram_alloc(new_block, size, mem_path);
@@ -2709,7 +2717,13 @@ void qemu_ram_free(ram_addr_t addr)
 QLIST_REMOVE(block, next);
 if (block-flags  RAM_PREALLOC_MASK) {
 ;
-} else if (mem_path) {
+}
+#ifdef CONFIG_POSTCOPY
+else if (block-flags  RAM_POSTCOPY_UMEM_MASK) {
+postcopy_incoming_ram_free(block-umem);
+}
+#endif
+else if (mem_path) {
 #if defined (__linux__)  !defined(TARGET_S390X)
 if (block-fd) {
 munmap(block-host, block-length);
@@ -2755,6 +2769,10 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 } else {
 flags = MAP_FIXED;
 munmap(vaddr, length);
+if (block-flags  RAM_POSTCOPY_UMEM_MASK) {
+postcopy_incoming_qemu_pages_unmapped(addr, length);
+block-flags = ~RAM_POSTCOPY_UMEM_MASK;
+}
 if 

[PATCH v2 10/41] arch_init: simplify a bit by ram_find_block()

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   21 -
 exec.c  |   12 ++--
 2 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 9981abe..73bf250 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -432,11 +432,10 @@ void *ram_load_host_from_stream_offset(QEMUFile *f,
 qemu_get_buffer(f, (uint8_t *)id, len);
 id[len] = 0;
 
-QLIST_FOREACH(block, ram_list.blocks, next) {
-if (!strncmp(id, block-idstr, sizeof(id))) {
-*last_blockp = block;
-return memory_region_get_ram_ptr(block-mr) + offset;
-}
+block = ram_find_block(id, len);
+if (block) {
+*last_blockp = block;
+return memory_region_get_ram_ptr(block-mr) + offset;
 }
 
 fprintf(stderr, Can't find block %s!\n, id);
@@ -466,19 +465,15 @@ int ram_load_mem_size(QEMUFile *f, ram_addr_t 
total_ram_bytes)
 id[len] = 0;
 length = qemu_get_be64(f);
 
-QLIST_FOREACH(block, ram_list.blocks, next) {
-if (!strncmp(id, block-idstr, sizeof(id))) {
-if (block-length != length)
-return -EINVAL;
-break;
-}
-}
-
+block = ram_find_block(id, len);
 if (!block) {
 fprintf(stderr, Unknown ramblock \%s\, cannot 
 accept migration\n, id);
 return -EINVAL;
 }
+if (block-length != length) {
+return -EINVAL;
+}
 
 total_ram_bytes -= length;
 }
diff --git a/exec.c b/exec.c
index a0494c7..078a408 100644
--- a/exec.c
+++ b/exec.c
@@ -33,6 +33,7 @@
 #include kvm.h
 #include hw/xen.h
 #include qemu-timer.h
+#include arch_init.h
 #include memory.h
 #include exec-memory.h
 #if defined(CONFIG_USER_ONLY)
@@ -2609,12 +2610,11 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char 
*name, DeviceState *dev)
 }
 pstrcat(new_block-idstr, sizeof(new_block-idstr), name);
 
-QLIST_FOREACH(block, ram_list.blocks, next) {
-if (block != new_block  !strcmp(block-idstr, new_block-idstr)) {
-fprintf(stderr, RAMBlock \%s\ already registered, abort!\n,
-new_block-idstr);
-abort();
-}
+block = ram_find_block(new_block-idstr, strlen(new_block-idstr));
+if (block != new_block) {
+fprintf(stderr, RAMBlock \%s\ already registered, abort!\n,
+new_block-idstr);
+abort();
 }
 }
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] export necessary symbols

2012-06-04 Thread Isaku Yamahata
Cc: Andrea Arcangeli aarca...@redhat.com
Cc: Avi Kivity a...@redhat.com
Cc: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 mm/memcontrol.c |1 +
 mm/mempolicy.c  |1 +
 mm/shmem.c  |1 +
 3 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ac35bcc..265ba2f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2761,6 +2761,7 @@ int mem_cgroup_cache_charge(struct page *page, struct 
mm_struct *mm,
}
return ret;
 }
+EXPORT_SYMBOL_GPL(mem_cgroup_cache_charge);
 
 /*
  * While swap-in, try_charge - commit or cancel, the page is locked.
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f15c1b2..ede02e2 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1907,6 +1907,7 @@ retry_cpuset:
goto retry_cpuset;
return page;
 }
+EXPORT_SYMBOL_GPL(alloc_pages_vma);
 
 /**
  * alloc_pages_current - Allocate pages.
diff --git a/mm/shmem.c b/mm/shmem.c
index 585bd22..f2b8aa7 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3041,6 +3041,7 @@ int shmem_zero_setup(struct vm_area_struct *vma)
vma-vm_flags |= VM_CAN_NONLINEAR;
return 0;
 }
+EXPORT_SYMBOL_GPL(shmem_zero_setup);
 
 /**
  * shmem_read_mapping_page_gfp - read into page cache, using specified page 
allocation flags.
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] postcopy migration: umem: Linux char device for postcopy

2012-06-04 Thread Isaku Yamahata
This is Linux kernel driver for qemu/kvm postcopy live migration.
This is used by qemu/kvm postcopy live migration patch.

TODO:
- Consider FUSE/CUSE option
  So far several mmap patches for FUSE/CUSE are floating around. (their
  purpose isn't different from our purpose, though). They haven't merged
  into the upstream yet.
  The driver specific part in qemu patches is modularized. So I expect it
  wouldn't be difficult to switch kernel driver to CUSE based driver.

ioctl commands:
UMEM_INIT: initialize umem device for qemu
UMEM_MAKE_VMA_ANONYMOUS: make the specified vma in the qemu process
 This is _NOT_ implemented yet.
 anonymous I'm not sure whether this can be implemented
 or not.
---
Changes v2 - v3:
- make fault handler killable
- make use of read()/write()
- documentation

Changes version 1 - 2:
- make ioctl structures padded to align
- un-KVM
  KVM_VMEM - UMEM
- dropped some ioctl commands as Avi requested

Isaku Yamahata (2):
  export necessary symbols
  umem: chardevice for kvm postcopy

 Documentation/misc-devices/umem.txt |  303 
 drivers/char/Kconfig|   10 +
 drivers/char/Makefile   |1 +
 drivers/char/umem.c |  900 +++
 include/linux/umem.h|   42 ++
 mm/memcontrol.c |1 +
 mm/mempolicy.c  |1 +
 mm/shmem.c  |1 +
 8 files changed, 1259 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/misc-devices/umem.txt
 create mode 100644 drivers/char/umem.c
 create mode 100644 include/linux/umem.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] umem: chardevice for kvm postcopy

2012-06-04 Thread Isaku Yamahata
This is a character device to hook page access.
The page fault in the area is propagated to another user process by
this chardriver. Then, the process fills the page contents and
resolves the page fault.

Cc: Andrea Arcangeli aarca...@redhat.com
Cc: Avi Kivity a...@redhat.com
Cc: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp

---
Changes v3 - v4:
- simplified umem_init: kill {a,}sync_req_max
- make fault handler killable even when core-dumping
- documentation

Changes v2 - v3:
- made fault handler killable
- allow O_LARGEFILE
- improve to handle FAULT_FLAG_ALLOW_RETRY
- smart on async fault
---
 Documentation/misc-devices/umem.txt |  303 
 drivers/char/Kconfig|   10 +
 drivers/char/Makefile   |1 +
 drivers/char/umem.c |  900 +++
 include/linux/umem.h|   42 ++
 5 files changed, 1256 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/misc-devices/umem.txt
 create mode 100644 drivers/char/umem.c
 create mode 100644 include/linux/umem.h

diff --git a/Documentation/misc-devices/umem.txt 
b/Documentation/misc-devices/umem.txt
new file mode 100644
index 000..61bba5f
--- /dev/null
+++ b/Documentation/misc-devices/umem.txt
@@ -0,0 +1,303 @@
+User process backed memory driver
+=
+
+Intro
+=
+User process backed memory driver provides /dev/umem device.
+This /dev/umem device is designed for some sort of distributed shared memory.
+Especially post-copy live migration with KVM.
+
+page fault in the area backed by this driver is propagated to (other) server
+process which serves the page contents. Usually the server process fetches
+page contents from the remote machine. Then the faulting process continues.
+
+
+Kernel-User protocol
+
+ioctl
+UMEM_INIT: Initialize the umem device with some parameters.
+  IN size: the area size in bytes (which is rounded up to page size)
+  OUT shmem_fd: the file descript to tmpfs that is associated to this umem
+device This is served as backing store of this umem device.
+
+mmap: Mapping the initialized umem device provides the area which
+  is served by user process.
+  The fault in this area is propagated to umem device via read
+  system call.
+read: kernel notifies a process that pages are faulted by returning
+  page offset in page size in u64 format.
+  umem device is pollable for read.
+write: Process notifies kernel that the page is ready to access
+   by writing page offset in page size in u64 format.
+
+
+operation flow
+==
+
+|
+V
+  open(/dev/umem)
+|
+V
+  ioctl(UMEM_INIT)
+|
+V
+  Here we have two file descriptors to
+  umem device and shmem file
+|
+|  daemon process which serves
+|  page fault
+V
+  fork()---,
+|  |
+V  V
+  close(shmem) mmap(shmem file)
+|  |
+V  V
+  mmap(umem device)   close(shmem file)
+|  |
+V  |
+  close(umem device)   |
+|  |
+  now the setup is done|
+  work on the umem area|
+|  |
+V  V
+  access umem area (poll and) read(umem)
+|  |
+V  V
+  page fault -- read system call returns
+  block  page offsets
+   |
+   V
+create page contents
+(usually pull the page
+ from remote)
+write the page contents
+to the shmem which was
+mmapped above
+   |
+

[PATCH v2 28/41] buffered_file: add qemu_file to read/write to buffer in memory

2012-06-04 Thread Isaku Yamahata
This is used by postcopy live migration.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 buffered_file.c |   50 ++
 buffered_file.h |   10 ++
 2 files changed, 60 insertions(+), 0 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 5198923..4f0c98e 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -106,6 +106,56 @@ static void buffer_flush(QEMUBuffer *buf, QEMUFile *file,
 
 
 /***
+ * read/write to buffer on memory
+ */
+
+static int buf_close(void *opaque)
+{
+QEMUFileBuf *s = opaque;
+buffer_destroy(s-buf);
+g_free(s);
+return 0;
+}
+
+static int buf_put_buffer(void *opaque,
+  const uint8_t *buf, int64_t pos, int size)
+{
+QEMUFileBuf *s = opaque;
+buffer_append(s-buf, buf, size);
+return size;
+}
+
+QEMUFileBuf *qemu_fopen_buf_write(void)
+{
+QEMUFileBuf *s = g_malloc0(sizeof(*s));
+
+s-file = qemu_fopen_ops(s,  buf_put_buffer, NULL, buf_close,
+ NULL, NULL, NULL);
+return s;
+}
+
+static int buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size)
+{
+QEMUFileBuf *s = opaque;
+ssize_t len = MIN(size, s-buf.buffer_capacity - s-buf.buffer_size);
+memcpy(buf, s-buf.buffer + s-buf.buffer_size, len);
+s-buf.buffer_size += len;
+return len;
+}
+
+/* This get the ownership of buf. */
+QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size)
+{
+QEMUFileBuf *s = g_malloc0(sizeof(*s));
+s-buf.buffer = buf;
+s-buf.buffer_size = 0; /* this is used as index to read */
+s-buf.buffer_capacity = size;
+s-file = qemu_fopen_ops(s, NULL, buf_get_buffer, buf_close,
+ NULL, NULL, NULL);
+return s-file;
+}
+
+/***
  * Nonblocking write only file
  */
 static ssize_t nonblock_flush_buffer_putbuf(void *opaque,
diff --git a/buffered_file.h b/buffered_file.h
index 2712e01..9e28bef 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -24,6 +24,16 @@ struct QEMUBuffer {
 };
 typedef struct QEMUBuffer QEMUBuffer;
 
+struct QEMUFileBuf {
+QEMUFile *file;
+QEMUBuffer buf;
+};
+typedef struct QEMUFileBuf QEMUFileBuf;
+
+QEMUFileBuf *qemu_fopen_buf_write(void);
+/* This get the ownership of buf. */
+QEMUFile *qemu_fopen_buf_read(uint8_t *buf, size_t size);
+
 struct QEMUFileNonblock {
 int fd;
 QEMUFile *file;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 29/41] umem.h: import Linux umem.h

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 linux-headers/linux/umem.h |   42 ++
 1 files changed, 42 insertions(+), 0 deletions(-)
 create mode 100644 linux-headers/linux/umem.h

diff --git a/linux-headers/linux/umem.h b/linux-headers/linux/umem.h
new file mode 100644
index 000..0cf7399
--- /dev/null
+++ b/linux-headers/linux/umem.h
@@ -0,0 +1,42 @@
+/*
+ * User process backed memory.
+ * This is mainly for KVM post copy.
+ *
+ * Copyright (c) 2011,
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata yamahata at valinux co jp
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see http://www.gnu.org/licenses/.
+ */
+
+#ifndef __LINUX_UMEM_H
+#define __LINUX_UMEM_H
+
+#include linux/types.h
+#include linux/ioctl.h
+
+struct umem_init {
+   __u64 size; /* in bytes */
+   __s32 shmem_fd;
+   __s32 padding;
+};
+
+#define UMEMIO 0x1E
+
+/* ioctl for umem fd */
+#define UMEM_INIT  _IOWR(UMEMIO, 0x0, struct umem_init)
+#define UMEM_MAKE_VMA_ANONYMOUS_IO  (UMEMIO, 0x1)
+
+#endif /* __LINUX_UMEM_H */
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 31/41] configure: add CONFIG_POSTCOPY option

2012-06-04 Thread Isaku Yamahata
Add enable/disable postcopy mode. No dynamic test yet.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 configure |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 1f338f8..21de4cb 100755
--- a/configure
+++ b/configure
@@ -194,6 +194,7 @@ zlib=yes
 guest_agent=yes
 libiscsi=
 coroutine=
+postcopy=yes
 
 # parse CC options first
 for opt do
@@ -824,6 +825,10 @@ for opt do
   ;;
   --disable-guest-agent) guest_agent=no
   ;;
+  --enable-postcopy) postcopy=yes
+  ;;
+  --disable-postcopy) postcopy=no
+  ;;
   *) echo ERROR: unknown option $opt; show_help=yes
   ;;
   esac
@@ -1110,6 +1115,8 @@ echo   --disable-guest-agentdisable building of the 
QEMU Guest Agent
 echo   --enable-guest-agent enable building of the QEMU Guest Agent
 echo   --with-coroutine=BACKEND coroutine backend. Supported options:
 echogthread, ucontext, sigaltstack, windows
+echo   --disable-postcopy   disable postcopy mode for live migration
+echo   --enable-postcopyenable postcopy mode for live migration
 echo 
 echo NOTE: The object files are built at the place where configure is 
launched
 exit 1
@@ -3029,6 +3036,7 @@ echo OpenGL support$opengl
 echo libiscsi support  $libiscsi
 echo build guest agent $guest_agent
 echo coroutine backend $coroutine_backend
+echo postcopy support  $postcopy
 
 if test $sdl_too_old = yes; then
 echo - Your SDL version is too old - please upgrade to have SDL support
@@ -3329,6 +3337,10 @@ if test $libiscsi = yes ; then
   echo CONFIG_LIBISCSI=y  $config_host_mak
 fi
 
+if test $postcopy = yes ; then
+  echo CONFIG_POSTCOPY=y  $config_host_mak
+fi
+
 # XXX: suppress that
 if [ $bsd = yes ] ; then
   echo CONFIG_BSD=y  $config_host_mak
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 25/41] migration: factor out parameters into MigrationParams

2012-06-04 Thread Isaku Yamahata
Introduce MigrationParams for parameters of migration.

Cc: Orit Wasserman owass...@redhat.com
Cc: Juan Quintela quint...@redhat.com
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
Changes v1 - v2:
- catch up qapi change
---
 block-migration.c |8 
 migration.c   |   21 +++--
 migration.h   |8 ++--
 qemu-common.h |1 +
 savevm.c  |   10 +++---
 sysemu.h  |2 +-
 vmstate.h |2 +-
 7 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index fd2..b95b4e1 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -700,13 +700,13 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 return 0;
 }
 
-static void block_set_params(int blk_enable, int shared_base, void *opaque)
+static void block_set_params(const MigrationParams *params, void *opaque)
 {
-block_mig_state.blk_enable = blk_enable;
-block_mig_state.shared_base = shared_base;
+block_mig_state.blk_enable = params-blk;
+block_mig_state.shared_base = params-shared;
 
 /* shared base means that blk_enable = 1 */
-block_mig_state.blk_enable |= shared_base;
+block_mig_state.blk_enable |= params-shared;
 }
 
 void blk_mig_init(void)
diff --git a/migration.c b/migration.c
index 48a8f68..3b97aec 100644
--- a/migration.c
+++ b/migration.c
@@ -352,7 +352,7 @@ void migrate_fd_connect(MigrationState *s)
   migrate_fd_close);
 
 DPRINTF(beginning savevm\n);
-ret = qemu_savevm_state_begin(s-file, s-blk, s-shared);
+ret = qemu_savevm_state_begin(s-file, s-params);
 if (ret  0) {
 DPRINTF(failed, %d\n, ret);
 migrate_fd_error(s);
@@ -361,15 +361,13 @@ void migrate_fd_connect(MigrationState *s)
 migrate_fd_put_ready(s);
 }
 
-static MigrationState *migrate_init(int blk, int inc)
+static MigrationState *migrate_init(const MigrationParams *params)
 {
 MigrationState *s = migrate_get_current();
 int64_t bandwidth_limit = s-bandwidth_limit;
 
 memset(s, 0, sizeof(*s));
-s-blk = blk;
-s-shared = inc;
-
+s-params = *params;
 s-bandwidth_limit = bandwidth_limit;
 s-state = MIG_STATE_SETUP;
 
@@ -393,9 +391,20 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
  Error **errp)
 {
 MigrationState *s = migrate_get_current();
+MigrationParams params = {
+.blk = false,
+.shared = false,
+};
 const char *p;
 int ret;
 
+if (has_blk) {
+params.blk = blk;
+}
+if (has_inc) {
+params.shared = inc;
+}
+
 if (s-state == MIG_STATE_ACTIVE) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
 return;
@@ -410,7 +419,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 return;
 }
 
-s = migrate_init(blk, inc);
+s = migrate_init(params);
 
 if (strstart(uri, tcp:, p)) {
 ret = tcp_start_outgoing_migration(s, p, errp);
diff --git a/migration.h b/migration.h
index d0dd536..59e6e68 100644
--- a/migration.h
+++ b/migration.h
@@ -19,6 +19,11 @@
 #include notify.h
 #include error.h
 
+struct MigrationParams {
+int blk;
+int shared;
+};
+
 typedef struct MigrationState MigrationState;
 
 struct MigrationState
@@ -31,8 +36,7 @@ struct MigrationState
 int (*close)(MigrationState *s);
 int (*write)(MigrationState *s, const void *buff, size_t size);
 void *opaque;
-int blk;
-int shared;
+MigrationParams params;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/qemu-common.h b/qemu-common.h
index 91e0562..057c810 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -263,6 +263,7 @@ typedef struct EventNotifier EventNotifier;
 typedef struct VirtIODevice VirtIODevice;
 typedef struct QEMUSGList QEMUSGList;
 typedef struct SHPCDevice SHPCDevice;
+typedef struct MigrationParams MigrationParams;
 
 typedef uint64_t pcibus_t;
 
diff --git a/savevm.c b/savevm.c
index 5640614..318ec61 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1611,7 +1611,7 @@ bool qemu_savevm_state_blocked(Error **errp)
 return false;
 }
 
-int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, int shared)
+int qemu_savevm_state_begin(QEMUFile *f, const MigrationParams *params)
 {
 SaveStateEntry *se;
 int ret;
@@ -1620,7 +1620,7 @@ int qemu_savevm_state_begin(QEMUFile *f, int blk_enable, 
int shared)
 if(se-set_params == NULL) {
 continue;
}
-   se-set_params(blk_enable, shared, se-opaque);
+   se-set_params(params, se-opaque);
 }
 
 qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
@@ -1758,13 +1758,17 @@ void qemu_savevm_state_cancel(QEMUFile *f)
 static int qemu_savevm_state(QEMUFile *f)
 {
 int ret;
+MigrationParams params = {
+.blk = 0,
+.shared = 0,
+};
 
 if (qemu_savevm_state_blocked(NULL)) {
 ret = -EINVAL;
 goto out;
 }
 
-ret = 

[PATCH v2 23/41] migration.c: remove redundant line in migrate_init()

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/migration.c b/migration.c
index 3f485d3..753addb 100644
--- a/migration.c
+++ b/migration.c
@@ -367,7 +367,6 @@ static MigrationState *migrate_init(int blk, int inc)
 int64_t bandwidth_limit = s-bandwidth_limit;
 
 memset(s, 0, sizeof(*s));
-s-bandwidth_limit = bandwidth_limit;
 s-blk = blk;
 s-shared = inc;
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 27/41] buffered_file: Introduce QEMUFileNonblock for nonblock write

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 buffered_file.c |  115 +++
 buffered_file.h |   13 ++
 2 files changed, 128 insertions(+), 0 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index 22dd4c9..5198923 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -106,6 +106,121 @@ static void buffer_flush(QEMUBuffer *buf, QEMUFile *file,
 
 
 /***
+ * Nonblocking write only file
+ */
+static ssize_t nonblock_flush_buffer_putbuf(void *opaque,
+const void *data, size_t size)
+{
+QEMUFileNonblock *s = opaque;
+ssize_t ret = write(s-fd, data, size);
+if (ret == -1) {
+return -errno;
+}
+return ret;
+}
+
+static void nonblock_flush_buffer(QEMUFileNonblock *s)
+{
+buffer_flush(s-buf, s-file, s, nonblock_flush_buffer_putbuf);
+
+if (s-buf.buffer_size  0) {
+s-buf.freeze_output = true;
+}
+}
+
+static int nonblock_put_buffer(void *opaque,
+   const uint8_t *buf, int64_t pos, int size)
+{
+QEMUFileNonblock *s = opaque;
+int error;
+ssize_t len = 0;
+
+error = qemu_file_get_error(s-file);
+if (error) {
+return error;
+}
+
+nonblock_flush_buffer(s);
+error = qemu_file_get_error(s-file);
+if (error) {
+return error;
+}
+
+while (!s-buf.freeze_output  size  0) {
+ssize_t ret;
+assert(s-buf.buffer_size == 0);
+
+ret = write(s-fd, buf, size);
+if (ret == -1) {
+if (errno == EINTR) {
+continue;
+} else if (errno == EAGAIN) {
+s-buf.freeze_output = true;
+} else {
+qemu_file_set_error(s-file, errno);
+}
+break;
+}
+
+len += ret;
+buf += ret;
+size -= ret;
+}
+
+if (size  0) {
+buffer_append(s-buf, buf, size);
+len += size;
+}
+return len;
+}
+
+int nonblock_pending_size(QEMUFileNonblock *s)
+{
+return qemu_pending_size(s-file) + s-buf.buffer_size;
+}
+
+void nonblock_fflush(QEMUFileNonblock *s)
+{
+s-buf.freeze_output = false;
+nonblock_flush_buffer(s);
+if (!s-buf.freeze_output) {
+qemu_fflush(s-file);
+}
+}
+
+void nonblock_wait_for_flush(QEMUFileNonblock *s)
+{
+while (nonblock_pending_size(s)  0) {
+fd_set fds;
+FD_ZERO(fds);
+FD_SET(s-fd, fds);
+select(s-fd + 1, NULL, fds, NULL, NULL);
+
+nonblock_fflush(s);
+}
+}
+
+static int nonblock_close(void *opaque)
+{
+QEMUFileNonblock *s = opaque;
+nonblock_wait_for_flush(s);
+buffer_destroy(s-buf);
+g_free(s);
+return 0;
+}
+
+QEMUFileNonblock *qemu_fopen_nonblock(int fd)
+{
+QEMUFileNonblock *s = g_malloc0(sizeof(*s));
+
+s-fd = fd;
+fcntl_setfl(fd, O_NONBLOCK);
+s-file = qemu_fopen_ops(s, nonblock_put_buffer, NULL, nonblock_close,
+ NULL, NULL, NULL);
+return s;
+}
+
+/***
  * Buffered File
  */
 
diff --git a/buffered_file.h b/buffered_file.h
index d3ef546..2712e01 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -24,6 +24,19 @@ struct QEMUBuffer {
 };
 typedef struct QEMUBuffer QEMUBuffer;
 
+struct QEMUFileNonblock {
+int fd;
+QEMUFile *file;
+
+QEMUBuffer buf;
+};
+typedef struct QEMUFileNonblock QEMUFileNonblock;
+
+QEMUFileNonblock *qemu_fopen_nonblock(int fd);
+int nonblock_pending_size(QEMUFileNonblock *s);
+void nonblock_fflush(QEMUFileNonblock *s);
+void nonblock_wait_for_flush(QEMUFileNonblock *s);
+
 typedef ssize_t (BufferedPutFunc)(void *opaque, const void *data, size_t size);
 typedef void (BufferedPutReadyFunc)(void *opaque);
 typedef void (BufferedWaitForUnfreezeFunc)(void *opaque);
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 19/41] savevm/QEMUFile: drop qemu_stdio_fd

2012-06-04 Thread Isaku Yamahata
Now qemu_file_fd() replaces qemu_stdio_fd().

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 migration-exec.c |4 ++--
 migration-fd.c   |2 +-
 qemu-file.h  |1 -
 savevm.c |   12 
 4 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/migration-exec.c b/migration-exec.c
index 6c97db9..95e9779 100644
--- a/migration-exec.c
+++ b/migration-exec.c
@@ -98,7 +98,7 @@ static void exec_accept_incoming_migration(void *opaque)
 QEMUFile *f = opaque;
 
 process_incoming_migration(f);
-qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL);
+qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL);
 qemu_fclose(f);
 }
 
@@ -113,7 +113,7 @@ int exec_start_incoming_migration(const char *command)
 return -errno;
 }
 
-qemu_set_fd_handler2(qemu_stdio_fd(f), NULL,
+qemu_set_fd_handler2(qemu_file_fd(f), NULL,
 exec_accept_incoming_migration, NULL, f);
 
 return 0;
diff --git a/migration-fd.c b/migration-fd.c
index 50138ed..d9c13fe 100644
--- a/migration-fd.c
+++ b/migration-fd.c
@@ -104,7 +104,7 @@ static void fd_accept_incoming_migration(void *opaque)
 QEMUFile *f = opaque;
 
 process_incoming_migration(f);
-qemu_set_fd_handler2(qemu_stdio_fd(f), NULL, NULL, NULL, NULL);
+qemu_set_fd_handler2(qemu_file_fd(f), NULL, NULL, NULL, NULL);
 qemu_fclose(f);
 }
 
diff --git a/qemu-file.h b/qemu-file.h
index 98a8023..1a12e7d 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -70,7 +70,6 @@ QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
-int qemu_stdio_fd(QEMUFile *f);
 int qemu_file_fd(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_buffered_file_drain(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index cba1a69..ec9f5d0 100644
--- a/savevm.c
+++ b/savevm.c
@@ -293,18 +293,6 @@ QEMUFile *qemu_popen_cmd(const char *command, const char 
*mode)
 return qemu_popen(popen_file, mode);
 }
 
-/* TODO: replace this with qemu_file_fd() */
-int qemu_stdio_fd(QEMUFile *f)
-{
-QEMUFileStdio *p;
-int fd;
-
-p = (QEMUFileStdio *)f-opaque;
-fd = fileno(p-stdio_file);
-
-return fd;
-}
-
 QEMUFile *qemu_fdopen(int fd, const char *mode)
 {
 QEMUFileStdio *s;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 30/41] update-linux-headers.sh: teach umem.h to update-linux-headers.sh

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 scripts/update-linux-headers.sh |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 9d2a4bc..2afdd54 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -43,7 +43,7 @@ done
 
 rm -rf $output/linux-headers/linux
 mkdir -p $output/linux-headers/linux
-for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do
+for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h umem.h; do
 cp $tmpdir/include/linux/$header $output/linux-headers/linux
 done
 if [ -L $linux/source ]; then
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 26/41] buffered_file: factor out buffer management logic

2012-06-04 Thread Isaku Yamahata
This patch factors out buffer management logic.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 buffered_file.c |  141 +-
 buffered_file.h |8 +++
 2 files changed, 94 insertions(+), 55 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index a38caec..22dd4c9 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -20,24 +20,6 @@
 #include buffered_file.h
 
 //#define DEBUG_BUFFERED_FILE
-
-typedef struct QEMUFileBuffered
-{
-BufferedPutFunc *put_buffer;
-BufferedPutReadyFunc *put_ready;
-BufferedWaitForUnfreezeFunc *wait_for_unfreeze;
-BufferedCloseFunc *close;
-void *opaque;
-QEMUFile *file;
-int freeze_output;
-size_t bytes_xfer;
-size_t xfer_limit;
-uint8_t *buffer;
-size_t buffer_size;
-size_t buffer_capacity;
-QEMUTimer *timer;
-} QEMUFileBuffered;
-
 #ifdef DEBUG_BUFFERED_FILE
 #define DPRINTF(fmt, ...) \
 do { printf(buffered-file:  fmt, ## __VA_ARGS__); } while (0)
@@ -46,57 +28,71 @@ typedef struct QEMUFileBuffered
 do { } while (0)
 #endif
 
-static void buffered_append(QEMUFileBuffered *s,
-const uint8_t *buf, size_t size)
-{
-if (size  (s-buffer_capacity - s-buffer_size)) {
-void *tmp;
-
-DPRINTF(increasing buffer capacity from %zu by %zu\n,
-s-buffer_capacity, size + 1024);
 
-s-buffer_capacity += size + 1024;
+/***
+ * buffer management
+ */
 
-tmp = g_realloc(s-buffer, s-buffer_capacity);
-if (tmp == NULL) {
-fprintf(stderr, qemu file buffer expansion failed\n);
-exit(1);
-}
+static void buffer_destroy(QEMUBuffer *s)
+{
+g_free(s-buffer);
+}
 
-s-buffer = tmp;
+static void buffer_consume(QEMUBuffer *s, size_t offset)
+{
+if (offset  0) {
+assert(s-buffer_size = offset);
+memmove(s-buffer, s-buffer + offset, s-buffer_size - offset);
+s-buffer_size -= offset;
 }
+}
 
+static void buffer_append(QEMUBuffer *s, const uint8_t *buf, size_t size)
+{
+#define BUF_SIZE_INC(32 * 1024) /* = IO_BUF_SIZE */
+int inc = size - (s-buffer_capacity - s-buffer_size);
+if (inc  0) {
+s-buffer_capacity += DIV_ROUND_UP(inc, BUF_SIZE_INC) * BUF_SIZE_INC;
+s-buffer = g_realloc(s-buffer, s-buffer_capacity);
+}
 memcpy(s-buffer + s-buffer_size, buf, size);
 s-buffer_size += size;
 }
 
-static void buffered_flush(QEMUFileBuffered *s)
+typedef ssize_t (BufferPutBuf)(void *opaque, const void *data, size_t size);
+
+static void buffer_flush(QEMUBuffer *buf, QEMUFile *file,
+ void *opaque, BufferPutBuf *put_buf)
 {
 size_t offset = 0;
 int error;
 
-error = qemu_file_get_error(s-file);
+error = qemu_file_get_error(file);
 if (error != 0) {
 DPRINTF(flush when error, bailing: %s\n, strerror(-error));
 return;
 }
 
-DPRINTF(flushing %zu byte(s) of data\n, s-buffer_size);
+DPRINTF(flushing %zu byte(s) of data\n, buf-buffer_size);
 
-while (offset  s-buffer_size) {
+while (offset  buf-buffer_size) {
 ssize_t ret;
 
-ret = s-put_buffer(s-opaque, s-buffer + offset,
-s-buffer_size - offset);
-if (ret == -EAGAIN) {
+ret = put_buf(opaque, buf-buffer + offset, buf-buffer_size - offset);
+if (ret == -EINTR) {
+continue;
+} else if (ret == -EAGAIN) {
 DPRINTF(backend not ready, freezing\n);
-s-freeze_output = 1;
+buf-freeze_output = true;
 break;
 }
 
-if (ret = 0) {
+if (ret  0) {
 DPRINTF(error flushing data, %zd\n, ret);
-qemu_file_set_error(s-file, ret);
+qemu_file_set_error(file, ret);
+break;
+} else if (ret == 0) {
+DPRINTF(ret == 0\n);
 break;
 } else {
 DPRINTF(flushed %zd byte(s)\n, ret);
@@ -104,9 +100,44 @@ static void buffered_flush(QEMUFileBuffered *s)
 }
 }
 
-DPRINTF(flushed %zu of %zu byte(s)\n, offset, s-buffer_size);
-memmove(s-buffer, s-buffer + offset, s-buffer_size - offset);
-s-buffer_size -= offset;
+DPRINTF(flushed %zu of %zu byte(s)\n, offset, buf-buffer_size);
+buffer_consume(buf, offset);
+}
+
+
+/***
+ * Buffered File
+ */
+
+typedef struct QEMUFileBuffered
+{
+BufferedPutFunc *put_buffer;
+BufferedPutReadyFunc *put_ready;
+BufferedWaitForUnfreezeFunc *wait_for_unfreeze;
+BufferedCloseFunc *close;
+void *opaque;
+QEMUFile *file;
+size_t bytes_xfer;
+size_t xfer_limit;
+QEMUTimer *timer;
+QEMUBuffer buf;
+} QEMUFileBuffered;
+
+static ssize_t buffered_flush_putbuf(void *opaque,
+ const 

[PATCH v2 14/41] exec.c: export last_ram_offset()

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 exec-obsolete.h |1 +
 exec.c  |4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/exec-obsolete.h b/exec-obsolete.h
index 792c831..fb21dd7 100644
--- a/exec-obsolete.h
+++ b/exec-obsolete.h
@@ -25,6 +25,7 @@
 
 #ifndef CONFIG_USER_ONLY
 
+ram_addr_t qemu_last_ram_offset(void);
 ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
MemoryRegion *mr);
 ram_addr_t qemu_ram_alloc(ram_addr_t size, MemoryRegion *mr);
diff --git a/exec.c b/exec.c
index 7f44893..785 100644
--- a/exec.c
+++ b/exec.c
@@ -2576,7 +2576,7 @@ static ram_addr_t find_ram_offset(ram_addr_t size)
 return offset;
 }
 
-static ram_addr_t last_ram_offset(void)
+ram_addr_t qemu_last_ram_offset(void)
 {
 RAMBlock *block;
 ram_addr_t last = 0;
@@ -2672,7 +2672,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 QLIST_INSERT_HEAD(ram_list.blocks, new_block, next);
 
 ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
-   last_ram_offset()  TARGET_PAGE_BITS);
+qemu_last_ram_offset()  
TARGET_PAGE_BITS);
 memset(ram_list.phys_dirty + (new_block-offset  TARGET_PAGE_BITS),
0xff, size  TARGET_PAGE_BITS);
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 08/41] arch_init/ram_load: refactor ram_load

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   67 +-
 arch_init.h |1 +
 2 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index c861e30..bb0cd52 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -438,6 +438,41 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 return ram_load_host_from_stream_offset(f, offset, flags, block);
 }
 
+int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes)
+{
+/* Synchronize RAM block list */
+char id[256];
+ram_addr_t length;
+
+while (total_ram_bytes) {
+RAMBlock *block;
+uint8_t len;
+
+len = qemu_get_byte(f);
+qemu_get_buffer(f, (uint8_t *)id, len);
+id[len] = 0;
+length = qemu_get_be64(f);
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+if (!strncmp(id, block-idstr, sizeof(id))) {
+if (block-length != length)
+return -EINVAL;
+break;
+}
+}
+
+if (!block) {
+fprintf(stderr, Unknown ramblock \%s\, cannot 
+accept migration\n, id);
+return -EINVAL;
+}
+
+total_ram_bytes -= length;
+}
+
+return 0;
+}
+
 int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
 ram_addr_t addr;
@@ -456,35 +491,9 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
 
 if (flags  RAM_SAVE_FLAG_MEM_SIZE) {
 if (version_id == 4) {
-/* Synchronize RAM block list */
-char id[256];
-ram_addr_t length;
-ram_addr_t total_ram_bytes = addr;
-
-while (total_ram_bytes) {
-RAMBlock *block;
-uint8_t len;
-
-len = qemu_get_byte(f);
-qemu_get_buffer(f, (uint8_t *)id, len);
-id[len] = 0;
-length = qemu_get_be64(f);
-
-QLIST_FOREACH(block, ram_list.blocks, next) {
-if (!strncmp(id, block-idstr, sizeof(id))) {
-if (block-length != length)
-return -EINVAL;
-break;
-}
-}
-
-if (!block) {
-fprintf(stderr, Unknown ramblock \%s\, cannot 
-accept migration\n, id);
-return -EINVAL;
-}
-
-total_ram_bytes -= length;
+error = ram_load_mem_size(f, addr);
+if (error) {
+return error;
 }
 }
 }
diff --git a/arch_init.h b/arch_init.h
index 0a39082..507f110 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -45,6 +45,7 @@ void *ram_load_host_from_stream_offset(QEMUFile *f,
ram_addr_t offset,
int flags,
RAMBlock **last_blockp);
+int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes);
 #endif
 
 #endif
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 04/41] arch_init: refactor host_from_stream_offset()

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   25 ++---
 arch_init.h |7 +++
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 2a53f58..36ece1d 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -374,21 +374,22 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 return (stage == 2)  (expected_time = migrate_max_downtime());
 }
 
-static inline void *host_from_stream_offset(QEMUFile *f,
-ram_addr_t offset,
-int flags)
+void *ram_load_host_from_stream_offset(QEMUFile *f,
+   ram_addr_t offset,
+   int flags,
+   RAMBlock **last_blockp)
 {
-static RAMBlock *block = NULL;
+RAMBlock *block;
 char id[256];
 uint8_t len;
 
 if (flags  RAM_SAVE_FLAG_CONTINUE) {
-if (!block) {
+if (!(*last_blockp)) {
 fprintf(stderr, Ack, bad migration stream!\n);
 return NULL;
 }
 
-return memory_region_get_ram_ptr(block-mr) + offset;
+return memory_region_get_ram_ptr((*last_blockp)-mr) + offset;
 }
 
 len = qemu_get_byte(f);
@@ -396,14 +397,24 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 id[len] = 0;
 
 QLIST_FOREACH(block, ram_list.blocks, next) {
-if (!strncmp(id, block-idstr, sizeof(id)))
+if (!strncmp(id, block-idstr, sizeof(id))) {
+*last_blockp = block;
 return memory_region_get_ram_ptr(block-mr) + offset;
+}
 }
 
 fprintf(stderr, Can't find block %s!\n, id);
 return NULL;
 }
 
+static inline void *host_from_stream_offset(QEMUFile *f,
+ram_addr_t offset,
+int flags)
+{
+static RAMBlock *block = NULL;
+return ram_load_host_from_stream_offset(f, offset, flags, block);
+}
+
 int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
 ram_addr_t addr;
diff --git a/arch_init.h b/arch_init.h
index 456637d..d84eac7 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -39,4 +39,11 @@ int xen_available(void);
 
 #define RAM_SAVE_VERSION_ID 4 /* currently version 4 */
 
+#if defined(NEED_CPU_H)  !defined(CONFIG_USER_ONLY)
+void *ram_load_host_from_stream_offset(QEMUFile *f,
+   ram_addr_t offset,
+   int flags,
+   RAMBlock **last_blockp);
+#endif
+
 #endif
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 13/41] exec.c: factor out qemu_get_ram_ptr()

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 cpu-all.h |2 ++
 exec.c|   51 +--
 2 files changed, 31 insertions(+), 22 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 028528f..ff7f827 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -508,6 +508,8 @@ extern RAMList ram_list;
 extern const char *mem_path;
 extern int mem_prealloc;
 
+RAMBlock *qemu_get_ram_block(ram_addr_t adar);
+
 /* Flags stored in the low bits of the TLB virtual address.  These are
defined so that fast path ram access is all zeros.  */
 /* Zero if TLB entry is valid.  */
diff --git a/exec.c b/exec.c
index 078a408..7f44893 100644
--- a/exec.c
+++ b/exec.c
@@ -2799,15 +2799,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 }
 #endif /* !_WIN32 */
 
-/* Return a host pointer to ram allocated with qemu_ram_alloc.
-   With the exception of the softmmu code in this file, this should
-   only be used for local memory (e.g. video ram) that the device owns,
-   and knows it isn't going to access beyond the end of the block.
-
-   It should not be used for general purpose DMA.
-   Use cpu_physical_memory_map/cpu_physical_memory_rw instead.
- */
-void *qemu_get_ram_ptr(ram_addr_t addr)
+RAMBlock *qemu_get_ram_block(ram_addr_t addr)
 {
 RAMBlock *block;
 
@@ -2818,19 +2810,7 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 QLIST_REMOVE(block, next);
 QLIST_INSERT_HEAD(ram_list.blocks, block, next);
 }
-if (xen_enabled()) {
-/* We need to check if the requested address is in the RAM
- * because we don't want to map the entire memory in QEMU.
- * In that case just map until the end of the page.
- */
-if (block-offset == 0) {
-return xen_map_cache(addr, 0, 0);
-} else if (block-host == NULL) {
-block-host =
-xen_map_cache(block-offset, block-length, 1);
-}
-}
-return block-host + (addr - block-offset);
+return block;
 }
 }
 
@@ -2841,6 +2821,33 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 }
 
 /* Return a host pointer to ram allocated with qemu_ram_alloc.
+   With the exception of the softmmu code in this file, this should
+   only be used for local memory (e.g. video ram) that the device owns,
+   and knows it isn't going to access beyond the end of the block.
+
+   It should not be used for general purpose DMA.
+   Use cpu_physical_memory_map/cpu_physical_memory_rw instead.
+ */
+void *qemu_get_ram_ptr(ram_addr_t addr)
+{
+RAMBlock *block = qemu_get_ram_block(addr);
+
+if (xen_enabled()) {
+/* We need to check if the requested address is in the RAM
+ * because we don't want to map the entire memory in QEMU.
+ * In that case just map until the end of the page.
+ */
+if (block-offset == 0) {
+return xen_map_cache(addr, 0, 0);
+} else if (block-host == NULL) {
+block-host =
+xen_map_cache(block-offset, block-length, 1);
+}
+}
+return block-host + (addr - block-offset);
+}
+
+/* Return a host pointer to ram allocated with qemu_ram_alloc.
  * Same as qemu_get_ram_ptr but avoid reordering ramblocks.
  */
 void *qemu_safe_ram_ptr(ram_addr_t addr)
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 11/41] arch_init: factor out counting transferred bytes

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   24 
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 73bf250..2617478 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -155,8 +155,9 @@ static int is_dup_page(uint8_t *page)
 }
 
 static RAMBlock *last_block_sent = NULL;
+static uint64_t bytes_transferred;
 
-int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
+static int ram_save_page_int(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
 {
 MemoryRegion *mr = block-mr;
 uint8_t *p;
@@ -192,6 +193,13 @@ int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t 
offset)
 return TARGET_PAGE_SIZE;
 }
 
+int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
+{
+int bytes_sent = ram_save_page_int(f, block, offset);
+bytes_transferred += bytes_sent;
+return bytes_sent;
+}
+
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
@@ -228,8 +236,6 @@ int ram_save_block(QEMUFile *f)
 return bytes_sent;
 }
 
-static uint64_t bytes_transferred;
-
 static ram_addr_t ram_save_remaining(void)
 {
 RAMBlock *block;
@@ -357,11 +363,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 bwidth = qemu_get_clock_ns(rt_clock);
 
 while ((ret = qemu_file_rate_limit(f)) == 0) {
-int bytes_sent;
-
-bytes_sent = ram_save_block(f);
-bytes_transferred += bytes_sent;
-if (bytes_sent == 0) { /* no more blocks */
+if (ram_save_block(f) == 0) { /* no more blocks */
 break;
 }
 }
@@ -381,11 +383,9 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 
 /* try transferring iterative blocks of memory */
 if (stage == 3) {
-int bytes_sent;
-
 /* flush all remaining blocks regardless of rate limiting */
-while ((bytes_sent = ram_save_block(f)) != 0) {
-bytes_transferred += bytes_sent;
+while (ram_save_block(f) != 0) {
+/* nothing */
 }
 memory_global_dirty_log_stop();
 }
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 16/41] savevm: qemu_pending_size() to return pending buffered size

2012-06-04 Thread Isaku Yamahata
This will be used later by postcopy migration.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 qemu-file.h |1 +
 savevm.c|5 +
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/qemu-file.h b/qemu-file.h
index a285bef..880ef4b 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -91,6 +91,7 @@ int qemu_get_byte(QEMUFile *f);
 int qemu_peek_byte(QEMUFile *f, int offset);
 int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
 void qemu_file_skip(QEMUFile *f, int size);
+int qemu_pending_size(const QEMUFile *f);
 
 static inline unsigned int qemu_get_ubyte(QEMUFile *f)
 {
diff --git a/savevm.c b/savevm.c
index 8ad843f..2992f97 100644
--- a/savevm.c
+++ b/savevm.c
@@ -595,6 +595,11 @@ void qemu_file_skip(QEMUFile *f, int size)
 }
 }
 
+int qemu_pending_size(const QEMUFile *f)
+{
+return f-buf_size - f-buf_index;
+}
+
 int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
 {
 int pending;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 05/41] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   21 ++---
 migration.h |1 +
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 36ece1d..28e5abb 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -287,6 +287,19 @@ void sort_ram_list(void)
 g_free(blocks);
 }
 
+void ram_save_live_mem_size(QEMUFile *f)
+{
+RAMBlock *block;
+
+qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+qemu_put_byte(f, strlen(block-idstr));
+qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr));
+qemu_put_be64(f, block-length);
+}
+}
+
 int ram_save_live(QEMUFile *f, int stage, void *opaque)
 {
 ram_addr_t addr;
@@ -321,13 +334,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 
 memory_global_dirty_log_start();
 
-qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
-
-QLIST_FOREACH(block, ram_list.blocks, next) {
-qemu_put_byte(f, strlen(block-idstr));
-qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr));
-qemu_put_be64(f, block-length);
-}
+ram_save_live_mem_size(f);
 }
 
 bytes_transferred_last = bytes_transferred;
diff --git a/migration.h b/migration.h
index 8b9509c..e2e9b43 100644
--- a/migration.h
+++ b/migration.h
@@ -78,6 +78,7 @@ uint64_t ram_bytes_total(void);
 
 void sort_ram_list(void);
 int ram_save_block(QEMUFile *f);
+void ram_save_live_mem_size(QEMUFile *f);
 int ram_save_live(QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
 
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 07/41] arch_init/ram_save_live: factor out ram_save_limit

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   28 
 migration.h |1 +
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 900cc8e..c861e30 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -311,9 +311,23 @@ void ram_save_live_mem_size(QEMUFile *f)
 }
 }
 
+void ram_save_memory_set_dirty(void)
+{
+RAMBlock *block;
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+ram_addr_t addr;
+for (addr = 0; addr  block-length; addr += TARGET_PAGE_SIZE) {
+if (!memory_region_get_dirty(block-mr, addr, TARGET_PAGE_SIZE,
+ DIRTY_MEMORY_MIGRATION)) {
+memory_region_set_dirty(block-mr, addr, TARGET_PAGE_SIZE);
+}
+}
+}
+}
+
 int ram_save_live(QEMUFile *f, int stage, void *opaque)
 {
-ram_addr_t addr;
 uint64_t bytes_transferred_last;
 double bwidth = 0;
 uint64_t expected_time = 0;
@@ -327,7 +341,6 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 memory_global_sync_dirty_bitmap(get_system_memory());
 
 if (stage == 1) {
-RAMBlock *block;
 bytes_transferred = 0;
 last_block_sent = NULL;
 last_block = NULL;
@@ -335,17 +348,8 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 sort_ram_list();
 
 /* Make sure all dirty bits are set */
-QLIST_FOREACH(block, ram_list.blocks, next) {
-for (addr = 0; addr  block-length; addr += TARGET_PAGE_SIZE) {
-if (!memory_region_get_dirty(block-mr, addr, TARGET_PAGE_SIZE,
- DIRTY_MEMORY_MIGRATION)) {
-memory_region_set_dirty(block-mr, addr, TARGET_PAGE_SIZE);
-}
-}
-}
-
+ram_save_memory_set_dirty();
 memory_global_dirty_log_start();
-
 ram_save_live_mem_size(f);
 }
 
diff --git a/migration.h b/migration.h
index e2e9b43..6cf4512 100644
--- a/migration.h
+++ b/migration.h
@@ -78,6 +78,7 @@ uint64_t ram_bytes_total(void);
 
 void sort_ram_list(void);
 int ram_save_block(QEMUFile *f);
+void ram_save_memory_set_dirty(void);
 void ram_save_live_mem_size(QEMUFile *f);
 int ram_save_live(QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 09/41] arch_init: introduce helper function to find ram block with id string

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |   13 +
 arch_init.h |1 +
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index bb0cd52..9981abe 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -397,6 +397,19 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 return (stage == 2)  (expected_time = migrate_max_downtime());
 }
 
+RAMBlock *ram_find_block(const char *id, uint8_t len)
+{
+RAMBlock *block;
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+if (!strncmp(id, block-idstr, len)) {
+return block;
+}
+}
+
+return NULL;
+}
+
 void *ram_load_host_from_stream_offset(QEMUFile *f,
ram_addr_t offset,
int flags,
diff --git a/arch_init.h b/arch_init.h
index 507f110..7f5c77a 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -41,6 +41,7 @@ int xen_available(void);
 
 #if defined(NEED_CPU_H)  !defined(CONFIG_USER_ONLY)
 int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
+RAMBlock *ram_find_block(const char *id, uint8_t len);
 void *ram_load_host_from_stream_offset(QEMUFile *f,
ram_addr_t offset,
int flags,
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 03/41] arch_init/ram_save: introduce constant for ram save version = 4

2012-06-04 Thread Isaku Yamahata
Introduce RAM_SAVE_VERSION_ID to represent version_id for ram save format.

Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp
---
 arch_init.c |2 +-
 arch_init.h |2 ++
 vl.c|4 ++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index bd4e61e..2a53f58 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -410,7 +410,7 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
 int flags;
 int error;
 
-if (version_id  4 || version_id  4) {
+if (version_id  4 || version_id  RAM_SAVE_VERSION_ID) {
 return -EINVAL;
 }
 
diff --git a/arch_init.h b/arch_init.h
index 7cc3fa7..456637d 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -37,4 +37,6 @@ int xen_available(void);
 #define RAM_SAVE_FLAG_EOS  0x10
 #define RAM_SAVE_FLAG_CONTINUE 0x20
 
+#define RAM_SAVE_VERSION_ID 4 /* currently version 4 */
+
 #endif
diff --git a/vl.c b/vl.c
index 23ab3a3..62dc343 100644
--- a/vl.c
+++ b/vl.c
@@ -3436,8 +3436,8 @@ int main(int argc, char **argv, char **envp)
 default_drive(default_sdcard, snapshot, machine-use_scsi,
   IF_SD, 0, SD_OPTS);
 
-register_savevm_live(NULL, ram, 0, 4, NULL, ram_save_live, NULL,
- ram_load, NULL);
+register_savevm_live(NULL, ram, 0, RAM_SAVE_VERSION_ID, NULL,
+ ram_save_live, NULL, ram_load, NULL);
 
 if (nb_numa_nodes  0) {
 int i;
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 00/41] postcopy live migration

2012-06-04 Thread Isaku Yamahata
After the long time, we have v2. This is qemu part.
The linux kernel part is sent separatedly.

Changes v1 - v2:
- split up patches for review
- buffered file refactored
- many bug fixes
  Espcially PV drivers can work with postcopy
- optimization/heuristic

Patches
1 - 30: refactoring exsiting code and preparation
31 - 37: implement postcopy itself (essential part)
38 - 41: some optimization/heuristic for postcopy

Intro
=
This patch series implements postcopy live migration.[1]
As discussed at KVM forum 2011, dedicated character device is used for
distributed shared memory between migration source and destination.
Now we can discuss/benchmark/compare with precopy. I believe there are
much rooms for improvement.

[1] http://wiki.qemu.org/Features/PostCopyLiveMigration


Usage
=
You need load umem character device on the host before starting migration.
Postcopy can be used for tcg and kvm accelarator. The implementation depend
on only linux umem character device. But the driver dependent code is split
into a file.
I tested only host page size == guest page size case, but the implementation
allows host page size != guest page size case.

The following options are added with this patch series.
- incoming part
  command line options
  -postcopy [-postcopy-flags flags]
  where flags is for changing behavior for benchmark/debugging
  Currently the following flags are available
  0: default
  1: enable touching page request

  example:
  qemu -postcopy -incoming tcp:0: -monitor stdio -machine accel=kvm

- outging part
  options for migrate command 
  migrate [-p [-n] [-m]] URI [prefault forward [prefault backword]]
  -p: indicate postcopy migration
  -n: disable background transferring pages: This is for benchmark/debugging
  -m: move background transfer of postcopy mode
  prefault forward: The number of forward pages which is sent with on-demand
  prefault backward: The number of backward pages which is sent with
   on-demand

  example:
  migrate -p -n tcp:dest ip address: 
  migrate -p -n -m tcp:dest ip address: 32 0


TODO

- benchmark/evaluation. Especially how async page fault affects the result.
- improve/optimization
  At the moment at least what I'm aware of is
  - making incoming socket non-blocking with thread
As page compression is comming, it is impractical to non-blocking read
and check if the necessary data is read.
  - touching pages in incoming qemu process by fd handler seems suboptimal.
creating dedicated thread?
  - outgoing handler seems suboptimal causing latency.
- consider on FUSE/CUSE possibility
- don't fork umemd, but create thread?

basic postcopy work flow

qemu on the destination
  |
  V
open(/dev/umem)
  |
  V
UMEM_INIT
  |
  V
Here we have two file descriptors to
umem device and shmem file
  |
  |  umemd
  |  daemon on the destination
  |
  Vcreate pipe to communicate
fork()---,
  |  |
  V  |
close(socket)V
close(shmem)  mmap(shmem file)
  |  |
  V  V
mmap(umem device) for guest RAM   close(shmem file)
  |  |
close(umem device)   |
  |  |
  V  |
wait for ready from daemon pipe-send ready message
  |  |
  | Here the daemon takes over 
send okpipe--- the owner of the socket
  | to the source  
  V  |
entering post copy stage |
start guest execution|
  |  |
  V  V
access guest RAM  read() to get faulted pages
  |  |
  V  V
page fault --page offset is returned
block|
 V
  pull page from the source
  write the page contents

[PATCH v2 06/41] arch_init: refactor ram_save_block()

2012-06-04 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp

---
Chnages v1 - v2:
- don't refer last_block which can be NULL.
  And avoid possible infinite loop.
---
 arch_init.c |   82 +-
 arch_init.h |1 +
 2 files changed, 48 insertions(+), 35 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 28e5abb..900cc8e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -154,6 +154,44 @@ static int is_dup_page(uint8_t *page)
 return 1;
 }
 
+static RAMBlock *last_block_sent = NULL;
+
+int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
+{
+MemoryRegion *mr = block-mr;
+uint8_t *p;
+int cont;
+
+if (!memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
+ DIRTY_MEMORY_MIGRATION)) {
+return 0;
+}
+memory_region_reset_dirty(mr, offset, TARGET_PAGE_SIZE,
+  DIRTY_MEMORY_MIGRATION);
+
+cont = (block == last_block_sent) ? RAM_SAVE_FLAG_CONTINUE : 0;
+p = memory_region_get_ram_ptr(mr) + offset;
+last_block_sent = block;
+
+if (is_dup_page(p)) {
+qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS);
+if (!cont) {
+qemu_put_byte(f, strlen(block-idstr));
+qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr));
+}
+qemu_put_byte(f, *p);
+return 1;
+}
+
+qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE);
+if (!cont) {
+qemu_put_byte(f, strlen(block-idstr));
+qemu_put_buffer(f, (uint8_t *)block-idstr, strlen(block-idstr));
+}
+qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
+return TARGET_PAGE_SIZE;
+}
+
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
@@ -162,45 +200,14 @@ int ram_save_block(QEMUFile *f)
 RAMBlock *block = last_block;
 ram_addr_t offset = last_offset;
 int bytes_sent = 0;
-MemoryRegion *mr;
 
-if (!block)
+if (!block) {
 block = QLIST_FIRST(ram_list.blocks);
+last_block = block;
+}
 
 do {
-mr = block-mr;
-if (memory_region_get_dirty(mr, offset, TARGET_PAGE_SIZE,
-DIRTY_MEMORY_MIGRATION)) {
-uint8_t *p;
-int cont = (block == last_block) ? RAM_SAVE_FLAG_CONTINUE : 0;
-
-memory_region_reset_dirty(mr, offset, TARGET_PAGE_SIZE,
-  DIRTY_MEMORY_MIGRATION);
-
-p = memory_region_get_ram_ptr(mr) + offset;
-
-if (is_dup_page(p)) {
-qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS);
-if (!cont) {
-qemu_put_byte(f, strlen(block-idstr));
-qemu_put_buffer(f, (uint8_t *)block-idstr,
-strlen(block-idstr));
-}
-qemu_put_byte(f, *p);
-bytes_sent = 1;
-} else {
-qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE);
-if (!cont) {
-qemu_put_byte(f, strlen(block-idstr));
-qemu_put_buffer(f, (uint8_t *)block-idstr,
-strlen(block-idstr));
-}
-qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
-bytes_sent = TARGET_PAGE_SIZE;
-}
-
-break;
-}
+bytes_sent = ram_save_page(f, block, offset);
 
 offset += TARGET_PAGE_SIZE;
 if (offset = block-length) {
@@ -209,6 +216,10 @@ int ram_save_block(QEMUFile *f)
 if (!block)
 block = QLIST_FIRST(ram_list.blocks);
 }
+
+if (bytes_sent  0) {
+break;
+}
 } while (block != last_block || offset != last_offset);
 
 last_block = block;
@@ -318,6 +329,7 @@ int ram_save_live(QEMUFile *f, int stage, void *opaque)
 if (stage == 1) {
 RAMBlock *block;
 bytes_transferred = 0;
+last_block_sent = NULL;
 last_block = NULL;
 last_offset = 0;
 sort_ram_list();
diff --git a/arch_init.h b/arch_init.h
index d84eac7..0a39082 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -40,6 +40,7 @@ int xen_available(void);
 #define RAM_SAVE_VERSION_ID 4 /* currently version 4 */
 
 #if defined(NEED_CPU_H)  !defined(CONFIG_USER_ONLY)
+int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 void *ram_load_host_from_stream_offset(QEMUFile *f,
ram_addr_t offset,
int flags,
-- 
1.7.1.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] virtio_blk: unlock vblk-lock during kick

2012-06-04 Thread Michael S. Tsirkin
On Fri, Jun 01, 2012 at 10:13:06AM +0100, Stefan Hajnoczi wrote:
 Holding the vblk-lock across kick causes poor scalability in SMP
 guests.  If one CPU is doing virtqueue kick and another CPU touches the
 vblk-lock it will have to spin until virtqueue kick completes.
 
 This patch reduces system% CPU utilization in SMP guests that are
 running multithreaded I/O-bound workloads.  The improvements are small
 but show as iops and SMP are increased.
 
 Khoa Huynh k...@us.ibm.com provided initial performance data that
 indicates this optimization is worthwhile at high iops.
 
 Asias He as...@redhat.com reports the following fio results:
 
 Host: Linux 3.4.0+ #302 SMP x86_64 GNU/Linux
 Guest: same as host kernel
 
 Average 3 runs:
 with locked kick
 readiops=119907.50 bw=59954.00 runt=35018.50 io=2048.00
 write   iops=217187.00 bw=108594.00 runt=19312.00 io=2048.00
 readiops=33948.00 bw=16974.50 runt=186820.50 io=3095.70
 write   iops=35014.00 bw=17507.50 runt=181151.00 io=3095.70
 clat (usec) max=3484.10 avg=121085.38 stdev=174416.11 min=0.00
 clat (usec) max=3438.30 avg=59863.35 stdev=116607.69 min=0.00
 clat (usec) max=3745.65 avg=454501.30 stdev=332699.00 min=0.00
 clat (usec) max=4089.75 avg=442374.99 stdev=304874.62 min=0.00
 cpu sys=615.12 majf=24080.50 ctx=64253616.50 usr=68.08 minf=17907363.00
 cpu sys=1235.95 majf=23389.00 ctx=59788148.00 usr=98.34 minf=20020008.50
 cpu sys=764.96 majf=28414.00 ctx=848279274.00 usr=36.39 minf=19737254.00
 cpu sys=714.13 majf=21853.50 ctx=854608972.00 usr=33.56 minf=18256760.50
 
 with unlocked kick
 readiops=118559.00 bw=59279.66 runt=35400.66 io=2048.00
 write   iops=227560.00 bw=113780.33 runt=18440.00 io=2048.00
 readiops=34567.66 bw=17284.00 runt=183497.33 io=3095.70
 write   iops=34589.33 bw=17295.00 runt=183355.00 io=3095.70
 clat (usec) max=3485.56 avg=121989.58 stdev=197355.15 min=0.00
 clat (usec) max=3222.33 avg=57784.11 stdev=141002.89 min=0.00
 clat (usec) max=4060.93 avg=447098.65 stdev=315734.33 min=0.00
 clat (usec) max=3656.30 avg=447281.70 stdev=314051.33 min=0.00
 cpu sys=683.78 majf=24501.33 ctx=64435364.66 usr=68.91 minf=17907893.33
 cpu sys=1218.24 majf=25000.33 ctx=60451475.00 usr=101.04 minf=19757720.00
 cpu sys=740.39 majf=24809.00 ctx=845290443.66 usr=37.25 minf=19349958.33
 cpu sys=723.63 majf=27597.33 ctx=850199927.33 usr=35.35 minf=19092343.00
 
 FIO config file:
 
 [global]
 exec_prerun=echo 3  /proc/sys/vm/drop_caches
 group_reporting
 norandommap
 ioscheduler=noop
 thread
 bs=512
 size=4MB
 direct=1
 filename=/dev/vdb
 numjobs=256
 ioengine=aio
 iodepth=64
 loops=3
 
 Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 ---
 Other block drivers (cciss, rbd, nbd) use spin_unlock_irq() so I followed 
 that.
 To me this seems wrong: blk_run_queue() uses spin_lock_irqsave() but we enable
 irqs with spin_unlock_irq().  If the caller of blk_run_queue() had irqs
 disabled and we enable them again this could be a problem, right?  Can someone
 more familiar with kernel locking comment?
 
  drivers/block/virtio_blk.c |   10 --
  1 file changed, 8 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
 index 774c31d..d674977 100644
 --- a/drivers/block/virtio_blk.c
 +++ b/drivers/block/virtio_blk.c
 @@ -199,8 +199,14 @@ static void do_virtblk_request(struct request_queue *q)
   issued++;
   }
  
 - if (issued)
 - virtqueue_kick(vblk-vq);
 + if (!issued)
 + return;
 +
 + if (virtqueue_kick_prepare(vblk-vq)) {
 + spin_unlock_irq(vblk-disk-queue-queue_lock);
 + virtqueue_notify(vblk-vq);

If blk_done runs and completes the request at this point,
can hot unplug then remove the queue?
If yes will we get a use after free?

 + spin_lock_irq(vblk-disk-queue-queue_lock);
 + }
  }
  
  /* return id (s/n) string for *disk to *id_str
 -- 
 1.7.10
 
 ___
 Virtualization mailing list
 virtualizat...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/virtualization
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] virtio_blk: unlock vblk-lock during kick

2012-06-04 Thread Michael S. Tsirkin
On Fri, Jun 01, 2012 at 10:13:06AM +0100, Stefan Hajnoczi wrote:
 Other block drivers (cciss, rbd, nbd) use spin_unlock_irq() so I followed 
 that.
 To me this seems wrong: blk_run_queue() uses spin_lock_irqsave() but we enable
 irqs with spin_unlock_irq().  If the caller of blk_run_queue() had irqs
 disabled and we enable them again this could be a problem, right?  Can someone
 more familiar with kernel locking comment?

Why take the risk?  What's the advantage of enabling them here? VCPU is
not running while the hypervisor is processing the notification anyway.
And the next line returns from the function so the interrupts will get
enabled.

  drivers/block/virtio_blk.c |   10 --
  1 file changed, 8 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
 index 774c31d..d674977 100644
 --- a/drivers/block/virtio_blk.c
 +++ b/drivers/block/virtio_blk.c
 @@ -199,8 +199,14 @@ static void do_virtblk_request(struct request_queue *q)
   issued++;
   }
  
 - if (issued)
 - virtqueue_kick(vblk-vq);
 + if (!issued)
 + return;
 +
 + if (virtqueue_kick_prepare(vblk-vq)) {
 + spin_unlock_irq(vblk-disk-queue-queue_lock);
 + virtqueue_notify(vblk-vq);
 + spin_lock_irq(vblk-disk-queue-queue_lock);
 + }
  }
  
  /* return id (s/n) string for *disk to *id_str
 -- 
 1.7.10
 
 ___
 Virtualization mailing list
 virtualizat...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/virtualization
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use IRQF_ONESHOT for assigned device MSI interrupts

2012-06-04 Thread Thomas Gleixner
On Sun, 3 Jun 2012, Avi Kivity wrote:

 On 06/01/2012 09:26 PM, Jan Kiszka wrote:
  
  you suggesting we need a request_edge_threaded_only_irq() API?  Thanks,
  
  I'm just wondering if that restriction for threaded IRQs is really
  necessary for all use cases we have. Threaded MSIs do not appear to me
  like have to be handled that conservatively, but maybe I'm missing some
  detail.
  
 
 btw, I'm hoping we can unthread assigned MSIs.  If the delivery is
 unicast, we can precalculate everything and all the handler has to do is
 set the IRR, KVM_REQ_EVENT, and kick the vcpu.  All of these can be done
 from interrupt context with just RCU locking.

There is really no need to run MSI/MSI-X interrupts threaded for
KVM. I'm running the patch below for quite some time and it works like
a charm.

Thanks,

tglx

Index: linux-2.6/virt/kvm/assigned-dev.c
===
--- linux-2.6.orig/virt/kvm/assigned-dev.c
+++ linux-2.6/virt/kvm/assigned-dev.c
@@ -105,7 +105,7 @@ static irqreturn_t kvm_assigned_dev_thre
 }
 
 #ifdef __KVM_HAVE_MSI
-static irqreturn_t kvm_assigned_dev_thread_msi(int irq, void *dev_id)
+static irqreturn_t kvm_assigned_dev_msi_handler(int irq, void *dev_id)
 {
struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
 
@@ -117,7 +117,7 @@ static irqreturn_t kvm_assigned_dev_thre
 #endif
 
 #ifdef __KVM_HAVE_MSIX
-static irqreturn_t kvm_assigned_dev_thread_msix(int irq, void *dev_id)
+static irqreturn_t kvm_assigned_dev_msix_handler(int irq, void *dev_id)
 {
struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
int index = find_index_from_host_irq(assigned_dev, irq);
@@ -346,9 +346,8 @@ static int assigned_device_enable_host_m
}
 
dev-host_irq = dev-dev-irq;
-   if (request_threaded_irq(dev-host_irq, NULL,
-kvm_assigned_dev_thread_msi, 0,
-dev-irq_name, dev)) {
+   if (request_irq(dev-host_irq, kvm_assigned_dev_msi_handler, 0,
+   dev-irq_name, dev)) {
pci_disable_msi(dev-dev);
return -EIO;
}
@@ -373,9 +372,9 @@ static int assigned_device_enable_host_m
return r;
 
for (i = 0; i  dev-entries_nr; i++) {
-   r = request_threaded_irq(dev-host_msix_entries[i].vector,
-NULL, kvm_assigned_dev_thread_msix,
-0, dev-irq_name, dev);
+   r = request_irq(dev-host_msix_entries[i].vector,
+   kvm_assigned_dev_msix_handler, 0,
+   dev-irq_name, dev);
if (r)
goto err;
}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use IRQF_ONESHOT for assigned device MSI interrupts

2012-06-04 Thread Jan Kiszka
On 2012-06-04 13:21, Thomas Gleixner wrote:
 On Sun, 3 Jun 2012, Avi Kivity wrote:
 
 On 06/01/2012 09:26 PM, Jan Kiszka wrote:

 you suggesting we need a request_edge_threaded_only_irq() API?  Thanks,

 I'm just wondering if that restriction for threaded IRQs is really
 necessary for all use cases we have. Threaded MSIs do not appear to me
 like have to be handled that conservatively, but maybe I'm missing some
 detail.


 btw, I'm hoping we can unthread assigned MSIs.  If the delivery is
 unicast, we can precalculate everything and all the handler has to do is
 set the IRR, KVM_REQ_EVENT, and kick the vcpu.  All of these can be done
 from interrupt context with just RCU locking.
 
 There is really no need to run MSI/MSI-X interrupts threaded for
 KVM. I'm running the patch below for quite some time and it works like
 a charm.
 
 Thanks,
 
   tglx
 
 Index: linux-2.6/virt/kvm/assigned-dev.c
 ===
 --- linux-2.6.orig/virt/kvm/assigned-dev.c
 +++ linux-2.6/virt/kvm/assigned-dev.c
 @@ -105,7 +105,7 @@ static irqreturn_t kvm_assigned_dev_thre
  }
  
  #ifdef __KVM_HAVE_MSI
 -static irqreturn_t kvm_assigned_dev_thread_msi(int irq, void *dev_id)
 +static irqreturn_t kvm_assigned_dev_msi_handler(int irq, void *dev_id)
  {
   struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
  
 @@ -117,7 +117,7 @@ static irqreturn_t kvm_assigned_dev_thre
  #endif
  
  #ifdef __KVM_HAVE_MSIX
 -static irqreturn_t kvm_assigned_dev_thread_msix(int irq, void *dev_id)
 +static irqreturn_t kvm_assigned_dev_msix_handler(int irq, void *dev_id)
  {
   struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
   int index = find_index_from_host_irq(assigned_dev, irq);
 @@ -346,9 +346,8 @@ static int assigned_device_enable_host_m
   }
  
   dev-host_irq = dev-dev-irq;
 - if (request_threaded_irq(dev-host_irq, NULL,
 -  kvm_assigned_dev_thread_msi, 0,
 -  dev-irq_name, dev)) {
 + if (request_irq(dev-host_irq, kvm_assigned_dev_msi_handler, 0,
 + dev-irq_name, dev)) {
   pci_disable_msi(dev-dev);
   return -EIO;
   }
 @@ -373,9 +372,9 @@ static int assigned_device_enable_host_m
   return r;
  
   for (i = 0; i  dev-entries_nr; i++) {
 - r = request_threaded_irq(dev-host_msix_entries[i].vector,
 -  NULL, kvm_assigned_dev_thread_msix,
 -  0, dev-irq_name, dev);
 + r = request_irq(dev-host_msix_entries[i].vector,
 + kvm_assigned_dev_msix_handler, 0,
 + dev-irq_name, dev);
   if (r)
   goto err;
   }

This may work in practice but has two conceptual problems:
 - we do not want to run a potential broadcast to all VCPUs to run in
   a host IRQ handler
 - crazy user space could have configured the route to end up in the
   PIC or IOAPIC, and both are not hard-IRQ safe (this should probably
   be caught on setup)

So this shortcut requires some checks before being applied to a specific
MSI/MSI-X vector.


Taking KVM aside, my general question remains if threaded MSI handlers
of all devices really need to apply IRQF_ONESHOT though they should have
no use for it.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm segfaults and bad page state in 3.4.0

2012-06-04 Thread Fengguang Wu
Hi,

I'm running lots of kvm instances for doing kernel boot tests.
Unfortunately the test system itself is not stable enough, I got scary
errors in both kvm and the host kernel. Like this. 

[294025.795382] kvm used greatest stack depth: 2896 bytes left
[310388.622083] kvm[1864]: segfault at c ip 7f498e9f6a81 sp 
7f4994b9fca0 error 4 in kvm[7f498e96+33b000]
[310692.050589] kvm[4332]: segfault at 10 ip 7fca662620b9 sp 
7fca70472af0 error 6 in kvm[7fca661cc000+33b000]
[312608.950120] kvm[18931]: segfault at 8 ip 7f95962a10a5 sp 
7f959d777170 error 4 in kvm[7f959620b000+33b000]
[312622.941640] kvm[19123]: segfault at 10 ip 7f406f5580b9 sp 
7f4077d8b350 error 6 in kvm[7f406f4c2000+33b000]
[313917.860951] kvm[28789]: segfault at c ip 7f718f4dfa81 sp 
7f7198459520 error 4 in kvm[7f718f449000+33b000]
[313919.177192] kvm used greatest stack depth: 2864 bytes left
[314061.390945] kvm used greatest stack depth: 2208 bytes left
[327479.676068] BUG: Bad page state in process kvm  pfn:59ac9
[327479.676455] page:ea000166b240 count:0 mapcount:0 mapping:  
(null) index:0x7fd346bc6
[327479.677083] page flags: 0x114(referenced|dirty)
[327479.677575] Modules linked in:
[327479.677897] Pid: 11423, comm: kvm Not tainted 3.4.0 #131
[327479.678272] Call Trace:
[327479.678538]  [81107343] bad_page+0xe6/0xfb
[327479.678897]  [811079c6] get_page_from_freelist+0x534/0x6f6
[327479.679314]  [81107d92] __alloc_pages_nodemask+0x20a/0x75e
[327479.679729]  [8108e121] ? finish_task_switch+0x4c/0xf6
[327479.680136]  [81143477] ? lookup_page_cgroup_used+0xe/0x24
[327479.680548]  [811079b5] ? get_page_from_freelist+0x523/0x6f6
[327479.680970]  [811367c8] alloc_pages_current+0xd2/0xf3
[327479.681369]  [811012e4] __page_cache_alloc+0xa1/0xae
[327479.681761]  [8110b144] __do_page_cache_readahead+0x107/0x20b
[327479.682188]  [8110b0cc] ? __do_page_cache_readahead+0x8f/0x20b
[327479.682615]  [811293b0] ? anon_vma_prepare+0xb4/0x137
[327479.683010]  [8110b521] ra_submit+0x21/0x25
[327479.683375]  [81102f7a] filemap_fault+0x18a/0x383
[327479.683757]  [8111d6b3] __do_fault+0xc8/0x451
[327479.684128]  [81120103] handle_pte_fault+0x2de/0x844
[327479.684522]  [8114446e] ? mem_cgroup_count_vm_event+0x1a/0x96
[327479.684944]  [811218ac] handle_mm_fault+0x1a6/0x1bb
[327479.685339]  [819b8c12] do_page_fault+0x405/0x42a
[327479.685722]  [8112619e] ? do_mmap_pgoff+0x299/0x2f3
[327479.686115]  [813fe03d] ? trace_hardirqs_off_thunk+0x3a/0x3c
[327479.686534]  [819b5b45] page_fault+0x25/0x30
[327479.686898] Disabling lock debugging due to kernel taint

The same host kernel, in another test box:

[770644.256817] kvm_get_msr_common: 2123 callbacks suppressed
[770644.257475] kvm: 31889: cpu0 unhandled rdmsr: 0x2
[770644.258103] kvm: 31889: cpu0 unhandled rdmsr: 0x3
[770644.258707] kvm: 31889: cpu0 unhandled rdmsr: 0x4
[770644.259322] kvm: 31889: cpu0 unhandled rdmsr: 0x5
[770644.259914] kvm: 31889: cpu0 unhandled rdmsr: 0x6
[770644.260499] kvm: 31889: cpu0 unhandled rdmsr: 0x7
[770644.261108] kvm: 31889: cpu0 unhandled rdmsr: 0x8
[770644.261700] kvm: 31889: cpu0 unhandled rdmsr: 0x9
[770644.262302] kvm: 31889: cpu0 unhandled rdmsr: 0xa
[770644.262883] kvm: 31889: cpu0 unhandled rdmsr: 0xb
[909290.636655] kvm[31619]: segfault at 40 ip 7fcb3d8c4254 sp 
7fcb41bcaec0 error 4 in kvm[7fcb3d82e000+33b000]

Please drop me hints if I can help debugging it (a week later, after
returning from LinuxCon Japan), thank you.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: do not iterate over all VMs in mmu_shrink()

2012-06-04 Thread Gleb Natapov
mmu_shrink() needlessly iterates over all VMs even though it will not
attempt to free mmu pages from more than one on them. Fix that and also
check used mmu pages count outside of VM lock to skip inactive VMs faster.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c2fef8e..d1d477a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3945,7 +3945,6 @@ static void kvm_mmu_remove_some_alloc_mmu_pages(struct 
kvm *kvm,
 static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
 {
struct kvm *kvm;
-   struct kvm *kvm_freed = NULL;
int nr_to_scan = sc-nr_to_scan;
 
if (nr_to_scan == 0)
@@ -3957,22 +3956,30 @@ static int mmu_shrink(struct shrinker *shrink, struct 
shrink_control *sc)
int idx;
LIST_HEAD(invalid_list);
 
+   /*
+* n_used_mmu_pages is accessed without holding kvm-mmu_lock
+* here. We may skip a VM instance errorneosly, but we do not
+* want to shrink a VM that only started to populate its MMU
+* anyway.
+*/
+   if (kvm-arch.n_used_mmu_pages  0) {
+   if (!nr_to_scan--)
+   break;
+   continue;
+   }
+
idx = srcu_read_lock(kvm-srcu);
spin_lock(kvm-mmu_lock);
-   if (!kvm_freed  nr_to_scan  0 
-   kvm-arch.n_used_mmu_pages  0) {
-   kvm_mmu_remove_some_alloc_mmu_pages(kvm,
-   invalid_list);
-   kvm_freed = kvm;
-   }
-   nr_to_scan--;
 
+   kvm_mmu_remove_some_alloc_mmu_pages(kvm, invalid_list);
kvm_mmu_commit_zap_page(kvm, invalid_list);
+
spin_unlock(kvm-mmu_lock);
srcu_read_unlock(kvm-srcu, idx);
+
+   list_move_tail(kvm-vm_list, vm_list);
+   break;
}
-   if (kvm_freed)
-   list_move_tail(kvm_freed-vm_list, vm_list);
 
raw_spin_unlock(kvm_lock);
 
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm segfaults and bad page state in 3.4.0

2012-06-04 Thread Gleb Natapov
On Mon, Jun 04, 2012 at 07:46:03PM +0800, Fengguang Wu wrote:
 Hi,
 
 I'm running lots of kvm instances for doing kernel boot tests.
 Unfortunately the test system itself is not stable enough, I got scary
 errors in both kvm and the host kernel. Like this. 
 
What do you mean by in both kvm and the host kernel. Do you have
similar Oopses inside your guests? If yes can you post one?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/41] postcopy live migration

2012-06-04 Thread Anthony Liguori

On 06/04/2012 05:57 PM, Isaku Yamahata wrote:

After the long time, we have v2. This is qemu part.
The linux kernel part is sent separatedly.

Changes v1 -  v2:
- split up patches for review
- buffered file refactored
- many bug fixes
   Espcially PV drivers can work with postcopy
- optimization/heuristic

Patches
1 - 30: refactoring exsiting code and preparation
31 - 37: implement postcopy itself (essential part)
38 - 41: some optimization/heuristic for postcopy

Intro
=
This patch series implements postcopy live migration.[1]
As discussed at KVM forum 2011, dedicated character device is used for
distributed shared memory between migration source and destination.
Now we can discuss/benchmark/compare with precopy. I believe there are
much rooms for improvement.

[1] http://wiki.qemu.org/Features/PostCopyLiveMigration


Usage
=
You need load umem character device on the host before starting migration.
Postcopy can be used for tcg and kvm accelarator. The implementation depend
on only linux umem character device. But the driver dependent code is split
into a file.
I tested only host page size == guest page size case, but the implementation
allows host page size != guest page size case.

The following options are added with this patch series.
- incoming part
   command line options
   -postcopy [-postcopy-flagsflags]
   where flags is for changing behavior for benchmark/debugging
   Currently the following flags are available
   0: default
   1: enable touching page request

   example:
   qemu -postcopy -incoming tcp:0: -monitor stdio -machine accel=kvm

- outging part
   options for migrate command
   migrate [-p [-n] [-m]] URI [prefault forward  [prefault backword]]
   -p: indicate postcopy migration
   -n: disable background transferring pages: This is for benchmark/debugging
   -m: move background transfer of postcopy mode
   prefault forward: The number of forward pages which is sent with on-demand
   prefault backward: The number of backward pages which is sent with
on-demand

   example:
   migrate -p -n tcp:dest ip address:
   migrate -p -n -m tcp:dest ip address: 32 0


TODO

- benchmark/evaluation. Especially how async page fault affects the result.


I don't mean to beat on a dead horse, but I really don't understand the point of 
postcopy migration other than the fact that it's possible.  It's a lot of code 
and a new ABI in an area where we already have too much difficulty maintaining 
our ABI.


Without a compelling real world case with supporting benchmarks for why we need 
postcopy and cannot improve precopy, I'm against merging this.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use IRQF_ONESHOT for assigned device MSI interrupts

2012-06-04 Thread Thomas Gleixner
On Mon, 4 Jun 2012, Jan Kiszka wrote:
 On 2012-06-04 13:21, Thomas Gleixner wrote:
 So this shortcut requires some checks before being applied to a specific
 MSI/MSI-X vector.
 
 
 Taking KVM aside, my general question remains if threaded MSI handlers
 of all devices really need to apply IRQF_ONESHOT though they should have
 no use for it.

In theory no, but we had more than one incident, where threaded irqs
w/o a primary handler and w/o IRQF_ONEHSOT lead to full system
starvation. Linus requested this sanity check and I think it's sane
and required.

In fact it's a non issue for MSI. MSI uses handle_edge_irq which does
not mask the interrupt. IRQF_ONESHOT is a noop for that flow handler.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking

2012-06-04 Thread Luiz Capitulino
On Mon, 04 Jun 2012 12:56:41 +0800
Anthony Liguori anth...@codemonkey.ws wrote:

 On 05/25/2012 08:53 PM, Luiz Capitulino wrote:
  On Fri, 25 May 2012 13:01:37 +0100
  Stefan Hajnoczistefa...@gmail.com  wrote:
 
  I agree it would be nice to drop entirely but I don't feel happy doing
  that to users who might have QEMU buried in scripts somewhere.  One
  day they upgrade packages and suddenly their stuff doesn't work
  anymore.
 
  This is very similar to kqemu and I don't think we regret having dropped it.
 
 You couldn't imagine the number of complaints I got from users about dropping 
 kqemu.  It caused me considerable pain.  Complaints ranged from down right 
 hostile (I had to involve the Launchpad admins at one point because of a 
 particular user) to entirely sympathetic.
 
 kqemu wasn't just a maintenance burden, it was preventing large guest memory 
 support in KVM guests.  There was no simple way around it without breaking 
 kqemu 
 ABI and making significant changes to the kqemu module.
 
 Dropping features is only something that should be approached lightly and 
 certainly not something that should be done just because you don't like a 
 particular bit of code.

It's not just because I don't like the code. Afaik, there are better external
tools that seem to do exact the same thing (and even seem to do it better)

But as Markus said in the other thread, it's just advice, not strong objection.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: [Qemu-devel] [PATCH v2 00/41] postcopy live migration

2012-06-04 Thread Isaku Yamahata
On Mon, Jun 04, 2012 at 05:01:30AM -0700, Chegu Vinod wrote:
 Hello Isaku Yamahata,

Hi.

 I just saw your patches..Would it be possible to email me a tar bundle of 
 these
 patches (makes it easier to apply the patches to a copy of the upstream 
 qemu.git)

I uploaded them to github for those who are interested in it.

git://github.com/yamahata/qemu.git qemu-postcopy-june-04-2012
git://github.com/yamahata/linux-umem.git  linux-umem-june-04-2012 


 BTW, I am also curious if you have considered using any kind of RDMA features 
 for
 optimizing the page-faults during postcopy ?

Yes, RDMA is interesting topic. Can we share your use case/concern/issues?
Thus we can collaborate.
You may want to see Benoit's results. As long as I know, he has not published
his code yet.

thanks,

 Thanks
 Vinod



 --

 Message: 1
 Date: Mon,  4 Jun 2012 18:57:02 +0900
 From: Isaku Yamahatayamah...@valinux.co.jp
 To: qemu-de...@nongnu.org, kvm@vger.kernel.org
 Cc: benoit.hud...@gmail.com, aarca...@redhat.com, aligu...@us.ibm.com,
   quint...@redhat.com, stefa...@gmail.com, t.hirofu...@aist.go.jp,
   dl...@redhat.com, satoshi.i...@aist.go.jp,  
 mdr...@linux.vnet.ibm.com,
   yoshikawa.tak...@oss.ntt.co.jp, owass...@redhat.com, a...@redhat.com,
   pbonz...@redhat.com
 Subject: [Qemu-devel] [PATCH v2 00/41] postcopy live migration
 Message-ID:cover.1338802190.git.yamah...@valinux.co.jp

 After the long time, we have v2. This is qemu part.
 The linux kernel part is sent separatedly.

 Changes v1 -  v2:
 - split up patches for review
 - buffered file refactored
 - many bug fixes
   Espcially PV drivers can work with postcopy
 - optimization/heuristic

 Patches
 1 - 30: refactoring exsiting code and preparation
 31 - 37: implement postcopy itself (essential part)
 38 - 41: some optimization/heuristic for postcopy

 Intro
 =
 This patch series implements postcopy live migration.[1]
 As discussed at KVM forum 2011, dedicated character device is used for
 distributed shared memory between migration source and destination.
 Now we can discuss/benchmark/compare with precopy. I believe there are
 much rooms for improvement.

 [1] http://wiki.qemu.org/Features/PostCopyLiveMigration


 Usage
 =
 You need load umem character device on the host before starting migration.
 Postcopy can be used for tcg and kvm accelarator. The implementation depend
 on only linux umem character device. But the driver dependent code is split
 into a file.
 I tested only host page size == guest page size case, but the implementation
 allows host page size != guest page size case.

 The following options are added with this patch series.
 - incoming part
   command line options
   -postcopy [-postcopy-flagsflags]
   where flags is for changing behavior for benchmark/debugging
   Currently the following flags are available
   0: default
   1: enable touching page request

   example:
   qemu -postcopy -incoming tcp:0: -monitor stdio -machine accel=kvm

 - outging part
   options for migrate command
   migrate [-p [-n] [-m]] URI [prefault forward  [prefault backword]]
   -p: indicate postcopy migration
   -n: disable background transferring pages: This is for benchmark/debugging
   -m: move background transfer of postcopy mode
   prefault forward: The number of forward pages which is sent with on-demand
   prefault backward: The number of backward pages which is sent with
on-demand

   example:
   migrate -p -n tcp:dest ip address:
   migrate -p -n -m tcp:dest ip address: 32 0


 TODO
 
 - benchmark/evaluation. Especially how async page fault affects the result.
 - improve/optimization
   At the moment at least what I'm aware of is
   - making incoming socket non-blocking with thread
 As page compression is comming, it is impractical to non-blocking read
 and check if the necessary data is read.
   - touching pages in incoming qemu process by fd handler seems suboptimal.
 creating dedicated thread?
   - outgoing handler seems suboptimal causing latency.
 - consider on FUSE/CUSE possibility
 - don't fork umemd, but create thread?

 basic postcopy work flow
 
 qemu on the destination
   |
   V
 open(/dev/umem)
   |
   V
 UMEM_INIT
   |
   V
 Here we have two file descriptors to
 umem device and shmem file
   |
   |  umemd
   |  daemon on the destination
   |
   Vcreate pipe to communicate
 fork()---,
   |  |
   V  |
 close(socket)V
 close(shmem)  

Re: [PATCH] KVM: Use IRQF_ONESHOT for assigned device MSI interrupts

2012-06-04 Thread Jan Kiszka
On 2012-06-04 15:07, Thomas Gleixner wrote:
 On Mon, 4 Jun 2012, Jan Kiszka wrote:
 On 2012-06-04 13:21, Thomas Gleixner wrote:
 So this shortcut requires some checks before being applied to a specific
 MSI/MSI-X vector.


 Taking KVM aside, my general question remains if threaded MSI handlers
 of all devices really need to apply IRQF_ONESHOT though they should have
 no use for it.
 
 In theory no, but we had more than one incident, where threaded irqs
 w/o a primary handler and w/o IRQF_ONEHSOT lead to full system
 starvation. Linus requested this sanity check and I think it's sane
 and required.

OK.

 
 In fact it's a non issue for MSI. MSI uses handle_edge_irq which does
 not mask the interrupt. IRQF_ONESHOT is a noop for that flow handler.

Isn't irq_finalize_oneshot processes for all flows?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Use IRQF_ONESHOT for assigned device MSI interrupts

2012-06-04 Thread Thomas Gleixner
On Mon, 4 Jun 2012, Jan Kiszka wrote:

 On 2012-06-04 15:07, Thomas Gleixner wrote:
  On Mon, 4 Jun 2012, Jan Kiszka wrote:
  On 2012-06-04 13:21, Thomas Gleixner wrote:
  So this shortcut requires some checks before being applied to a specific
  MSI/MSI-X vector.
 
 
  Taking KVM aside, my general question remains if threaded MSI handlers
  of all devices really need to apply IRQF_ONESHOT though they should have
  no use for it.
  
  In theory no, but we had more than one incident, where threaded irqs
  w/o a primary handler and w/o IRQF_ONEHSOT lead to full system
  starvation. Linus requested this sanity check and I think it's sane
  and required.
 
 OK.
 
  
  In fact it's a non issue for MSI. MSI uses handle_edge_irq which does
  not mask the interrupt. IRQF_ONESHOT is a noop for that flow handler.
 
 Isn't irq_finalize_oneshot processes for all flows?

Right, forgot about that. The only way we can avoid that, is that we
get a hint from the underlying irq chip/ handler setup with an extra
flag to tell the core, that it's safe to avoid the ONESHOT/finalize
magic.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Executing script in VM, from KVM

2012-06-04 Thread Michael Johns
Hi list,

I have recently started using KVM, and I'm looking to see how I might
be able to use and adapt KVM for my own purposes, to do things more
easily on some of the virtual machines I use.

To this end, I was wondering if it is possible to execute a script or
run a command or a process within the virtual machines from KVM/QEMU?

Thanks,

Michael
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/41] postcopy live migration

2012-06-04 Thread Isaku Yamahata
On Mon, Jun 04, 2012 at 08:37:04PM +0800, Anthony Liguori wrote:
 On 06/04/2012 05:57 PM, Isaku Yamahata wrote:
 After the long time, we have v2. This is qemu part.
 The linux kernel part is sent separatedly.

 Changes v1 -  v2:
 - split up patches for review
 - buffered file refactored
 - many bug fixes
Espcially PV drivers can work with postcopy
 - optimization/heuristic

 Patches
 1 - 30: refactoring exsiting code and preparation
 31 - 37: implement postcopy itself (essential part)
 38 - 41: some optimization/heuristic for postcopy

 Intro
 =
 This patch series implements postcopy live migration.[1]
 As discussed at KVM forum 2011, dedicated character device is used for
 distributed shared memory between migration source and destination.
 Now we can discuss/benchmark/compare with precopy. I believe there are
 much rooms for improvement.

 [1] http://wiki.qemu.org/Features/PostCopyLiveMigration


 Usage
 =
 You need load umem character device on the host before starting migration.
 Postcopy can be used for tcg and kvm accelarator. The implementation depend
 on only linux umem character device. But the driver dependent code is split
 into a file.
 I tested only host page size == guest page size case, but the implementation
 allows host page size != guest page size case.

 The following options are added with this patch series.
 - incoming part
command line options
-postcopy [-postcopy-flagsflags]
where flags is for changing behavior for benchmark/debugging
Currently the following flags are available
0: default
1: enable touching page request

example:
qemu -postcopy -incoming tcp:0: -monitor stdio -machine accel=kvm

 - outging part
options for migrate command
migrate [-p [-n] [-m]] URI [prefault forward  [prefault backword]]
-p: indicate postcopy migration
-n: disable background transferring pages: This is for benchmark/debugging
-m: move background transfer of postcopy mode
prefault forward: The number of forward pages which is sent with 
 on-demand
prefault backward: The number of backward pages which is sent with
 on-demand

example:
migrate -p -n tcp:dest ip address:
migrate -p -n -m tcp:dest ip address: 32 0


 TODO
 
 - benchmark/evaluation. Especially how async page fault affects the result.

 I don't mean to beat on a dead horse, but I really don't understand the 
 point of postcopy migration other than the fact that it's possible.  It's 
 a lot of code and a new ABI in an area where we already have too much 
 difficulty maintaining our ABI.

 Without a compelling real world case with supporting benchmarks for why 
 we need postcopy and cannot improve precopy, I'm against merging this.

Some new results are available at 
https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_yamahata_postcopy.pdf

precopy assumes that the network bandwidth are wide enough and
the number of dirty pages converges. But it doesn't always hold true.

- planned migration
  predictability of total migration time is important

- dynamic consolidation
  In cloud use cases, the resources of physical machine are usually
  over committed.
  When physical machine becomes over loaded, some VMs are moved to another
  physical host to balance the load.
  precopy can't move VMs promptly. compression makes things worse.

- inter data center migration
  With L2 over L3 technology, it has becoming common to create a virtual
  data center which actually spans over multi physical data centers.
  It is useful to migrate VMs over physical data centers as disaster recovery.
  The network bandwidth between DCs is narrower than LAN case. So precopy
  assumption wouldn't hold.

- In case that network bandwidth might be limited by QoS,
  precopy assumption doesn't hold.


thanks,
-- 
yamahata
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL] qemu-kvm: merge with latest upstream

2012-06-04 Thread Jan Kiszka
The following changes since commit 75b4a94628c5035b5c350eff4a4edf923b5dc33a:

  pci-assign: Drop kvm_assigned_irq::host_irq initialization (2012-06-01 
20:51:10 -0300)

are available in the git repository at:
  git://git.kiszka.org/qemu-kvm.git queues/qemu-merge

This is the back-merge of post-1.1 upstream master with latest kvm
changes into qemu-kvm. It switches to upstream MSI/MSI-X handling and
replaces the MSI-X masking notifier with upstream's MSI vector notifier.
The switch of the latter feature is done in a single merge as creating
intermediate steps would have required the writing of quite a bit of
temporary code.

Please test and merge.

Anthony Liguori (4):
  Update version for 1.1.0 release
  Update version to open the 1.2 development branch
  Merge remote-tracking branch 'kwolf/for-anthony' into staging
  Merge remote-tracking branch 'qemu-kvm/uq/master' into staging

Avi Kivity (1):
  kvm: update vmxcap for EPT A/D, INVPCID, RDRAND, VMFUNC

Christian Borntraeger (1):
  virtio-blk: Fix geometry sector calculation

Daniel Verkamp (1):
  ahci: SATA FIS is 20 bytes, not 0x20

Jan Kiszka (33):
  Introduce MSIMessage structure
  kvm: Refactor KVMState::max_gsi to gsi_count
  kvm: Introduce basic MSI support for in-kernel irqchips
  pc: Enable MSI support at APIC level
  kvm: Update kernel headers
  kvm: Enable in-kernel irqchip support by default
  kvm: x86: Wire up MSI support for in-kernel irqchip
  kvm: Add support for direct MSI injections
  msix: Introduce vector notifiers
  kvm: Rename kvm_irqchip_add_route to kvm_irqchip_add_irq_route
  kvm: Introduce kvm_irqchip_add_msi_route
  kvm: Publicize kvm_irqchip_release_virq
  kvm: Make kvm_irqchip_commit_routes an internal service
  msix: Factor out msix_get_message
  msix: Invoke msix_handle_mask_update on msix_mask_all
  kvm: Introduce kvm_irqchip_add/remove_irqfd
  kvm: Enable use of kvm_irqchip_in_kernel in hwlib code
  msix: Add msix_nr_vectors_allocated
  virtio/vhost: Add support for KVM in-kernel MSI injection
  Merge commit '4e2e4e6355959a1af011167b0db5ac7ffd3adf94'
  Merge commit '04fa27f5ae5f025424bb7b88d3453c46e8900102'
  Merge commit '08a82ac01cb5409480128f8e1f144557d99b74a3'
  Merge commit 'ffb8d4296e01f0ead3ba81b08a34637c5bbff0da'
  Merge commit '4a3adebb1854d48f0c67958e164c6b2f29d44064'
  Merge commit '287d55c6769c3a38e9083b103cb148fb38858b3a' into 
queues/qemu-merge
  qemu-kvm: Drop redundant MSI hooks
  Merge commit 'bc4caf49c7fee6d1e063df32ca7554e5b98bfc89' into 
queues/qemu-merge
  Merge commit '5b5f1330da2d7e5b5cbde8c60738774b2bd8692f' into 
queues/qemu-merge
  Merge commit '7d37d351dffee60fc7048bbfd8573421f15eb724' into 
queues/qemu-merge
  Merge commit '8cc9b43f7c5f826b39af4b012ad89bb55faac29c' into 
queues/qemu-merge
  qemu-kvm: Drop redundant MSI-X hooks
  qemu-kvm: Drop unused kvm_msi_message_* services
  qemu-kvm: Remove unused kvm stubs

Jason Wang (1):
  Revert rtl8139: do the network/host communication only in normal 
operating mode

Jim Meyering (1):
  block: prevent snapshot mode $TMPDIR symlink attack

MORITA Kazutaka (1):
  sheepdog: fix return value of do_load_save_vm_state

Peter A. G. Crosthwaite (1):
  target-microblaze: lwx/swx: first implementation

Stefan Weil (1):
  virtio: Fix compiler warning for non Linux hosts

 VERSION   |2 +-
 block/sheepdog.c  |   10 +-
 hw/apic.c |3 +
 hw/device-assignment.c|   18 +--
 hw/ide/ahci.c |4 +-
 hw/kvm/apic.c |   34 +-
 hw/msi.c  |  127 
 hw/msi.h  |6 +-
 hw/msix.c |  263 ++---
 hw/msix.h |8 +-
 hw/pc.c   |   13 --
 hw/pc_piix.c  |   14 +--
 hw/pci.c  |1 -
 hw/pci.h  |   25 +---
 hw/virtio-blk.c   |   21 +++-
 hw/virtio-pci.c   |  132 ++---
 hw/virtio-pci.h   |6 +
 hw/xen.h  |   10 --
 hw/xen_apic.c |5 +
 kvm-all.c |  253 +++-
 kvm-stub.c|   35 ++
 kvm.h |   36 ++
 linux-headers/linux/kvm.h |   38 ++
 qemu-common.h |1 +
 qemu-kvm.c|   88 +-
 scripts/kvm/vmxcap|   13 ++
 target-microblaze/cpu.c   |1 +
 target-microblaze/cpu.h   |4 +
 target-microblaze/helper.c|2 +
 target-microblaze/translate.c |   62 +-
 30 files changed, 648 insertions(+), 587 deletions(-)

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To 

Re: Biweekly KVM Test report, kernel 51bfd299... qemu a1fce560...

2012-06-04 Thread Kevin Wolf
Am 01.06.2012 10:31, schrieb Kevin Wolf:
 Am 01.06.2012 09:57, schrieb Ren, Yongjie:
 -Original Message-
 From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
 Sent: Thursday, May 31, 2012 4:28 AM
 To: Ren, Yongjie
 Cc: Kevin Wolf; Avi Kivity; kvm@vger.kernel.org; Liu, RongrongX
 Subject: Re: Biweekly KVM Test report, kernel 51bfd299... qemu
 a1fce560...

 On Tue, May 22, 2012 at 07:40:29AM +, Ren, Yongjie wrote:
 -Original Message-
 From: Kevin Wolf [mailto:kw...@redhat.com]
 Sent: Monday, May 21, 2012 11:30 PM
 To: Ren, Yongjie
 Cc: Avi Kivity; kvm@vger.kernel.org; Liu, RongrongX
 Subject: Re: Biweekly KVM Test report, kernel 51bfd299... qemu
 a1fce560...

 Am 21.05.2012 11:45, schrieb Ren, Yongjie:
 -Original Message-
 From: Kevin Wolf [mailto:kw...@redhat.com]
 Sent: Monday, May 21, 2012 5:05 PM
 To: Avi Kivity
 Cc: Ren, Yongjie; kvm@vger.kernel.org
 Subject: Re: Biweekly KVM Test report, kernel 51bfd299... qemu
 a1fce560...

 Am 21.05.2012 10:27, schrieb Avi Kivity:
 On 05/21/2012 06:34 AM, Ren, Yongjie wrote:
 Hi All,

 This is KVM upstream test result against kvm.git
 51bfd2998113e1f8ce8dcf853407b76a04b5f2a0 based on kernel
 3.4.0-rc7,
 and qemu-kvm.git a1fce560c0e5f287ed65d2aaadb3e59578aaa983.

 We found 1 new bug and 1 bug got fixed in the past two weeks.

 New issue (1):
 1. disk error when guest boot up via qcow2 image
   https://bugs.launchpad.net/qemu/+bug/1002121
   -- Should be a regression on qemu-kvm.


 Kevin, is this the known regression in qcow2 or something new?

 If the commit ID is right, it must be something new. The regression
 that
 Marcelo found was fixed in 54e68143.

 Yes, it's right. This should be a new regression.
 I looked at the comment of 54e68143, and found it was not related
 the
 issue I reported.

 The Launchpad bug refers to commit e54f008ef, which doesn't
 include
 this
 fix indeed. So was the test repeated with a more current qemu-kvm
 version after filing the bug in Launchpad, or is the commit ID in this
 mail wrong?

 Latest commit 3fd9fedb in qemu-kvm master tree still has this issue.
 And, the commit ID provided in Launchpad is correct.

 Can you please check if the bug exists in upstream qemu.git as well?

 This bug doesn't exist on upstream qemu.git with latest commit:
 fd4567d9.
 So, it should only exists on qemu-kvm tree.

 Please bisect manually (not using git bisect), with the attached list of
 commits. These are the qemu - qemu-kvm merge commits in the range
 described as bad/good.

 The 1st bad commit in your attached list is abc551bd
 More detailed info:
 171d2f2249a360d7d623130d3aa991418c53716d   good
 fd453a24166e36a3d376c9bc221e520e3ee425afgood
 abc551bd456cf0407fa798395d83dc5aa35f6dbb bad
 823ccf41509baa197dd6a3bef63837a6cf101ad8 bad
 
 Thanks, this points to the qcow2 v3 changes. Let's try to find the exact
 culprit. I have rebased the qcow2 patches on top of that good merge
 (fd453a24). Please apply the attached mbox on top of this merge:
 
 git checkout fd453a24
 git am qcow2v3.mbox
 
 You can then start a git bisect between your new HEAD and fd453a24.

Another thing you could try is if reverting e82dabd and bef0fd59 fixes
the problem for you.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: [Qemu-devel] [PATCH v2 00/41] postcopy live migration

2012-06-04 Thread Chegu Vinod

On 6/4/2012 6:13 AM, Isaku Yamahata wrote:

On Mon, Jun 04, 2012 at 05:01:30AM -0700, Chegu Vinod wrote:

Hello Isaku Yamahata,

Hi.


I just saw your patches..Would it be possible to email me a tar bundle of these
patches (makes it easier to apply the patches to a copy of the upstream 
qemu.git)

I uploaded them to github for those who are interested in it.

git://github.com/yamahata/qemu.git qemu-postcopy-june-04-2012
git://github.com/yamahata/linux-umem.git  linux-umem-june-04-2012



Thanks for the pointer...

BTW, I am also curious if you have considered using any kind of RDMA features 
for
optimizing the page-faults during postcopy ?

Yes, RDMA is interesting topic. Can we share your use case/concern/issues?



Looking at large sized guests (256GB and higher)  running cpu/memory 
intensive enterprise workloads.
The  concerns are the same...i.e. having a predictable total migration 
time, minimal downtime/freeze-time and of course minimal service 
degradation to the workload(s) in the VM or the co-located VM's...


How large of a guest have you tested your changes with and what kind of 
workloads have you used so far ?



Thus we can collaborate.
You may want to see Benoit's results.


Yes. 'have already seen some of Benoit's results.

Hence the question about use of RDMA techniques for post copy.


As long as I know, he has not published
his code yet.


Thanks
Vinod



thanks,


Thanks
Vinod



--

Message: 1
Date: Mon,  4 Jun 2012 18:57:02 +0900
From: Isaku Yamahatayamah...@valinux.co.jp
To: qemu-de...@nongnu.org, kvm@vger.kernel.org
Cc: benoit.hud...@gmail.com, aarca...@redhat.com, aligu...@us.ibm.com,
quint...@redhat.com, stefa...@gmail.com, t.hirofu...@aist.go.jp,
dl...@redhat.com, satoshi.i...@aist.go.jp,  
mdr...@linux.vnet.ibm.com,
yoshikawa.tak...@oss.ntt.co.jp, owass...@redhat.com, a...@redhat.com,
pbonz...@redhat.com
Subject: [Qemu-devel] [PATCH v2 00/41] postcopy live migration
Message-ID:cover.1338802190.git.yamah...@valinux.co.jp

After the long time, we have v2. This is qemu part.
The linux kernel part is sent separatedly.

Changes v1 -   v2:
- split up patches for review
- buffered file refactored
- many bug fixes
   Espcially PV drivers can work with postcopy
- optimization/heuristic

Patches
1 - 30: refactoring exsiting code and preparation
31 - 37: implement postcopy itself (essential part)
38 - 41: some optimization/heuristic for postcopy

Intro
=
This patch series implements postcopy live migration.[1]
As discussed at KVM forum 2011, dedicated character device is used for
distributed shared memory between migration source and destination.
Now we can discuss/benchmark/compare with precopy. I believe there are
much rooms for improvement.

[1] http://wiki.qemu.org/Features/PostCopyLiveMigration


Usage
=
You need load umem character device on the host before starting migration.
Postcopy can be used for tcg and kvm accelarator. The implementation depend
on only linux umem character device. But the driver dependent code is split
into a file.
I tested only host page size == guest page size case, but the implementation
allows host page size != guest page size case.

The following options are added with this patch series.
- incoming part
   command line options
   -postcopy [-postcopy-flagsflags]
   where flags is for changing behavior for benchmark/debugging
   Currently the following flags are available
   0: default
   1: enable touching page request

   example:
   qemu -postcopy -incoming tcp:0: -monitor stdio -machine accel=kvm

- outging part
   options for migrate command
   migrate [-p [-n] [-m]] URI [prefault forward   [prefault backword]]
   -p: indicate postcopy migration
   -n: disable background transferring pages: This is for benchmark/debugging
   -m: move background transfer of postcopy mode
   prefault forward: The number of forward pages which is sent with on-demand
   prefault backward: The number of backward pages which is sent with
on-demand

   example:
   migrate -p -n tcp:dest ip address:
   migrate -p -n -m tcp:dest ip address: 32 0


TODO

- benchmark/evaluation. Especially how async page fault affects the result.
- improve/optimization
   At the moment at least what I'm aware of is
   - making incoming socket non-blocking with thread
 As page compression is comming, it is impractical to non-blocking read
 and check if the necessary data is read.
   - touching pages in incoming qemu process by fd handler seems suboptimal.
 creating dedicated thread?
   - outgoing handler seems suboptimal causing latency.
- consider on FUSE/CUSE possibility
- don't fork umemd, but create thread?

basic postcopy work flow

 qemu on the destination
   |
   V
 open(/dev/umem)
   |
   V
 UMEM_INIT
   |
 

[Bug 43339] New: Wrong Pci-Bridge Header Type check.

2012-06-04 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=43339

   Summary: Wrong Pci-Bridge Header Type check.
   Product: Virtualization
   Version: unspecified
Kernel Version: 3.4
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: ve...@ispras.ru
Regression: No


I have found bug in file virt/kvm/assigned-device.c

670 /* Don't allow bridges to be assigned */
671 pci_read_config_byte(dev, PCI_HEADER_TYPE, header_type);
672 if ((header_type  PCI_HEADER_TYPE) != PCI_HEADER_TYPE_NORMAL) {
673 r = -EPERM;
674 goto out_put;
675 }

This code doesn't check that device is PCI-Bridge. In my case

header_type is 1,  default value for PCI-Bridge
PCI_HEADER_TYPE is 14(0xE)
PCI_HEADER_TYPE_NORMAL is 0

So, 1  0xE == 0 thus KVM assigns pci-bridge device to VM successfully.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] kvm tools: Increase AIO_MAX to 256

2012-06-04 Thread Asias He
The queue size for virtio_blk is 256 and AIO_MAX is 32, we might be
short of available aio events if guest issues  32 requests
simultaneously. Following error is observed when guest running stressed
I/O workload.

  Info: disk_image__read error: total=-11

To fix this, let's increase the aio events limit.

Signed-off-by: Asias He asias.he...@gmail.com
---
 tools/kvm/disk/core.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/kvm/disk/core.c b/tools/kvm/disk/core.c
index ed338e7..d1d2d59 100644
--- a/tools/kvm/disk/core.c
+++ b/tools/kvm/disk/core.c
@@ -6,7 +6,7 @@
 #include sys/eventfd.h
 #include sys/poll.h
 
-#define AIO_MAX 32
+#define AIO_MAX 256
 
 int debug_iodelay;
 
-- 
1.7.10.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm tools: Restart io_submit if it returns EAGAIN

2012-06-04 Thread Asias He
Keep trying if io_submit returns EAGAIN. No need to fail the request.

Signed-off-by: Asias He asias.he...@gmail.com
---
 tools/kvm/util/read-write.c |   16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/util/read-write.c b/tools/kvm/util/read-write.c
index 55473ba..44709df 100644
--- a/tools/kvm/util/read-write.c
+++ b/tools/kvm/util/read-write.c
@@ -322,23 +322,33 @@ int aio_pwritev(io_context_t ctx, struct iocb *iocb, int 
fd, const struct iovec
off_t offset, int ev, void *param)
 {
struct iocb *ios[1] = { iocb };
+   int ret;
 
io_prep_pwritev(iocb, fd, iov, iovcnt, offset);
io_set_eventfd(iocb, ev);
iocb-data = param;
 
-   return io_submit(ctx, 1, ios);
+restart:
+   ret = io_submit(ctx, 1, ios);
+   if (ret == -EAGAIN)
+   goto restart;
+   return ret;
 }
 
 int aio_preadv(io_context_t ctx, struct iocb *iocb, int fd, const struct iovec 
*iov, int iovcnt,
off_t offset, int ev, void *param)
 {
struct iocb *ios[1] = { iocb };
+   int ret;
 
io_prep_preadv(iocb, fd, iov, iovcnt, offset);
io_set_eventfd(iocb, ev);
iocb-data = param;
 
-   return io_submit(ctx, 1, ios);
+restart:
+   ret = io_submit(ctx, 1, ios);
+   if (ret == -EAGAIN)
+   goto restart;
+   return ret;
 }
-#endif
\ No newline at end of file
+#endif
-- 
1.7.10.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: [Qemu-devel] [PATCH v2 00/41] postcopy live migration

2012-06-04 Thread Isaku Yamahata
On Mon, Jun 04, 2012 at 07:27:25AM -0700, Chegu Vinod wrote:
 On 6/4/2012 6:13 AM, Isaku Yamahata wrote:
 On Mon, Jun 04, 2012 at 05:01:30AM -0700, Chegu Vinod wrote:
 Hello Isaku Yamahata,
 Hi.

 I just saw your patches..Would it be possible to email me a tar bundle of 
 these
 patches (makes it easier to apply the patches to a copy of the upstream 
 qemu.git)
 I uploaded them to github for those who are interested in it.

 git://github.com/yamahata/qemu.git qemu-postcopy-june-04-2012
 git://github.com/yamahata/linux-umem.git  linux-umem-june-04-2012


 Thanks for the pointer...
 BTW, I am also curious if you have considered using any kind of RDMA 
 features for
 optimizing the page-faults during postcopy ?
 Yes, RDMA is interesting topic. Can we share your use case/concern/issues?


 Looking at large sized guests (256GB and higher)  running cpu/memory  
 intensive enterprise workloads.
 The  concerns are the same...i.e. having a predictable total migration  
 time, minimal downtime/freeze-time and of course minimal service  
 degradation to the workload(s) in the VM or the co-located VM's...

 How large of a guest have you tested your changes with and what kind of  
 workloads have you used so far ?

Only up to several GB VM. Off course We'd like to benchmark with real
huge VM (several hundred GB), but it's somewhat difficult.


 Thus we can collaborate.
 You may want to see Benoit's results.

 Yes. 'have already seen some of Benoit's results.

Great.

 Hence the question about use of RDMA techniques for post copy.

So far my implementation doesn't used RDMA.

 As long as I know, he has not published
 his code yet.

 Thanks
 Vinod


 thanks,

 Thanks
 Vinod



 --

 Message: 1
 Date: Mon,  4 Jun 2012 18:57:02 +0900
 From: Isaku Yamahatayamah...@valinux.co.jp
 To: qemu-de...@nongnu.org, kvm@vger.kernel.org
 Cc: benoit.hud...@gmail.com, aarca...@redhat.com, aligu...@us.ibm.com,
 quint...@redhat.com, stefa...@gmail.com, t.hirofu...@aist.go.jp,
 dl...@redhat.com, satoshi.i...@aist.go.jp,  
 mdr...@linux.vnet.ibm.com,
 yoshikawa.tak...@oss.ntt.co.jp, owass...@redhat.com, a...@redhat.com,
 pbonz...@redhat.com
 Subject: [Qemu-devel] [PATCH v2 00/41] postcopy live migration
 Message-ID:cover.1338802190.git.yamah...@valinux.co.jp

 After the long time, we have v2. This is qemu part.
 The linux kernel part is sent separatedly.

 Changes v1 -   v2:
 - split up patches for review
 - buffered file refactored
 - many bug fixes
Espcially PV drivers can work with postcopy
 - optimization/heuristic

 Patches
 1 - 30: refactoring exsiting code and preparation
 31 - 37: implement postcopy itself (essential part)
 38 - 41: some optimization/heuristic for postcopy

 Intro
 =
 This patch series implements postcopy live migration.[1]
 As discussed at KVM forum 2011, dedicated character device is used for
 distributed shared memory between migration source and destination.
 Now we can discuss/benchmark/compare with precopy. I believe there are
 much rooms for improvement.

 [1] http://wiki.qemu.org/Features/PostCopyLiveMigration


 Usage
 =
 You need load umem character device on the host before starting migration.
 Postcopy can be used for tcg and kvm accelarator. The implementation depend
 on only linux umem character device. But the driver dependent code is split
 into a file.
 I tested only host page size == guest page size case, but the implementation
 allows host page size != guest page size case.

 The following options are added with this patch series.
 - incoming part
command line options
-postcopy [-postcopy-flagsflags]
where flags is for changing behavior for benchmark/debugging
Currently the following flags are available
0: default
1: enable touching page request

example:
qemu -postcopy -incoming tcp:0: -monitor stdio -machine accel=kvm

 - outging part
options for migrate command
migrate [-p [-n] [-m]] URI [prefault forward   [prefault backword]]
-p: indicate postcopy migration
-n: disable background transferring pages: This is for 
 benchmark/debugging
-m: move background transfer of postcopy mode
prefault forward: The number of forward pages which is sent with 
 on-demand
prefault backward: The number of backward pages which is sent with
 on-demand

example:
migrate -p -n tcp:dest ip address:
migrate -p -n -m tcp:dest ip address: 32 0


 TODO
 
 - benchmark/evaluation. Especially how async page fault affects the result.
 - improve/optimization
At the moment at least what I'm aware of is
- making incoming socket non-blocking with thread
  As page compression is comming, it is impractical to non-blocking read
  and check if the necessary data is read.
- touching pages in incoming qemu process by fd handler seems suboptimal.
  creating dedicated thread?
- outgoing handler seems 

[PATCH] kvm tools: Simplfy disk read write function name

2012-06-04 Thread Asias He
We read and write in sectors by default. It makes little sense to add
the extra _sector string for read and write ops/function name.

Signed-off-by: Asias He asias.he...@gmail.com
---
 tools/kvm/disk/blk.c   |4 ++--
 tools/kvm/disk/core.c  |8 
 tools/kvm/disk/qcow.c  |   12 ++--
 tools/kvm/disk/raw.c   |   18 +-
 tools/kvm/include/kvm/disk-image.h |   16 
 5 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/tools/kvm/disk/blk.c b/tools/kvm/disk/blk.c
index 1b06155..cf853c1 100644
--- a/tools/kvm/disk/blk.c
+++ b/tools/kvm/disk/blk.c
@@ -7,8 +7,8 @@
  * raw image and blk dev are similar, so reuse raw image ops.
  */
 static struct disk_image_operations blk_dev_ops = {
-   .read_sector= raw_image__read_sector,
-   .write_sector   = raw_image__write_sector,
+   .read   = raw_image__read,
+   .write  = raw_image__write,
 };
 
 static bool is_mounted(struct stat *st)
diff --git a/tools/kvm/disk/core.c b/tools/kvm/disk/core.c
index d1d2d59..e214359 100644
--- a/tools/kvm/disk/core.c
+++ b/tools/kvm/disk/core.c
@@ -193,8 +193,8 @@ ssize_t disk_image__read(struct disk_image *disk, u64 
sector, const struct iovec
if (debug_iodelay)
msleep(debug_iodelay);
 
-   if (disk-ops-read_sector) {
-   total = disk-ops-read_sector(disk, sector, iov, iovcount, 
param);
+   if (disk-ops-read) {
+   total = disk-ops-read(disk, sector, iov, iovcount, param);
if (total  0) {
pr_info(disk_image__read error: total=%ld\n, 
(long)total);
return total;
@@ -221,12 +221,12 @@ ssize_t disk_image__write(struct disk_image *disk, u64 
sector, const struct iove
if (debug_iodelay)
msleep(debug_iodelay);
 
-   if (disk-ops-write_sector) {
+   if (disk-ops-write) {
/*
 * Try writev based operation first
 */
 
-   total = disk-ops-write_sector(disk, sector, iov, iovcount, 
param);
+   total = disk-ops-write(disk, sector, iov, iovcount, param);
if (total  0) {
pr_info(disk_image__write error: total=%ld\n, 
(long)total);
return total;
diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
index 23f11f2..b3221c4 100644
--- a/tools/kvm/disk/qcow.c
+++ b/tools/kvm/disk/qcow.c
@@ -1180,15 +1180,15 @@ static int qcow_disk_close(struct disk_image *disk)
 }
 
 static struct disk_image_operations qcow_disk_readonly_ops = {
-   .read_sector= qcow_read_sector,
-   .close  = qcow_disk_close,
+   .read   = qcow_read_sector,
+   .close  = qcow_disk_close,
 };
 
 static struct disk_image_operations qcow_disk_ops = {
-   .read_sector= qcow_read_sector,
-   .write_sector   = qcow_write_sector,
-   .flush  = qcow_disk_flush,
-   .close  = qcow_disk_close,
+   .read   = qcow_read_sector,
+   .write  = qcow_write_sector,
+   .flush  = qcow_disk_flush,
+   .close  = qcow_disk_close,
 };
 
 static int qcow_read_refcount_table(struct qcow *q)
diff --git a/tools/kvm/disk/raw.c b/tools/kvm/disk/raw.c
index d2df814..42ca9f1 100644
--- a/tools/kvm/disk/raw.c
+++ b/tools/kvm/disk/raw.c
@@ -6,7 +6,7 @@
 #include libaio.h
 #endif
 
-ssize_t raw_image__read_sector(struct disk_image *disk, u64 sector, const 
struct iovec *iov,
+ssize_t raw_image__read(struct disk_image *disk, u64 sector, const struct 
iovec *iov,
int iovcount, void *param)
 {
u64 offset = sector  SECTOR_SHIFT;
@@ -21,7 +21,7 @@ ssize_t raw_image__read_sector(struct disk_image *disk, u64 
sector, const struct
 #endif
 }
 
-ssize_t raw_image__write_sector(struct disk_image *disk, u64 sector, const 
struct iovec *iov,
+ssize_t raw_image__write(struct disk_image *disk, u64 sector, const struct 
iovec *iov,
int iovcount, void *param)
 {
u64 offset = sector  SECTOR_SHIFT;
@@ -36,7 +36,7 @@ ssize_t raw_image__write_sector(struct disk_image *disk, u64 
sector, const struc
 #endif
 }
 
-ssize_t raw_image__read_sector_mmap(struct disk_image *disk, u64 sector, const 
struct iovec *iov,
+ssize_t raw_image__read_mmap(struct disk_image *disk, u64 sector, const struct 
iovec *iov,
int iovcount, void *param)
 {
u64 offset = sector  SECTOR_SHIFT;
@@ -54,7 +54,7 @@ ssize_t raw_image__read_sector_mmap(struct disk_image *disk, 
u64 sector, const s
return total;
 }
 
-ssize_t raw_image__write_sector_mmap(struct disk_image *disk, u64 sector, 
const struct iovec *iov,
+ssize_t raw_image__write_mmap(struct disk_image *disk, u64 sector, const 
struct iovec *iov,
int iovcount, void *param)
 {
u64 offset = sector  

[PATCH] kvm tools: Process virito blk requests in separate thread

2012-06-04 Thread Asias He
All blk requests are processed in notify_vq() which is in the context of
ioeventfd thread: ioeventfd__thread(). The processing in notify_vq() may
take a long time to complete and all devices share the single ioeventfd
thead, so this might block other device's notify_vq() being called and
starve other devices.

This patch makes virtio blk's notify_vq() just notify the blk thread
instead of doing the real hard read/write work. Tests show that the
overhead of the notification operations is small.

The reasons for using dedicated thead instead of using thead pool
follow:

1) In thread pool model, each job handling operation:
thread_pool__do_job() takes about 6 or 7 mutex_{lock,unlock} ops. Most
of the mutex are global (job_mutex) which are contented by the threads
in the pool. It's fine for the non performance critical virtio devices,
such as console, rng, etc. But it's not optimal for net and blk devices.

2) Using dedicated threads to handle blk requests opens the door for
user to set different IO priority for the blk threads.

3) It also reduces the contentions between net and blk devices if they
do not share the thead pool.

Signed-off-by: Asias He asias.he...@gmail.com
---
 tools/kvm/virtio/blk.c |   26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/tools/kvm/virtio/blk.c b/tools/kvm/virtio/blk.c
index da92094..e0dc37d 100644
--- a/tools/kvm/virtio/blk.c
+++ b/tools/kvm/virtio/blk.c
@@ -49,6 +49,11 @@ struct blk_dev {
 
struct virt_queue   vqs[NUM_VIRT_QUEUES];
struct blk_dev_req  reqs[VIRTIO_BLK_QUEUE_SIZE];
+
+   pthread_t   io_thread;
+   int io_efd;
+
+   struct kvm  *kvm;
 };
 
 static LIST_HEAD(bdevs);
@@ -174,11 +179,26 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, 
u32 pfn)
return 0;
 }
 
+static void *virtio_blk_thread(void *dev)
+{
+   struct blk_dev *bdev = dev;
+   u64 data;
+
+   while (1) {
+   read(bdev-io_efd, data, sizeof(u64));
+   virtio_blk_do_io(bdev-kvm, bdev-vqs[0], bdev);
+   }
+
+   pthread_exit(NULL);
+   return NULL;
+}
+
 static int notify_vq(struct kvm *kvm, void *dev, u32 vq)
 {
struct blk_dev *bdev = dev;
+   u64 data = 1;
 
-   virtio_blk_do_io(kvm, bdev-vqs[vq], bdev);
+   write(bdev-io_efd, data, sizeof(data));
 
return 0;
 }
@@ -233,6 +253,8 @@ static int virtio_blk__init_one(struct kvm *kvm, struct 
disk_image *disk)
.capacity   = disk-size / SECTOR_SIZE,
.seg_max= DISK_SEG_MAX,
},
+   .io_efd = eventfd(0, 0),
+   .kvm= kvm,
};
 
virtio_init(kvm, bdev, bdev-vdev, blk_dev_virtio_ops,
@@ -247,6 +269,8 @@ static int virtio_blk__init_one(struct kvm *kvm, struct 
disk_image *disk)
 
disk_image__set_callback(bdev-disk, virtio_blk_complete);
 
+   pthread_create(bdev-io_thread, NULL, virtio_blk_thread, bdev);
+
if (compat_id != -1)
compat_id = compat__add_message(virtio-blk device was not 
detected,
While you have requested a 
virtio-blk device, 
-- 
1.7.10.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Process virito blk requests in separate thread

2012-06-04 Thread Cyrill Gorcunov
On Mon, Jun 04, 2012 at 11:40:53PM +0800, Asias He wrote:
  
 +static void *virtio_blk_thread(void *dev)
 +{
 + struct blk_dev *bdev = dev;
 + u64 data;
 +
 + while (1) {
 + read(bdev-io_efd, data, sizeof(u64));
 + virtio_blk_do_io(bdev-kvm, bdev-vqs[0], bdev);
 + }
 +
 + pthread_exit(NULL);
 + return NULL;
 +}

I must admit I don't understand this code ;) The data get read into
stack variable forever?

Cyrill
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Process virito blk requests in separate thread

2012-06-04 Thread Sasha Levin
On Mon, 2012-06-04 at 23:40 +0800, Asias He wrote:
 All blk requests are processed in notify_vq() which is in the context of
 ioeventfd thread: ioeventfd__thread(). The processing in notify_vq() may
 take a long time to complete and all devices share the single ioeventfd
 thead, so this might block other device's notify_vq() being called and
 starve other devices.

We're using native vectored AIO for for processing blk requests, so I'm
not certain if theres any point in giving the blk device it's own thread
for handling that.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm segfaults and bad page state in 3.4.0

2012-06-04 Thread Fengguang Wu
On Mon, Jun 04, 2012 at 08:35:30PM +0800, Fengguang Wu wrote:
 Hi Gleb,
 
 On Mon, Jun 04, 2012 at 02:56:50PM +0300, Gleb Natapov wrote:
  On Mon, Jun 04, 2012 at 07:46:03PM +0800, Fengguang Wu wrote:
   Hi,
   
   I'm running lots of kvm instances for doing kernel boot tests.
   Unfortunately the test system itself is not stable enough, I got scary
   errors in both kvm and the host kernel. Like this. 
   
  What do you mean by in both kvm and the host kernel. Do you have
 
 I mean the host side's kvm user space process and kernel seem to have 
 problems.
 
  similar Oopses inside your guests? If yes can you post one?
 
 There are all kinds of problems in the guest kernel, too. Probably I
 built in too many modules (take a debian config and s/=m/=y/) and
 enabled too many debug options. Many of the bugs I ran into have
 already been reported by others in LKML. Here are more weird things.

Two more boot errors..

storvsc device driver (from Microsoft..) bug:

[  108.445777] hv_vmbus: registering driver storvsc
[  108.498750] [ cut here ]
[  108.502649] kernel BUG at /c/kernel-tests/intel/drivers/base/driver.c:227!
[  108.502649] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
[  108.502649] CPU 0 
[  108.502649] Modules linked in:
[  108.502649] 
[  108.502649] Pid: 1, comm: swapper/0 Not tainted 3.2.0-rt13+ #1 Bochs Bochs
[  108.502649] RIP: 0010:[8197f395]  [8197f395] 
driver_register+0x24/0x116
[  108.502649] RSP: 0018:8800162c5e60  EFLAGS: 00010246
[  108.502649] RAX: 84131c40 RBX: 8411e580 RCX: 25232522
[  108.502649] RDX:  RSI: 82dac59f RDI: 8411e580
[  108.502649] RBP: 8800162c5ea0 R08: 0002 R09: 84f32270
[  108.502649] R10:  R11:  R12: 
[  108.502649] R13: 83aeeeff R14:  R15: 
[  108.502649] FS:  () GS:88001740() 
knlGS:
[  108.502649] CS:  0010 DS:  ES:  CR0: 8005003b
[  108.502649] CR2:  CR3: 03e12000 CR4: 06f0
[  108.502649] DR0:  DR1:  DR2: 
[  108.502649] DR3:  DR6: 0ff0 DR7: 0400
[  108.502649] Process swapper/0 (pid: 1, threadinfo 8800162c4000, task 
8800162c0040)
[  108.502649] Stack:
[  108.502649]  8800162c5eb0 8800162c5e70 8800162c5e80 
8411e560
[  108.502649]   83aeeeff  

[  108.502649]  8800162c5ed0 827e3b18 83e6eda8 
845d6460
[  108.502649] Call Trace:
[  108.502649]  [827e3b18] __vmbus_driver_register+0x4a/0x5c
[  108.502649]  [8445fcc4] ? rtsx_init+0x29/0x29
[  108.502649]  [8445fcf9] storvsc_drv_init+0x35/0x3f
[  108.502649]  [81002099] do_one_initcall+0x7f/0x13a
[  108.502649]  [843e4caa] kernel_init+0xce/0x148
[  108.502649]  [82db5604] kernel_thread_helper+0x4/0x10
[  108.502649]  [82dac9b4] ? retint_restore_args+0x13/0x13
[  108.502649]  [843e4bdc] ? start_kernel+0x412/0x412
[  108.502649]  [82db5600] ? gs_change+0x13/0x13
[  108.502649] Code: 5c 41 5d 41 5e 5d c3 55 48 89 e5 41 57 41 56 41 55 41 54 
53 48 83 ec 18 66 66 66 66 90 48 8b 47 08 48 89 fb 48 83 78 68 00 75 02 0f 0b 
48 83 78 30 00 74 07 48 83 7f 30 00 75 1c 48 83 78 38 00 
[  108.502649] RIP  [8197f395] driver_register+0x24/0x116
[  108.502649]  RSP 8800162c5e60
[  110.913751] ---[ end trace 184c66c6768bd651 ]---
[  110.967270] swapper/0 used greatest stack depth: 3688 bytes left
[  111.021415] Kernel panic - not syncing: Attempted to kill init!
[  111.075053] Pid: 1, comm: swapper/0 Tainted: G  D  3.2.0-rt13+ #1
[  111.130699] Call Trace:
[  111.185972]  [82d5d34d] panic+0xa0/0x1b3
[  111.241642]  [82dabdda] ? _raw_write_unlock_irq+0x2e/0x47
[  111.294939]  [810a55f8] do_exit+0x9b/0x7b7
[  111.349523]  [810a31cd] ? kmsg_dump+0x82/0x135
[  111.402315]  [82dad653] oops_end+0xaf/0xb8
[  111.454034]  [8104beb4] die+0x5a/0x66
[  111.505217]  [82dad181] do_trap+0x11a/0x129
[  111.555117]  [81049b4a] do_invalid_op+0x98/0xa1
[  111.603546]  [8197f395] ? driver_register+0x24/0x116
[  111.651247]  [810d2423] ? trace_hardirqs_off_caller+0x3f/0x9e
[  111.700511]  [816a457d] ? trace_hardirqs_off_thunk+0x3a/0x3c
[  111.748561]  [82dac9e4] ? restore_args+0x30/0x30
[  111.796413]  [82db547b] invalid_op+0x1b/0x20
[  111.844369]  [82dac59f] ? _raw_spin_unlock_irqrestore+0x3e/0x61
[  111.893537]  [8197f395] ? driver_register+0x24/0x116
[  111.943061]  [827e3b18] __vmbus_driver_register+0x4a/0x5c
[  111.993386]  [8445fcc4] ? rtsx_init+0x29/0x29
[  112.043646]  [8445fcf9] storvsc_drv_init+0x35/0x3f
[  

Re: KVM entry failed, hardware error

2012-06-04 Thread Johannes Bauer
On 04.06.2012 10:53, Gleb Natapov wrote:
 On Sun, Jun 03, 2012 at 06:25:33PM +0200, Johannes Bauer wrote:
 Therefore, I've uploaded the compressed trace.dat file, so you can maybe
 have a look why the report tool barfs and interpret it correctly. I
 can't figure it out. The trace is here:

 http://spornkuller.de/trace.dat.bz2

 I can read this trace.

Hm, weird. But good that it works on your side. I get a lot of:

trace-cmd: No such file or directory
  bad op token {
  failed to read event print fmt for kvm_emulate_insn
version = 6
CPU 0 is empty
cpus=4
 qemu-system-x86-10775 [001]  2512.220779: kvm_fpu:  load
 qemu-system-x86-10775 [001]  2512.220782: kvm_entry:vcpu 0
 qemu-system-x86-10775 [001]  2512.220785: kvm_exit: reason
EXCEPTION_NMI rip 0xfff0 info 0 8b0e
 qemu-system-x86-10775 [001]  2512.220787: kvm_page_fault:   address
0 error_code 14
 qemu-system-x86-10775 [001]  2512.220796: kvm_entry:vcpu 0
 qemu-system-x86-10775 [001]  2512.220798: kvm_exit: reason
EXCEPTION_NMI rip 0xc81e info 0 8b0d
 qemu-system-x86-10775 [001]  2512.220803: kvm_emulate_insn: [FAILED
TO PARSE] rip=51230 csbase=983040 len=3 insn= �f%���f flags=0 failed=0
 qemu-system-x86-10775 [001]  2512.220806: kvm_entry:vcpu 0
 qemu-system-x86-10775 [001]  2512.220807: kvm_exit: reason
EXCEPTION_NMI rip 0xc827 info 0 8b0d
 qemu-system-x86-10775 [001]  2512.220808: kvm_emulate_insn: [FAILED
TO PARSE] rip=51239 csbase=983040 len=3 insn=���f�flags=0 failed=0
[...]

 Can you do info pci in qemu's monitor
 after failure? 

(qemu) info pci
  Bus  0, device   0, function 0:
Host bridge: PCI device 8086:1237
  id 
  Bus  0, device   1, function 0:
ISA bridge: PCI device 8086:7000
  id 
  Bus  0, device   1, function 1:
IDE controller: PCI device 8086:7010
  BAR4: I/O at 0xc000 [0xc00f].
  id 
  Bus  0, device   1, function 3:
Bridge: PCI device 8086:7113
  IRQ 9.
  id 
  Bus  0, device   2, function 0:
VGA controller: PCI device 1013:00b8
  BAR0: 32 bit prefetchable memory at 0xf000 [0xf1ff].
  BAR1: 32 bit memory at 0xf200 [0xf2000fff].
  BAR6: 32 bit memory at 0x [0xfffe].
  id 
  Bus  0, device   3, function 0:
Ethernet controller: PCI device 8086:100e
  IRQ 11.
  BAR0: 32 bit memory at 0xf202 [0xf203].
  BAR1: I/O at 0xc040 [0xc07f].
  BAR6: 32 bit memory at 0x [0x0001fffe].
  id 
  Bus  0, device   4, function 0:
SCSI controller: PCI device 1af4:1001
  IRQ 11.
  BAR0: I/O at 0xc080 [0xc0bf].
  BAR1: 32 bit memory at 0xf206 [0xf2060fff].
  id 


What is your command line?

bin/qemu-system-x86_64 -cpu host -enable-kvm -net nic -net
user,smb=Share,restrict=on -drive
media=disk,file=Windows7_x32.qcow2,if=virtio -m 2048 -smp 1 -nographic

(added -nographic to be able to enter the console)

Also, as per Avi's request:

(qemu) x/256b 0x2b
002b: 0xeb 0x26 0x27 0x00 0x00 0x00 0x2b 0x00
002b0008: 0xff 0xff 0x00 0x00 0x00 0x9a 0xcf 0x00
002b0010: 0xff 0xff 0x00 0x00 0x00 0x92 0xcf 0x00
002b0018: 0xff 0xff 0x00 0x00 0x2b 0x9f 0x00 0x00
002b0020: 0xff 0xff 0x00 0x02 0x00 0x93 0x00 0x00
002b0028: 0x8a 0x15 0x68 0xbc 0x00 0x00 0xa1 0xbf
002b0030: 0x00 0x2b 0x00 0x85 0xc0 0x74 0x06 0x8b
002b0038: 0x1d 0xbb 0x00 0x2b 0x00 0xa1 0xc7 0x00
002b0040: 0x2b 0x00 0x85 0xc0 0x74 0x06 0x8b 0x15
002b0048: 0xc3 0x00 0x2b 0x00 0xbe 0x00 0x00 0x20
002b0050: 0x00 0x31 0xc0 0x31 0xff 0x66 0x8b 0x3d
002b0058: 0xb5 0x00 0x2b 0x00 0xc1 0xe7 0x04 0x66
002b0060: 0xa1 0xb3 0x00 0x2b 0x00 0x01 0xc7 0x8b
002b0068: 0x0d 0xb7 0x00 0x2b 0x00 0xfc 0xf3 0xa4
002b0070: 0x0f 0x01 0x15 0x02 0x00 0x2b 0x00 0x66
002b0078: 0xb8 0x20 0x00 0x8e 0xd8 0x8e 0xc0 0x8e
002b0080: 0xe0 0x8e 0xe8 0x8e 0xd0 0xbc 0x00 0x02
002b0088: 0x00 0x00 0xea 0x91 0x00 0x00 0x00 0x18
002b0090: 0x00 0x0f 0x20 0xc0 0x66 0x83 0xe0 0xfe
002b0098: 0x0f 0x22 0xc0 0x66 0x31 0xc0 0x8e 0xd8
002b00a0: 0x8e 0xc0 0x8e 0xd0 0x66 0xbc 0x00 0x04
002b00a8: 0x00 0x00 0x8e 0xe0 0x8e 0xe8 0xea 0x00
002b00b0: 0x00 0x00 0x20 0x00 0x00 0x00 0x20 0x4a
002b00b8: 0xda 0x05 0x00 0x80 0x00 0x00 0x00 0x01
002b00c0: 0x00 0x00 0x00 0x80 0x00 0x00 0x00 0x01
002b00c8: 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x90
002b00d0: 0x90 0x90 0x55 0x89 0xe5 0x57 0x8b 0x7d
002b00d8: 0x0c 0x56 0x53 0x8b 0x5d 0x08 0x6a 0x23
002b00e0: 0x6a 0x00 0x68 0x80 0x05 0x00 0x00 0xe8
002b00e8: 0xf3 0xcf 0x00 0x00 0x83 0xc4 0x0c 0x83
002b00f0: 0x3d 0xe0 0x04 0x03 0x00 0x00 0xc6 0x05
002b00f8: 0x80 0x05 0x00 0x00 0x13 0x74 0x0d 0x53

Best regards,
Joe
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a 

Re: KVM entry failed, hardware error

2012-06-04 Thread Johannes Bauer
On 04.06.2012 20:28, Johannes Bauer wrote:
 What is your command line?
 
 bin/qemu-system-x86_64 -cpu host -enable-kvm -net nic -net
 user,smb=Share,restrict=on -drive
 media=disk,file=Windows7_x32.qcow2,if=virtio -m 2048 -smp 1 -nographic

Just noticed that the output I just provided was for the 32 Bit version
of Windows 7. Did the same (info pci and the memdump) for the 64 Bit
version again and diffed them with meld -- they're *perfectly*
identical. Just so there isn't any confusion.

Best regards,
Joe
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Process virito blk requests in separate thread

2012-06-04 Thread Asias He
On Mon, Jun 4, 2012 at 11:48 PM, Cyrill Gorcunov gorcu...@openvz.org wrote:
 On Mon, Jun 04, 2012 at 11:40:53PM +0800, Asias He wrote:

 +static void *virtio_blk_thread(void *dev)
 +{
 +     struct blk_dev *bdev = dev;
 +     u64 data;
 +
 +     while (1) {
 +             read(bdev-io_efd, data, sizeof(u64));
 +             virtio_blk_do_io(bdev-kvm, bdev-vqs[0], bdev);
 +     }
 +
 +     pthread_exit(NULL);
 +     return NULL;
 +}

 I must admit I don't understand this code ;) The data get read into
 stack variable forever?

The data we read itself is not interesting at all. virtio_blk_thread()
sleeps on the eventfd bdev-io_efd until notify_vq() writes something
to wake it up.

-- 
Asias He
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PULL 00/20 1.2] kvm updates

2012-06-04 Thread Andreas Färber
Am 04.06.2012 07:46, schrieb Anthony Liguori:
 On 05/22/2012 12:37 AM, Avi Kivity wrote:
 Please pull from:

git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master
 
 Pulled.  Thanks.

This broke the ppc build. Guys, why wasn't this tested? There's only
three KVM targets to test compared to the 14 I'm struggling with...

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Process virito blk requests in separate thread

2012-06-04 Thread Asias He
On Tue, Jun 5, 2012 at 12:07 AM, Sasha Levin levinsasha...@gmail.com wrote:
 On Mon, 2012-06-04 at 23:40 +0800, Asias He wrote:
 All blk requests are processed in notify_vq() which is in the context of
 ioeventfd thread: ioeventfd__thread(). The processing in notify_vq() may
 take a long time to complete and all devices share the single ioeventfd
 thead, so this might block other device's notify_vq() being called and
 starve other devices.

 We're using native vectored AIO for for processing blk requests, so I'm
 not certain if theres any point in giving the blk device it's own thread
 for handling that.

We discussed this last year. Search the same subject. Pekka suggested
improving the thead pool API to support dedicated thead support. I'd
prefer to merge this patch for now since that support is till not
there.

Recently, I added some debug code to see how many loops
virtio_blk_do_io() will do until it finishes the processing.
I am seeing something like this:

   Info: virtio_blk_do_io max_nr_loop=8427

The processing takes a very long time.

-- 
Asias He
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 26/74] target-i386: Pass X86CPU to do_cpu_{init,sipi}()

2012-06-04 Thread Andreas Färber
Allows to use cpu_reset() in place of cpu_state_reset().

Signed-off-by: Andreas Färber afaer...@suse.de
Reviewed-by: Igor Mammedov imamm...@redhat.com
---
 cpu-exec.c   |4 ++--
 target-i386/cpu.h|4 ++--
 target-i386/helper.c |   13 -
 target-i386/kvm.c|6 --
 4 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 0344cd5..fbb39cb 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -287,11 +287,11 @@ int cpu_exec(CPUArchState *env)
 #if defined(TARGET_I386)
 if (interrupt_request  CPU_INTERRUPT_INIT) {
 svm_check_intercept(env, SVM_EXIT_INIT);
-do_cpu_init(env);
+do_cpu_init(x86_env_get_cpu(env));
 env-exception_index = EXCP_HALTED;
 cpu_loop_exit(env);
 } else if (interrupt_request  CPU_INTERRUPT_SIPI) {
-do_cpu_sipi(env);
+do_cpu_sipi(x86_env_get_cpu(env));
 } else if (env-hflags2  HF2_GIF_MASK) {
 if ((interrupt_request  CPU_INTERRUPT_SMI) 
 !(env-hflags  HF_SMM_MASK)) {
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 2460f63..aeff20b 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -1053,8 +1053,8 @@ static inline void cpu_get_tb_cpu_state(CPUX86State *env, 
target_ulong *pc,
 (env-eflags  (IOPL_MASK | TF_MASK | RF_MASK | VM_MASK));
 }
 
-void do_cpu_init(CPUX86State *env);
-void do_cpu_sipi(CPUX86State *env);
+void do_cpu_init(X86CPU *cpu);
+void do_cpu_sipi(X86CPU *cpu);
 
 #define MCE_INJECT_BROADCAST1
 #define MCE_INJECT_UNCOND_AO2
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 3421be2..e182025 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1187,27 +1187,30 @@ CPUX86State *cpu_x86_init(const char *cpu_model)
 }
 
 #if !defined(CONFIG_USER_ONLY)
-void do_cpu_init(CPUX86State *env)
+void do_cpu_init(X86CPU *cpu)
 {
+CPUX86State *env = cpu-env;
 int sipi = env-interrupt_request  CPU_INTERRUPT_SIPI;
 uint64_t pat = env-pat;
 
-cpu_state_reset(env);
+cpu_reset(CPU(cpu));
 env-interrupt_request = sipi;
 env-pat = pat;
 apic_init_reset(env-apic_state);
 env-halted = !cpu_is_bsp(env);
 }
 
-void do_cpu_sipi(CPUX86State *env)
+void do_cpu_sipi(X86CPU *cpu)
 {
+CPUX86State *env = cpu-env;
+
 apic_sipi(env-apic_state);
 }
 #else
-void do_cpu_init(CPUX86State *env)
+void do_cpu_init(X86CPU *cpu)
 {
 }
-void do_cpu_sipi(CPUX86State *env)
+void do_cpu_sipi(X86CPU *cpu)
 {
 }
 #endif
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index e74a9e4..0d0d8f6 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1698,6 +1698,8 @@ void kvm_arch_post_run(CPUX86State *env, struct kvm_run 
*run)
 
 int kvm_arch_process_async_events(CPUX86State *env)
 {
+X86CPU *cpu = x86_env_get_cpu(env);
+
 if (env-interrupt_request  CPU_INTERRUPT_MCE) {
 /* We must not raise CPU_INTERRUPT_MCE if it's not supported. */
 assert(env-mcg_cap);
@@ -1732,11 +1734,11 @@ int kvm_arch_process_async_events(CPUX86State *env)
 }
 if (env-interrupt_request  CPU_INTERRUPT_INIT) {
 kvm_cpu_synchronize_state(env);
-do_cpu_init(env);
+do_cpu_init(cpu);
 }
 if (env-interrupt_request  CPU_INTERRUPT_SIPI) {
 kvm_cpu_synchronize_state(env);
-do_cpu_sipi(env);
+do_cpu_sipi(cpu);
 }
 if (env-interrupt_request  CPU_INTERRUPT_TPR) {
 env-interrupt_request = ~CPU_INTERRUPT_TPR;
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] PCI: Introduce INTx check mask API

2012-06-04 Thread Benjamin Herrenschmidt
On Thu, 2012-05-24 at 09:02 -0300, Jan Kiszka wrote:
 
 Since PCI 2.3, this bit is mandatory, and it should be independent of
 the masking bit. The question is, if your device is supposed to support
 2.3, thus is just buggy, 

It's a PCI Express device :-)

 or if our detection algorithm is unreliable. It
 basically builds on the assumption that, if we can flip the mask bit,
 the feature should be present. I guess that is the best we can do. Maybe
 we can augment this with a blacklist of devices that support flipping
 without actually providing the feature.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] PCI: Introduce INTx check mask API

2012-06-04 Thread Benjamin Herrenschmidt

 Yep, that's what I'd suggest as well, add a blacklist to
 pci_intx_mask_supported() so this device returns false and we require an
 exclusive interrupt for it.  Thanks,

BTW, we should consider supporting an MSI-only option for guests as
well:

LSIs are a problem for virtualization, especially when we start
having things like expander racks with slots behind bridges etc, and
in some case it's better to support an MSI only setup rather than
not support the virtualizing the devices at all (or at least in
different partitions).

However, to do that, we either need to ensure the device can't be
coerced by SW to still assert the LSI and cause trouble. This can be
dealt with two ways I have in mind:

 - If we don't use any of those 4 interrupts lines at all (ie, we use no
LSI on the host bridge and they aren't shared with another bridge
etc...), we can just mask them out in the main PIC. On Power there's no
sharing between interrupt sources from different host bridges so that
would work for us

 - If the intermediary P2P bridge has a feature to block incoming LSIs
from children (I've heard that exists, is that standard ? I haven't
looked in the latest specs)

There's a third one:

 - If you trust the device own mask bit

 ... But this is fishy since many devices -will- have some kind of
backdoor via MMIO to bypass (or alter) the config space setting. In some
cases the driver can even completely replace the firmware inside the
device and do pretty much whatever it wants :-)

The main thing is how do we represent both in term of interface to
qemu and qemu - kernel interface wanting such a no LSI setup... not
sure about that one.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] PCI: Introduce INTx check mask API

2012-06-04 Thread Alexander Graf

On 05.06.2012, at 03:39, Benjamin Herrenschmidt wrote:

 
 Yep, that's what I'd suggest as well, add a blacklist to
 pci_intx_mask_supported() so this device returns false and we require an
 exclusive interrupt for it.  Thanks,
 
 BTW, we should consider supporting an MSI-only option for guests as
 well:
 
 LSIs are a problem for virtualization, especially when we start
 having things like expander racks with slots behind bridges etc, and
 in some case it's better to support an MSI only setup rather than
 not support the virtualizing the devices at all (or at least in
 different partitions).
 
 However, to do that, we either need to ensure the device can't be
 coerced by SW to still assert the LSI and cause trouble. This can be
 dealt with two ways I have in mind:
 
 - If we don't use any of those 4 interrupts lines at all (ie, we use no
 LSI on the host bridge and they aren't shared with another bridge
 etc...), we can just mask them out in the main PIC. On Power there's no
 sharing between interrupt sources from different host bridges so that
 would work for us
 
 - If the intermediary P2P bridge has a feature to block incoming LSIs
 from children (I've heard that exists, is that standard ? I haven't
 looked in the latest specs)
 
 There's a third one:
 
 - If you trust the device own mask bit
 
 ... But this is fishy since many devices -will- have some kind of
 backdoor via MMIO to bypass (or alter) the config space setting. In some
 cases the driver can even completely replace the firmware inside the
 device and do pretty much whatever it wants :-)
 
 The main thing is how do we represent both in term of interface to
 qemu and qemu - kernel interface wanting such a no LSI setup... not
 sure about that one.

Wouldn't the no LSI setting be a flag to the vfio configuration? So when you 
set up the device group, you say this group can only do MSI. That way the 
interface would be sysfs and QEMU wouldn't have anything to do with it :)


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PULL 00/20 1.2] kvm updates

2012-06-04 Thread Anthony Liguori

On 06/05/2012 08:52 AM, Andreas Färber wrote:

Am 04.06.2012 07:46, schrieb Anthony Liguori:

On 05/22/2012 12:37 AM, Avi Kivity wrote:

Please pull from:

git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master


Pulled.  Thanks.


This broke the ppc build. Guys, why wasn't this tested? There's only
three KVM targets to test compared to the 14 I'm struggling with...


Is build bot running against uq/master?  If it's not, maybe we should add it to 
build bot to catch this sort of thing.


Regards,

Anthony Liguori


Andreas



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PULL 00/20 1.2] kvm updates

2012-06-04 Thread Andreas Färber
Am 05.06.2012 03:58, schrieb Anthony Liguori:
 Is build bot running against uq/master?  If it's not, maybe we should
 add it to build bot to catch this sort of thing.

That's a question for Stefan and Daniel to answer.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] PCI: Introduce INTx check mask API

2012-06-04 Thread Benjamin Herrenschmidt
On Tue, 2012-06-05 at 03:44 +0200, Alexander Graf wrote:
 Wouldn't the no LSI setting be a flag to the vfio configuration? So
 when you set up the device group, you say this group can only do
 MSI. That way the interface would be sysfs and QEMU wouldn't have
 anything to do with it :)

Sure whatever ;-)

There need to be some validity checking in the kernel to see if we can
safely mask those interrupts too, etc...

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AMD KVM Pci Passthrough reports device busy

2012-06-04 Thread Alex Williamson
On Mon, 2012-06-04 at 16:11 -0500, Chris Sanders wrote:
 Hello, I've been working for several days trying to get Pci
 Passthrough to work.  So far the #kvm IRC channel has helped me with a
 few suggestions, though that hasn't yet solved the problem.  I'm
 running CentOS 6.2 and was suggested I try compiling 3.2.18 kernel
 form kernel.org.  This has changed a few of the messages but the guest
 still fails to start.
 
 Grepping for AMD-VI produces:
 # dmesg | grep AMD-Vi
 AMD-Vi: Enabling IOMMU at :00:00.2 cap 0x40
 AMD-Vi: Lazy IO/TLB flushing enabled
 
 After boot I'm running the following script
 echo unbind pci-pci bridge
 echo 1002 4383  /sys/bus/pci/drivers/pci-stub/new_id
 echo unbind pci device
 echo :03:07.0  /sys/bus/pci/drivers/ivtv/unbind
 echo  0803  /sys/bus/pci/drivers/pci-stub/new_id
 echo :03:07.0  /sys/bus/pci/devices/\:03\:07.0/driver/unbind
 echo :03:07.0  /sys/bus/pci/drivers/pci-stub/bind
 
 This is lspci -n showing my device behind the Pci-Pci bridge
 -[:00]-+-00.0
+-00.2
+-02.0-[01]--+-00.0
|\-00.1
+-09.0-[02]00.0
+-11.0
+-12.0
+-12.2
+-13.0
+-13.2
+-14.0
+-14.1
+-14.3
+-14.4-[03]07.0
+-14.5
+-15.0-[04]--
+-16.0
+-16.2
+-18.0
+-18.1
+-18.2
+-18.3
+-18.4
\-18.5
 
 My kvm command and error are:
 # /usr/libexec/qemu-kvm -m 3048 -net none -device
 virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
 file=/dev/vg_hdd/lv_sagetv,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
 -device pci-assign,host=03:07.0
 Failed to assign device (null) : Device or resource busy
 qemu-kvm: -device pci-assign,host=03:07.0: Device 'pci-assign' could
 not be initialized

I have a setup with an AMD 990FX system and a spare PVR-350 card that I
installed to reproduce.  The sad answer is that it's nearly impossible
to assign PCI devices on these systems due to the aliasing of devices
below the PCIe-to-PCI bridge (PCIe devices are much, much easier to
assign).  If you boot with amd_iommu_dump, you'll see some output like
this:

AMD-Vi:   DEV_ALIAS_RANGEdevid: 05:00.0 flags: 00 devid_to: 
00:14.4
AMD-Vi:   DEV_RANGE_END  devid: 05:1f.7

This says the devices on bus 5 (my bus 5 is equivalent to your bus 3)
are all aliased to device 14.4:

00:14.4 PCI bridge: Advanced Micro Devices [AMD] nee ATI SBx00 PCI to PCI 
Bridge (rev 40) (prog-if 01 [Subtractive decode])

What that means is that the IOMMU can't distinguish devices behind the
PCI-to-PCI bridge so all devices are grouped as an alias to device 14.4.
You would hopefully not care about this, you don't have any other
devices anyway.  Unfortunately amd_iommu pre-allocates IOMMU domains for
every device, so it's already allocated a domain for device 14.4 and
adds device 03:07.0 into it.  Unbinding 03:07.0 from the ivtv driver
detaches that devices from the domain, but when we go to assign it to a
guest we create a new domain.  Assigning 03:07.0 into that new domain
fails because the device is an alias for 00:14.4, which still has a
different domain.  One way to get around this would be to also assign
the bridge to the guest, but we don't support and actually reject
assigning bridges :(

This works a bit better on Intel VT-d systems because domains are
dynamically allocated.  Thus for streaming DMA, the domain is only
created when the driver attempts to setup a DMA transaction.  When the
driver is unbound, the domain is destroyed thus allowing us to setup a
new domain for device assignment.

If you don't mind running non-upstream code, VFIO is a re-write of
device assignment for Qemu that is aware of such alias problems and
actually works in this case.  The downside is that VFIO is strict about
multifunction devices supporting ACS to prevent peer-to-peer between
domains, so will require all of the 14.x devices to be bound to pci-stub
as well.  On my system, this includes an smbus controller, audio device,
lpc controller, and usb device.  If AMD could confirm this device
doesn't allow peer-to-peer between functions, we could relax this
requirement a bit.  VFIO kernel and qemu can be found here:

git://github.com/awilliam/linux-vfio.git (iommu-group-vfio-next-20120529)
git://github.com/awilliam/qemu-vfio.git (iommu-group-vfio)

See Documentation/vfio.txt for description.  The major difference is the
setup of drivers.  At a minimum, the device you want to assign to the
guest needs to be bound to the vfio-pci driver, much in the same way you
bind devices to pci-stub.  All other devices in the group should be
bound to pci-stub.  You can find the other devices in the group by
following the iommu_group link in sysfs, ex:

/sys/bus/pci/devices/:03:07.0/iommu_group/devices

Once you have that, simply 

Re: Has any work 3.3 kvm-kmod for rhel 6.2 kernel successfully?

2012-06-04 Thread ya su
Jan:

sorry for late response of your suggestion.

I have found the patch which produce this problem, it comes from
this one: 7850ac5420803996e2960d15b924021f28e0dffc.

I change as the following, it works fine.

diff -ur -i kvm-kmod-3.4/x86/kvm_main.c kvm-kmod-3.4-fix/x86/kvm_main.c
--- kvm-kmod-3.4/x86/kvm_main.c 2012-05-21 23:43:02.0 +0800
+++ kvm-kmod-3.4-fix/x86/kvm_main.c 2012-06-05 12:19:37.780136969 +0800
@@ -1525,8 +1525,8 @@
if (memslot  memslot-dirty_bitmap) {
unsigned long rel_gfn = gfn - memslot-base_gfn;

-   if (!test_and_set_bit_le(rel_gfn, memslot-dirty_bitmap))
-   memslot-nr_dirty_pages++;
+   __set_bit_le(rel_gfn, memslot-dirty_bitmap);
+   memslot-nr_dirty_pages++;
}
 }

~

I think the root cause maybe: the acton of clear dirty_bitmap
don't sync with that of set nr_dirty_pages=0.

but I don't realize why it works fine in new kernel.

Regards.

Suya.


2012/4/16 Jan Kiszka jan.kis...@siemens.com:
 On 2012-04-16 16:34, ya su wrote:
 I first notice 3.3 release notes, it says it can compile against
 2.6.32-40, so I think it can work with 2.6.32,  then I change it with
 rhel 2.6.32 kernel.

 The problem is that the RHEL 2.6.32 kernel has nothing to do with a
 standard 2.6.32 as too many features were ported back. So the version
 number based feature checks fail as you noticed.

 We could adapt kvm-kmod to detect that it is a RHEL kernel (there is
 surely some define), but it requires going through all the relevant
 features carefully.


 I just re-change orginal kvm-kmod 3.3 with rhel 2.6.32, only to change
 compile redefination errors, but the problem remains the same. the
 patch attached.

 I don't go through git commits, as so many changes from 2.6.32 to 3.3 in 
 kernel.

 I think the problem may come from  memory change notification.

 The approach to resolve this could be to identify backported features
 based on the build breakage or runtime anomalies, then analyze the
 kvm-kmod history for changes that wrapped those features, and finally
 adjust all affected code blocks. I'm open for patches and willing to
 support you on questions, but I can't work on this myself.

 Jan

 --
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PULL 00/20 1.2] kvm updates

2012-06-04 Thread Andreas Färber
Am 04.06.2012 07:46, schrieb Anthony Liguori:
 On 05/22/2012 12:37 AM, Avi Kivity wrote:
 Please pull from:

git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master
 
 Pulled.  Thanks.

This broke the ppc build. Guys, why wasn't this tested? There's only
three KVM targets to test compared to the 14 I'm struggling with...

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PULL 00/20 1.2] kvm updates

2012-06-04 Thread Andreas Färber
Am 05.06.2012 03:58, schrieb Anthony Liguori:
 Is build bot running against uq/master?  If it's not, maybe we should
 add it to build bot to catch this sort of thing.

That's a question for Stefan and Daniel to answer.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html