[Bug 47451] need to re-load driver in guest to make a hot-plug VF work

2012-09-28 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=47451





--- Comment #4 from Jay Ren yongjie@intel.com  2012-09-28 06:07:50 ---
(In reply to comment #3)
 (In reply to comment #2)
  (In reply to comment #1)
  Can we narrow down the kvm.git commit range at all?  The
   one provided is over 12k commits covering v3.4-rc3 to v3.5-rc6.  Thanks
  I did more testing.
  Do you remember the bug #43328 ( VT-d/SR-IOV totally doesn't work in guest)?
  Just use your fix commit for that bug, I'll meet this hot-plug issue.
  Is there a chance your patch fixed one bug but introduced another one? :)
  
  commit a76beb14123a69ca080f5a5425e28b786d62318d
  Author: Alex Williamson alex.william...@redhat.com
  Date: Mon Jul 9 10:53:22 2012 -0600
  
  KVM: Fix device assignment threaded irq handler
 
 Thanks for the narrowing it down.  It looks like perhaps that patch was
 ineffective at trying to keep us out of using IRQF_ONESHOT due to
 irq_setup_forced_threading() re-enabling it.  Does the problem go away if you
 change the two calls to request_threaded_irq() in that commit to use
 IRQF_NO_THREAD for the flag value in place of 0?

No, replacing flag value with 'IRQF_NO_THREAD' can't make PCIe NIC hot-plug
work.
Can you try with your commit a76beb14123a6 ?
BTW, sometimes, this bug is not so stable. Using '-m 512 -smp 2' option for
qemu-kvm commandline to start a RHEL6.x guest will make it very easy to
reproduce.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-28 Thread Raghavendra K T

On 09/28/2012 11:15 AM, H. Peter Anvin wrote:

On 09/27/2012 10:38 PM, Raghavendra K T wrote:

+
+bool kvm_overcommitted()
+{


This better not be C...


I think you meant I should have had like kvm_overcommitted(void) and 
(different function name perhaps)


or is it the body of function?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-09-28 Thread Raghavendra K T

On 09/28/2012 02:37 AM, Jiannan Ouyang wrote:



On Thu, Sep 27, 2012 at 4:50 AM, Avi Kivity a...@redhat.com
mailto:a...@redhat.com wrote:

On 09/25/2012 04:43 PM, Jiannan Ouyang wrote:
  I've actually implemented this preempted_bitmap idea.

Interesting, please share the code if you can.

  However, I'm doing this to expose this information to the guest,
so the
  guest is able to know if the lock holder is preempted or not before
  spining. Right now, I'm doing experiment to show that this idea
works.
 
  I'm wondering what do you guys think of the relationship between the
  pv_ticketlock approach and PLE handler approach. Are we going to
adopt
  PLE instead of the pv ticketlock, and why?

Right now we're searching for the best solution.  The tradeoffs are more
or less:

PLE:
- works for unmodified / non-Linux guests
- works for all types of spins (e.g. smp_call_function*())
- utilizes an existing hardware interface (PAUSE instruction) so likely
more robust compared to a software interface

PV:
- has more information, so it can perform better

Given these tradeoffs, if we can get PLE to work for moderate amounts of
overcommit then I'll prefer it (even if it slightly underperforms PV).
If we are unable to make it work well, then we'll have to add PV.

--
error compiling committee.c: too many arguments to function


FYI. The preempted_bitmap patch.

I delete some unrelated code in the generated patch file and seems
broken the patch file format... I hope anyone could teach me some
solutions.
However, it's pretty straight forward, four things: declaration,
initialization, set and clear. I think you guys can figure it out easily!

As Avi sugguested, you could check task state TASK_RUNNING in sched_out.

Signed-off-by: Jiannan Ouyang ouy...@cs.pitt.edu
mailto:ouy...@cs.pitt.edu

diff --git a/arch/x86/include/asm/

paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 8613cbb..4fcb648 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -73,6 +73,16 @@ struct pv_info {
 const char *name;
  };


I suppose we need this in common place since s390 also should have this,
if we are using this information in vcpu_on_spin()..



+struct pv_sched_info {
+   unsigned long   sched_bitmap;


Thinking, whether we need something similar to cpumask here?
Only thing is we are representing guest (v)cpumask.


+} __attribute__((__packed__));
+
  struct pv_init_ops {
 /*
  * Patch may replace one of the defined code sequences with
diff --git a/arch/x86/kernel/paravirt-spinlocks.c
b/arch/x86/kernel/paravirt-spinlocks.c
index 676b8c7..2242d22 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c

+struct pv_sched_info pv_sched_info = {
+.sched_bitmap = (unsigned long)-1,
+};
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 44ee712..3eb277e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -494,6 +494,11 @@ static struct kvm *kvm_create_vm(unsigned long
type)
 mutex_init(kvm-slots_lock);
 atomic_set(kvm-users_count, 1);

+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+kvm-pv_sched_info.sched_bitmap = (unsigned long)-1;
+#endif
+
 r = kvm_init_mmu_notifier(kvm);
 if (r)
 goto out_err;
@@ -2697,7 +2702,13 @@ struct kvm_vcpu
*preempt_notifier_to_vcpu(struct preempt_notifier *pn)
  static void kvm_sched_in(struct preempt_notifier *pn, int cpu)
  {
 struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);

+   set_bit(vcpu-vcpu_id, vcpu-kvm-pv_sched_info.sched_bitmap);
 kvm_arch_vcpu_load(vcpu, cpu);
  }

@@ -2705,7 +2716,13 @@ static void kvm_sched_out(struct
preempt_notifier *pn,
   struct task_struct *next)
  {
 struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);

+   clear_bit(vcpu-vcpu_id,
vcpu-kvm-pv_sched_info.sched_bitmap);
 kvm_arch_vcpu_put(vcpu);
  }


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: Disable callback in virtblk_done()

2012-09-28 Thread Rusty Russell
Asias He as...@redhat.com writes:
 I forgot about the cool hack which MST put in to defer event updates
 using disable_cb/enable_cb.

 Hmm, are you talking about virtqueue_enable_cb_delayed()?

Just the fact that virtqueue_disable_cb() prevents updates of
used_index, and then we do the update in virtqueue_enable_cb().

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: Disable callback in virtblk_done()

2012-09-28 Thread Asias He
On 09/28/2012 02:08 PM, Rusty Russell wrote:
 Asias He as...@redhat.com writes:
 I forgot about the cool hack which MST put in to defer event updates
 using disable_cb/enable_cb.

 Hmm, are you talking about virtqueue_enable_cb_delayed()?
 
 Just the fact that virtqueue_disable_cb() prevents updates of
 used_index, and then we do the update in virtqueue_enable_cb().

Okay.

-- 
Asias
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vga passthrough // questions about pci passthrough

2012-09-28 Thread Jan Kiszka
On 2012-09-27 21:18, Alex Williamson wrote:
 On Thu, 2012-09-27 at 20:43 +0200, Martin Wolf wrote:
 thank you for the information.

 i will try what you mentioned...
 do you have some additional information about rebooting a VM with a 
 passed through videocard?
 (amd / ati 7870)
 
 I don't.  Is the bsod on reboot only or does it also happen on shutdown?
 There's a slim chance it could be traced by enabling debug in the
 pci-assign driver and analyzing what the guest driver is trying to do.
 I'm hoping that q35 chipset support might resolve some issues with vga
 assignment as it exposes a topology that looks a bit more like one that
 a driver would expect on physical hardware.  Thanks,

From our attempts to get more working than what NVIDIA Quadro cards
support officially, my own experiments with q35 in this context and our
discussions with NVIDIA, I'm pretty skeptical that this chipset will
make a difference here. Most problems are due to those non-standard side
channels to configure the hardware, memory mappings etc. And getting
this working requires either cooperation of the vendor or *a lot* of
reverse engineering.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: [RFC v2 PATCH 04/21] x86: Avoid RCU warnings on slave CPUs

2012-09-28 Thread Tomoki Sekiyama
Hi Paul,

Thank you for your comments, and sorry for my late reply.

On 2012/09/21 2:34, Paul E. McKenney wrote:

 On Thu, Sep 06, 2012 at 08:27:40PM +0900, Tomoki Sekiyama wrote:
 Initialize rcu related variables to avoid warnings about RCU usage while
 slave CPUs is running specified functions. Also notify RCU subsystem before
 the slave CPU is entered into idle state.
 
 Hello, Tomoki,
 
 A few questions and comments interspersed below.
 snip
 diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
 index e8cfe377..45dfc1d 100644
 --- a/arch/x86/kernel/smpboot.c
 +++ b/arch/x86/kernel/smpboot.c
 @@ -382,6 +382,8 @@ notrace static void __cpuinit start_slave_cpu(void 
 *unused)
  f = per_cpu(slave_cpu_func, cpu);
  per_cpu(slave_cpu_func, cpu).func = NULL;

 +rcu_note_context_switch(cpu);
 +
 
 Why not use rcu_idle_enter() and rcu_idle_exit()?  These would tell
 RCU to ignore the slave CPU for the duration of its idle period.
 The way you have it, if a slave CPU stayed idle for too long, you
 would get RCU CPU stall warnings, and possibly system hangs as well. 

That's true, rcu_idle_enter() and rcu_idle_exit() should be used when
the slave cpu is idle. Thanks.

 Or is this being called from some task that is not the idle task?
 If so, you instead want the new rcu_user_enter() and rcu_user_exit()
 that are hopefully on their way into 3.7.  Or maybe better, use a real
 idle task, so that idle_task(smp_processor_id()) returns true and RCU
 stops complaining.  ;-)

 Note that CPUs that RCU believes to be idle are not permitted to contain
 RCU read-side critical sections, which in turn means no entering the
 scheduler, no sleeping, and so on.  There is an RCU_NONIDLE() macro
 to tell RCU to pay attention to the CPU only for the duration of the
 statement passed to RCU_NONIDLE, and there are also an _rcuidle variant
 of the tracing statement to allow tracing from idle. 

This was for KVM is called as `func', which contains RCU read-side critical
sections, and rcu_virt_note_context_switch() (that is
rcu_note_context_switch(cpu)) before entering guest.
Maybe it should be replaced by rcu_user_enter() and rcu_user_exit() in the
future.

 --- a/kernel/rcutree.c
 +++ b/kernel/rcutree.c
 @@ -2589,6 +2589,9 @@ static int __cpuinit rcu_cpu_notify(struc 
 tnotifier_block *self,
  switch (action) {
  case CPU_UP_PREPARE:
  case CPU_UP_PREPARE_FROZEN:
 +#ifdef CONFIG_SLAVE_CPU
 +case CPU_SLAVE_UP_PREPARE:
 +#endif
 
 Why do you need #ifdef here?  Why not define CPU_SLAVE_UP_PREPARE
 unconditionally?  Then if CONFIG_SLAVE_CPU=n, rcu_cpu_notify() would
 never be invoked with CPU_SLAVE_UP_PREPARE, so no problems. 

Agreed. That will make the code simpler.

Thank you again,
-- 
Tomoki Sekiyama tomoki.sekiyama...@hitachi.com
Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: Disable callback in virtblk_done()

2012-09-28 Thread Michael S. Tsirkin
On Thu, Sep 27, 2012 at 09:40:03AM +0930, Rusty Russell wrote:
 I forgot about the cool hack which MST put in to defer event updates
 using disable_cb/enable_cb.

I considered sticking some invalid value
in event index on disable but in my testing it did not seem to
give any gain, and knowing actual index of the other side
is better for debugging.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-28 Thread Peter Zijlstra
On Fri, 2012-09-28 at 11:08 +0530, Raghavendra K T wrote:
 
 Peter, Can I post your patch with your from/sob.. in V2?
 Please let me know.. 

Yeah I guess ;-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v4] kvm/fpu: Enable fully eager restore kvm FPU

2012-09-28 Thread Hao, Xudong
 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Thursday, September 27, 2012 6:12 PM
 To: Hao, Xudong
 Cc: kvm@vger.kernel.org; Zhang, Xiantao
 Subject: Re: [PATCH v4] kvm/fpu: Enable fully eager restore kvm FPU
 
 On 09/26/2012 07:54 AM, Hao, Xudong wrote:
  -Original Message-
  From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
  Behalf Of Avi Kivity
  Sent: Tuesday, September 25, 2012 4:16 PM
  To: Hao, Xudong
  Cc: kvm@vger.kernel.org; Zhang, Xiantao
  Subject: Re: [PATCH v4] kvm/fpu: Enable fully eager restore kvm FPU
 
  On 09/25/2012 04:32 AM, Hao, Xudong wrote:
   
btw, it is clear that long term the fpu will always be eagerly loaded,
as hosts and guests (and hardware) are updated.  At that time it will
make sense to remove the lazy fpu code entirely.  But maybe that time
 is
here already, since exits are rare and so the guest has a lot of chance
to use the fpu, so eager fpu saves the #NM vmexit.
   
Can you check a kernel compile on a westmere system?  If eager fpu is
faster there than lazy fpu, we can just make the fpu always eager and
remove quite a bit of code.
   
   I remember westmere does not support Xsave, do you want performance
 of
  fxsave/fresotr ?
 
  Yes.   If a westmere is fast enough then we can probably justify it.  If
  you can run tests on Sandy/Ivy Bridge, even better.
 
  Run kernel compile on westmere, eager fpu is about 0.4% faster, seems
 eager does not benefit it too much, so remain lazy fpu for lazy_allowed fpu
 state?
 
 Why not make it eager all the time then?  It will simplify the code
 quite a bit, no?
 
The code will simple if make it eager, I'll remove the lazy logic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] virtio: add API to query ring capacity

2012-09-28 Thread Michael S. Tsirkin
It's sometimes necessary to query ring capacity after dequeueing a
buffer. Add an API for this.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/virtio/virtio_ring.c | 19 +++
 include/linux/virtio.h   |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5aa43c3..ee3d80b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -715,4 +715,23 @@ unsigned int virtqueue_get_vring_size(struct virtqueue 
*_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
 
+/**
+ * virtqueue_get_capacity - query available ring capacity
+ * @vq: the struct virtqueue we're talking about.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted), otherwise result is unreliable.
+ *
+ * Returns remaining capacity of queue.
+ * Note that it only really makes sense to treat all
+ * return values as available: indirect buffers mean that
+ * we can put an entire sg[] array inside a single queue entry.
+ */
+unsigned int virtqueue_get_capacity(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   return vq-num_free;
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_capacity);
+
 MODULE_LICENSE(GPL);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index a1ba8bb..fab61e8 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -50,6 +50,8 @@ void *virtqueue_detach_unused_buf(struct virtqueue *vq);
 
 unsigned int virtqueue_get_vring_size(struct virtqueue *vq);
 
+unsigned int virtqueue_get_capacity(struct virtqueue *vq);
+
 /**
  * virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] virtio-net: correct capacity math on ring full

2012-09-28 Thread Michael S. Tsirkin
Capacity math on ring full is wrong: we are
looking at num_sg but that might be optimistic
because of indirect buffer use.

The implementation also penalizes fast path
with extra memory accesses for the benefit of
ring full condition handling which is slow path.

It's easy to query ring capacity so let's do just that.

This change also makes it easier to move vnet header
for tx around as follow-up patch does.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/virtio_net.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 83d2b0c..316f1be 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -95,7 +95,6 @@ struct skb_vnet_hdr {
struct virtio_net_hdr hdr;
struct virtio_net_hdr_mrg_rxbuf mhdr;
};
-   unsigned int num_sg;
 };
 
 struct padded_vnet_hdr {
@@ -557,10 +556,10 @@ again:
return received;
 }
 
-static unsigned int free_old_xmit_skbs(struct virtnet_info *vi)
+static void free_old_xmit_skbs(struct virtnet_info *vi)
 {
struct sk_buff *skb;
-   unsigned int len, tot_sgs = 0;
+   unsigned int len;
struct virtnet_stats *stats = this_cpu_ptr(vi-stats);
 
while ((skb = virtqueue_get_buf(vi-svq, len)) != NULL) {
@@ -571,16 +570,15 @@ static unsigned int free_old_xmit_skbs(struct 
virtnet_info *vi)
stats-tx_packets++;
u64_stats_update_end(stats-tx_syncp);
 
-   tot_sgs += skb_vnet_hdr(skb)-num_sg;
dev_kfree_skb_any(skb);
}
-   return tot_sgs;
 }
 
 static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
 {
struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
const unsigned char *dest = ((struct ethhdr *)skb-data)-h_dest;
+   unsigned num_sg;
 
pr_debug(%s: xmit %p %pM\n, vi-dev-name, skb, dest);
 
@@ -619,8 +617,8 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff 
*skb)
else
sg_set_buf(vi-tx_sg, hdr-hdr, sizeof hdr-hdr);
 
-   hdr-num_sg = skb_to_sgvec(skb, vi-tx_sg + 1, 0, skb-len) + 1;
-   return virtqueue_add_buf(vi-svq, vi-tx_sg, hdr-num_sg,
+   num_sg = skb_to_sgvec(skb, vi-tx_sg + 1, 0, skb-len) + 1;
+   return virtqueue_add_buf(vi-svq, vi-tx_sg, num_sg,
 0, skb, GFP_ATOMIC);
 }
 
@@ -664,7 +662,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
netif_stop_queue(dev);
if (unlikely(!virtqueue_enable_cb_delayed(vi-svq))) {
/* More just got used, free them then recheck. */
-   capacity += free_old_xmit_skbs(vi);
+   free_old_xmit_skbs(vi);
+   capacity = virtqueue_get_capacity(vi-svq);
if (capacity = 2+MAX_SKB_FRAGS) {
netif_start_queue(dev);
virtqueue_disable_cb(vi-svq);
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] virtio-net: put virtio net header inline with data

2012-09-28 Thread Michael S. Tsirkin
For small packets we can simplify xmit processing
by linearizing buffers with the header:
most packets seem to have enough head room
we can use for this purpose.
Since existing hypervisors require that header
is the first s/g element, we need a feature bit
for this.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/virtio_net.c   | 44 +++-
 include/linux/virtio_net.h |  5 -
 2 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 316f1be..6e6e53e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -67,6 +67,9 @@ struct virtnet_info {
/* Host will merge rx buffers for big packets (shake it! shake it!) */
bool mergeable_rx_bufs;
 
+   /* Host can handle any s/g split between our header and packet data */
+   bool any_header_sg;
+
/* enable config space updates */
bool config_enable;
 
@@ -576,11 +579,28 @@ static void free_old_xmit_skbs(struct virtnet_info *vi)
 
 static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
 {
-   struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
+   struct skb_vnet_hdr *hdr;
const unsigned char *dest = ((struct ethhdr *)skb-data)-h_dest;
unsigned num_sg;
+   unsigned hdr_len;
+   bool can_push;
+
 
pr_debug(%s: xmit %p %pM\n, vi-dev-name, skb, dest);
+   if (vi-mergeable_rx_bufs)
+   hdr_len = sizeof hdr-mhdr;
+   else
+   hdr_len = sizeof hdr-hdr;
+
+   can_push = vi-any_header_sg 
+   !((unsigned long)skb-data  (__alignof__(*hdr) - 1)) 
+   !skb_header_cloned(skb)  skb_headroom(skb) = hdr_len;
+   /* Even if we can, don't push here yet as this would skew
+* csum_start offset below. */
+   if (can_push)
+   hdr = (struct skb_vnet_hdr *)(skb-data - hdr_len);
+   else
+   hdr = skb_vnet_hdr(skb);
 
if (skb-ip_summed == CHECKSUM_PARTIAL) {
hdr-hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
@@ -609,15 +629,18 @@ static int xmit_skb(struct virtnet_info *vi, struct 
sk_buff *skb)
hdr-hdr.gso_size = hdr-hdr.hdr_len = 0;
}
 
-   hdr-mhdr.num_buffers = 0;
-
-   /* Encode metadata header at front. */
if (vi-mergeable_rx_bufs)
-   sg_set_buf(vi-tx_sg, hdr-mhdr, sizeof hdr-mhdr);
-   else
-   sg_set_buf(vi-tx_sg, hdr-hdr, sizeof hdr-hdr);
+   hdr-mhdr.num_buffers = 0;
 
-   num_sg = skb_to_sgvec(skb, vi-tx_sg + 1, 0, skb-len) + 1;
+   if (can_push) {
+   __skb_push(skb, hdr_len);
+   num_sg = skb_to_sgvec(skb, vi-tx_sg, 0, skb-len);
+   /* Pull header back to avoid skew in tx bytes calculations. */
+   __skb_pull(skb, hdr_len);
+   } else {
+   sg_set_buf(vi-tx_sg, hdr, hdr_len);
+   num_sg = skb_to_sgvec(skb, vi-tx_sg + 1, 0, skb-len) + 1;
+   }
return virtqueue_add_buf(vi-svq, vi-tx_sg, num_sg,
 0, skb, GFP_ATOMIC);
 }
@@ -1128,6 +1151,9 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
vi-mergeable_rx_bufs = true;
 
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_ANY_HEADER_SG))
+   vi-any_header_sg = true;
+
err = init_vqs(vi);
if (err)
goto free_stats;
@@ -1286,7 +1312,7 @@ static unsigned int features[] = {
VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
-   VIRTIO_NET_F_GUEST_ANNOUNCE,
+   VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_ANY_HEADER_SG
 };
 
 static struct virtio_driver virtio_net_driver = {
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 2470f54..16a577b 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -51,6 +51,7 @@
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20  /* Extra RX mode control support */
 #define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can announce device on the
 * network */
+#define VIRTIO_NET_F_ANY_HEADER_SG 22  /* Host can handle any header s/g */
 
 #define VIRTIO_NET_S_LINK_UP   1   /* Link is up */
 #define VIRTIO_NET_S_ANNOUNCE  2   /* Announcement is needed */
@@ -62,7 +63,9 @@ struct virtio_net_config {
__u16 status;
 } __attribute__((packed));
 
-/* This is the first element of the scatter-gather list.  If you don't
+/* This header comes first in the scatter-gather list.
+ * If VIRTIO_NET_F_ANY_HEADER_SG is not negotiated, it must
+ * be the first element of the scatter-gather list.  If you don't
  * specify GSO or CSUM features, you can simply ignore the header. */
 struct virtio_net_hdr {
 #define 

[PATCH 0/3] virtio-net: inline header support

2012-09-28 Thread Michael S. Tsirkin
Thinking about Sasha's patches, we can reduce ring usage
for virtio net small packets dramatically if we put
virtio net header inline with the data.
This can be done for free in case guest net stack allocated
extra head room for the packet, and I don't see
why would this have any downsides.

Even though with my recent patches qemu
no longer requires header to be the first s/g element,
we need a new feature bit to detect this.
A trivial qemu patch will be sent separately.

We could get rid of an extra s/g for big packets too,
but since in practice everyone enables mergeable buffers,
I don't see much of a point.

Rusty, if you decide to pick this up I'll send a
(rather trivial) spec patch shortly afterwards, but holidays
are beginning here. Considering how simple
the guest patch is, I hope it can make it in 3.7?

Also note that patch 1 and 2 are IMO a good
idea without patch 3. If you decide to defer patch 3
pls consider 1/2 separately.

Before:
[root@virtlab203 qemu]# ssh robin ./netperf/bin/netperf -t TCP_RR -H
11.0.0.4
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   10.002992.88   
16384  87380 

After:
[root@virtlab203 qemu]# ssh robin ./netperf/bin/netperf -t TCP_RR -H
11.0.0.4
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
11.0.0.4 (11.0.0.4) port 0 AF_INET : demo
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   10.003195.57   
16384  87380 

Michael S. Tsirkin (3):
  virtio: add API to query ring capacity
  virtio-net: correct capacity math on ring full
  virtio-net: put virtio net header inline with data

 drivers/net/virtio_net.c | 57 +++-
 drivers/virtio/virtio_ring.c | 19 +++
 include/linux/virtio.h   |  2 ++
 include/linux/virtio_net.h   |  5 +++-
 4 files changed, 66 insertions(+), 17 deletions(-)

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH qemu] virtio-net: add feature bit for any header s/g

2012-09-28 Thread Michael S. Tsirkin
Old qemu versions required that 1st s/g entry is the header.

My recent patchset titled virtio-net: iovec handling cleanup
removed this limitation but a feature
bit is needed so guests know it's safe to lay out
header differently.

This patch applies on top and adds such a feature bit.
virtio net header inline with the data is beneficial
for latency and small packet bandwidth.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/virtio-net.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/virtio-net.h b/hw/virtio-net.h
index 36aa463..e7187e4 100644
--- a/hw/virtio-net.h
+++ b/hw/virtio-net.h
@@ -44,6 +44,7 @@
 #define VIRTIO_NET_F_CTRL_RX18  /* Control channel RX mode support */
 #define VIRTIO_NET_F_CTRL_VLAN  19  /* Control channel VLAN filtering */
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20   /* Extra RX mode control support */
+#define VIRTIO_NET_F_ANY_HEADER_SG 22   /* Host can handle any header s/g */
 
 #define VIRTIO_NET_S_LINK_UP1   /* Link is up */
 
@@ -186,5 +187,6 @@ struct virtio_net_ctrl_mac {
 DEFINE_PROP_BIT(ctrl_vq, _state, _field, VIRTIO_NET_F_CTRL_VQ, 
true), \
 DEFINE_PROP_BIT(ctrl_rx, _state, _field, VIRTIO_NET_F_CTRL_RX, 
true), \
 DEFINE_PROP_BIT(ctrl_vlan, _state, _field, VIRTIO_NET_F_CTRL_VLAN, 
true), \
-DEFINE_PROP_BIT(ctrl_rx_extra, _state, _field, 
VIRTIO_NET_F_CTRL_RX_EXTRA, true)
+DEFINE_PROP_BIT(ctrl_rx_extra, _state, _field, 
VIRTIO_NET_F_CTRL_RX_EXTRA, true), \
+DEFINE_PROP_BIT(any_header_sg, _state, _field, 
VIRTIO_NET_F_ANY_HEADER_SG, true)
 #endif
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-28 Thread Andrew Theurer
On Fri, 2012-09-28 at 11:08 +0530, Raghavendra K T wrote:
 On 09/27/2012 05:33 PM, Avi Kivity wrote:
  On 09/27/2012 01:23 PM, Raghavendra K T wrote:
 
  This gives us a good case for tracking preemption on a per-vm basis.  As
  long as we aren't preempted, we can keep the PLE window high, and also
  return immediately from the handler without looking for candidates.
 
  1) So do you think, deferring preemption patch ( Vatsa was mentioning
  long back)  is also another thing worth trying, so we reduce the chance
  of LHP.
 
  Yes, we have to keep it in mind.  It will be useful for fine grained
  locks, not so much so coarse locks or IPIs.
 
 
 Agree.
 
  I would still of course prefer a PLE solution, but if we can't get it to
  work we can consider preemption deferral.
 
 
 Okay.
 
 
  IIRC, with defer preemption :
  we will have hook in spinlock/unlock path to measure depth of lock held,
  and shared with host scheduler (may be via MSRs now).
  Host scheduler 'prefers' not to preempt lock holding vcpu. (or rather
  give say one chance.
 
  A downside is that we have to do that even when undercommitted.

Hopefully vcpu preemption is very rare when undercommitted, so it should
not happen much at all.
 
  Also there may be a lot of false positives (deferred preemptions even
  when there is no contention).

It will be interesting to see how this behaves with a very high lock
activity in a guest.  Once the scheduler defers preemption, is it for a
fixed amount of time, or does it know to cut the deferral short as soon
as the lock depth is reduced [by x]?
 
 Yes. That is a worry.
 
 
 
  2) looking at the result (comparing A  C) , I do feel we have
  significant in iterating over vcpus (when compared to even vmexit)
  so We still would need undercommit fix sugested by PeterZ (improving by
  140%). ?
 
  Looking only at the current runqueue?  My worry is that it misses a lot
  of cases.  Maybe try the current runqueue first and then others.
 
  Or were you referring to something else?
 
 No. I was referring to the same thing.
 
 However. I had tried following also (which works well to check 
 undercommited scenario). But thinking to use only for yielding in case
 of overcommit (yield in overcommit suggested by Rik) and keep 
 undercommit patch as suggested by PeterZ
 
 [ patch is not in proper diff I suppose ].
 
 Will test them.
 
 Peter, Can I post your patch with your from/sob.. in V2?
 Please let me know..
 
 ---
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 28f00bc..9ed3759 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -1620,6 +1620,21 @@ bool kvm_vcpu_eligible_for_directed_yield(struct 
 kvm_vcpu *vcpu)
   return eligible;
   }
   #endif
 +
 +bool kvm_overcommitted()
 +{
 + unsigned long load;
 +
 + load = avenrun[0] + FIXED_1/200;
 + load = load  FSHIFT;
 + load = (load  7) / num_online_cpus();
 +
 + if (load  128)
 + return true;
 +
 + return false;
 +}
 +
   void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   {
   struct kvm *kvm = me-kvm;
 @@ -1629,6 +1644,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   int pass;
   int i;
 
 + if (!kvm_overcommitted())
 + return;
 +
   kvm_vcpu_set_in_spin_loop(me, true);
   /*
* We boost the priority of a VCPU that is runnable but not


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM

2012-09-28 Thread Marcelo Tosatti
On Fri, Sep 28, 2012 at 02:07:26AM +, Auld, Will wrote:
 Marcelo,
 
 I tagged my comments below with [auld] to make it easier to read. 
 
 Thanks,
 
 Will
 
 -Original Message-
 From: Marcelo Tosatti [mailto:mtosa...@redhat.com] 
 Sent: Thursday, September 27, 2012 4:49 AM
 To: Auld, Will
 Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao; Liu, Jinsong
 Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
 
 On Thu, Sep 27, 2012 at 08:31:22AM -0300, Marcelo Tosatti wrote:
  On Thu, Sep 27, 2012 at 12:50:16AM +, Auld, Will wrote:
   Marcelo,
   
   I think I am missing something. There should be no needed changes to 
   current algorithms that exist today. Does it seem that I have broken 
   Zachary's implementation somehow?
  
  Yes. compute_guest_tsc() function must take ia32_tsc_adjust into 
  account. guest_read_tsc (and the SVM equivalent) also.
 
 [auld] I don't see how that function is broken. 

compute_guest_tsc() should return the TSC value accordingly to what is
emulated via vcpu-arch.virtual_tsc_mult, but this can be fixed later.

 Also, must take into account VMX-SVM migration. In that case, you should 
 export IA32_TSC_ADJUST along with IA32_TSC MSR.
 
 [auld] I'll give this more thought. Two different ways to go, allow this to 
 only work on host processors with this feature or enable this for all VM 
 independent of the underlying host processor capability. In the former case 
 migrating cross architecture might be disallowed. In the later case sending 
 only IA32_TSC on migration should be enough as the delta would be accounted 
 for in tsc_offset of the control structure.

That is fine, yes, if you want to migrate across, don't expose the
feature.

 
 Which brings us back to the initial question, if there are other means to 
 provide stable TSC, why use this MSR? For example, VMWare guests have no need 
 to use this MSR (because the hypervisor provides TSC guarantees).
 
 [auld] Using this MSR simplifies the process of synchronizing the tsc for 
 each logical processor because its value does not change with the clock. How 
 do you write the same value to all the IA32_TIME_STAMP_COUNTER MSR? Well, 
 figure out what you want to write there, get all the processors to rendezvous 
 at the same time, have all logical processors complete their writes in a very 
 small amount of time. This is in contrast to deciding the offset to write and 
 then having all the logical processors write the offset. No worries about 
 rendezvous, synchronization of the writes in time and such.  
 
 Then we come back to the two questions: 
 
 - Is there anyone from Intel working on the Linux host side, where it
   makes sense to use this?
 
 [auld] I am not aware of anyone working on this for Linux.
 
 - Are you sure its worthwhile to expose this to KVM guests?
 
 [auld] At least one OS is moving to implement this that is commonly used as a 
 guest. 

OK thanks.

 
   
   Thanks,
   
   Will
   
   -Original Message-
   From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
   Sent: Wednesday, September 26, 2012 5:29 PM
   To: Auld, Will
   Cc: kvm@vger.kernel.org; Avi Kivity; Zhang, Xiantao; Liu, Jinsong
   Subject: Re: [PATCH] Enabling IA32_TSC_ADJUST for guest VM
   
   On Wed, Sep 26, 2012 at 10:58:46PM +, Auld, Will wrote:
Avi, Still working on your suggestions.

Marcelo,

The purpose is to be able to run guests that implement this change and 
not require they revert to the older method of adjusting the TSC. I am 
making no assumption about whether the guest checks to see if the times 
are good enough or just runs an algorithm every time but in any case 
this would allow the simpler, cleaner and less expensive algorithm to 
run if it exists. 
   
   Will, you can choose to not expose the feature. Correct?
   
   Because this conflicts with the model that has been envisioned and 
   developed by Zachary... for that model to continue to be functional 
   you'll have to make sure the TSC emulation is adjusted accordingly to 
   consider IA32_TSC_ADJUST (for example, when trapping TSC).
   
   From that point of view, the patch below is incomplete.
   
   ... or KVM can choose to never expose the feature via CPUID and handle 
   TSC consistency itself (i understand your perspective of getting a task 
   complete, but unfortunately from my POV its not so simple).
   
Thanks,

Will

The purpose of the IA32_TSC_ADJUST control is to make it easier for 
the operating system (host) to decrease the delta between cores to an 
acceptable value, so that applications can make use of direct RDTSC, 
correct?

Why is it necessary for the guests to make use of such interface, if 
the hypervisor could provide proper TSC?

(not against exposing it to the guests, just thinking out loud).

That is, if the purpose of the IA32_TSC_ADJUST is to provide proper 
synchronized TSC across cores, and newer 

Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-28 Thread Peter Zijlstra
On Fri, 2012-09-28 at 06:40 -0500, Andrew Theurer wrote:
 It will be interesting to see how this behaves with a very high lock
 activity in a guest.  Once the scheduler defers preemption, is it for
 a
 fixed amount of time, or does it know to cut the deferral short as
 soon
 as the lock depth is reduced [by x]? 

Since the locks live in a guest/userspace, we don't even know they're
held at all, let alone when state changes.

Also, afaik PLE simply exits the guest whenever you do a busy-wait,
there's no guarantee its due to a lock at all, we could be waiting for a
'virtual' hardware resource or whatnot.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenarios in PLE handler

2012-09-28 Thread Raghavendra K T

On 09/28/2012 05:10 PM, Andrew Theurer wrote:

On Fri, 2012-09-28 at 11:08 +0530, Raghavendra K T wrote:

On 09/27/2012 05:33 PM, Avi Kivity wrote:

On 09/27/2012 01:23 PM, Raghavendra K T wrote:



[...]


Also there may be a lot of false positives (deferred preemptions even
when there is no contention).


It will be interesting to see how this behaves with a very high lock
activity in a guest.  Once the scheduler defers preemption, is it for a
fixed amount of time, or does it know to cut the deferral short as soon
as the lock depth is reduced [by x]?


Design/protocol that Vatsa, had in mind was something like this:

- scheduler does not give a vcpu holding lock forever, it may give one
chance that would give only few ticks. In addition to giving chance,
scheduler also sets some indication that he has been given chance.

- vcpu once he release (all) the lock(s), if it had given chance,
it should clear that (ACK), and relinquish the cpu.




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vga passthrough // questions about pci passthrough

2012-09-28 Thread Martin Wolf
well my first tests with the vga rom were useless because of apparmor 
rules i guess
now i placed the vga.rom in /usr/share/qemu ... well the error is gone 
now but no changes ;)

so i added the bar parameter but it also made no difference :(

are you interested in the windows memory dump from the bsod?

another thing, after i ran some benchmarks after a fresh reboot on win7 
i wanted to measure some

values of the 7870 so i started gpu-z ( http://www.techpowerup.com/gpuz/ )
then almost immediately the vm froze
i found one log entry in one of the libvirt log files:

kvm: /build/buildd/qemu-kvm-1.2.0+noroms/exec.c:2255: register_subpage: 
Assertion `existing-mr-subpage || existing-mr == io_mem_unassigned' 
failed.

maybe you know what this is about.

thanks again for your patience and help ;)

Am 28.09.2012 10:12, schrieb Jan Kiszka:

On 2012-09-27 21:18, Alex Williamson wrote:

On Thu, 2012-09-27 at 20:43 +0200, Martin Wolf wrote:

thank you for the information.

i will try what you mentioned...
do you have some additional information about rebooting a VM with a
passed through videocard?
(amd / ati 7870)

I don't.  Is the bsod on reboot only or does it also happen on shutdown?
There's a slim chance it could be traced by enabling debug in the
pci-assign driver and analyzing what the guest driver is trying to do.
I'm hoping that q35 chipset support might resolve some issues with vga
assignment as it exposes a topology that looks a bit more like one that
a driver would expect on physical hardware.  Thanks,

 From our attempts to get more working than what NVIDIA Quadro cards
support officially, my own experiments with q35 in this context and our
discussions with NVIDIA, I'm pretty skeptical that this chipset will
make a difference here. Most problems are due to those non-standard side
channels to configure the hardware, memory mappings etc. And getting
this working requires either cooperation of the vendor or *a lot* of
reverse engineering.

Jan




--
Adiumentum GmbH
Gf. Martin Wolf
Banderbacherstraße 76
90513 Zirndorf

0911 / 9601470
mw...@adiumentum.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vga passthrough // questions about pci passthrough

2012-09-28 Thread Alex Williamson
On Fri, 2012-09-28 at 10:12 +0200, Jan Kiszka wrote:
 On 2012-09-27 21:18, Alex Williamson wrote:
  On Thu, 2012-09-27 at 20:43 +0200, Martin Wolf wrote:
  thank you for the information.
 
  i will try what you mentioned...
  do you have some additional information about rebooting a VM with a 
  passed through videocard?
  (amd / ati 7870)
  
  I don't.  Is the bsod on reboot only or does it also happen on shutdown?
  There's a slim chance it could be traced by enabling debug in the
  pci-assign driver and analyzing what the guest driver is trying to do.
  I'm hoping that q35 chipset support might resolve some issues with vga
  assignment as it exposes a topology that looks a bit more like one that
  a driver would expect on physical hardware.  Thanks,
 
 From our attempts to get more working than what NVIDIA Quadro cards
 support officially, my own experiments with q35 in this context and our
 discussions with NVIDIA, I'm pretty skeptical that this chipset will
 make a difference here. Most problems are due to those non-standard side
 channels to configure the hardware, memory mappings etc. And getting
 this working requires either cooperation of the vendor or *a lot* of
 reverse engineering.

I heard from an nvidia guy that the driver behaves differently depending
on whether it finds an upstream express port, so we're probably causing
ourselves more problems if it's trying to run in AGP mode.  There was
also a lot of FUD in Xen (maybe justified) around how the BIOS
determines the memory ranges and whether it bypasses the PCI BARs and
gets them directly.  That means some cards may require identity mapping
to work.  It seems like the very high-end cards are possibly fixing
this, but they're far more expensive than I can justify.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


resize raw images

2012-09-28 Thread Lentes, Bernd
Hi,

i'm not very experienced in KVM. I installed two VM's in a raw image. I'm 
impressed of the speed of the vm's, that's nice :-).
I have a lot of vm's running on VMWare Server 1.09, which is very old. I'd like 
to migrate them to KVM.
I'd like to migrate them to raw images, because i'm able to mount a raw image 
from the host like a partition if the VM is having problems.
I also have to create some new vm's. What is when disk space is running out ? 
My idea is to create the new vm's in raw images. Inside the vm, filesystems 
will reside in logical volumes. When disk space is running out, i resize the 
raw image using:

- qemu-img create -f raw additional.raw size
- cat additional.raw  vm.raw
- inside the vm, resize the filesystems easily with lvm tools und resize2fs.

What do you think about this idea ? Are there easier solutions ?

Thanks in advance.


Bernd

--
Bernd Lentes

Systemadministration
Institut für Entwicklungsgenetik
Gebäude 35.34 - Raum 208
HelmholtzZentrum münchen
bernd.len...@helmholtz-muenchen.de
phone: +49 89 3187 1241
fax:   +49 89 3187 2294
http://www.helmholtz-muenchen.de/idg

Wir sollten nicht den Tod fürchten, sondern
das schlechte Leben

Helmholtz Zentrum München
Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)
Ingolstädter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe
Geschäftsführer: Prof. Dr. Günther Wess und Dr. Nikolaus Blum
Registergericht: Amtsgericht München HRB 6466
USt-IdNr: DE 129521671
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vga passthrough // questions about pci passthrough

2012-09-28 Thread Jan Kiszka
On 2012-09-28 17:50, Alex Williamson wrote:
 On Fri, 2012-09-28 at 10:12 +0200, Jan Kiszka wrote:
 On 2012-09-27 21:18, Alex Williamson wrote:
 On Thu, 2012-09-27 at 20:43 +0200, Martin Wolf wrote:
 thank you for the information.

 i will try what you mentioned...
 do you have some additional information about rebooting a VM with a 
 passed through videocard?
 (amd / ati 7870)

 I don't.  Is the bsod on reboot only or does it also happen on shutdown?
 There's a slim chance it could be traced by enabling debug in the
 pci-assign driver and analyzing what the guest driver is trying to do.
 I'm hoping that q35 chipset support might resolve some issues with vga
 assignment as it exposes a topology that looks a bit more like one that
 a driver would expect on physical hardware.  Thanks,

 From our attempts to get more working than what NVIDIA Quadro cards
 support officially, my own experiments with q35 in this context and our
 discussions with NVIDIA, I'm pretty skeptical that this chipset will
 make a difference here. Most problems are due to those non-standard side
 channels to configure the hardware, memory mappings etc. And getting
 this working requires either cooperation of the vendor or *a lot* of
 reverse engineering.
 
 I heard from an nvidia guy that the driver behaves differently depending
 on whether it finds an upstream express port, so we're probably causing
 ourselves more problems if it's trying to run in AGP mode.

May be a point for the low- to mid-range cards. It does not apply to the
virtualization-ready Quadro series according to our information back then.

  There was
 also a lot of FUD in Xen (maybe justified) around how the BIOS
 determines the memory ranges and whether it bypasses the PCI BARs and
 gets them directly.  That means some cards may require identity mapping
 to work.  It seems like the very high-end cards are possibly fixing
 this, but they're far more expensive than I can justify.  Thanks,

Yes, that is what makes them virtualization ready. But they also come
with limitations. So far, you can't pass-through a primary card or use
it for early boot messages of the guest as the BIOS is not ready for
that - without identity mapping or even more.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/2] kvm: Handle undercommitted guest case in PLE handler

2012-09-28 Thread Konrad Rzeszutek Wilk
  PLE:
  - works for unmodified / non-Linux guests
  - works for all types of spins (e.g. smp_call_function*())
  - utilizes an existing hardware interface (PAUSE instruction) so likely
  more robust compared to a software interface
 
  PV:
  - has more information, so it can perform better
  
  Should we also consider that we always have an edge here for non-PLE
  machine?
 
 True.  The deployment share for these is decreasing rapidly though.  I
 hate optimizing for obsolete hardware.

Keep in mind that the patchset that Jeremy provided also cleans (remove)
parts of the pv spinlock code. It removes the various spin_lock,
spin_unlock, etc that touch paravirt code. Instead the pv code is only
in the slowpath. And if you don't compile with CONFIG_PARAVIRT_SPINLOCK
the end code is the same as it is now.

On a different subject-  I am curious whether the Haswell new locking
instructions (the transactional ones?) can be put in usage for the slow
case?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [libvirt] TSC scaling interface to management

2012-09-28 Thread Marcelo Tosatti
On Tue, Sep 25, 2012 at 11:08:58AM +0100, Daniel P. Berrange wrote:
 On Wed, Sep 12, 2012 at 12:39:39PM -0300, Marcelo Tosatti wrote:
  
  
  HW TSC scaling is a feature of AMD processors that allows a
  multiplier to be specified to the TSC frequency exposed to the guest.
  
  KVM also contains provision to trap TSC (KVM: Infrastructure for
  software and hardware based TSC rate scaling cc578287e3224d0da)
  or advance TSC frequency.
  
  This is useful when migrating to a host with different frequency and
  the guest is possibly using direct RDTSC instructions for purposes
  other than measuring cycles (that is, it previously calculated
  cycles-per-second, and uses that information which is stale after
  migration).
  
  qemu-x86: Set tsc_khz in kvm when supported (e7429073ed1a76518)
  added support for tsc_khz= option in QEMU.
  
  I am proposing the following changes so that management applications
  can work with this:
  
  1) New option for tsc_khz, which is tsc_khz=host (QEMU command line
  option). Host means that QEMU is responsible for retrieving the 
  TSC frequency of the host processor and use that.
  Management application does not have to deal with the burden.
 
 FYI, libvirt already has support for expressing a number of different
 TSC related config options, for support of Xen and VMWare's capabilities
 in this area. What we currently allow for is
 
timer name='tsc' frequency='NNN'  mode='auto|native|emulate|smpsafe'/
 
 In this context the frequency attribute provides the HZ value to
 provide to the guest.
 
   - auto == Emulate if TSC is unstable, else allow native TSC access
   - native == Always allow native TSC access
   - emulate = Always emulate TSC
   - smpsafe == Always emulate TSC, and interlock SMP

These options can be mapped into KVM if necessary (they can map to
tsc_khz=XXX or to the module options (unfortunately not per-guest ATM)).

  Therefore it appears that this tsc_khz=auto option can be specified
  only if the user specifies so (it can be a per-guest flag hidden
  in the management configuration/manual).
  
  Sending this email to gather suggestions (or objections)
  to this interface.
 
 
 Daniel
 -- 
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
 |: http://libvirt.org  -o- http://virt-manager.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Karen had the suggestion to remove the burden of choice from the user,
which we can achieve by knowing whether or not the guest is using
a paravirtual clock.

The problem is that opens a can of races: Did migration happen before or
after guest boot process enabled the paravirtual clock etc.

I suppose leaving the option to the user is fine: if you run an obscure
operating system, which does not support paravirtual clock, then it
must be dealt with specialy (its in the manual, no big deal).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html