date:20111025

Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Jan Kiszka

On 2011-10-24 19:23, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
 On 2011-10-24 18:05, Michael S. Tsirkin wrote:
 This is what I have in mind:
  - devices set PBA bit if MSI message cannot be sent due to mask (*)
  - core checksclears PBA bit on unmask, injects message if bit was set
  - devices clear PBA bit if message reason is resolved before unmask (*)

 OK, but practically, when exactly does the device clear PBA?

 Consider a network adapter that signals messages in a RX ring: If the
 corresponding vector is masked while the guest empties the ring, I
 strongly assume that the device is supposed to take back the pending bit
 in that case so that there is no interrupt inject on a later vector
 unmask operation.

 Jan

 Do you mean virtio here?

Maybe, but I'm also thinking of fully emulated devices.

 Do you expect this optimization to give
 a significant performance gain?

Hard to asses in general. But I have a silly guest here that obviously
masks MSI vectors for each event. This currently not only kicks us into
a heavy-weight exit, it also enforces serialization on qemu_global_mutex
(while we have the rest already isolated).

 
 It would also be challenging to implement this in
 a race free manner. Clearing on interrupt status read
 seems straight-forward.

With an in-kernel MSI-X MMIO handler, this race will be naturally
unavoidable as there is no more global lock shared between table/PBA
accesses and the device model. But, when using atomic bit ops, I don't
think that will cause headache.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Avi Kivity

On 10/24/2011 02:06 PM, Jan Kiszka wrote:
 On 2011-10-24 13:09, Avi Kivity wrote:
  On 10/24/2011 12:19 PM, Jan Kiszka wrote:
 
  With the new feature it may be worthwhile, but I'd like to see the whole
  thing, with numbers attached.
 
  It's not a performance issue, it's a resource limitation issue: With the
  new API we can stop worrying about user space device models consuming
  limited IRQ routes of the KVM subsystem.
 
  
  Only if those devices are in the same process (or have access to the
  vmfd).  Interrupt routing together with irqfd allows you to disaggregate
  the device model.  Instead of providing a competing implementation with
  new limitations, we need to remove the limitations of the old
  implementation.

 That depends on where we do the cut. Currently we let the IRQ source
 signal an abstract edge on a pre-allocated pseudo IRQ line. But we
 cannot build correct MSI-X on top of the current irqfd model as we lack
 the level information (for PBA emulation). *) So we either need to
 extend the existing model anyway -- or push per-vector masking back to
 the IRQ source. In the latter case, it would be a very good chance to
 give up on limited pseudo GSIs with static routes and do MSI messaging
 from external IRQ sources to KVM directly.

Good point.


 But all those considerations affect different APIs than what I'm
 proposing here. We will always need a way to inject MSIs in the context
 of the VM as there will always be scenarios where devices are better run
 in that very same context, for performance or simplicity or whatever
 reasons. E.g., I could imagine that one would like to execute an
 emulated IRQ remapper rather in the hypervisor context than
 over-microkernelized in a separate process.

Right.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[BUG] qemu-kvm: memory_region_add_subregion_common: Assertion `!subregion-parent' failed.

2011-10-25 Thread Lai Jiangshan

This bug is triggered for my Windows XP guest, but not for my linux guests.

The gdb result shows that a vga.vram memoryregion is added twice.

libvirt log ---

2011-10-25 16:18:58.117: starting up
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none 
/home/laijs/bin/qemu.sh -S -M pc-0.13 -enable-kvm -m 256 -smp 
1,sockets=1,cores=1,threads=1 -name Windows -uuid 
3b0f4f29-1ca1-c199-a080-3ccac8f745a9 -nodefconfig -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/Windows.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime 
-no-shutdown -drive 
file=/home/laijs/Windows/windows.img,if=none,id=drive-ide0-0-0,format=qcow2,cache=none
 -device 
ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive 
file=/home/laijs/OFFICE2003_STD_EN.ISO,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
 -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev 
tap,fd=17,id=hostnet0 -device 
pcnet,netdev=hostnet0,id=net0,mac=52:54:00:a5:d2:ea,bus=pci.0,multifunction=on,addr=0x3.0x0
 -usb -vnc 127.0.0.1:0 -vga std -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0
x5.0x0
Domain id=45 is tainted: high-privileges
qemu-system-x86_64: /home/laijs/work/qemu-kvm/memory.c:1083: 
memory_region_add_subregion_common: Assertion `!subregion-parent' failed.
2011-10-25 16:18:58.380: shutting down



-git bisect result





b195043003d90ea4027ea01cc7a6c974ac915108 is the first bad commit
commit b195043003d90ea4027ea01cc7a6c974ac915108
Author: Avi Kivity a...@redhat.com
Date:   Mon Aug 8 16:08:57 2011 +0300

vga: convert vga and its derivatives to the memory API

Convert all vga memory to the memory API.  Note we need to fall back to
get_system_memory(), since the various buses don't pass the vga window
as a memory region.

We no longer need to sync the dirty bitmap of the cirrus mapped memory
banks, since the memory API takes care of that for us.

[jan: fix vga-pci logging]

Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Anthony Liguori aligu...@us.ibm.com

:04 04 e8faf8e539a4ec814ef212ce97040046363a67f3 
917701ddf6a42f49204a5b2adfa9654c03c0d6f6 M  hw

---gdb result(b19504300)


(gdb) bt
#0  0x0042995a in memory_region_add_subregion_common (mr=0x2a799a0, 
offset=4026531840, 
subregion=0x2d17250) at /home/laijs/work/qemu-kvm/memory.c:1083
#1  0x00590408 in pci_update_mappings (d=0x2d16f40)
at /home/laijs/work/qemu-kvm/hw/pci.c:1123
#2  0x00590846 in pci_default_write_config (d=0x2d16f40, addr=4, 
val=value optimized out, l=value optimized out) at 
/home/laijs/work/qemu-kvm/hw/pci.c:1173
#3  0x00427616 in kvm_handle_io (env=0x2bdf500) at 
/home/laijs/work/qemu-kvm/kvm-all.c:837
#4  kvm_cpu_exec (env=0x2bdf500) at /home/laijs/work/qemu-kvm/kvm-all.c:976
#5  0x0040c308 in cpu_exec_all () at 
/home/laijs/work/qemu-kvm/cpus.c:1102
#6  0x00587281 in main_loop (argc=value optimized out, argv=value 
optimized out, 
envp=value optimized out) at /home/laijs/work/qemu-kvm/vl.c:1392
#7  main (argc=value optimized out, argv=value optimized out, envp=value 
optimized out)
at /home/laijs/work/qemu-kvm/vl.c:3356
(gdb) p mr-name
$1 = 0x2a79a70 system
(gdb) p subregion-name
$2 = 0x2c02960 vga.vram
(gdb) p *subregion
$3 = {ops = 0x0, opaque = 0x0, parent = 0x2a799a0, size = 8388608, addr = 
3758096384, offset = 0, 
  backend_registered = true, ram_addr = 268697600, iorange = {ops = 0x0, base = 
0, len = 0}, 
  terminates = true, alias = 0x0, alias_offset = 0, priority = 1, may_overlap = 
true, subregions = 
{tqh_first = 0x0, tqh_last = 0x2d172c8}, subregions_link = {tqe_next = 
0x2b7d610, tqe_prev = 
0x2b7d788}, coalesced = {tqh_first = 0x0, tqh_last = 0x2d172e8}, name = 
0x2c02960 vga.vram, 
  dirty_log_mask = 1 '\001', ioeventfd_nb = 0, ioeventfds = 0x0}
(gdb) p subregion-parent
$4 = (MemoryRegion *) 0x2a799a0
(gdb) p *subregion-parent
$5 = {ops = 0x0, opaque = 0x0, parent = 0x0, size = 9223372036854775807, addr = 
0, offset = 0, 
  backend_registered = false, ram_addr = 0, iorange = {ops = 0x0, base = 0, len 
= 0}, terminates = 
false, alias = 0x0, alias_offset = 0, priority = 0, may_overlap = false, 
subregions = {
tqh_first = 0x2acc120, tqh_last = 0x2b4d168}, subregions_link = {tqe_next = 
0x0, tqe_prev = 
0x0}, coalesced = {tqh_first = 0x0, tqh_last = 0x2a79a38}, name = 0x2a79a70 
system, 
  dirty_log_mask = 0 '\000', ioeventfd_nb = 0, ioeventfds = 0x0}

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1 V6 resent ] qemu-kvm: fix improper nmi emulation

2011-10-25 Thread Lai Jiangshan

Previous discussions:
 
  Which approach you prefer to?
  I need to know the result before wasting too much time to respin
  the approach.
  
  Yes, sorry about the slow and sometimes conflicting feedback.
  
  1) Fix KVM_NMI emulation approach  (which is v3 patchset)
- It directly fixes the problem and matches the
  real hard ware more, but it changes KVM_NMI bahavior.
- Require both kernel-site and userspace-site fix.
 
  2) Get the LAPIC state from kernel irqchip, and inject NMI if it is 
  allowed
 (which is v4 patchset)
- Simple, don't changes any kernel behavior.
- Only need the userspace-site fix
 
  3) Add KVM_SET_LINT1 approach (which is v5 patchset)
- don't changes the kernel's KVM_NMI behavior.
- much complex
- Require both kernel-site and userspace-site fix.
- userspace-site should also handle the !KVM_SET_LINT1
  condition, it uses all the 2) approach' code. it means
  this approach equals the 2) approach + KVM_SET_LINT1 ioctl.
 
  This is an urgent bug of us, we need to settle it down soo
  
  While (1) is simple, it overloads a single ioctl with two meanings,
  that's not so good.
  
  Whether we do (1) or (3), we need (2) as well, for older kernels.
  
  So I recommend first focusing on (2) and merging it, then doing (3).
  
  (note an additional issue with 3 is whether to make it a vm or vcpu
  ioctl - we've been assuming vcpu ioctl but it's not necessarily the best
  choice).
  
It is the 2) approach.
It only changes the user space site, the kernel site is not touched.
It is changed from previous v4 patch, fixed problems found by Jan.
end previous discussions


From: Lai Jiangshan la...@cn.fujitsu.com


Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
button event happens. This doesn't properly emulate real hardware on
which NMI button event triggers LINT1. Because of this, NMI is sent to
the processor even when LINT1 is maskied in LVT. For example, this
causes the problem that kdump initiated by NMI sometimes doesn't work
on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

With this patch, inject-nmi request is handled as follows.

- When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
  interrupt.
- When in-kernel irqchip is enabled, get the in-kernel LAPIC states
  and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
  delivering the NMI directly. (Suggested by Jan Kiszka)

Changed from old version:
  re-implement it by the Jan's suggestion.
  fix the race found by Jan.

Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com
Reported-by: Kenji Kaneshige kaneshige.ke...@jp.fujitsu.com
Acked-by: Avi Kivity a...@redhat.com
Acked-by: Jan Kiszka jan.kis...@web.de
---
 hw/apic.c |   33 +
 hw/apic.h |1 +
 monitor.c |6 +-
 3 files changed, 39 insertions(+), 1 deletions(-)
diff --git a/hw/apic.c b/hw/apic.c
index 69d6ac5..922796a 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -205,6 +205,39 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
 }
 }
 
+static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
+
+static void kvm_irqchip_deliver_nmi(void *p)
+{
+APICState *s = p;
+struct kvm_lapic_state klapic;
+uint32_t lvt;
+
+kvm_get_lapic(s-cpu_env, klapic);
+lvt = kapic_reg(klapic, 0x32 + APIC_LVT_LINT1);
+
+if (lvt  APIC_LVT_MASKED) {
+return;
+}
+
+if (((lvt  8)  7) != APIC_DM_NMI) {
+return;
+}
+
+kvm_vcpu_ioctl(s-cpu_env, KVM_NMI);
+}
+
+void apic_deliver_nmi(DeviceState *d)
+{
+APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
+
+if (kvm_irqchip_in_kernel()) {
+run_on_cpu(s-cpu_env, kvm_irqchip_deliver_nmi, s);
+} else {
+apic_local_deliver(s, APIC_LVT_LINT1);
+}
+}
+
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
 int __i, __j, __mask;\
diff --git a/hw/apic.h b/hw/apic.h
index c857d52..3a4be0a 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
  uint8_t trigger_mode);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
+void apic_deliver_nmi(DeviceState *d);
 int apic_get_interrupt(DeviceState *s);
 void apic_reset_irq_delivered(void);
 int apic_get_irq_delivered(void);
diff --git a/monitor.c b/monitor.c
index cb485bf..0b81f17 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict 
*qdict, QObject **ret_data)
 CPUState *env;
 
 for (env = first_cpu; env != NULL; env = env-next_cpu) {
-cpu_interrupt(env, CPU_INTERRUPT_NMI);
+if (!env-apic_state) {
+cpu_interrupt(env, CPU_INTERRUPT_NMI);
+} else {
+apic_deliver_nmi(env-apic_state);
+}
 }
 
 return 0;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the

Re: [BUG] qemu-kvm: memory_region_add_subregion_common: Assertion `!subregion-parent' failed.

2011-10-25 Thread Jan Kiszka

On 2011-10-25 11:42, Lai Jiangshan wrote:
 This bug is triggered for my Windows XP guest, but not for my linux guests.
 
 The gdb result shows that a vga.vram memoryregion is added twice.
 
 libvirt log ---
 
 2011-10-25 16:18:58.117: starting up
 LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none 
 /home/laijs/bin/qemu.sh -S -M pc-0.13 -enable-kvm -m 256 -smp 
 1,sockets=1,cores=1,threads=1 -name Windows -uuid 
 3b0f4f29-1ca1-c199-a080-3ccac8f745a9 -nodefconfig -nodefaults -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/Windows.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime 
 -no-shutdown -drive 
 file=/home/laijs/Windows/windows.img,if=none,id=drive-ide0-0-0,format=qcow2,cache=none
  -device 
 ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 
 -drive 
 file=/home/laijs/OFFICE2003_STD_EN.ISO,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
  -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev 
 tap,fd=17,id=hostnet0 -device pcnet
  
 ,netdev=hostnet0,id=net0,mac=52:54:00:a5:d2:ea,bus=pci.0,multifunction=on,addr=0x3.0x0
  -usb -vnc 127.0.0.1:0 -vga std -device 
 virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0
 x5.0x0
 Domain id=45 is tainted: high-privileges
 qemu-system-x86_64: /home/laijs/work/qemu-kvm/memory.c:1083: 
 memory_region_add_subregion_common: Assertion `!subregion-parent' failed.
 2011-10-25 16:18:58.380: shutting down
 
 
 
 -git bisect result
 
 
 
 
 
 b195043003d90ea4027ea01cc7a6c974ac915108 is the first bad commit
 commit b195043003d90ea4027ea01cc7a6c974ac915108
 Author: Avi Kivity a...@redhat.com
 Date:   Mon Aug 8 16:08:57 2011 +0300
 
 vga: convert vga and its derivatives to the memory API
 
 Convert all vga memory to the memory API.  Note we need to fall back to
 get_system_memory(), since the various buses don't pass the vga window
 as a memory region.
 
 We no longer need to sync the dirty bitmap of the cirrus mapped memory
 banks, since the memory API takes care of that for us.
 
 [jan: fix vga-pci logging]
 
 Reviewed-by: Richard Henderson r...@twiddle.net
 Signed-off-by: Avi Kivity a...@redhat.com
 Signed-off-by: Anthony Liguori aligu...@us.ibm.com
 
 :04 04 e8faf8e539a4ec814ef212ce97040046363a67f3 
 917701ddf6a42f49204a5b2adfa9654c03c0d6f6 Mhw
 
 ---gdb result(b19504300)
 
 
 (gdb) bt
 #0  0x0042995a in memory_region_add_subregion_common (mr=0x2a799a0, 
 offset=4026531840, 
 subregion=0x2d17250) at /home/laijs/work/qemu-kvm/memory.c:1083
 #1  0x00590408 in pci_update_mappings (d=0x2d16f40)
 at /home/laijs/work/qemu-kvm/hw/pci.c:1123
 #2  0x00590846 in pci_default_write_config (d=0x2d16f40, addr=4, 
 val=value optimized out, l=value optimized out) at 
 /home/laijs/work/qemu-kvm/hw/pci.c:1173
 #3  0x00427616 in kvm_handle_io (env=0x2bdf500) at 
 /home/laijs/work/qemu-kvm/kvm-all.c:837
 #4  kvm_cpu_exec (env=0x2bdf500) at /home/laijs/work/qemu-kvm/kvm-all.c:976
 #5  0x0040c308 in cpu_exec_all () at 
 /home/laijs/work/qemu-kvm/cpus.c:1102
 #6  0x00587281 in main_loop (argc=value optimized out, argv=value 
 optimized out, 
 envp=value optimized out) at /home/laijs/work/qemu-kvm/vl.c:1392
 #7  main (argc=value optimized out, argv=value optimized out, envp=value 
 optimized out)
 at /home/laijs/work/qemu-kvm/vl.c:3356
 (gdb) p mr-name
 $1 = 0x2a79a70 system
 (gdb) p subregion-name
 $2 = 0x2c02960 vga.vram
 (gdb) p *subregion
 $3 = {ops = 0x0, opaque = 0x0, parent = 0x2a799a0, size = 8388608, addr = 
 3758096384, offset = 0, 
   backend_registered = true, ram_addr = 268697600, iorange = {ops = 0x0, base 
 = 0, len = 0}, 
   terminates = true, alias = 0x0, alias_offset = 0, priority = 1, may_overlap 
 = true, subregions = 
 {tqh_first = 0x0, tqh_last = 0x2d172c8}, subregions_link = {tqe_next = 
 0x2b7d610, tqe_prev = 
 0x2b7d788}, coalesced = {tqh_first = 0x0, tqh_last = 0x2d172e8}, name = 
 0x2c02960 vga.vram, 
   dirty_log_mask = 1 '\001', ioeventfd_nb = 0, ioeventfds = 0x0}
 (gdb) p subregion-parent
 $4 = (MemoryRegion *) 0x2a799a0
 (gdb) p *subregion-parent
 $5 = {ops = 0x0, opaque = 0x0, parent = 0x0, size = 9223372036854775807, addr 
 = 0, offset = 0, 
   backend_registered = false, ram_addr = 0, iorange = {ops = 0x0, base = 0, 
 len = 0}, terminates = 
 false, alias = 0x0, alias_offset = 0, priority = 0, may_overlap = false, 
 subregions = {
 tqh_first = 0x2acc120, tqh_last = 0x2b4d168}, subregions_link = {tqe_next 
 = 0x0, tqe_prev = 
 0x0}, coalesced = {tqh_first = 0x0, tqh_last = 0x2a79a38}, name = 
 0x2a79a70 system, 
   dirty_log_mask = 0 '\000', ioeventfd_nb = 0, ioeventfds = 0x0}
 

That's likely vga_init_vbe vs. pci_register_bar with vga.vram. Either
the PCI bar or the VBE mapping should be wrapped by an alias region

[PATCH] kvm tools: Add method to stop ipc thread

2011-10-25 Thread Sasha Levin

Stop the ipc thread when shutting down the hypervisor.

This solves a bug where the .sock files weren't removed upon shutdown.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/include/kvm/kvm-ipc.h |1 +
 tools/kvm/kvm-ipc.c |   33 +
 tools/kvm/kvm.c |1 +
 3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/tools/kvm/include/kvm/kvm-ipc.h b/tools/kvm/include/kvm/kvm-ipc.h
index c932052..731767f 100644
--- a/tools/kvm/include/kvm/kvm-ipc.h
+++ b/tools/kvm/include/kvm/kvm-ipc.h
@@ -22,5 +22,6 @@ enum {
 int kvm_ipc__register_handler(u32 type, void (*cb)(int fd, u32 type, u32 len, 
u8 *msg));
 int kvm_ipc__handle(int fd, struct kvm_ipc_msg *msg);
 int kvm_ipc__start(int sock);
+int kvm_ipc__stop(void);
 
 #endif
diff --git a/tools/kvm/kvm-ipc.c b/tools/kvm/kvm-ipc.c
index f05e926..65209e1 100644
--- a/tools/kvm/kvm-ipc.c
+++ b/tools/kvm/kvm-ipc.c
@@ -7,12 +7,14 @@
 #include sys/un.h
 #include sys/types.h
 #include sys/socket.h
+#include sys/eventfd.h
 
 #define KVM_IPC_MAX_MSGS 16
 
 static void (*msgs[KVM_IPC_MAX_MSGS])(int fd, u32 type, u32 len, u8 *msg);
 static DECLARE_RWSEM(msgs_rwlock);
-static int epoll_fd, server_fd;
+static int epoll_fd, server_fd, stop_fd;
+static pthread_t thread;
 
 int kvm_ipc__register_handler(u32 type, void (*cb)(int fd, u32 type, u32 len, 
u8 *msg))
 {
@@ -109,8 +111,11 @@ static void *kvm_ipc__thread(void *param)
nfds = epoll_wait(epoll_fd, event, 1, -1);
if (nfds  0) {
int fd = event.data.fd;
-
-   if (fd == server_fd) {
+   printf(bleh\n);
+   if (fd == stop_fd) {
+   printf(Got stop signal\n);
+   break;
+   } else if (fd == server_fd) {
int client;
 
client = kvm_ipc__new_conn(fd);
@@ -128,7 +133,6 @@ static void *kvm_ipc__thread(void *param)
 
 int kvm_ipc__start(int sock)
 {
-   pthread_t thread;
struct epoll_event ev;
 
server_fd = sock;
@@ -140,8 +144,29 @@ int kvm_ipc__start(int sock)
if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, sock, ev)  0)
die(Failed starting IPC thread);
 
+   stop_fd = eventfd(0, 0);
+   ev.events = EPOLLIN | EPOLLOUT | EPOLLRDHUP | EPOLLET;
+   ev.data.fd = stop_fd;
+   if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, stop_fd, ev)  0)
+   die(Failed adding stop event to epoll);
+
if (pthread_create(thread, NULL, kvm_ipc__thread, NULL) != 0)
die(Failed starting IPC thread);
 
return 0;
 }
+
+int kvm_ipc__stop(void)
+{
+   u64 val = 1;
+   int ret;
+
+   ret = write(stop_fd, val, sizeof(val));
+   if (ret  0)
+   return ret;
+
+   close(server_fd);
+   close(epoll_fd);
+
+   return ret;
+}
diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 40ae6a5..8d6b5e1 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -237,6 +237,7 @@ void kvm__delete(struct kvm *kvm)
kvm__stop_timer(kvm);
 
munmap(kvm-ram_start, kvm-ram_size);
+   kvm_ipc__stop();
kvm__remove_socket(kvm-name);
free(kvm);
 }
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: Add method to stop ipc thread

2011-10-25 Thread Pekka Enberg

On Tue, Oct 25, 2011 at 1:40 PM, Sasha Levin levinsasha...@gmail.com wrote:
 Stop the ipc thread when shutting down the hypervisor.

 This solves a bug where the .sock files weren't removed upon shutdown.

 Signed-off-by: Sasha Levin levinsasha...@gmail.com
 ---
  tools/kvm/include/kvm/kvm-ipc.h |    1 +
  tools/kvm/kvm-ipc.c             |   33 +
  tools/kvm/kvm.c                 |    1 +
  3 files changed, 31 insertions(+), 4 deletions(-)

 diff --git a/tools/kvm/include/kvm/kvm-ipc.h b/tools/kvm/include/kvm/kvm-ipc.h
 index c932052..731767f 100644
 --- a/tools/kvm/include/kvm/kvm-ipc.h
 +++ b/tools/kvm/include/kvm/kvm-ipc.h
 @@ -22,5 +22,6 @@ enum {
  int kvm_ipc__register_handler(u32 type, void (*cb)(int fd, u32 type, u32 
 len, u8 *msg));
  int kvm_ipc__handle(int fd, struct kvm_ipc_msg *msg);
  int kvm_ipc__start(int sock);
 +int kvm_ipc__stop(void);

  #endif
 diff --git a/tools/kvm/kvm-ipc.c b/tools/kvm/kvm-ipc.c
 index f05e926..65209e1 100644
 --- a/tools/kvm/kvm-ipc.c
 +++ b/tools/kvm/kvm-ipc.c
 @@ -7,12 +7,14 @@
  #include sys/un.h
  #include sys/types.h
  #include sys/socket.h
 +#include sys/eventfd.h

  #define KVM_IPC_MAX_MSGS 16

  static void (*msgs[KVM_IPC_MAX_MSGS])(int fd, u32 type, u32 len, u8 *msg);
  static DECLARE_RWSEM(msgs_rwlock);
 -static int epoll_fd, server_fd;
 +static int epoll_fd, server_fd, stop_fd;
 +static pthread_t thread;

  int kvm_ipc__register_handler(u32 type, void (*cb)(int fd, u32 type, u32 
 len, u8 *msg))
  {
 @@ -109,8 +111,11 @@ static void *kvm_ipc__thread(void *param)
                nfds = epoll_wait(epoll_fd, event, 1, -1);
                if (nfds  0) {
                        int fd = event.data.fd;
 -
 -                       if (fd == server_fd) {
 +                       printf(bleh\n);
 +                       if (fd == stop_fd) {
 +                               printf(Got stop signal\n);
 +                               break;

LOL. Can I have a version of the patch without those printf calls? :-)

 +                       } else if (fd == server_fd) {
                                int client;

                                client = kvm_ipc__new_conn(fd);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ANNOUNCE] kvm-kmod-3.1

2011-10-25 Thread Jan Kiszka

This is the kvm-kmod compat wrapper release based on KVM modules of
Linux 3.1 (see [1] for background information). The package is available
for download from

http://sourceforge.net/projects/kvm/files/kvm-kmod/3.1/kvm-kmod-3.1.tar.bz2/download

KVM changes since kvm-kmod-3.0b:
 - initial version of VMX nesting
 - steal time support (inform guest about the time it was scheduled out)
 - faster MMIO
 - additional CPU features exposed to the guest, most importantly SMEP

kvm-kmod changes:
 - fix for cross-building kvm-kmod
 - Drop unused PROCESSOR variable from config.mak


As kernel.org is still out of reach for me, I'm hosting the kvm-kmod git
tree now on my server:

git://git.kiszka.org/kvm-kmod.git
http://git.kiszka.org/kvm-kmod.git

This will likely remain the primary repository because it's also not
suffering from the excessively long mirroring latencies kernel.org had
to cope with. Those latencies made buildbot runs that require public git
access really unhandy.

Jan

[1] http://www.linux-kvm.org/page/Getting_the_kvm_kernel_modules
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: Add method to stop ipc thread

2011-10-25 Thread Osier Yang

于 2011年10月25日 18:40, Sasha Levin 写道:
 Stop the ipc thread when shutting down the hypervisor.

 This solves a bug where the .sock files weren't removed upon shutdown.

I tested the patch, the socket is still there. (I'm testing with
--sdl, and
shutdown the vm window). And the guest starting succeeded even if
a socket with the name same as the specified guest name exists, which
means I can start two guests with same socket, which is not right.


 Signed-off-by: Sasha Levin levinsasha...@gmail.com
 ---
  tools/kvm/include/kvm/kvm-ipc.h |1 +
  tools/kvm/kvm-ipc.c |   33 +
  tools/kvm/kvm.c |1 +
  3 files changed, 31 insertions(+), 4 deletions(-)

 diff --git a/tools/kvm/include/kvm/kvm-ipc.h b/tools/kvm/include/kvm/kvm-ipc.h
 index c932052..731767f 100644
 --- a/tools/kvm/include/kvm/kvm-ipc.h
 +++ b/tools/kvm/include/kvm/kvm-ipc.h
 @@ -22,5 +22,6 @@ enum {
  int kvm_ipc__register_handler(u32 type, void (*cb)(int fd, u32 type, u32 
 len, u8 *msg));
  int kvm_ipc__handle(int fd, struct kvm_ipc_msg *msg);
  int kvm_ipc__start(int sock);
 +int kvm_ipc__stop(void);
  
  #endif
 diff --git a/tools/kvm/kvm-ipc.c b/tools/kvm/kvm-ipc.c
 index f05e926..65209e1 100644
 --- a/tools/kvm/kvm-ipc.c
 +++ b/tools/kvm/kvm-ipc.c
 @@ -7,12 +7,14 @@
  #include sys/un.h
  #include sys/types.h
  #include sys/socket.h
 +#include sys/eventfd.h
  
  #define KVM_IPC_MAX_MSGS 16
  
  static void (*msgs[KVM_IPC_MAX_MSGS])(int fd, u32 type, u32 len, u8 *msg);
  static DECLARE_RWSEM(msgs_rwlock);
 -static int epoll_fd, server_fd;
 +static int epoll_fd, server_fd, stop_fd;
 +static pthread_t thread;
  
  int kvm_ipc__register_handler(u32 type, void (*cb)(int fd, u32 type, u32 
 len, u8 *msg))
  {
 @@ -109,8 +111,11 @@ static void *kvm_ipc__thread(void *param)
   nfds = epoll_wait(epoll_fd, event, 1, -1);
   if (nfds  0) {
   int fd = event.data.fd;
 -
 - if (fd == server_fd) {
 + printf(bleh\n);
 + if (fd == stop_fd) {
 + printf(Got stop signal\n);
 + break;
 + } else if (fd == server_fd) {
   int client;
  
   client = kvm_ipc__new_conn(fd);
 @@ -128,7 +133,6 @@ static void *kvm_ipc__thread(void *param)
  
  int kvm_ipc__start(int sock)
  {
 - pthread_t thread;
   struct epoll_event ev;
  
   server_fd = sock;
 @@ -140,8 +144,29 @@ int kvm_ipc__start(int sock)
   if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, sock, ev)  0)
   die(Failed starting IPC thread);
  
 + stop_fd = eventfd(0, 0);
 + ev.events = EPOLLIN | EPOLLOUT | EPOLLRDHUP | EPOLLET;
 + ev.data.fd = stop_fd;
 + if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, stop_fd, ev)  0)
 + die(Failed adding stop event to epoll);
 +
   if (pthread_create(thread, NULL, kvm_ipc__thread, NULL) != 0)
   die(Failed starting IPC thread);
  
   return 0;
  }
 +
 +int kvm_ipc__stop(void)
 +{
 + u64 val = 1;
 + int ret;
 +
 + ret = write(stop_fd, val, sizeof(val));
 + if (ret  0)
 + return ret;
 +
 + close(server_fd);
 + close(epoll_fd);
 +
 + return ret;
 +}
 diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
 index 40ae6a5..8d6b5e1 100644
 --- a/tools/kvm/kvm.c
 +++ b/tools/kvm/kvm.c
 @@ -237,6 +237,7 @@ void kvm__delete(struct kvm *kvm)
   kvm__stop_timer(kvm);
  
   munmap(kvm-ram_start, kvm-ram_size);
 + kvm_ipc__stop();
   kvm__remove_socket(kvm-name);
   free(kvm);
  }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Michael S. Tsirkin

On Tue, Oct 25, 2011 at 09:24:17AM +0200, Jan Kiszka wrote:
 On 2011-10-24 19:23, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
  On 2011-10-24 18:05, Michael S. Tsirkin wrote:
  This is what I have in mind:
   - devices set PBA bit if MSI message cannot be sent due to mask (*)
   - core checksclears PBA bit on unmask, injects message if bit was set
   - devices clear PBA bit if message reason is resolved before unmask (*)
 
  OK, but practically, when exactly does the device clear PBA?
 
  Consider a network adapter that signals messages in a RX ring: If the
  corresponding vector is masked while the guest empties the ring, I
  strongly assume that the device is supposed to take back the pending bit
  in that case so that there is no interrupt inject on a later vector
  unmask operation.
 
  Jan
 
  Do you mean virtio here?
 
 Maybe, but I'm also thinking of fully emulated devices.

One thing seems certain: actual, assigned devices don't
have this fake msi-x level so they don't notify host
when that changes.

  Do you expect this optimization to give
  a significant performance gain?
 
 Hard to asses in general. But I have a silly guest here that obviously
 masks MSI vectors for each event. This currently not only kicks us into
 a heavy-weight exit, it also enforces serialization on qemu_global_mutex
 (while we have the rest already isolated).

It easy to see how MSIX mask support in kernel would help.
Not sure whether it's worth it to also add special APIs to
reduce the number of spurious interrupts for such silly guests.

  
  It would also be challenging to implement this in
  a race free manner. Clearing on interrupt status read
  seems straight-forward.
 
 With an in-kernel MSI-X MMIO handler, this race will be naturally
 unavoidable as there is no more global lock shared between table/PBA
 accesses and the device model. But, when using atomic bit ops, I don't
 think that will cause headache.
 
 Jan

This is not the race I meant.  The challenge is for the device to
determine that it can clear the PBA.  atomic accesses on PBA won't help
here I think.


 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: Add method to stop ipc thread

2011-10-25 Thread Sasha Levin

On Tue, 2011-10-25 at 13:42 +0300, Pekka Enberg wrote:
 On Tue, Oct 25, 2011 at 1:40 PM, Sasha Levin levinsasha...@gmail.com wrote:
  Stop the ipc thread when shutting down the hypervisor.
 
  This solves a bug where the .sock files weren't removed upon shutdown.
 
  Signed-off-by: Sasha Levin levinsasha...@gmail.com
  ---
   tools/kvm/include/kvm/kvm-ipc.h |1 +
   tools/kvm/kvm-ipc.c |   33 +
   tools/kvm/kvm.c |1 +
   3 files changed, 31 insertions(+), 4 deletions(-)
 
  diff --git a/tools/kvm/include/kvm/kvm-ipc.h 
  b/tools/kvm/include/kvm/kvm-ipc.h
  index c932052..731767f 100644
  --- a/tools/kvm/include/kvm/kvm-ipc.h
  +++ b/tools/kvm/include/kvm/kvm-ipc.h
  @@ -22,5 +22,6 @@ enum {
   int kvm_ipc__register_handler(u32 type, void (*cb)(int fd, u32 type, u32 
  len, u8 *msg));
   int kvm_ipc__handle(int fd, struct kvm_ipc_msg *msg);
   int kvm_ipc__start(int sock);
  +int kvm_ipc__stop(void);
 
   #endif
  diff --git a/tools/kvm/kvm-ipc.c b/tools/kvm/kvm-ipc.c
  index f05e926..65209e1 100644
  --- a/tools/kvm/kvm-ipc.c
  +++ b/tools/kvm/kvm-ipc.c
  @@ -7,12 +7,14 @@
   #include sys/un.h
   #include sys/types.h
   #include sys/socket.h
  +#include sys/eventfd.h
 
   #define KVM_IPC_MAX_MSGS 16
 
   static void (*msgs[KVM_IPC_MAX_MSGS])(int fd, u32 type, u32 len, u8 *msg);
   static DECLARE_RWSEM(msgs_rwlock);
  -static int epoll_fd, server_fd;
  +static int epoll_fd, server_fd, stop_fd;
  +static pthread_t thread;
 
   int kvm_ipc__register_handler(u32 type, void (*cb)(int fd, u32 type, u32 
  len, u8 *msg))
   {
  @@ -109,8 +111,11 @@ static void *kvm_ipc__thread(void *param)
 nfds = epoll_wait(epoll_fd, event, 1, -1);
 if (nfds  0) {
 int fd = event.data.fd;
  -
  -   if (fd == server_fd) {
  +   printf(bleh\n);
  +   if (fd == stop_fd) {
  +   printf(Got stop signal\n);
  +   break;
 
 LOL. Can I have a version of the patch without those printf calls? :-)

Woopsie :)

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] kvm tools: Add method to stop ipc thread

2011-10-25 Thread Sasha Levin

Stop the ipc thread when shutting down the hypervisor.

This solves a bug where the .sock files weren't removed upon shutdown.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/include/kvm/kvm-ipc.h |1 +
 tools/kvm/kvm-ipc.c |   30 +++---
 tools/kvm/kvm.c |1 +
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/tools/kvm/include/kvm/kvm-ipc.h b/tools/kvm/include/kvm/kvm-ipc.h
index c932052..731767f 100644
--- a/tools/kvm/include/kvm/kvm-ipc.h
+++ b/tools/kvm/include/kvm/kvm-ipc.h
@@ -22,5 +22,6 @@ enum {
 int kvm_ipc__register_handler(u32 type, void (*cb)(int fd, u32 type, u32 len, 
u8 *msg));
 int kvm_ipc__handle(int fd, struct kvm_ipc_msg *msg);
 int kvm_ipc__start(int sock);
+int kvm_ipc__stop(void);
 
 #endif
diff --git a/tools/kvm/kvm-ipc.c b/tools/kvm/kvm-ipc.c
index f05e926..3131fe8 100644
--- a/tools/kvm/kvm-ipc.c
+++ b/tools/kvm/kvm-ipc.c
@@ -7,12 +7,14 @@
 #include sys/un.h
 #include sys/types.h
 #include sys/socket.h
+#include sys/eventfd.h
 
 #define KVM_IPC_MAX_MSGS 16
 
 static void (*msgs[KVM_IPC_MAX_MSGS])(int fd, u32 type, u32 len, u8 *msg);
 static DECLARE_RWSEM(msgs_rwlock);
-static int epoll_fd, server_fd;
+static int epoll_fd, server_fd, stop_fd;
+static pthread_t thread;
 
 int kvm_ipc__register_handler(u32 type, void (*cb)(int fd, u32 type, u32 len, 
u8 *msg))
 {
@@ -110,7 +112,9 @@ static void *kvm_ipc__thread(void *param)
if (nfds  0) {
int fd = event.data.fd;
 
-   if (fd == server_fd) {
+   if (fd == stop_fd) {
+   break;
+   } else if (fd == server_fd) {
int client;
 
client = kvm_ipc__new_conn(fd);
@@ -128,7 +132,6 @@ static void *kvm_ipc__thread(void *param)
 
 int kvm_ipc__start(int sock)
 {
-   pthread_t thread;
struct epoll_event ev;
 
server_fd = sock;
@@ -140,8 +143,29 @@ int kvm_ipc__start(int sock)
if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, sock, ev)  0)
die(Failed starting IPC thread);
 
+   stop_fd = eventfd(0, 0);
+   ev.events = EPOLLIN | EPOLLOUT | EPOLLRDHUP | EPOLLET;
+   ev.data.fd = stop_fd;
+   if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, stop_fd, ev)  0)
+   die(Failed adding stop event to epoll);
+
if (pthread_create(thread, NULL, kvm_ipc__thread, NULL) != 0)
die(Failed starting IPC thread);
 
return 0;
 }
+
+int kvm_ipc__stop(void)
+{
+   u64 val = 1;
+   int ret;
+
+   ret = write(stop_fd, val, sizeof(val));
+   if (ret  0)
+   return ret;
+
+   close(server_fd);
+   close(epoll_fd);
+
+   return ret;
+}
diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 40ae6a5..8d6b5e1 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -237,6 +237,7 @@ void kvm__delete(struct kvm *kvm)
kvm__stop_timer(kvm);
 
munmap(kvm-ram_start, kvm-ram_size);
+   kvm_ipc__stop();
kvm__remove_socket(kvm-name);
free(kvm);
 }
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: Add method to stop ipc thread

2011-10-25 Thread Sasha Levin

On Tue, 2011-10-25 at 19:08 +0800, Osier Yang wrote:
 于 2011年10月25日 18:40, Sasha Levin 写道:
  Stop the ipc thread when shutting down the hypervisor.
 
  This solves a bug where the .sock files weren't removed upon shutdown.
 
 I tested the patch, the socket is still there. (I'm testing with
 --sdl, and
 shutdown the vm window). And the guest starting succeeded even if
 a socket with the name same as the specified guest name exists, which
 means I can start two guests with same socket, which is not right.

I've just tested it with --sdl, looks like closing the sdl window uses
the old method of IPC which I forgot to convert.

I'll fix that in a separate patch.

If you test it without --sdl it should work right.

-- 

Sasha.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Jan Kiszka

On 2011-10-25 13:20, Michael S. Tsirkin wrote:
 On Tue, Oct 25, 2011 at 09:24:17AM +0200, Jan Kiszka wrote:
 On 2011-10-24 19:23, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
 On 2011-10-24 18:05, Michael S. Tsirkin wrote:
 This is what I have in mind:
  - devices set PBA bit if MSI message cannot be sent due to mask (*)
  - core checksclears PBA bit on unmask, injects message if bit was set
  - devices clear PBA bit if message reason is resolved before unmask (*)

 OK, but practically, when exactly does the device clear PBA?

 Consider a network adapter that signals messages in a RX ring: If the
 corresponding vector is masked while the guest empties the ring, I
 strongly assume that the device is supposed to take back the pending bit
 in that case so that there is no interrupt inject on a later vector
 unmask operation.

 Jan

 Do you mean virtio here?

 Maybe, but I'm also thinking of fully emulated devices.
 
 One thing seems certain: actual, assigned devices don't
 have this fake msi-x level so they don't notify host
 when that changes.

But they have real PBA. We just need to replicate the emulated vector
mask state into real hw. Doesn't this happen anyway when we disable the
IRQ on the host?

If not, that may require a bit more work, maybe a special masking mode
that can be requested by the managing backend of an assigned device from
the MSI-X in-kernel service.

 
 Do you expect this optimization to give
 a significant performance gain?

 Hard to asses in general. But I have a silly guest here that obviously
 masks MSI vectors for each event. This currently not only kicks us into
 a heavy-weight exit, it also enforces serialization on qemu_global_mutex
 (while we have the rest already isolated).
 
 It easy to see how MSIX mask support in kernel would help.
 Not sure whether it's worth it to also add special APIs to
 reduce the number of spurious interrupts for such silly guests.

I do not get the latter point. What could be simplified (without making
it incorrect) when ignoring excessive mask accesses? Also, if sane
guests do not access the mask that frequently, why was in-kernel MSI-X
MMIO proposed at all?

 

 It would also be challenging to implement this in
 a race free manner. Clearing on interrupt status read
 seems straight-forward.

 With an in-kernel MSI-X MMIO handler, this race will be naturally
 unavoidable as there is no more global lock shared between table/PBA
 accesses and the device model. But, when using atomic bit ops, I don't
 think that will cause headache.

 Jan
 
 This is not the race I meant.  The challenge is for the device to
 determine that it can clear the PBA.  atomic accesses on PBA won't help
 here I think.

The device knows best if the interrupt reason persists. It can
synchronize MSI assertion and PBA bit clearance. If it clears too
late, than this reflects what may happen on real hw as well when host
and device race for changing vector mask vs. device state. It's not
stated that those changes need to be serialized inside the device, is it?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: Fix SDL exit on window close

2011-10-25 Thread Sasha Levin

We've changed IPC to use sockets instead of signals, but the process of
closing the SDL window was still trigerring an exit signal causing
an ugly message and not cleaning up after itself.

This patch switches that to use the proper method of cleaning up.

Reported-by: Osier Yang jy...@redhat.com
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/ui/sdl.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/ui/sdl.c b/tools/kvm/ui/sdl.c
index b84cd66..5bf11fa 100644
--- a/tools/kvm/ui/sdl.c
+++ b/tools/kvm/ui/sdl.c
@@ -4,6 +4,7 @@
 #include kvm/i8042.h
 #include kvm/util.h
 #include kvm/kvm.h
+#include kvm/kvm-cpu.h
 
 #include SDL/SDL.h
 #include pthread.h
@@ -254,7 +255,7 @@ static void *sdl__thread(void *p)
SDL_Delay(1000 / FRAME_RATE);
}
 exit:
-   kill(0, SIGKVMSTOP);
+   kvm_cpu__reboot();
 
return NULL;
 }
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Michael S. Tsirkin

On Tue, Oct 25, 2011 at 01:41:39PM +0200, Jan Kiszka wrote:
 On 2011-10-25 13:20, Michael S. Tsirkin wrote:
  On Tue, Oct 25, 2011 at 09:24:17AM +0200, Jan Kiszka wrote:
  On 2011-10-24 19:23, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
  On 2011-10-24 18:05, Michael S. Tsirkin wrote:
  This is what I have in mind:
   - devices set PBA bit if MSI message cannot be sent due to mask (*)
   - core checksclears PBA bit on unmask, injects message if bit was 
  set
   - devices clear PBA bit if message reason is resolved before unmask 
  (*)
 
  OK, but practically, when exactly does the device clear PBA?
 
  Consider a network adapter that signals messages in a RX ring: If the
  corresponding vector is masked while the guest empties the ring, I
  strongly assume that the device is supposed to take back the pending bit
  in that case so that there is no interrupt inject on a later vector
  unmask operation.
 
  Jan
 
  Do you mean virtio here?
 
  Maybe, but I'm also thinking of fully emulated devices.
  
  One thing seems certain: actual, assigned devices don't
  have this fake msi-x level so they don't notify host
  when that changes.
 
 But they have real PBA. We just need to replicate the emulated vector
 mask state into real hw. Doesn't this happen anyway when we disable the
 IRQ on the host?

Not immediately I think.

 If not, that may require a bit more work, maybe a special masking mode
 that can be requested by the managing backend of an assigned device from
 the MSI-X in-kernel service.

True. OTOH this might have cost (extra mmio) for the
doubtful benefit of making PBA values exact.

  
  Do you expect this optimization to give
  a significant performance gain?
 
  Hard to asses in general. But I have a silly guest here that obviously
  masks MSI vectors for each event. This currently not only kicks us into
  a heavy-weight exit, it also enforces serialization on qemu_global_mutex
  (while we have the rest already isolated).
  
  It easy to see how MSIX mask support in kernel would help.
  Not sure whether it's worth it to also add special APIs to
  reduce the number of spurious interrupts for such silly guests.
 
 I do not get the latter point. What could be simplified (without making
 it incorrect) when ignoring excessive mask accesses?

Clearing PBA when we detect an empty ring in host is not required,
IMO. It's an optimization.

 Also, if sane
 guests do not access the mask that frequently, why was in-kernel MSI-X
 MMIO proposed at all?

Apparently whether mask accesses happen a lot depends on the workload.

  
 
  It would also be challenging to implement this in
  a race free manner. Clearing on interrupt status read
  seems straight-forward.
 
  With an in-kernel MSI-X MMIO handler, this race will be naturally
  unavoidable as there is no more global lock shared between table/PBA
  accesses and the device model. But, when using atomic bit ops, I don't
  think that will cause headache.
 
  Jan
  
  This is not the race I meant.  The challenge is for the device to
  determine that it can clear the PBA.  atomic accesses on PBA won't help
  here I think.
 
 The device knows best if the interrupt reason persists.

It might not know this unless notified by driver.
E.g. virtio drivers currently don't do interrupt status
reads.

 It can
 synchronize MSI assertion and PBA bit clearance. If it clears too
 late, than this reflects what may happen on real hw as well when host
 and device race for changing vector mask vs. device state. It's not
 stated that those changes need to be serialized inside the device, is it?
 
 Jan

Talking about emulated devices?  It's not sure that real
hardware clears PBA. Considering that no guests I know of use PBA ATM,
I would not be surprised if many devices had broken PBA support.


 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC/PATCH] kvm tools, bios: Don't include glibc headers

2011-10-25 Thread Pekka Enberg

We should not include glibc headers from BIOS code which can't rely on glibc at
runtime.

This patch fixes the following build breakage on 64-bit:

CC   bios/e820.o
  In file included from /usr/include/features.h:387:0,
   from /usr/include/unistd.h:26,
   from include/kvm/util.h:12,
   from bios/e820.c:5:
  /usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or 
directory compilation terminated.
  make: *** [bios/bios.bin.elf] Error 1

when built without the 'glibc-devel.i686' package on Fedora, for example.

Cc: Asias He asias.he...@gmail.com
Cc: Cyrill Gorcunov gorcu...@gmail.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Sasha Levin levinsasha...@gmail.com
Signed-off-by: Pekka Enberg penb...@kernel.org
---
 tools/kvm/README   |2 +-
 tools/kvm/bios/e820.c  |1 -
 tools/kvm/bios/int10.c |3 ---
 3 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/tools/kvm/README b/tools/kvm/README
index 7cbf530..c262995 100644
--- a/tools/kvm/README
+++ b/tools/kvm/README
@@ -102,7 +102,7 @@ For rpm based systems:
 32-bit:
 yum install glibc-devel
 64-bit:
-yum install glibc-devel glibc-devel.i686 glibc-static
+yum install glibc-devel glibc-static
 
 On 64-bit Arch Linux make sure the multilib repository is enabled in your
 /etc/pacman.conf and run
diff --git a/tools/kvm/bios/e820.c b/tools/kvm/bios/e820.c
index 1eafb5b..a9bca29 100644
--- a/tools/kvm/bios/e820.c
+++ b/tools/kvm/bios/e820.c
@@ -2,7 +2,6 @@
 
 #include kvm/segment.h
 #include kvm/bios.h
-#include kvm/util.h
 
 #include asm/processor-flags.h
 #include asm/e820.h
diff --git a/tools/kvm/bios/int10.c b/tools/kvm/bios/int10.c
index 6387ca8..32f7c92 100644
--- a/tools/kvm/bios/int10.c
+++ b/tools/kvm/bios/int10.c
@@ -1,12 +1,9 @@
 #include kvm/segment.h
 #include kvm/bios.h
-#include kvm/util.h
 #include kvm/vesa.h
 
 #include bios/memcpy.h
 
-#include stdint.h
-
 #define VESA_MAGIC ('V' + ('E'  8) + ('S'  16) + ('A'  24))
 
 /* VESA General Information table */
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] kvm tools: Kill double slash from kvm__get_dir()

2011-10-25 Thread Pekka Enberg

In preparation for showing kvm__get_dir() path to the user, kill the
unnecessary double slash to make the output more readable.

Cc: Asias He asias.he...@gmail.com
Cc: Cyrill Gorcunov gorcu...@gmail.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Sasha Levin levinsasha...@gmail.com
Signed-off-by: Pekka Enberg penb...@kernel.org
---
 tools/kvm/include/kvm/kvm.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/include/kvm/kvm.h b/tools/kvm/include/kvm/kvm.h
index 2b3024a..38253da 100644
--- a/tools/kvm/include/kvm/kvm.h
+++ b/tools/kvm/include/kvm/kvm.h
@@ -22,7 +22,7 @@
 #define SIGKVMSTOP (SIGRTMIN + 4)
 #define SIGKVMRESUME   (SIGRTMIN + 5)
 
-#define KVM_PID_FILE_PATH  /.kvm-tools/
+#define KVM_PID_FILE_PATH  .kvm-tools/
 #define HOME_DIR   getenv(HOME)
 
 struct kvm {
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] kvm tools: Improve kvm setup info and error messages

2011-10-25 Thread Pekka Enberg

This patch improves the output of 'kvm setup' as follows:

  [penberg@tux kvm]$ ./kvm setup

  kvm setup creates a new rootfs under /home/penberg//.kvm-tools/.
  This can be used later by the '-d' parameter of 'kvm run'.

   usage: kvm setup [name]

  [penberg@tux kvm]$ ./kvm setup test-box
  A new rootfs 'test-box' has been created in 
'/home/penberg//.kvm-tools/test-box'.

  You can now start it by running the following command:

kvm run -d test-box

  [penberg@tux kvm]$ ./kvm setup test-box
  Unable to create rootfs in /home/penberg//.kvm-tools/test-box: File exists

Cc: Asias He asias.he...@gmail.com
Cc: Cyrill Gorcunov gorcu...@gmail.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Sasha Levin levinsasha...@gmail.com
Signed-off-by: Pekka Enberg penb...@kernel.org
---
 tools/kvm/builtin-setup.c |   15 +--
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/tools/kvm/builtin-setup.c b/tools/kvm/builtin-setup.c
index dce1748..7deb0ed 100644
--- a/tools/kvm/builtin-setup.c
+++ b/tools/kvm/builtin-setup.c
@@ -46,8 +46,9 @@ static void parse_setup_options(int argc, const char **argv)
 
 void kvm_setup_help(void)
 {
-   printf(\nkvm setup creates a new rootfs and stores it under 
~/.kvm-tools/ .\n
-   This can be used later by the '-d' parameter of 'kvm run'.\n);
+   printf(\nkvm setup creates a new rootfs under %s.\n
+   This can be used later by the '-d' parameter of 'kvm run'.\n,
+   kvm__get_dir());
usage_with_options(setup_usage, setup_options);
 }
 
@@ -215,11 +216,13 @@ int kvm_cmd_setup(int argc, const char **argv, const char 
*prefix)
 
r = do_setup(instance_name);
if (r == 0)
-   pr_info(Your new rootfs named %s has been created.\n
-   You can now start it by running 'kvm run -d %s'\n,
-   instance_name, instance_name);
+   printf(A new rootfs '%s' has been created in '%s%s'.\n\n
+   You can now start it by running the following 
command:\n\n
+ kvm run -d %s\n,
+   instance_name, kvm__get_dir(), instance_name, 
instance_name);
else
-   perror(Error creating rootfs);
+   printf(Unable to create rootfs in %s%s: %s\n,
+   kvm__get_dir(), instance_name, strerror(errno));
 
return r;
 }
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Jan Kiszka

On 2011-10-25 14:05, Michael S. Tsirkin wrote:
 On Tue, Oct 25, 2011 at 01:41:39PM +0200, Jan Kiszka wrote:
 On 2011-10-25 13:20, Michael S. Tsirkin wrote:
 On Tue, Oct 25, 2011 at 09:24:17AM +0200, Jan Kiszka wrote:
 On 2011-10-24 19:23, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
 On 2011-10-24 18:05, Michael S. Tsirkin wrote:
 This is what I have in mind:
  - devices set PBA bit if MSI message cannot be sent due to mask (*)
  - core checksclears PBA bit on unmask, injects message if bit was 
 set
  - devices clear PBA bit if message reason is resolved before unmask 
 (*)

 OK, but practically, when exactly does the device clear PBA?

 Consider a network adapter that signals messages in a RX ring: If the
 corresponding vector is masked while the guest empties the ring, I
 strongly assume that the device is supposed to take back the pending bit
 in that case so that there is no interrupt inject on a later vector
 unmask operation.

 Jan

 Do you mean virtio here?

 Maybe, but I'm also thinking of fully emulated devices.

 One thing seems certain: actual, assigned devices don't
 have this fake msi-x level so they don't notify host
 when that changes.

 But they have real PBA. We just need to replicate the emulated vector
 mask state into real hw. Doesn't this happen anyway when we disable the
 IRQ on the host?
 
 Not immediately I think.
 
 If not, that may require a bit more work, maybe a special masking mode
 that can be requested by the managing backend of an assigned device from
 the MSI-X in-kernel service.
 
 True. OTOH this might have cost (extra mmio) for the
 doubtful benefit of making PBA values exact.

I think correctness come before performance unless the latter hurts
significantly.

 

 Do you expect this optimization to give
 a significant performance gain?

 Hard to asses in general. But I have a silly guest here that obviously
 masks MSI vectors for each event. This currently not only kicks us into
 a heavy-weight exit, it also enforces serialization on qemu_global_mutex
 (while we have the rest already isolated).

 It easy to see how MSIX mask support in kernel would help.
 Not sure whether it's worth it to also add special APIs to
 reduce the number of spurious interrupts for such silly guests.

 I do not get the latter point. What could be simplified (without making
 it incorrect) when ignoring excessive mask accesses?
 
 Clearing PBA when we detect an empty ring in host is not required,
 IMO. It's an optimization.

For virtio that might be true - as we are free to define the device
behaviour to our benefit. What emulated real devices do is another thing.

 
 Also, if sane
 guests do not access the mask that frequently, why was in-kernel MSI-X
 MMIO proposed at all?
 
 Apparently whether mask accesses happen a lot depends on the workload.
 


 It would also be challenging to implement this in
 a race free manner. Clearing on interrupt status read
 seems straight-forward.

 With an in-kernel MSI-X MMIO handler, this race will be naturally
 unavoidable as there is no more global lock shared between table/PBA
 accesses and the device model. But, when using atomic bit ops, I don't
 think that will cause headache.

 Jan

 This is not the race I meant.  The challenge is for the device to
 determine that it can clear the PBA.  atomic accesses on PBA won't help
 here I think.

 The device knows best if the interrupt reason persists.
 
 It might not know this unless notified by driver.
 E.g. virtio drivers currently don't do interrupt status
 reads.

Talking about real devices, they obviously know as they maintain the
hardware state.

 
 It can
 synchronize MSI assertion and PBA bit clearance. If it clears too
 late, than this reflects what may happen on real hw as well when host
 and device race for changing vector mask vs. device state. It's not
 stated that those changes need to be serialized inside the device, is it?

 Jan
 
 Talking about emulated devices?  It's not sure that real
 hardware clears PBA. Considering that no guests I know of use PBA ATM,
 I would not be surprised if many devices had broken PBA support.

OK, if there are no conforming MSI-X devices out there, then we can
forget about all the PBA maintenance beyond set if message hit mask,
cleared again on unmask. But I doubt that this is generally true.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: Improve kvm list default output

2011-10-25 Thread Pekka Enberg

This patch changes 'kvm list' default output to look as follows:

  [penberg@tux kvm]$ ./kvm list
PID GUEST
   8156 guest-8156
default (not running)
test-box (not running)

Cc: Asias He asias.he...@gmail.com
Cc: Cyrill Gorcunov gorcu...@gmail.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Sasha Levin levinsasha...@gmail.com
Signed-off-by: Pekka Enberg penb...@kernel.org
---
 tools/kvm/builtin-list.c |   10 --
 1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/tools/kvm/builtin-list.c b/tools/kvm/builtin-list.c
index 12e2537..2ff3472 100644
--- a/tools/kvm/builtin-list.c
+++ b/tools/kvm/builtin-list.c
@@ -91,8 +91,6 @@ cleanup:
 
 static int kvm_list_running_instances(void)
 {
-   printf(  PID GUEST\n);
-
return kvm__enumerate_instances(print_guest);
 }
 
@@ -107,13 +105,11 @@ static int kvm_list_rootfs(void)
if (dir == NULL)
return -1;
 
-   printf(  ROOTFS\n);
-
while ((dirent = readdir(dir))) {
if (dirent-d_type == DT_DIR 
strcmp(dirent-d_name, .) 
strcmp(dirent-d_name, ..))
-   printf(%s\n, dirent-d_name);
+   printf(  %s (not running)\n, dirent-d_name);
}
 
return 0;
@@ -136,7 +132,9 @@ int kvm_cmd_list(int argc, const char **argv, const char 
*prefix)
parse_setup_options(argc, argv);
 
if (!run  !rootfs)
-   kvm_list_help();
+   run = rootfs = true;
+
+   printf(  PID GUEST\n);
 
if (run) {
r = kvm_list_running_instances();
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-25 Thread Kevin Wolf

Am 24.10.2011 13:35, schrieb Paolo Bonzini:
 On 10/24/2011 01:04 PM, Juan Quintela wrote:

 Hi

 Please send in any agenda items you are interested in covering.
 
 - What's left to merge for 1.0.

I would still like to cache the default cache mode (probably to
cache=writeback). We don't allow guests to toggle WCE yet which Anthony
would have liked to see before doing the change. Is it a strict requirement?

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] PCI: Rework config space locking, add INTx masking services

2011-10-25 Thread Jan Kiszka

On 2011-10-14 16:48, Jan Kiszka wrote:
 On 2011-10-06 17:48, Jesse Barnes wrote:
 On Mon, 12 Sep 2011 18:54:01 +0200
 Jan Kiszka jan.kis...@siemens.com wrote:

 This series tries to heal the currently broken locking scheme around PCI
 config space accesses.

 We have an interface lock out access via sysfs, but that service wrongly
 assumes it is only called by one instance at a time for some device. So
 two loops doing

 echo 1  /sys/bus/pci/devices/some-device/reset

 in parallel will trigger a kernel BUG at the moment.

 Besides synchronizing with user space, we also need to manage config
 space access of generic PCI drivers. They need to mask legacy interrupt
 lines while the specific driver runs in user space or a guest OS.

 The approach taken here is provide mutex-like locking for general
 access - which still requires a special mechanism due to requirements of
 the IBM Power RAID SCSI driver. Furthermore, INTx masking is now
 available via the PCI core and synchronized via the internal pci_lock.

 Not sure who may want to take this, so I'm CC'ing broadly.

 ISTR a bunch of discussion about this (just back from lots of work
 travel and vacation, sorry I missed most of it).

 Is this the agreed upon way of handling it?  If so, can I get some
 Reviewed/Acked-bys from people?
 
 I hope this is acceptable. These changes are required for further
 improvements of the KVM device assignment support (INTx sharing). So I
 would appreciate any ack or whatever feedback as well.

At least two acks arrived by now. But I guess this series is not yet
lined up for the 3.2 window, is it? Can we get it merged for 3.3? Just
let me know if any rebasing is required and against which tree.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] qemu-kvm: memory_region_add_subregion_common: Assertion `!subregion-parent' failed.

2011-10-25 Thread Gerd Hoffmann

  Hi,

 -M pc-0.13

 That's likely vga_init_vbe vs. pci_register_bar with vga.vram. Either
 the PCI bar or the VBE mapping should be wrapped by an alias region
 pointing to the original vram. And vmware_vga seems to be affected as well.

Will there be a migration flag day before 1.0?
If yes we can just zap the bochs vbe mapping for pci vga cards.

It is not used any more since 0.14 and it is only there for migration
compatibility with 0.13 and older ...

Easiest way around this is to use a more recent machine type.

cheers,
  Gerd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-25 Thread Anthony Liguori


On 10/25/2011 07:35 AM, Kevin Wolf wrote:

Am 24.10.2011 13:35, schrieb Paolo Bonzini:

On 10/24/2011 01:04 PM, Juan Quintela wrote:


Hi

Please send in any agenda items you are interested in covering.


- What's left to merge for 1.0.


I would still like to cache the default cache mode (probably to
cache=writeback). We don't allow guests to toggle WCE yet which Anthony
would have liked to see before doing the change. Is it a strict requirement?


I don't see a way around it.  If the default mode is cache=writeback, then we're 
open to data corruption in any guest where barrier=0.  With guest togglable WCE, 
it ends up being a guest configuration issue so we can more or less defer 
responsibility.


Do you think it's a good idea to change the default mode w/o guest WCE toggle 
support?  What's your view about older guests if we change the default mode? 
What's your main motivation for wanting to change the default mode?


I'd be much more open to changing the default mode to cache=none FWIW since the 
risk of data loss there is much, much lower.


Regards,

Anthony Liguori



Kevin



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] kvm tools: Kill double slash from kvm__get_dir()

2011-10-25 Thread Pekka Enberg

In preparation for showing kvm__get_dir() path to the user, kill the
unnecessary double slash to make the output more readable.

Cc: Asias He asias.he...@gmail.com
Cc: Cyrill Gorcunov gorcu...@gmail.com
Cc: Ingo Molnar mi...@elte.hu
Cc: Sasha Levin levinsasha...@gmail.com
Signed-off-by: Pekka Enberg penb...@kernel.org
---
 tools/kvm/kvm.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 8d6b5e1..8605fc8 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -86,7 +86,11 @@ static char kvm_dir[PATH_MAX];
 
 static void set_dir(const char *fmt, va_list args)
 {
-   vsnprintf(kvm_dir, sizeof(kvm_dir), fmt, args);
+   char tmp[PATH_MAX];
+
+   vsnprintf(tmp, sizeof(tmp), fmt, args);
+
+   realpath(tmp, kvm_dir);
 }
 
 void kvm__set_dir(const char *fmt, ...)
-- 
1.7.6.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-25 Thread Dor Laor


On 10/25/2011 03:05 PM, Anthony Liguori wrote:

On 10/25/2011 07:35 AM, Kevin Wolf wrote:

Am 24.10.2011 13:35, schrieb Paolo Bonzini:

On 10/24/2011 01:04 PM, Juan Quintela wrote:


Hi

Please send in any agenda items you are interested in covering.


- What's left to merge for 1.0.


I would still like to cache the default cache mode (probably to
cache=writeback). We don't allow guests to toggle WCE yet which Anthony
would have liked to see before doing the change. Is it a strict
requirement?


I don't see a way around it. If the default mode is cache=writeback,
then we're open to data corruption in any guest where barrier=0. With
guest togglable WCE, it ends up being a guest configuration issue so we
can more or less defer responsibility.

Do you think it's a good idea to change the default mode w/o guest WCE
toggle support? What's your view about older guests if we change the
default mode? What's your main motivation for wanting to change the
default mode?

I'd be much more open to changing the default mode to cache=none FWIW
since the risk of data loss there is much, much lower.


A bit related to this, it would be nice to mark a VM un-migratable if 
cache!=none. Juan reports that currently it such VMs are exposed to data 
integrity issues so we need to fail migrating them automatically.




Regards,

Anthony Liguori



Kevin






--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm tools: Avoid overwriting socket files with same name

2011-10-25 Thread Sasha Levin

This patch prevents overwriting socket files by running two instances with
the same name.

Reported-by: Osier Yang jy...@redhat.com
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/kvm.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 8d6b5e1..f700d78 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -152,6 +152,9 @@ static int kvm__create_socket(struct kvm *kvm)
sprintf(full_name, %s, kvm__get_dir());
mkdir(full_name, 0777);
sprintf(full_name, %s/%s.sock, kvm__get_dir(), kvm-name);
+   if (access(full_name, F_OK) == 0)
+   die(Socket file %s already exist, full_name);
+
s = socket(AF_UNIX, SOCK_STREAM, 0);
if (s  0)
return s;
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] KVM: Introduce direct MSI message injection for in-kernel irqchips

2011-10-25 Thread Michael S. Tsirkin

On Tue, Oct 25, 2011 at 02:21:01PM +0200, Jan Kiszka wrote:
 On 2011-10-25 14:05, Michael S. Tsirkin wrote:
  On Tue, Oct 25, 2011 at 01:41:39PM +0200, Jan Kiszka wrote:
  On 2011-10-25 13:20, Michael S. Tsirkin wrote:
  On Tue, Oct 25, 2011 at 09:24:17AM +0200, Jan Kiszka wrote:
  On 2011-10-24 19:23, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 07:05:08PM +0200, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 06:10:28PM +0200, Jan Kiszka wrote:
  On 2011-10-24 18:05, Michael S. Tsirkin wrote:
  This is what I have in mind:
   - devices set PBA bit if MSI message cannot be sent due to mask (*)
   - core checksclears PBA bit on unmask, injects message if bit was 
  set
   - devices clear PBA bit if message reason is resolved before 
  unmask (*)
 
  OK, but practically, when exactly does the device clear PBA?
 
  Consider a network adapter that signals messages in a RX ring: If the
  corresponding vector is masked while the guest empties the ring, I
  strongly assume that the device is supposed to take back the pending 
  bit
  in that case so that there is no interrupt inject on a later vector
  unmask operation.
 
  Jan
 
  Do you mean virtio here?
 
  Maybe, but I'm also thinking of fully emulated devices.
 
  One thing seems certain: actual, assigned devices don't
  have this fake msi-x level so they don't notify host
  when that changes.
 
  But they have real PBA. We just need to replicate the emulated vector
  mask state into real hw. Doesn't this happen anyway when we disable the
  IRQ on the host?
  
  Not immediately I think.
  
  If not, that may require a bit more work, maybe a special masking mode
  that can be requested by the managing backend of an assigned device from
  the MSI-X in-kernel service.
  
  True. OTOH this might have cost (extra mmio) for the
  doubtful benefit of making PBA values exact.
 
 I think correctness come before performance unless the latter hurts
 significantly.
 
  
 
  Do you expect this optimization to give
  a significant performance gain?
 
  Hard to asses in general. But I have a silly guest here that obviously
  masks MSI vectors for each event. This currently not only kicks us into
  a heavy-weight exit, it also enforces serialization on qemu_global_mutex
  (while we have the rest already isolated).
 
  It easy to see how MSIX mask support in kernel would help.
  Not sure whether it's worth it to also add special APIs to
  reduce the number of spurious interrupts for such silly guests.
 
  I do not get the latter point. What could be simplified (without making
  it incorrect) when ignoring excessive mask accesses?
  
  Clearing PBA when we detect an empty ring in host is not required,
  IMO. It's an optimization.
 
 For virtio that might be true - as we are free to define the device
 behaviour to our benefit. What emulated real devices do is another thing.

Anything specific in mind?

  
  Also, if sane
  guests do not access the mask that frequently, why was in-kernel MSI-X
  MMIO proposed at all?
  
  Apparently whether mask accesses happen a lot depends on the workload.
  
 
 
  It would also be challenging to implement this in
  a race free manner. Clearing on interrupt status read
  seems straight-forward.
 
  With an in-kernel MSI-X MMIO handler, this race will be naturally
  unavoidable as there is no more global lock shared between table/PBA
  accesses and the device model. But, when using atomic bit ops, I don't
  think that will cause headache.
 
  Jan
 
  This is not the race I meant.  The challenge is for the device to
  determine that it can clear the PBA.  atomic accesses on PBA won't help
  here I think.
 
  The device knows best if the interrupt reason persists.
  
  It might not know this unless notified by driver.
  E.g. virtio drivers currently don't do interrupt status
  reads.
 
 Talking about real devices, they obviously know as they maintain the
 hardware state.

Not necessarily. It's quite common to keep the ring in coherent memory
allocated by driver, not within the device, the state is then
maintained by driver and device together.

  
  It can
  synchronize MSI assertion and PBA bit clearance. If it clears too
  late, than this reflects what may happen on real hw as well when host
  and device race for changing vector mask vs. device state. It's not
  stated that those changes need to be serialized inside the device, is it?
 
  Jan
  
  Talking about emulated devices?  It's not sure that real
  hardware clears PBA. Considering that no guests I know of use PBA ATM,
  I would not be surprised if many devices had broken PBA support.
 
 OK, if there are no conforming MSI-X devices out there,

Oh, I'm guessing some devices are conforming :)

 then we can
 forget about all the PBA maintenance beyond set if message hit mask,
 cleared again on unmask. But I doubt that this is generally true.
 
 Jan

We seem to get by basically with what you describe but I'm not
saying it's perfect, just that it's hard to make it perfect.

 --

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-25 Thread Anthony Liguori


On 10/25/2011 08:18 AM, Dor Laor wrote:

On 10/25/2011 03:05 PM, Anthony Liguori wrote:

On 10/25/2011 07:35 AM, Kevin Wolf wrote:

Am 24.10.2011 13:35, schrieb Paolo Bonzini:

On 10/24/2011 01:04 PM, Juan Quintela wrote:


Hi

Please send in any agenda items you are interested in covering.


- What's left to merge for 1.0.


I would still like to cache the default cache mode (probably to
cache=writeback). We don't allow guests to toggle WCE yet which Anthony
would have liked to see before doing the change. Is it a strict
requirement?


I don't see a way around it. If the default mode is cache=writeback,
then we're open to data corruption in any guest where barrier=0. With
guest togglable WCE, it ends up being a guest configuration issue so we
can more or less defer responsibility.

Do you think it's a good idea to change the default mode w/o guest WCE
toggle support? What's your view about older guests if we change the
default mode? What's your main motivation for wanting to change the
default mode?

I'd be much more open to changing the default mode to cache=none FWIW
since the risk of data loss there is much, much lower.


A bit related to this, it would be nice to mark a VM un-migratable if
cache!=none. Juan reports that currently it such VMs are exposed to data
integrity issues so we need to fail migrating them automatically.


That's not correct.  cache!=none is perfectly safe *if* you have coherent shared 
storage.


Regards,

Anthony Liguori





Regards,

Anthony Liguori



Kevin









--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-25 Thread Andreas Färber

Am 25.10.2011 15:18, schrieb Dor Laor:
 [...] it would be nice to mark a VM un-migratable [snip]

Speaking of which, I'm working on the missing migration support for AHCI
but fear I won't quite make it for the Nov 1 deadline.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-25 Thread Kevin Wolf

Am 25.10.2011 15:05, schrieb Anthony Liguori:
 On 10/25/2011 07:35 AM, Kevin Wolf wrote:
 Am 24.10.2011 13:35, schrieb Paolo Bonzini:
 On 10/24/2011 01:04 PM, Juan Quintela wrote:

 Hi

 Please send in any agenda items you are interested in covering.

 - What's left to merge for 1.0.

 I would still like to cache the default cache mode (probably to
 cache=writeback). We don't allow guests to toggle WCE yet which Anthony
 would have liked to see before doing the change. Is it a strict requirement?
 
 I don't see a way around it.  If the default mode is cache=writeback, then 
 we're 
 open to data corruption in any guest where barrier=0.  With guest togglable 
 WCE, 
 it ends up being a guest configuration issue so we can more or less defer 
 responsibility.

So do you think that offering a WCE inside the guest would be a real
solution or just a way to have an excuse?

Christoph said that OSes don't usually change this by themselves, it
would need an administrator manually changing the setting. But if we
require that, we can just as well require that the administrator set
cache=writethrough on the qemu command line.

 Do you think it's a good idea to change the default mode w/o guest WCE toggle 
 support?  What's your view about older guests if we change the default mode? 
 What's your main motivation for wanting to change the default mode?

Because people are constantly complaining about the awful
(cache=writethrough) performance they get before they are told they
should use a different cache option. And they are right. The
out-of-the-box experience with qemu's block performance really sucks.

 I'd be much more open to changing the default mode to cache=none FWIW since 
 the 
 risk of data loss there is much, much lower.

I think people said that they'd rather not have cache=none as default
because O_DIRECT doesn't work everywhere.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-25 Thread Anthony Liguori


On 10/25/2011 08:56 AM, Kevin Wolf wrote:

Am 25.10.2011 15:05, schrieb Anthony Liguori:

On 10/25/2011 07:35 AM, Kevin Wolf wrote:

Am 24.10.2011 13:35, schrieb Paolo Bonzini:

On 10/24/2011 01:04 PM, Juan Quintela wrote:


Hi

Please send in any agenda items you are interested in covering.


- What's left to merge for 1.0.


I would still like to cache the default cache mode (probably to
cache=writeback). We don't allow guests to toggle WCE yet which Anthony
would have liked to see before doing the change. Is it a strict requirement?


I don't see a way around it.  If the default mode is cache=writeback, then we're
open to data corruption in any guest where barrier=0.  With guest togglable WCE,
it ends up being a guest configuration issue so we can more or less defer
responsibility.


So do you think that offering a WCE inside the guest would be a real
solution or just a way to have an excuse?


No, it offers a mechanism to fix mistakes at run-time verses at start up time. 
 It also means that you can make template images that understand that they 
don't support barriers and change the WCE setting appropriately.



Christoph said that OSes don't usually change this by themselves, it
would need an administrator manually changing the setting. But if we
require that, we can just as well require that the administrator set
cache=writethrough on the qemu command line.


The administrator of the guest != the administrator of the host.


Do you think it's a good idea to change the default mode w/o guest WCE toggle
support?  What's your view about older guests if we change the default mode?
What's your main motivation for wanting to change the default mode?


Because people are constantly complaining about the awful
(cache=writethrough) performance they get before they are told they
should use a different cache option. And they are right. The
out-of-the-box experience with qemu's block performance really sucks.


With qcow2 you mean, right?


I'd be much more open to changing the default mode to cache=none FWIW since the
risk of data loss there is much, much lower.


I think people said that they'd rather not have cache=none as default
because O_DIRECT doesn't work everywhere.


Where doesn't it work these days?  I know it doesn't work on tmpfs.  I know it 
works on ext[234], btrfs, nfs.


Regards,

Anthony Liguori


Kevin



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm 0.15 usb problem

2011-10-25 Thread Thomas Beinicke

Hi Gerd,

I tried with the branch you suggested but it didn't work either.

The whole WinXP VM crashed once I virtually plugged in the USB ISDN card.

I also tried the qemu-kvm-15.1 release but that also didn't work.

Do you have any further suggestions? How can I assist in debugging that problem?

Cheers

Thomas


On Monday 19 September 2011 08:30:11 Gerd Hoffmann wrote:
 On 09/16/11 18:22, Thomas Beinicke wrote:
  I am experiencing the same problem with version 0.15 while 0.14 works 
  flawless.
  We're using it for an USB ISDN card which freezes the windows host with 
  qemu-kvm 0.15 when plugged in while the system is already running or it 
  completely prevents the start of the VM.
  We experience the same problem with two completely different USB ISDN 
  adapters.
 
  I can further debug this problem if necessary.
 
 Try http://www.kraxel.org/cgit/qemu/log/?h=usb.25.stable
 
 cheers,
Gerd
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-25 Thread Kevin Wolf

Am 25.10.2011 16:06, schrieb Anthony Liguori:
 On 10/25/2011 08:56 AM, Kevin Wolf wrote:
 Am 25.10.2011 15:05, schrieb Anthony Liguori:
 On 10/25/2011 07:35 AM, Kevin Wolf wrote:
 Am 24.10.2011 13:35, schrieb Paolo Bonzini:
 On 10/24/2011 01:04 PM, Juan Quintela wrote:

 Hi

 Please send in any agenda items you are interested in covering.

 - What's left to merge for 1.0.

 I would still like to cache the default cache mode (probably to
 cache=writeback). We don't allow guests to toggle WCE yet which Anthony
 would have liked to see before doing the change. Is it a strict 
 requirement?

 I don't see a way around it.  If the default mode is cache=writeback, then 
 we're
 open to data corruption in any guest where barrier=0.  With guest togglable 
 WCE,
 it ends up being a guest configuration issue so we can more or less defer
 responsibility.

 So do you think that offering a WCE inside the guest would be a real
 solution or just a way to have an excuse?
 
 No, it offers a mechanism to fix mistakes at run-time verses at start up 
 time. 

This is true (in both directions). But I think it's independent from the
right default.

   It also means that you can make template images that understand that they 
 don't support barriers and change the WCE setting appropriately.

Isn't that really a job for management tools?

 Christoph said that OSes don't usually change this by themselves, it
 would need an administrator manually changing the setting. But if we
 require that, we can just as well require that the administrator set
 cache=writethrough on the qemu command line.
 
 The administrator of the guest != the administrator of the host.

But the administrator of the guest == the owner of the qemu instance,
no? He should be the one to use the management tools and configure his VMs.

 Do you think it's a good idea to change the default mode w/o guest WCE 
 toggle
 support?  What's your view about older guests if we change the default mode?
 What's your main motivation for wanting to change the default mode?

 Because people are constantly complaining about the awful
 (cache=writethrough) performance they get before they are told they
 should use a different cache option. And they are right. The
 out-of-the-box experience with qemu's block performance really sucks.
 
 With qcow2 you mean, right?

No, with any format, including raw. Which isn't surprising at all,
O_SYNC makes writes very expensive.

 I'd be much more open to changing the default mode to cache=none FWIW since 
 the
 risk of data loss there is much, much lower.

 I think people said that they'd rather not have cache=none as default
 because O_DIRECT doesn't work everywhere.
 
 Where doesn't it work these days?  I know it doesn't work on tmpfs.  I know 
 it 
 works on ext[234], btrfs, nfs.

I think tmpfs was named (and failing to start with default settings on
tmpfs would be nasty enough), but iirc Alex had another one.

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC v2 PATCH 5/4 PATCH] virtio-net: send gratuitous packet when needed

2011-10-25 Thread Michael S. Tsirkin

On Tue, Oct 25, 2011 at 10:50:41AM +0800, Jason Wang wrote:
 On 10/24/2011 01:25 PM, Michael S. Tsirkin wrote:
  On Mon, Oct 24, 2011 at 02:54:59PM +1030, Rusty Russell wrote:
  On Sat, 22 Oct 2011 13:43:11 +0800, Jason Wang jasow...@redhat.com wrote:
  This make let virtio-net driver can send gratituous packet by a new
  config bit - VIRTIO_NET_S_ANNOUNCE in each config update
  interrupt. When this bit is set by backend, the driver would schedule
  a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS.
 
  This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE.
 
  Signed-off-by: Jason Wang jasow...@redhat.com
 
  This seems like a huge layering violation.  Imagine this in real
  hardware, for example.
  
  commits 06c4648d46d1b757d6b9591a86810be79818b60c
  and 99606477a5888b0ead0284fecb13417b1da8e3af
  document the need for this:
  
  NETDEV_NOTIFY_PEERS notifier indicates that a device moved to a 
  different physical link.
  and
  In real hardware such notifications are only
  generated when the device comes up or the address changes.
  
  So hypervisor could get the same behaviour by sending link up/down
  events, this is just an optimization so guest won't do
  unecessary stuff like try to reconfigure an IP address.
  
  
  Maybe LOCATION_CHANGE would be a better name?
  
 
 ANNOUNCE_SELF?

It would be nice to formulate what kind of event
are we notifying the guest about.
The announce part of it is really up to the guest, isn't it?

  
  There may be a good reason why virtual devices might want this kind of
  reconfiguration cheat, which is unnecessary for normal machines,
  
  I think yes, the difference with real hardware is guest can change
  location without link getting dropped.
  FWIW, Xen seems to use this capability too.
 
 So does ms netvsc.
 
  
  but
  it'd have to be spelled out clearly in the spec to justify it...
 
  Cheers,
  Rusty.
  
  Agree, and I'd like to see the spec too. The interface seems
  to involve the guest clearing the status bit when it detects
  an event?
 
 I would describe this in spec. The interface need guest to clear the
 status bit, this would let the back-end know it has finished the work as
 we may need to send the gratuitous packets many times.
 
  
  Also - how does it interact with the link up event?
  We probably don't want to schedule this when we detect
  a link status change or during initialization, as
  this patch seems to do? What if link goes down
  while the work is running? Is that OK?
  
 
 Looks like there's are duplications if guest enable arp_notify vm is
 started,

How hard would it be to avoid these duplicates?

 but we need to handle the situation that resuming a stopped
 virtual machine.
 
 For the link down race, I don't see any real issue, either dropping or
 queued.

For example, you do
unregister_netdev(vi-dev);
cancel_work_sync(vi-announce);

which looks scary as announce seems to use the netdev.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next-2.6 PATCH 0/8 RFC v2] macvlan: MAC Address filtering support for passthru mode

2011-10-25 Thread Michael S. Tsirkin

On Mon, Oct 24, 2011 at 11:15:05AM -0700, Roopa Prabhu wrote:
  ... And also I dont know of any hw
  that provides an interface to set hw filters on a per queue basis.
  
  VMDq hardware would support this, no?
  
 Am not really sure. This patch uses netdev to pass filters to hw. And I
 don't see any netdev infrastructure that would support per queue filters.

Sure. I was only saying that as far as I understand,
VMDq hardware does support this functionality.
Right, Greg?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [net-next-2.6 PATCH 0/8 RFC v2] macvlan: MAC Address filtering support for passthru mode

2011-10-25 Thread Rose, Gregory V

 -Original Message-
 From: Michael S. Tsirkin [mailto:m...@redhat.com]
 Sent: Tuesday, October 25, 2011 8:46 AM
 To: Roopa Prabhu
 Cc: net...@vger.kernel.org; s...@us.ibm.com; dragos.tatu...@gmail.com;
 a...@arndb.de; kvm@vger.kernel.org; da...@davemloft.net;
 mc...@broadcom.com; dwa...@cisco.com; shemmin...@vyatta.com;
 eric.duma...@gmail.com; ka...@trash.net; be...@cisco.com; Rose, Gregory V
 Subject: Re: [net-next-2.6 PATCH 0/8 RFC v2] macvlan: MAC Address
 filtering support for passthru mode

 On Mon, Oct 24, 2011 at 11:15:05AM -0700, Roopa Prabhu wrote:
   ... And also I dont know of any hw
   that provides an interface to set hw filters on a per queue basis.

   VMDq hardware would support this, no?

  Am not really sure. This patch uses netdev to pass filters to hw. And I
  don't see any netdev infrastructure that would support per queue
 filters.

 Sure. I was only saying that as far as I understand,
 VMDq hardware does support this functionality.
 Right, Greg?

In the case of Intel HW yes.  I'll refrain from speaking for other HW vendors 
although I'm guessing it would be true in their cases also.  YMMV, caveat 
emptor, etc. etc.

- Greg

 --
 MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V2 4/5] kvm guest : Added configuration support to enable debug information for KVM Guests

2011-10-25 Thread Raghavendra K T


On 10/24/2011 03:45 PM, Avi Kivity wrote:

On 10/23/2011 09:07 PM, Raghavendra K T wrote:

Added configuration support to enable debug information
for KVM Guests in debugfs

Signed-off-by: Srivatsa Vaddagiriva...@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulosesuz...@in.ibm.com
Signed-off-by: Raghavendra K Traghavendra...@linux.vnet.ibm.com
---
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1f03f82..ed34269 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -562,6 +562,15 @@ config KVM_GUEST
  This option enables various optimizations for running under the KVM
  hypervisor.

+config KVM_DEBUG_FS
+   bool Enable debug information for KVM Guests in debugfs
+   depends on KVM_GUEST
+   default n
+   ---help---
+ This option enables collection of various statistics for KVM guest.
+ Statistics are displayed in debugfs filesystem. Enabling this option
+ may incur significant overhead.
+
  source arch/x86/lguest/Kconfig




This might be better implemented through tracepoints, which an be
enabled dynamically.


Interesting option. I 'll explore this one. Thanks for pointing.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V2 3/5] kvm hypervisor : Add two hypercalls to support pv-ticketlock

2011-10-25 Thread Raghavendra K T


On 10/24/2011 07:20 PM, Srivatsa Vaddagiri wrote:

* Avi Kivitya...@redhat.com  [2011-10-24 15:09:25]:


I guess with that change, we can also dropthe need for other hypercall
introduced in this patch (kvm_pv_kick_cpu_op()). Essentially a vcpu sleeping
because of HLT instruction can be woken up by a IPI issued by vcpu releasing a
lock.


Not if interrupts are disabled.


Hmm yes ..so we need a kick hypercall then.



So then do also you foresee the need for directed yield at some point, 
to address LHP? provided we have good improvements to prove.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V2 4/5] kvm guest : Added configuration support to enable debug information for KVM Guests

2011-10-25 Thread Jeremy Fitzhardinge

On 10/24/2011 03:15 AM, Avi Kivity wrote:
 On 10/23/2011 09:07 PM, Raghavendra K T wrote:
 Added configuration support to enable debug information
 for KVM Guests in debugfs
 
 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 Signed-off-by: Suzuki Poulose suz...@in.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
 index 1f03f82..ed34269 100644
 --- a/arch/x86/Kconfig
 +++ b/arch/x86/Kconfig
 @@ -562,6 +562,15 @@ config KVM_GUEST
This option enables various optimizations for running under the KVM
hypervisor.
  
 +config KVM_DEBUG_FS
 +bool Enable debug information for KVM Guests in debugfs
 +depends on KVM_GUEST
 +default n
 +---help---
 +  This option enables collection of various statistics for KVM guest.
 +  Statistics are displayed in debugfs filesystem. Enabling this option
 +  may incur significant overhead.
 +
  source arch/x86/lguest/Kconfig
  

 This might be better implemented through tracepoints, which an be
 enabled dynamically.

Tracepoints use spinlocks, so that could get awkward.

J

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V2 5/5] kvm guest : pv-ticketlocks support for linux guests running on KVM hypervisor

2011-10-25 Thread Jeremy Fitzhardinge

On 10/23/2011 12:07 PM, Raghavendra K T wrote:
 This patch extends Linux guests running on KVM hypervisor to support
 pv-ticketlocks. Very early during bootup, paravirtualied KVM guest detects if 
 the hypervisor has required feature (KVM_FEATURE_WAIT_FOR_KICK) to support 
 pv-ticketlocks. If so, support for pv-ticketlocks is registered via 
 pv_lock_ops.

 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 Signed-off-by: Suzuki Poulose suz...@in.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
 diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
 index 2874c19..c7f34b7 100644
 --- a/arch/x86/include/asm/kvm_para.h
 +++ b/arch/x86/include/asm/kvm_para.h
 @@ -195,10 +195,18 @@ void kvm_async_pf_task_wait(u32 token);
  void kvm_async_pf_task_wake(u32 token);
  u32 kvm_read_and_reset_pf_reason(void);
  extern void kvm_disable_steal_time(void);
 -#else
 +
 +#ifdef CONFIG_PARAVIRT_SPINLOCKS
 +void __init kvm_guest_early_init(void);
 +#else /* CONFIG_PARAVIRT_SPINLOCKS */
 +#define kvm_guest_early_init() do { } while (0)
 +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
 +
 +#else /* CONFIG_KVM_GUEST */
  #define kvm_guest_init() do { } while (0)
  #define kvm_async_pf_task_wait(T) do {} while(0)
  #define kvm_async_pf_task_wake(T) do {} while(0)
 +#define kvm_guest_early_init() do { } while (0)
  static inline u32 kvm_read_and_reset_pf_reason(void)
  {
   return 0;
 diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
 index 3bb0850..fb25bca 100644
 --- a/arch/x86/kernel/head32.c
 +++ b/arch/x86/kernel/head32.c
 @@ -9,6 +9,7 @@
  #include linux/start_kernel.h
  #include linux/mm.h
  #include linux/memblock.h
 +#include linux/kvm_para.h
  
  #include asm/setup.h
  #include asm/sections.h
 @@ -59,6 +60,8 @@ void __init i386_start_kernel(void)
   break;
   }
  
 +  kvm_guest_early_init();
 +
   /*
* At this point everything still needed from the boot loader
* or BIOS or kernel text should be early reserved or marked not
 diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
 index 5655c22..cabf8ec 100644
 --- a/arch/x86/kernel/head64.c
 +++ b/arch/x86/kernel/head64.c
 @@ -13,6 +13,7 @@
  #include linux/start_kernel.h
  #include linux/io.h
  #include linux/memblock.h
 +#include linux/kvm_para.h
  
  #include asm/processor.h
  #include asm/proto.h
 @@ -115,6 +116,8 @@ void __init x86_64_start_reservations(char 
 *real_mode_data)
  
   reserve_ebda_region();
  
 + kvm_guest_early_init();
 +
   /*
* At this point everything still needed from the boot loader
* or BIOS or kernel text should be early reserved or marked not
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index a9c2116..f4f341f 100644
 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -39,6 +39,16 @@
  #include asm/desc.h
  #include asm/tlbflush.h
  
 +#ifdef CONFIG_PARAVIRT_SPINLOCKS
 +
 +#ifdef CONFIG_KVM_DEBUG_FS
 +
 +#include linux/debugfs.h
 +
 +#endif /* CONFIG_KVM_DEBUG_FS */
 +
 +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
 +
  #define MMU_QUEUE_SIZE 1024
  
  static int kvmapf = 1;
 @@ -627,3 +637,240 @@ static __init int activate_jump_labels(void)
   return 0;
  }
  arch_initcall(activate_jump_labels);
 +
 +#ifdef CONFIG_PARAVIRT_SPINLOCKS
 +
 +#ifdef CONFIG_KVM_DEBUG_FS
 +
 +static struct kvm_spinlock_stats
 +{
 + u32 taken_slow;
 + u32 taken_slow_pickup;
 +
 + u32 released_slow;
 + u32 released_slow_kicked;
 +
 +#define HISTO_BUCKETS30
 + u32 histo_spin_blocked[HISTO_BUCKETS+1];
 +
 + u64 time_blocked;
 +} spinlock_stats;
 +
 +static u8 zero_stats;
 +
 +static inline void check_zero(void)
 +{
 + if (unlikely(zero_stats)) {
 + memset(spinlock_stats, 0, sizeof(spinlock_stats));
 + zero_stats = 0;
 + }
 +}
 +
 +#define ADD_STATS(elem, val) \
 + do { check_zero(); spinlock_stats.elem += (val); } while (0)
 +
 +static inline u64 spin_time_start(void)
 +{
 + return sched_clock();
 +}
 +
 +static void __spin_time_accum(u64 delta, u32 *array)
 +{
 + unsigned index = ilog2(delta);
 +
 + check_zero();
 +
 + if (index  HISTO_BUCKETS)
 + array[index]++;
 + else
 + array[HISTO_BUCKETS]++;
 +}
 +
 +static inline void spin_time_accum_blocked(u64 start)
 +{
 + u32 delta = sched_clock() - start;
 +
 + __spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
 + spinlock_stats.time_blocked += delta;
 +}
 +
 +static struct dentry *d_spin_debug;
 +static struct dentry *d_kvm_debug;
 +
 +struct dentry *kvm_init_debugfs(void)
 +{
 + d_kvm_debug = debugfs_create_dir(kvm, NULL);
 + if (!d_kvm_debug)
 + printk(KERN_WARNING Could not create 'kvm' debugfs 
 directory\n);
 +
 + return d_kvm_debug;
 +}
 +
 +static int __init kvm_spinlock_debugfs(void)
 +{
 + struct dentry *d_kvm = kvm_init_debugfs();
 +
 + if (d_kvm == NULL)
 +

[PATCH 0/6] Avoid soft lockup message when KVM is stopped by host

2011-10-25 Thread Eric B Munson

When a guest kernel is stopped by the host hypervisor it can look like a soft
lockup to the guest kernel.  This false warning can mask later soft lockup
warnings which may be real.  This patch series adds a method for a host
hypervisor to communicate to a guest kernel that it is being stopped.  The
final patch in the series has the watchdog check this flag when it goes to
issue a soft lockup warning and skip the warning if the guest knows it was
stopped.

It was attempted to solve this in Qemu, but the side effects of saving and
restoring the clock and tsc for each vcpu put the wall clock of the guest behind
by the amount of time of the pause.  This forces a guest to have ntp running
in order to keep the wall clock accurate.

Eric B Munson (6):
  Add flag to indicate that a vm was stopped by the host
  Add functions to check if the host has stopped the vm
  Add ioctl for KVM_GUEST_STOPPED
  Add generic stubs for kvm stop check functions
  Add check for suspended vm in softlockup detector
  Add age out of guest paused flag

 arch/x86/include/asm/pvclock-abi.h |1 +
 arch/x86/include/asm/pvclock.h |7 
 arch/x86/kernel/kvmclock.c |   55 
 arch/x86/kvm/x86.c |   14 +
 include/asm-generic/pvclock.h  |   19 
 include/linux/kvm.h|4 ++
 include/linux/kvm_host.h   |2 +
 kernel/watchdog.c  |9 ++
 8 files changed, 111 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/pvclock.h

-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/6] Add flag to indicate that a vm was stopped by the host

2011-10-25 Thread Eric B Munson

This flag will be used to check if the vm was stopped by the host when a soft
lockup was detected.

Signed-off-by: Eric B Munson emun...@mgebm.net
---
 arch/x86/include/asm/pvclock-abi.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index 35f2d19..6167fd7 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -40,5 +40,6 @@ struct pvclock_wall_clock {
 } __attribute__((__packed__));
 
 #define PVCLOCK_TSC_STABLE_BIT (1  0)
+#define PVCLOCK_GUEST_STOPPED  (1  1)
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/6] Add ioctl for KVM_GUEST_STOPPED

2011-10-25 Thread Eric B Munson

Now that we have a flag that will tell the guest it was suspended, create an
interface for that communication using a KVM ioctl.

Signed-off-by: Eric B Munson emun...@mgebm.net
---
 arch/x86/include/asm/pvclock.h |3 +++
 arch/x86/kernel/kvmclock.c |   12 
 arch/x86/kvm/x86.c |5 +
 include/linux/kvm.h|2 ++
 4 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 7d3ba41..9312814 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -3,6 +3,7 @@
 
 #include linux/clocksource.h
 #include asm/pvclock-abi.h
+#include linux/kvm_host.h
 
 /* some helper functions for xen and kvm pv clock sources */
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
@@ -13,6 +14,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
struct timespec *ts);
 void pvclock_resume(void);
 
+void kvm_set_host_stopped(struct kvm_vcpu *vcpu);
+
 bool kvm_check_and_clear_host_stopped(int cpu);
 
 /*
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 8ddcfaf..f4fff3d 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -22,6 +22,7 @@
 #include asm/msr.h
 #include asm/apic.h
 #include linux/percpu.h
+#include linux/kvm_host.h
 
 #include asm/x86_init.h
 #include asm/reboot.h
@@ -113,6 +114,17 @@ static void kvm_get_preset_lpj(void)
preset_lpj = lpj;
 }
 
+/*
+ * kvm_set_host_stopped() indicates to the guest kernel that it has been
+ * stopped by the hypervisor.  This function will be called from the host only.
+ */
+void kvm_set_host_stopped(struct kvm_vcpu *vcpu)
+{
+   struct pvclock_vcpu_time_info *src = vcpu-arch.hv_clock;
+   src-flags |= PVCLOCK_GUEST_STOPPED;
+}
+EXPORT_SYMBOL_GPL(kvm_set_host_stopped);
+
 bool kvm_check_and_clear_host_stopped(int cpu)
 {
bool ret = false;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 84a28ea..16e029e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3264,6 +3264,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
goto out;
}
+   case KVM_PAUSE_GUEST: {
+   r = 0;
+   kvm_set_host_stopped(vcpu);
+   break;
+   }
default:
r = -EINVAL;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index aace6b8..a2697be 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -759,6 +759,8 @@ struct kvm_clock_data {
 #define KVM_CREATE_SPAPR_TCE _IOW(KVMIO,  0xa8, struct 
kvm_create_spapr_tce)
 /* Available with KVM_CAP_RMA */
 #define KVM_ALLOCATE_RMA _IOR(KVMIO,  0xa9, struct kvm_allocate_rma)
+/* VM is being stopped by host */
+#define KVM_PAUSE_GUEST  _IO(KVMIO,   0xaa)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] Add generic stubs for kvm stop check functions

2011-10-25 Thread Eric B Munson

Signed-off-by: Eric B Munson emun...@mgebm.net
---
 include/asm-generic/pvclock.h |   19 +++
 1 files changed, 19 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/pvclock.h

diff --git a/include/asm-generic/pvclock.h b/include/asm-generic/pvclock.h
new file mode 100644
index 000..b42a8eb
--- /dev/null
+++ b/include/asm-generic/pvclock.h
@@ -0,0 +1,19 @@
+#ifndef _ASM_GENERIC_PVCLOCK_H
+#define _ASM_GENERIC_PVCLOCK_H
+
+
+/*
+ * These functions are used by architectures that support kvm to avoid issuing
+ * false soft lockup messages.
+ */
+static inline bool kvm_check_and_clear_host_stopped(int cpu)
+{
+   return false;
+}
+
+static inline void kvm_clear_guest_paused(struct kvm_vcpu *vcpu)
+{
+   return;
+}
+
+#endif
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/6] Add functions to check if the host has stopped the vm

2011-10-25 Thread Eric B Munson

When a host stops or suspends a VM it will set a flag to show this.  The
watchdog will use these functions to determine if a softlockup is real, or the
result of a suspended VM.

Signed-off-by: Eric B Munson emun...@mgebm.net
---
 arch/x86/include/asm/pvclock.h |2 ++
 arch/x86/kernel/kvmclock.c |   19 +++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index c59cc97..7d3ba41 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -13,6 +13,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
struct timespec *ts);
 void pvclock_resume(void);
 
+bool kvm_check_and_clear_host_stopped(int cpu);
+
 /*
  * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
  * yielding a 64-bit result.
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index c1a0188..8ddcfaf 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -113,6 +113,25 @@ static void kvm_get_preset_lpj(void)
preset_lpj = lpj;
 }
 
+bool kvm_check_and_clear_host_stopped(int cpu)
+{
+   bool ret = false;
+   struct pvclock_vcpu_time_info *src;
+
+   /*
+* per_cpu() is safe here because this function is only called from
+* timer functions where preemption is already disabled.
+*/
+   WARN_ON(!in_atomic());
+   src = per_cpu(hv_clock, cpu);
+   if ((src-flags  PVCLOCK_GUEST_STOPPED) != 0) {
+   src-flags = src-flags  (~PVCLOCK_GUEST_STOPPED);
+   ret = true;
+   }
+
+   return ret;
+}
+
 static struct clocksource kvm_clock = {
.name = kvm-clock,
.read = kvm_clock_get_cycles,
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] Add age out of guest paused flag

2011-10-25 Thread Eric B Munson

The KVM_GUEST_PAUSED flag will prevent a guest from compaining about a soft
lockup but it can mask real soft lockups if the flag isn't cleared when it is
no longer relevant.  This patch adds a kvm ioctl that the hypervisor will use
when it resumes a guest to start a timer for aging out the flag.  The time out
will be specified by the hypervisor in the ioctl call.

Signed-off-by: Eric B Munson emun...@mgebm.net
---
 arch/x86/include/asm/pvclock.h |2 ++
 arch/x86/kernel/kvmclock.c |   24 
 arch/x86/kvm/x86.c |9 +
 include/linux/kvm.h|2 ++
 include/linux/kvm_host.h   |2 ++
 5 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 9312814..e8460b9 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -18,6 +18,8 @@ void kvm_set_host_stopped(struct kvm_vcpu *vcpu);
 
 bool kvm_check_and_clear_host_stopped(int cpu);
 
+void kvm_clear_guest_paused(struct kvm_vcpu *vcpu, unsigned int length);
+
 /*
  * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
  * yielding a 64-bit result.
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index f4fff3d..f3f3935 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -23,6 +23,8 @@
 #include asm/apic.h
 #include linux/percpu.h
 #include linux/kvm_host.h
+#include linux/kobject.h
+#include linux/sysfs.h
 
 #include asm/x86_init.h
 #include asm/reboot.h
@@ -144,6 +146,28 @@ bool kvm_check_and_clear_host_stopped(int cpu)
return ret;
 }
 
+static void kvm_timer_clear_guest_paused(unsigned long vcpu_addr)
+{
+   struct kvm_vcpu *vcpu = (struct kvm_vcpu *)vcpu_addr;
+   struct pvclock_vcpu_time_info *src = vcpu-arch.hv_clock;
+   src-flags = src-flags  (!PVCLOCK_GUEST_STOPPED);
+}
+
+/*
+ * Host has resumed the guest, we need to clear the guest paused flag so we
+ * don't mask any real soft lockups.
+ */
+void kvm_clear_guest_paused(struct kvm_vcpu *vcpu, unsigned int length)
+{
+   if (!timer_pending(vcpu-flag_timer))
+   setup_timer(vcpu-flag_timer,
+   kvm_timer_clear_guest_paused,
+   (unsigned long)vcpu);
+   mod_timer(vcpu-flag_timer,
+ jiffies + (length * HZ));
+}
+EXPORT_SYMBOL_GPL(kvm_clear_guest_paused);
+
 static struct clocksource kvm_clock = {
.name = kvm-clock,
.read = kvm_clock_get_cycles,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 16e029e..b907b5a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3269,6 +3269,15 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
kvm_set_host_stopped(vcpu);
break;
}
+   case KVM_CLEAR_GUEST_PAUSED: {
+   unsigned int length;
+   r = -EFAULT;
+   if (copy_from_user(length, argp, sizeof length))
+   goto out;
+   r = 0;
+   kvm_clear_guest_paused(vcpu, length);
+   break;
+   }
default:
r = -EINVAL;
}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index a2697be..f333274 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -761,6 +761,8 @@ struct kvm_clock_data {
 #define KVM_ALLOCATE_RMA _IOR(KVMIO,  0xa9, struct kvm_allocate_rma)
 /* VM is being stopped by host */
 #define KVM_PAUSE_GUEST  _IO(KVMIO,   0xaa)
+/* Start the timer to clear the paused flag */
+#define KVM_CLEAR_GUEST_PAUSED   _IO(KVMIO,   0xab)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index eabb21a..e21be4b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -18,6 +18,7 @@
 #include linux/msi.h
 #include linux/slab.h
 #include linux/rcupdate.h
+#include linux/timer.h
 #include asm/signal.h
 
 #include linux/kvm.h
@@ -152,6 +153,7 @@ struct kvm_vcpu {
 #endif
 
struct kvm_vcpu_arch arch;
+   struct timer_list flag_timer;
 };
 
 static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/6] Add check for suspended vm in softlockup detector

2011-10-25 Thread Eric B Munson

A suspended VM can cause spurious soft lockup warnings.  To avoid these, the
watchdog now checks if the kernel knows it was stopped by the host and skips
the warning if so.

Signed-off-by: Eric B Munson emun...@mgebm.net
---
 kernel/watchdog.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 36491cd..8107ba7 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -25,6 +25,7 @@
 #include linux/sysctl.h
 
 #include asm/irq_regs.h
+#include asm/pvclock.h
 #include linux/perf_event.h
 
 int watchdog_enabled = 1;
@@ -292,6 +293,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
 */
duration = is_softlockup(touch_ts);
if (unlikely(duration)) {
+   /*
+* If a virtual machine is stopped by the host it can look to
+* the watchdog like a soft lockup, check to see if the host
+* stopped the vm before we issue the warning
+*/
+   if (kvm_check_and_clear_host_stopped(smp_processor_id()))
+   return HRTIMER_RESTART;
+
/* only warn once */
if (__this_cpu_read(soft_watchdog_warn) == true)
return HRTIMER_RESTART;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call agenda for October 25

2011-10-25 Thread Alexander Graf





On 25.10.2011, at 17:32, Kevin Wolf kw...@redhat.com wrote:

 Am 25.10.2011 16:06, schrieb Anthony Liguori:
 On 10/25/2011 08:56 AM, Kevin Wolf wrote:
 Am 25.10.2011 15:05, schrieb Anthony Liguori:
 On 10/25/2011 07:35 AM, Kevin Wolf wrote:
 Am 24.10.2011 13:35, schrieb Paolo Bonzini:
 On 10/24/2011 01:04 PM, Juan Quintela wrote:
 
 Hi
 
 Please send in any agenda items you are interested in covering.
 
 - What's left to merge for 1.0.
 
 I would still like to cache the default cache mode (probably to
 cache=writeback). We don't allow guests to toggle WCE yet which Anthony
 would have liked to see before doing the change. Is it a strict 
 requirement?
 
 I don't see a way around it.  If the default mode is cache=writeback, then 
 we're
 open to data corruption in any guest where barrier=0.  With guest 
 togglable WCE,
 it ends up being a guest configuration issue so we can more or less defer
 responsibility.
 
 So do you think that offering a WCE inside the guest would be a real
 solution or just a way to have an excuse?
 
 No, it offers a mechanism to fix mistakes at run-time verses at start up 
 time. 
 
 This is true (in both directions). But I think it's independent from the
 right default.
 
  It also means that you can make template images that understand that they 
 don't support barriers and change the WCE setting appropriately.
 
 Isn't that really a job for management tools?
 
 Christoph said that OSes don't usually change this by themselves, it
 would need an administrator manually changing the setting. But if we
 require that, we can just as well require that the administrator set
 cache=writethrough on the qemu command line.
 
 The administrator of the guest != the administrator of the host.
 
 But the administrator of the guest == the owner of the qemu instance,
 no? He should be the one to use the management tools and configure his VMs.
 
 Do you think it's a good idea to change the default mode w/o guest WCE 
 toggle
 support?  What's your view about older guests if we change the default 
 mode?
 What's your main motivation for wanting to change the default mode?
 
 Because people are constantly complaining about the awful
 (cache=writethrough) performance they get before they are told they
 should use a different cache option. And they are right. The
 out-of-the-box experience with qemu's block performance really sucks.
 
 With qcow2 you mean, right?
 
 No, with any format, including raw. Which isn't surprising at all,
 O_SYNC makes writes very expensive.
 
 I'd be much more open to changing the default mode to cache=none FWIW 
 since the
 risk of data loss there is much, much lower.
 
 I think people said that they'd rather not have cache=none as default
 because O_DIRECT doesn't work everywhere.
 
 Where doesn't it work these days?  I know it doesn't work on tmpfs.  I know 
 it 
 works on ext[234], btrfs, nfs.
 
 I think tmpfs was named (and failing to start with default settings on
 tmpfs would be nasty enough), but iirc Alex had another one.

Yeah, IIRC NFS also failed on me :)

Alex

 
 Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V2 5/5] kvm guest : pv-ticketlocks support for linux guests running on KVM hypervisor

2011-10-25 Thread Jeremy Fitzhardinge

On 10/23/2011 12:07 PM, Raghavendra K T wrote:
 This patch extends Linux guests running on KVM hypervisor to support
 pv-ticketlocks. Very early during bootup, paravirtualied KVM guest detects if 
 the hypervisor has required feature (KVM_FEATURE_WAIT_FOR_KICK) to support 
 pv-ticketlocks. If so, support for pv-ticketlocks is registered via 
 pv_lock_ops.

 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 Signed-off-by: Suzuki Poulose suz...@in.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
 diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
 index 2874c19..c7f34b7 100644
 --- a/arch/x86/include/asm/kvm_para.h
 +++ b/arch/x86/include/asm/kvm_para.h
 @@ -195,10 +195,18 @@ void kvm_async_pf_task_wait(u32 token);
  void kvm_async_pf_task_wake(u32 token);
  u32 kvm_read_and_reset_pf_reason(void);
  extern void kvm_disable_steal_time(void);
 -#else
 +
 +#ifdef CONFIG_PARAVIRT_SPINLOCKS
 +void __init kvm_guest_early_init(void);
 +#else /* CONFIG_PARAVIRT_SPINLOCKS */
 +#define kvm_guest_early_init() do { } while (0)
 +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
 +
 +#else /* CONFIG_KVM_GUEST */
  #define kvm_guest_init() do { } while (0)
  #define kvm_async_pf_task_wait(T) do {} while(0)
  #define kvm_async_pf_task_wake(T) do {} while(0)
 +#define kvm_guest_early_init() do { } while (0)
  static inline u32 kvm_read_and_reset_pf_reason(void)
  {
   return 0;
 diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
 index 3bb0850..fb25bca 100644
 --- a/arch/x86/kernel/head32.c
 +++ b/arch/x86/kernel/head32.c
 @@ -9,6 +9,7 @@
  #include linux/start_kernel.h
  #include linux/mm.h
  #include linux/memblock.h
 +#include linux/kvm_para.h
  
  #include asm/setup.h
  #include asm/sections.h
 @@ -59,6 +60,8 @@ void __init i386_start_kernel(void)
   break;
   }
  
 +  kvm_guest_early_init();
 +
   /*
* At this point everything still needed from the boot loader
* or BIOS or kernel text should be early reserved or marked not
 diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
 index 5655c22..cabf8ec 100644
 --- a/arch/x86/kernel/head64.c
 +++ b/arch/x86/kernel/head64.c
 @@ -13,6 +13,7 @@
  #include linux/start_kernel.h
  #include linux/io.h
  #include linux/memblock.h
 +#include linux/kvm_para.h
  
  #include asm/processor.h
  #include asm/proto.h
 @@ -115,6 +116,8 @@ void __init x86_64_start_reservations(char 
 *real_mode_data)
  
   reserve_ebda_region();
  
 + kvm_guest_early_init();
 +
   /*
* At this point everything still needed from the boot loader
* or BIOS or kernel text should be early reserved or marked not
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index a9c2116..f4f341f 100644
 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -39,6 +39,16 @@
  #include asm/desc.h
  #include asm/tlbflush.h
  
 +#ifdef CONFIG_PARAVIRT_SPINLOCKS
 +
 +#ifdef CONFIG_KVM_DEBUG_FS
 +
 +#include linux/debugfs.h
 +
 +#endif /* CONFIG_KVM_DEBUG_FS */
 +
 +#endif /* CONFIG_PARAVIRT_SPINLOCKS */

This is a big mess.  Is there any problem with including linux/debugfs.h
unconditionally?  Or at least using #if
defined(CONFIG_PARAVIRT_SPINLOCKS)  defined(CONFIG_KVM_DEBUG_FS)?

J
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm tools: Fix SDL exit on window close

2011-10-25 Thread Osier Yang

于 2011年10月25日 19:46, Sasha Levin 写道:
 We've changed IPC to use sockets instead of signals, but the process of
 closing the SDL window was still trigerring an exit signal causing
 an ugly message and not cleaning up after itself.

 This patch switches that to use the proper method of cleaning up.

The patch works, thanks.

Osier
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

about NPIV with qemu-kvm.

2011-10-25 Thread ya su

hi, hannes:

I want to use NPIV with qemu-kvm, I issued the following command:

echo ':' 
/sys/class/fc_host/host0/vport_create

and it will produce a new host6 and one vport succesfully, but it
does not create any virtual hba pci device. so I don't know how to
assign the virtual host to qemu-kvm.

from your this mail, does array will first need to assign a lun to
this vport? and through this new created disk, like device /dev/sdf,
then I add qemu-kvm with -drive file=/dev/sdf,if=virtio... arguments?


Regards.

Suya.

2011/6/29, Hannes Reinecke h...@suse.de:
 On 06/29/2011 12:07 PM, Christoph Hellwig wrote:
 On Wed, Jun 29, 2011 at 10:39:42AM +0100, Stefan Hajnoczi wrote:
 I think we're missing a level of addressing.  We need the ability to
 talk to multiple target ports in order for list target ports to make
 sense.  Right now there is one implicit target that handles all
 commands.  That means there is one fixed I_T Nexus.

 If we introduce list target ports we also need a way to say This
 CDB is destined for target port #0.  Then it is possible to enumerate
 target ports and address targets independently of the LUN field in the
 CDB.

 I'm pretty sure this is also how SAS and other transports work.  In
 their framing they include the target port.

 Yes, exactly.  Hierachial LUNs are a nasty fringe feature that we should
 avoid as much as possible, that is for everything but IBM vSCSI which is
 braindead enough to force them.

 Yep.

 The question is whether we really need to support multiple targets on
 a virtio-scsi adapter or not.  If you are selectively mapping LUNs
 that the guest may access, then multiple targets are not necessary.
 If we want to do pass-through of the entire SCSI bus then we need
 multiple targets but I'm not sure if there are other challenges like
 dependencies on the transport (Fibre Channel, SAS, etc) which make it
 impossible to pass through bus-level access?

 I don't think bus-level pass through is either easily possible nor
 desirable.  What multiple targets are useful for is allowing more
 virtual disks than we have virtual PCI slots.  We could do this by
 supporting multiple LUNs, but given that many SCSI ressources are
 target-based doing multiple targets most likely is the more scabale
 and more logical variant.  E.g. we could much more easily have one
 virtqueue per target than per LUN.

 The general idea here is that we can support NPIV.
 With NPIV we'll have several scsi_hosts, each of which is assigned a
 different set of LUNs by the array.
 With virtio we need to able to react on LUN remapping on the array
 side, ie we need to be able to issue a 'REPORT LUNS' command and
 add/remove LUNs on the fly. This means we have to expose the
 scsi_host in some way via virtio.

 This is impossible with a one-to-one mapping between targets and
 LUNs. The actual bus-level pass-through will be just on the SCSI
 layer, ie 'REPORT LUNS' should be possible. If and how we do a LUN
 remapping internally on the host is a totally different matter.
 Same goes for the transport details; I doubt we will expose all the
 dingy details of the various transports, but rather restrict
 ourselves to an abstract transport.

 Cheers,

 Hannes
 --
 Dr. Hannes Reinecke zSeries  Storage
 h...@suse.de+49 911 74053 688
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
 GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC v2 PATCH 5/4 PATCH] virtio-net: send gratuitous packet when needed

2011-10-25 Thread Jason Wang

On 10/25/2011 11:41 PM, Michael S. Tsirkin wrote:
 On Tue, Oct 25, 2011 at 10:50:41AM +0800, Jason Wang wrote:
 On 10/24/2011 01:25 PM, Michael S. Tsirkin wrote:
 On Mon, Oct 24, 2011 at 02:54:59PM +1030, Rusty Russell wrote:
 On Sat, 22 Oct 2011 13:43:11 +0800, Jason Wang jasow...@redhat.com wrote:
 This make let virtio-net driver can send gratituous packet by a new
 config bit - VIRTIO_NET_S_ANNOUNCE in each config update
 interrupt. When this bit is set by backend, the driver would schedule
 a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS.

 This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE.

 Signed-off-by: Jason Wang jasow...@redhat.com

 This seems like a huge layering violation.  Imagine this in real
 hardware, for example.

 commits 06c4648d46d1b757d6b9591a86810be79818b60c
 and 99606477a5888b0ead0284fecb13417b1da8e3af
 document the need for this:

 NETDEV_NOTIFY_PEERS notifier indicates that a device moved to a 
 different physical link.
 and
 In real hardware such notifications are only
 generated when the device comes up or the address changes.

 So hypervisor could get the same behaviour by sending link up/down
 events, this is just an optimization so guest won't do
 unecessary stuff like try to reconfigure an IP address.


 Maybe LOCATION_CHANGE would be a better name?


 ANNOUNCE_SELF?
 
 It would be nice to formulate what kind of event
 are we notifying the guest about.
 The announce part of it is really up to the guest, isn't it?
 

Right.


 There may be a good reason why virtual devices might want this kind of
 reconfiguration cheat, which is unnecessary for normal machines,

 I think yes, the difference with real hardware is guest can change
 location without link getting dropped.
 FWIW, Xen seems to use this capability too.

 So does ms netvsc.


 but
 it'd have to be spelled out clearly in the spec to justify it...

 Cheers,
 Rusty.

 Agree, and I'd like to see the spec too. The interface seems
 to involve the guest clearing the status bit when it detects
 an event?

 I would describe this in spec. The interface need guest to clear the
 status bit, this would let the back-end know it has finished the work as
 we may need to send the gratuitous packets many times.


 Also - how does it interact with the link up event?
 We probably don't want to schedule this when we detect
 a link status change or during initialization, as
 this patch seems to do? What if link goes down
 while the work is running? Is that OK?


 Looks like there's are duplications if guest enable arp_notify vm is
 started,
 
 How hard would it be to avoid these duplicates?

Not hard, it could be done in backend by distinguishing the reason :
fresh start or cont after migration or stop.

 
 but we need to handle the situation that resuming a stopped
 virtual machine.

 For the link down race, I don't see any real issue, either dropping or
 queued.
 
 For example, you do
 unregister_netdev(vi-dev);
 cancel_work_sync(vi-announce);
 
 which looks scary as announce seems to use the netdev.
 

oops, it's a bug, I would fix it. Thanks
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: about NPIV with qemu-kvm.

2011-10-25 Thread Hannes Reinecke

On 10/26/2011 06:40 AM, ya su wrote:
 hi, hannes:
 
 I want to use NPIV with qemu-kvm, I issued the following command:
 
 echo ':' 
 /sys/class/fc_host/host0/vport_create
 
 and it will produce a new host6 and one vport succesfully, but it
 does not create any virtual hba pci device. so I don't know how to
 assign the virtual host to qemu-kvm.
 
Well, you can't. There is no mechanism for. When using NPIV you need
to pass in the individual LUNs via eg virtio-blk.

 from your this mail, does array will first need to assign a lun to
 this vport? and through this new created disk, like device /dev/sdf,
 then I add qemu-kvm with -drive file=/dev/sdf,if=virtio... arguments?
 
Yes. That's what you need to do.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke  zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

55 matches

Mail list logo