Re: cpu hotplug
On Mon, Sep 20, 2010 at 09:07:16PM -0400, Kevin O'Connor wrote: On Mon, Sep 20, 2010 at 08:50:17AM +0200, Gleb Natapov wrote: On Sun, Sep 19, 2010 at 06:03:31PM -0400, Kevin O'Connor wrote: I was wrong. The cpu_set x offline does send an event to the guest OS. SeaBIOS even forwards the event along - as far as I can tell a Notify(CPxx, 3) event is generated by SeaBIOS. My Windows 7 ultimate beta seems to receive the event okay (it pops up a dialog box which says you can't unplug cpus). It may react to Eject() method. The eject method is called by the OS to notify the host. Right now SeaBIOS's eject method doesn't do anything. Yes. What I meant is it may react on presence of Eject() method. In my experience Windows consider all devices with Eject() method as hot-pluggable. But actually IIRC Windows 7 gave me this dialog box with BOCHS BIOS too and there we didn't have Eject() method. Unfortunately, my test linux guest OS (FC13) doesn't seem to do anything with the unplug Notify event. I've tried with the original FC13 and with a fully updated version - no luck. So, I'm guessing this has something to do with the guest OS. Can you verify that _STA() return zero after cpu unplug? I've verified that. I've also verified that Linux doesn't call the _STA method after Notify(CPxx, 3). It does call _STA on startup and after a Notify(CPxx, 1) event. So, the Linux kernel in my FC13 guest just seems to be ignoring Notify(3) events. (According to ACPI spec, the guest should shutdown the cpu and then call the eject method.) In older kernels _STA was called on Notify(3), but recently cpu hot-plug in Linux was changed. Can you check what happens if you'll call Notify(1) on unplug? Spec says that the value is: Device Check. Used to notify OSPM that the device either appeared or disappeared. so may be it should be called on hot-plug and hot-unplug. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] vhost-net: add dhclient work-around from userspace
Newer versions of dhclient should also be OK: they detect that checksum is missing in the packet. Try it e.g. with a recent fedora guest as a client. I don't have fedora, but with the latest release (4.1.1-P1) on isc.org it still behaves the same (see output at the bottom). To solve the problem for old clients, recent kernels and iptables have support for CHECKSUM target. You can use this target to compute and fill in the checksum in a packet that lacks a checksum. Typical expected use: iptables -A POSTROUTING -t mangle -p udp --dport bootpc \ -j CHECKSUM --checksum-fill Nice trick :D libvirt will program these for you if it sets up the server, maybe there needs a flag to tell it that server is local. I don't use libvirt. My point is, there doesn't seem to be much working client and the only working client is a ver very old one (pump), newer client do not work, as opposite to what you have explained. To repeat myself, here is the situation: - DHCP server with vhost_net, all client w/o vhost_net work, clients with vhost_net do not work except pump - DHCP server w/o vhost_net, all clients work - physical DHCP server, client with vhost *do* work. --- output of the lates DHCP client --- Internet Systems Consortium DHCP Client 4.1.1-P1 Copyright 2004-2010 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Listening on LPF/eth0/00:16:3e:00:07:01 Sending on LPF/eth0/00:16:3e:00:07:01 Sending on Socket/fallback DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 6 DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 13 DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 14 DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 10 DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 18 5 bad udp checksums in 5 packets No DHCPOFFERS received. No working leases in persistent database - sleeping. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for Sept 21
On 09/21/2010 05:37 AM, Nakajima, Jun wrote: Avi Kivity wrote on Mon, 20 Sep 2010 at 09:50:55: On 09/20/2010 06:44 PM, Chris Wright wrote: Please send in any agenda items you are interested in covering. nested vmx: the resurrection. Nice to see it progressing again, but there's still a lot of ground to cover. Perhaps we can involve Intel to speed things up? Hi, Avi What are you looking for? Help in getting the patchset in. Reviewing is always appreciated (while it tends to increase the time, the result is usually better). If we can find a way to share the work, even better. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: VMX: Emulated real mode interrupt injection
On 09/20/2010 07:30 PM, Marcelo Tosatti wrote: static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, u32 idt_vectoring_info, int instr_len_field, @@ -3864,9 +3814,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, int type; bool idtv_info_valid; - if (vmx-rmode.irq.pending) - fixup_rmode_irq(vmx,idt_vectoring_info); - Don't you have to undo kvm_inject_realmode_interrupt if injection fails? Injection cannot fail (at least, in the same sense as the vmx injections). It's actually not about failures, it's about guest entry being cancelled due to a signal or some KVM_REQ that needs attention. For vmx style injections, we need to undo the injection to keep things in a consistent state. To realmode emulated injection, everything is in a consistent state already, so no need to undo anything (it's also impossible, since we overwrote memory on the stack). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] support piix PAM registers in KVM
Without this BIOS fails to remap 0xf memory from ROM to RAM so writes to F-segment modify ROM content instead of memory copy. Since QEMU does not reloads ROMs during reset on next boot modified copy of BIOS is used. Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/hw/piix_pci.c b/hw/piix_pci.c index 933ad86..0bf435d 100644 --- a/hw/piix_pci.c +++ b/hw/piix_pci.c @@ -99,10 +99,6 @@ static void i440fx_update_memory_mappings(PCII440FXState *d) int i, r; uint32_t smram, addr; -if (kvm_enabled()) { -/* FIXME: Support remappings and protection changes. */ -return; -} update_pam(d, 0xf, 0x10, (d-dev.config[I440FX_PAM] 4) 3); for(i = 0; i 12; i++) { r = (d-dev.config[(i 1) + (I440FX_PAM + 1)] ((i 1) * 4)) 3; -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tracing KVM with Systemtap
On Mon, 2010-09-20 at 14:36 +0100, Stefan Hajnoczi wrote: Right now there are few pre-defined probes (trace events in QEMU tracing speak). As I develop I try to be mindful of new ones I create and whether they would be generally useful. I intend to contribute more probes and hope others will too! I am still looking at/hacking the QEMU code. I have looked at the following places in the code that I think can be useful to have statistics gathered: net.c qemu_deliver_packet(), etc - network statistics CPU Arch/op_helper.c global_cpu_lock(), tlb_fill() - lock unlock, and TLB refill statistics balloon.c, hw/virtio-balloon.c - ballooning information. Besides the ballooning part, which I know what it is but don't fully understand how it works, the other parts can be implemented as Systemtap tapsets (~ DTrace scripts) in the initial stage. I will see what other probes are useful for the end users. Also, are there developer documentations for KVM? (I googled but found a lot of presentations about KVM but not a lot of info about the internals.) Rayson Prerna is also looking at adding useful probes. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.
On Tue, Sep 21, 2010 at 09:39:31AM +0800, Xin, Xiaohui wrote: From: Michael S. Tsirkin [mailto:m...@redhat.com] Sent: Monday, September 20, 2010 7:37 PM To: Xin, Xiaohui Cc: net...@vger.kernel.org; kvm@vger.kernel.org; linux-ker...@vger.kernel.org; mi...@elte.hu; da...@davemloft.net; herb...@gondor.hengli.com.au; jd...@linux.intel.com Subject: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device. On Mon, Sep 20, 2010 at 04:08:48PM +0800, xiaohui@intel.com wrote: From: Xin Xiaohui xiaohui@intel.com --- Michael, I have move the ioctl to configure the locked memory to vhost It's ok to move this to vhost but vhost does not know how much memory is needed by the backend. I think the backend here you mean is mp device. Actually, the memory needed is related to vq-num to run zero-copy smoothly. That means mp device did not know it but vhost did. Well, this might be so if you insist on locking all posted buffers immediately. However, let's assume I have a very large ring and prepost a ton of RX buffers: there's no need to lock all of them directly: if we have buffers A and B, we can lock A, pass it to hardware, and when A is consumed unlock A, lock B and pass it to hardware. It's not really critical. But note we can always have userspace tell MP device all it wants to know, after all. And the rlimt stuff is per process, we use current pointer to set and check the rlimit, the operations should be in the same process. Well no, the ring is handled from the kernel thread: we switch the mm to point to the owner task so copy from/to user and friends work, but you can't access the rlimit etc. Now the check operations are in vhost process, as mp_recvmsg() or mp_sendmsg() are called by vhost. Hmm, what do you mean by the check operations? send/recv are data path operations, they shouldn't do any checks, should they? So set operations should be in vhost process too, it's natural. So I think we'll need another ioctl in the backend to tell userspace how much memory is needed? Except vhost tells it to mp device, mp did not know how much memory is needed to run zero-copy smoothly. Is userspace interested about the memory mp is needed? Couldn't parse this last question. I think userspace generally does want control over how much memory we'll lock. We should not just lock as much as we can. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Tracing KVM with Systemtap
On Tue, Sep 21, 2010 at 1:58 PM, Rayson Ho r...@redhat.com wrote: On Mon, 2010-09-20 at 14:36 +0100, Stefan Hajnoczi wrote: Right now there are few pre-defined probes (trace events in QEMU tracing speak). As I develop I try to be mindful of new ones I create and whether they would be generally useful. I intend to contribute more probes and hope others will too! I am still looking at/hacking the QEMU code. I have looked at the following places in the code that I think can be useful to have statistics gathered: net.c qemu_deliver_packet(), etc - network statistics Yes. CPU Arch/op_helper.c global_cpu_lock(), tlb_fill() - lock unlock, and TLB refill statistics These are not relevant to KVM, they are only used when running with KVM disabled (TCG mode). balloon.c, hw/virtio-balloon.c - ballooning information. Prerna added a balloon event which is in qemu.git trace-events. Does that one do what you need? I will see what other probes are useful for the end users. Also, are there developer documentations for KVM? (I googled but found a lot of presentations about KVM but not a lot of info about the internals.) Not really. I suggest grabbing the source and following vl.c:main() to the main KVM execution code. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
how CPU hot-plug is suppose to work on Linux?
Hello, We are trying to add CPU hot-plug/unplug capability to KVM. We want to be able to initiate hot-plug/unplug from a host. Our current schema works like this: We have Processor object in DSDT for each potentially available CPU. Each Processor object has _MAD, _STA, _EJ0. _MAD of present CPU returns enabled LAPIC structure. _STA of present CPU return 0xf. _MADT of non present CPU returns disabled LAPIC. _STA returns 0x0. _EJ0 does nothing. When CPU is hot plugged: 1. Bit is set in sts register of gpe 2. acpi interrupt is sent 3. Linux ACPI evaluates corespondent gpe's _L() method 4. _L() method determines which CPU's status is changed 5. For each CPU that changed status from not present to present call Notify(1) to corespondent Processor() object. When CPU is hot unplugged: 1. Bit is set in sts register of gpe 2. acpi interrupt is sent 3. Linux ACPI evaluates corespondent gpe's _L() method 4. _L() method determines which CPU's status is changed 5. For each CPU that changed status from present to non present call Notify(3) to corespondent Processor() object. Now, CPU hot plug appears to be working. But CPU hot unplug does nothing. I expect that Linux will offline CPU and eject it after evaluating Notify(3) and seeing that _STA of ejected CPU returns 0x0 now. Any ideas how it is suppose to work? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] device-assignment: register a reset function
Am 17.09.2010 18:16, schrieb ext Alex Williamson: On Fri, 2010-09-17 at 17:27 +0200, Bernhard Kohl wrote: This is necessary because during reboot of a VM the assigned devices continue DMA transfers which causes memory corruption. Signed-off-by: Thomas Ostlerthomas.ost...@nsn.com Signed-off-by: Bernhard Kohlbernhard.k...@nsn.com --- hw/device-assignment.c | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 87f7418..fb47813 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1450,6 +1450,17 @@ static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev) dev-msix_table_page = NULL; } +static void reset_assigned_device(void *opaque) +{ +PCIDevice *d = (PCIDevice *)opaque; +uint32_t conf; + +/* reset the bus master bit to avoid further DMA transfers */ +conf = assigned_dev_pci_read_config(d, PCI_COMMAND, 2); +conf= ~PCI_COMMAND_MASTER; +assigned_dev_pci_write_config(d, PCI_COMMAND, conf, 2); +} + static int assigned_initfn(struct PCIDevice *pci_dev) { AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev); @@ -1499,6 +1510,9 @@ static int assigned_initfn(struct PCIDevice *pci_dev) if (r 0) goto assigned_out; +/* register reset function for the device */ +qemu_register_reset(reset_assigned_device, pci_dev); + /* intercept MSI-X entry page in the MMIO */ if (dev-cap.available ASSIGNED_DEVICE_CAP_MSIX) if (assigned_dev_register_msix_mmio(dev)) Hmm, at a minimum, we need a qemu_unregister_reset() in the exitfn, but upon further inspection, we should probably just do it the qdev way. That would mean simply setting qdev.reset to reset_assigned_device() in assign_info, then we can leave the registration/de-registration to qdev. Does that work? Sorry I missed that the first time. Thanks, Alex OK, we will rework the patch for qdev. This might take 2 weeks because of vacation. Thanks Bernhard -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: mmu: fix counting of rmap entries in rmap_add()
On Sat, Sep 18, 2010 at 08:41:02AM +0800, Hillf Danton wrote: It seems that rmap entries are under counted. Signed-off-by: Hillf Danton dhi...@gmail.com --- Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: VMX: Emulated real mode interrupt injection
On Tue, Sep 21, 2010 at 01:56:50PM +0200, Avi Kivity wrote: On 09/20/2010 07:30 PM, Marcelo Tosatti wrote: static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, u32 idt_vectoring_info, int instr_len_field, @@ -3864,9 +3814,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, int type; bool idtv_info_valid; - if (vmx-rmode.irq.pending) - fixup_rmode_irq(vmx,idt_vectoring_info); - Don't you have to undo kvm_inject_realmode_interrupt if injection fails? Injection cannot fail (at least, in the same sense as the vmx injections). It's actually not about failures, it's about guest entry being cancelled due to a signal or some KVM_REQ that needs attention. For vmx style injections, we need to undo the injection to keep things in a consistent state. To realmode emulated injection, everything is in a consistent state already, so no need to undo anything (it's also impossible, since we overwrote memory on the stack). Aren't you going to push EFLAGS,CS,EIP on the stack twice if that occurs? Yes, can't undo it... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: VMX: Emulated real mode interrupt injection
On 09/21/2010 05:36 PM, Marcelo Tosatti wrote: On Tue, Sep 21, 2010 at 01:56:50PM +0200, Avi Kivity wrote: On 09/20/2010 07:30 PM, Marcelo Tosatti wrote: static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, u32 idt_vectoring_info, int instr_len_field, @@ -3864,9 +3814,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx, int type; bool idtv_info_valid; - if (vmx-rmode.irq.pending) - fixup_rmode_irq(vmx,idt_vectoring_info); - Don't you have to undo kvm_inject_realmode_interrupt if injection fails? Injection cannot fail (at least, in the same sense as the vmx injections). It's actually not about failures, it's about guest entry being cancelled due to a signal or some KVM_REQ that needs attention. For vmx style injections, we need to undo the injection to keep things in a consistent state. To realmode emulated injection, everything is in a consistent state already, so no need to undo anything (it's also impossible, since we overwrote memory on the stack). Aren't you going to push EFLAGS,CS,EIP on the stack twice if that occurs? No, since we clear the pending flag (we do that even for vmx-injected interrupts; then cancel or injection failure re-sets the flag). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] msix: move definitions from msix.c to msix.h
On 09/20/2010 06:56 PM, Michael S. Tsirkin wrote: On Mon, Sep 20, 2010 at 05:06:45PM +0200, Avi Kivity wrote: This allows us to reuse them from the kvm support code. Signed-off-by: Avi Kivitya...@redhat.com I would rather all dealings with MSI-X table stayed in one place. All we need is just the entry, so let's add APIs to retrieve MSIX address and data: uint64_t msix_get_address(dev, vector) uint32_t msix_get_data(dev, vector) and that will be enough for KVM. Ok, will do. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/9] Protect qemu-kvm.h declarations with NEED_CPU_H
On 09/20/2010 07:05 PM, Michael S. Tsirkin wrote: On Mon, Sep 20, 2010 at 05:06:49PM +0200, Avi Kivity wrote: Target-specific definitions need to be qualified with NEED_CPU_H so kvm.h can be included from non-target-specific files. Signed-off-by: Avi Kivitya...@redhat.com Long term, would be cleaner to split this into two files ... Yes, this is a pain to deal with. --- kvm-stub.c |1 + qemu-kvm.h | 21 - 2 files changed, 21 insertions(+), 1 deletions(-) diff --git a/kvm-stub.c b/kvm-stub.c index 37d2b7a..2e4bf00 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -169,3 +169,4 @@ bool kvm_msix_notify(PCIDevice *dev, unsigned vector) { return false; } + intentional? Nope. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] msix/kvm integration cleanups
On 09/20/2010 07:02 PM, Michael S. Tsirkin wrote: On Mon, Sep 20, 2010 at 05:06:41PM +0200, Avi Kivity wrote: This cleans up msix/kvm integration a bit. The really important patch is the last one, which allows msix.o to be part of non-target-specific build. I actually thoought this later move should be done in a different way: - add all functions msix uses to kvm-stub.c Isn't that what I did? - kvm_irq_routing_entry should also have a stub I sent some minor comments in case you have a reason to prefer this way. My motivation is really the last patch. If you explain what you'd like to see I'll try to do it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm networking todo wiki
I've put up a wiki page with a kvm networking todo list, mainly to avoid effort duplication, but also in the hope to draw attention to what I think we should try addressing in KVM: http://www.linux-kvm.org/page/NetworkingTodo This page could cover all networking related activity in KVM, currently most info is related to virtio-net. Note: if there's no developer listed for an item, this just means I don't know of anyone actively working on an issue at the moment, not that no one intends to. I would appreciate it if others working on one of the items on this list would add their names so we can communicate better. If others like this wiki page, please go ahead and add stuff you are working on if any. It would be especially nice to add autotest projects: there is just a short test matrix and a catch-all 'Cover test matrix with autotest', currently. Currently there are some links to Red Hat bugzilla entries, feel free to add links to other bugzillas. Thanks! -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] msix/kvm integration cleanups
On Tue, Sep 21, 2010 at 06:05:10PM +0200, Avi Kivity wrote: On 09/20/2010 07:02 PM, Michael S. Tsirkin wrote: On Mon, Sep 20, 2010 at 05:06:41PM +0200, Avi Kivity wrote: This cleans up msix/kvm integration a bit. The really important patch is the last one, which allows msix.o to be part of non-target-specific build. I actually thoought this later move should be done in a different way: - add all functions msix uses to kvm-stub.c Isn't that what I did? - kvm_irq_routing_entry should also have a stub I sent some minor comments in case you have a reason to prefer this way. My motivation is really the last patch. If you explain what you'd like to see I'll try to do it. Basically my idea was to avoid all ifdefs in msix.c *without changing it*, by stubbing out kvm APIs and structures we use there. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: cpu_relax() during spin waiting for reboot
It doesn't really matter, but if we spin, we should spin in a more relaxed manner. This way, if something goes wrong at least it won't contribute to global warming. Signed-off-by: Avi Kivity a...@redhat.com --- virt/kvm/kvm_main.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index c7a57b4..b8499f5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2022,7 +2022,7 @@ asmlinkage void kvm_handle_fault_on_reboot(void) /* spin while reset goes on */ local_irq_enable(); while (true) - ; + cpu_relax(); } /* Fault while not rebooting. We want the trace. */ BUG(); -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Fix reboot on Intel hosts
For a while (how long?) reboots with active guests are broken on Intel hosts. This patch set fixes the problem. Avi Kivity (2): KVM: Fix reboot on Intel hosts KVM: cpu_relax() during spin waiting for reboot virt/kvm/kvm_main.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: Fix reboot on Intel hosts
When we reboot, we disable vmx extensions or otherwise INIT gets blocked. If a task on another cpu hits a vmx instruction, it will fault if vmx is disabled. We trap that to avoid a nasty oops and spin until the reboot completes. Problem is, we sleep with interrupts disabled. This blocks smp_send_stop() from running, and the reboot process halts. Fix by enabling interrupts before spinning. KVM-Stable-Tag. Signed-off-by: Avi Kivity a...@redhat.com --- virt/kvm/kvm_main.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9a73b98..c7a57b4 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2018,10 +2018,12 @@ static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val, asmlinkage void kvm_handle_fault_on_reboot(void) { - if (kvm_rebooting) + if (kvm_rebooting) { /* spin while reset goes on */ + local_irq_enable(); while (true) ; + } /* Fault while not rebooting. We want the trace. */ BUG(); } -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call minutes for Sept 21
Nested VMX - looking for forward progress and better collaboration between the Intel and IBM teams - needs more review (not a new issue) - use cases - work todo - merge baseline patch - looks pretty good - review is finding mostly small things at this point - need some correctness verification (both review from Intel and testing) - need a test suite - test suite harness will help here - a few dozen nested SVM tests are there, can follow for nested VMX - nested EPT - optimize (reduce vmreads and vmwrites) - has long term maintan Hotplug - command...guest may or may not respond - guest can't be trusted to be direct part of request/response loop - solve at QMP level - human monitor issues (multiple successive commands to complete a single unplug) - should be a GUI interface design decision, human monitor is not a good design point - digression into GUI interface Drive caching - need to formalize the meanings in terms of data integrity guarantees - guest write cache (does it directly reflect the host write cache?) - live migration, underlying block dev changes, so need to decouple the two - O_DIRECT + O_DSYNC - O_DSYNC needed based on whether disk cache is available - also issues with sparse files (e.g. O_DIRECT to unallocated extent) - how to manage w/out needing to flush every write, slow - perhaps start with O_DIRECT on raw, non-sparse files only? - backend needs to open backing store matching to guests disk cache state - O_DIRECT itself has inconsistent integrity guarantees - works well with fully allocated file, depedent on disk cache disable (or fs specific flushing) - filesystem specific warnings (ext4 w/ barriers on, brtfs) - need to be able to open w/ O_DSYNC depending on guets's write cache mode - make write cache visible to guest (need a knob for this) - qemu default is cache=writethrough, do we need to revisit that? - just present user with option whether or not to use host page cache - allow guest OS to choose disk write cache setting - set up host backend accordingly - be nice preserve write cache settings over boot (outgrowing cmos storage) - maybe some host fs-level optimization possible - e.g. O_DSYNC to allocated O_DIRECT extent becomes no-op - conclusion - one direct user tunable, use host page cache or not - one guest OS tunable, enable disk cache -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM timekeeping fixes 4/4] TSC catchup mode
On Mon, Sep 20, 2010 at 03:11:30PM -1000, Zachary Amsden wrote: On 09/20/2010 05:38 AM, Marcelo Tosatti wrote: On Sat, Sep 18, 2010 at 02:38:15PM -1000, Zachary Amsden wrote: Negate the effects of AN TYM spell while kvm thread is preempted by tracking conversion factor to the highest TSC rate and catching the TSC up when it has fallen behind the kernel view of time. Note that once triggered, we don't turn off catchup mode. A slightly more clever version of this is possible, which only does catchup when TSC rate drops, and which specifically targets only CPUs with broken TSC, but since these all are considered unstable_tsc(), this patch covers all necessary cases. Signed-off-by: Zachary Amsdenzams...@redhat.com --- arch/x86/include/asm/kvm_host.h |6 +++ arch/x86/kvm/x86.c | 87 +- 2 files changed, 72 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 8c5779d..e209078 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -384,6 +384,9 @@ struct kvm_vcpu_arch { u64 last_host_tsc; u64 last_guest_tsc; u64 last_kernel_ns; + u64 last_tsc_nsec; + u64 last_tsc_write; + bool tsc_catchup; bool nmi_pending; bool nmi_injected; @@ -444,6 +447,9 @@ struct kvm_arch { u64 last_tsc_nsec; u64 last_tsc_offset; u64 last_tsc_write; + u32 virtual_tsc_khz; + u32 virtual_tsc_mult; + s8 virtual_tsc_shift; struct kvm_xen_hvm_config xen_hvm_config; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 09f468a..9152156 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -962,6 +962,7 @@ static inline u64 get_kernel_ns(void) } static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz); +unsigned long max_tsc_khz; static inline int kvm_tsc_changes_freq(void) { @@ -985,6 +986,24 @@ static inline u64 nsec_to_cycles(u64 nsec) return ret; } +static void kvm_arch_set_tsc_khz(struct kvm *kvm, u32 this_tsc_khz) +{ + /* Compute a scale to convert nanoseconds in TSC cycles */ + kvm_get_time_scale(this_tsc_khz, NSEC_PER_SEC / 1000, + kvm-arch.virtual_tsc_shift, + kvm-arch.virtual_tsc_mult); + kvm-arch.virtual_tsc_khz = this_tsc_khz; +} + +static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns) +{ + u64 tsc = pvclock_scale_delta(kernel_ns-vcpu-arch.last_tsc_nsec, + vcpu-kvm-arch.virtual_tsc_mult, + vcpu-kvm-arch.virtual_tsc_shift); + tsc += vcpu-arch.last_tsc_write; + return tsc; +} + void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data) { struct kvm *kvm = vcpu-kvm; @@ -1029,6 +1048,8 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data) /* Reset of TSC must disable overshoot protection below */ vcpu-arch.hv_clock.tsc_timestamp = 0; + vcpu-arch.last_tsc_write = data; + vcpu-arch.last_tsc_nsec = ns; } EXPORT_SYMBOL_GPL(kvm_write_tsc); @@ -1041,22 +1062,42 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) s64 kernel_ns, max_kernel_ns; u64 tsc_timestamp; - if ((!vcpu-time_page)) - return 0; - /* Keep irq disabled to prevent changes to the clock */ local_irq_save(flags); kvm_get_msr(v, MSR_IA32_TSC,tsc_timestamp); kernel_ns = get_kernel_ns(); this_tsc_khz = __get_cpu_var(cpu_tsc_khz); - local_irq_restore(flags); if (unlikely(this_tsc_khz == 0)) { + local_irq_restore(flags); kvm_make_request(KVM_REQ_CLOCK_UPDATE, v); return 1; } /* +* We may have to catch up the TSC to match elapsed wall clock +* time for two reasons, even if kvmclock is used. +* 1) CPU could have been running below the maximum TSC rate kvmclock handles frequency changes? +* 2) Broken TSC compensation resets the base at each VCPU +* entry to avoid unknown leaps of TSC even when running +* again on the same CPU. This may cause apparent elapsed +* time to disappear, and the guest to stand still or run +* very slowly. I don't get this. Please explain. This compensation in arch_vcpu_load, for unstable TSC case, causes time while preempted to disappear from the TSC by adjusting the TSC back to match the last observed TSC. if (unlikely(vcpu-cpu != cpu) || check_tsc_unstable()) { /* Make sure TSC doesn't go backwards */ s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 : native_read_tsc() - vcpu-arch.last_host_tsc; if (tsc_delta 0) mark_tsc_unstable(KVM discovered backwards TSC); if (check_tsc_unstable()) kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta); Note that this is the
Re: KVM call minutes for Sept 21
On 09/21/2010 01:05 PM, Chris Wright wrote: Nested VMX - looking for forward progress and better collaboration between the Intel and IBM teams - needs more review (not a new issue) - use cases - work todo - merge baseline patch - looks pretty good - review is finding mostly small things at this point - need some correctness verification (both review from Intel and testing) - need a test suite - test suite harness will help here - a few dozen nested SVM tests are there, can follow for nested VMX - nested EPT - optimize (reduce vmreads and vmwrites) - has long term maintan Hotplug - command...guest may or may not respond - guest can't be trusted to be direct part of request/response loop - solve at QMP level - human monitor issues (multiple successive commands to complete a single unplug) - should be a GUI interface design decision, human monitor is not a good design point - digression into GUI interface The way this works IRL is: 1) Administrator presses a physical button. This sends an ACPI notification to the guest. 2) The guest makes a decision about how to handle APCI notification. 3) To initiate unplug, the guest disables the device and performs an operation to indicate to the PCI bus that the device is unloaded. 4) Step (3) causes an LED (usually near the button in 1) to change colors 5) Administrator then physically removes the device. So we need at least a QMP command to perform step (1). Since (3) can occur independently of (1), it should be an async notification. device_del should only perform step (5). A management tool needs to: pci_unplug_request slot /* wait for PCI_UNPLUGGED event */ device_del slot netdev_del backend Drive caching - need to formalize the meanings in terms of data integrity guarantees - guest write cache (does it directly reflect the host write cache?) - live migration, underlying block dev changes, so need to decouple the two - O_DIRECT + O_DSYNC - O_DSYNC needed based on whether disk cache is available - also issues with sparse files (e.g. O_DIRECT to unallocated extent) - how to manage w/out needing to flush every write, slow - perhaps start with O_DIRECT on raw, non-sparse files only? - backend needs to open backing store matching to guests disk cache state - O_DIRECT itself has inconsistent integrity guarantees - works well with fully allocated file, depedent on disk cache disable (or fs specific flushing) - filesystem specific warnings (ext4 w/ barriers on, brtfs) - need to be able to open w/ O_DSYNC depending on guets's write cache mode - make write cache visible to guest (need a knob for this) - qemu default is cache=writethrough, do we need to revisit that? - just present user with option whether or not to use host page cache - allow guest OS to choose disk write cache setting - set up host backend accordingly - be nice preserve write cache settings over boot (outgrowing cmos storage) - maybe some host fs-level optimization possible - e.g. O_DSYNC to allocated O_DIRECT extent becomes no-op - conclusion - one direct user tunable, use host page cache or not - one guest OS tunable, enable disk cache IOW, a qdev 'write-cache=on|off' property and a blockdev 'direct=on|off' property. For completeness, a blockdev 'unsafe=on|off' property. Open flags are: write-cache=on, direct=onO_DIRECT write-cache=off, direct=onO_DIRECT | O_DSYNC write-cache=on, direct=off0 write-cache=off, direct=offO_DSYNC It's still unclear what our default mode will be. The problem is, O_DSYNC has terrible performance on ext4 when barrier=1. write-cache=on,direct=off is a bad default because if you do a simple performance test, you'll get better than native and that upsets people. write-cache=off,direct=off is a bad default because ext4's default config sucks with this. likewise, write-cache=off, direct=on is a bad default for the same reason. Regards, Anthonny Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: put request that was created to retrieve the device id
On Fri, Sep 17, 2010 at 09:58:48AM -0500, Ryan Harper wrote: Since __bio_map_kern() sets up bio-bi_end_io = bio_map_kern_endio (which does a bio_put(bio)) doesn't that ensure we don't leak? Indeed, that should take care of it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
USB Host Passthrough BSOD on Windows XP
Hi. I installed a KVM virtual machine with Windows XP SP3 installed on it with all updates from Windows Update. I setted up an USB device from the host machine to be used on the virtual machine with the command qm set 107 -hostusb 2040:7070 The USB device is an Hauppauge WinTV Nova-T Stick DVB-T USB adapter. Windows recognises the hardware and correctly install its drivers, but when I try to use it (for example tuning some channels) I get the following Blue Screen Of Death: DRIVER_IRQL_NOT_LESS_OR_EQUAL *** STOP: 0x00D1 (0x048C4C04, 0x0002, 0x0001, 0xBA392FD3) *** usbuhci.sys - Address BA392FD3 base at BA39, DateStamp 480254ce Windows' Minudump files tell that the problem is from the usbuhci.sys driver. I'm using Proxmox VE 1.6 (the latest version) with the 2.6.32-2-pve kernel version. Do you have any hint? Thank you very much for your help! Bye. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call minutes for Sept 21
Hi, thanks for the summary. I also listened-in on the call. I'm glad these issues are being discussed. On Tue, Sep 21, 2010, Chris Wright wrote about KVM call minutes for Sept 21: Nested VMX - looking for forward progress and better collaboration between the Intel and IBM teams I'll be very happy if anyone, be it from Intel or somewhere else, would like to help me work on nested VMX. Somebody (I don't recognize your voices yet, sorry...) mentioned on the call that there might not be much point in cooperation before I finish getting nested VMX merged into KVM. I agree, but my conclusion is different that what I think the speaker implied: My conclusion is that it is important that we merge the nested VMX code into KVM as soon as possible, because if nested VMX is part of KVM (and not a set of patches which becomes stale the moment after I release it) this will make it much easier for people to test it, use it, and cooperate in developing it. - needs more review (not a new issue) I think the reviews that nested VMX has received over the past year (thanks to Avi Kivity, Gleb Natapov, Eddie Dong and sometimes others), have been fantastic. You guys have shown deep understanding of the code, and found numerous bugs, oversights, missing features, and also a fair share of ugly code, and we (first Orit and Abel, and then I) have done are best to fix all of these issues. I've personally learned a lot from the latest round of reviews, and the discussions with you. So I don't think there has been any lack of reviews. I don't think that getting more reviews is the most important task ahead of us. Surely, if more people review the code, more potential bugs will be spotted. But this is always the case, with any software. I think the question now is, what would it take to finally declare the code as good enough to be merged, with the understanding that even after being merged it will still be considered an experimental feature, disabled by default and documented as experimental. Nested SVM was also merged before it was perfect, and also KVM itself was released before being perfect :-) - use cases I don't kid myself that as soon as nested VMX is available in KVM, millions of users worldwide will flock to use it. Definitely, many KVM users will never find a need for nested virtualization. But I do believe that there are many use cases. We outlined some of them in our paper (to be presented in a couple of weeks in OSDI): 1. Hosting one of the new breed of operating systems which have a hypervisor as part of them. Windows 7 with XP mode is one example. Linux with KVM is another. 2. Platforms with embedded hypervisors in firmware need nested virt to run any workload - which can itself be a hypervisor with guests. 3. Clouds users could put in their virtual machine a hypervisor with sub-guests, and run multiple virtual machines on the one virtual machine which they get. 4. Enable live migration of entire hypervisors with their guests - for load balancing, disaster recovery, and so on. 5. Honeypots and protection against hypervisor-level rootkits 6. Make it easier to test, demonstrate, benchmark and debug hypervisors, and also entire virtualization setups. An entire virtualization setup (hypervisor and all its guests) could be run as one virtual machine, allowing testing many such setups on one physical machine. By the way, I find the question of why do we need nested VMX a bit odd, seeing that KVM already supports nested virtualization (for SVM). Is it the case that nested virtualization was found useful on AMD processors, but for Intel processors, it isn't? Of course not :-) I think KVM should support nested virtualization on neither architecture, or on both - and of course I think it should be on both :-) - work todo - merge baseline patch - looks pretty good - review is finding mostly small things at this point - need some correctness verification (both review from Intel and testing) - need a test suite - test suite harness will help here - a few dozen nested SVM tests are there, can follow for nested VMX - nested EPT I've been keeping track of the issues remaining from the last review, and indeed only a few remain. Only 8 of the 24 patches have any outstanding issue, and I'm working on those that remain, as you could see on the mailing list in the last couple of weeks. If there's interest, I can even summarize these remaing issues. But since I'm working on these patches alone, I think we need to define our priorities. Most of the outstanding review comments, while absolutely correct (and I was amazed by the quality of the reviewer's comments), deal with re-writing code that already works (to improve its style) or fixing relatively rare cases. It is not clear that these issues are more important than the other things listed in the summary above (test suite, nested EPT), but as long as I continue to rewrite
buildbot failure in qemu-kvm on default_i386_debian_5_0
The Buildbot has detected a new failure of default_i386_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_debian_5_0/builds/573 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: The Nightly scheduler named 'nightly_default' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_i386_out_of_tree
The Buildbot has detected a new failure of default_i386_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_out_of_tree/builds/510 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: The Nightly scheduler named 'nightly_default' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed compile sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call minutes for Sept 21
* Nadav Har'El (n...@math.technion.ac.il) wrote: On Tue, Sep 21, 2010, Chris Wright wrote about KVM call minutes for Sept 21: Nested VMX - looking for forward progress and better collaboration between the Intel and IBM teams I'll be very happy if anyone, be it from Intel or somewhere else, would like to help me work on nested VMX. Somebody (I don't recognize your voices yet, sorry...) mentioned on the call that there might not be much point in cooperation before I finish getting nested VMX merged into KVM. My recollection...it was Avi. I agree, but my conclusion is different that what I think the speaker implied: My conclusion is that it is important that we merge the nested VMX code into KVM as soon as possible, because if nested VMX is part of KVM (and not a set of patches which becomes stale the moment after I release it) this will make it much easier for people to test it, use it, and cooperate in developing it. Yup. And especially for follow-on work (like nested EPT). Makes sense to merge and build from merged base rather than have out-of-tree patchset continue to grow and grow. - needs more review (not a new issue) I think the reviews that nested VMX has received over the past year (thanks to Avi Kivity, Gleb Natapov, Eddie Dong and sometimes others), have been fantastic. You guys have shown deep understanding of the code, and found numerous bugs, oversights, missing features, and also a fair share of ugly code, and we (first Orit and Abel, and then I) have done are best to fix all of these issues. I've personally learned a lot from the latest round of reviews, and the discussions with you. So I don't think there has been any lack of reviews. I don't think that getting more reviews is the most important task ahead of us. At earlier points of review there were issues considered fundamental that needed to be fixed before merging (SMP and proper VMPTRLD emulation springs to mind). Now it seems it's down to smaller, more targetted issues. Some hesitancy is based on the complexity of the patches. So more review helps...test harness does too. Anything to build Avi's confidence to merging the code ;) Surely, if more people review the code, more potential bugs will be spotted. But this is always the case, with any software. I think the question now is, what would it take to finally declare the code as good enough to be merged, with the understanding that even after being merged it will still be considered an experimental feature, disabled by default and documented as experimental. Nested SVM was also merged before it was perfect, and also KVM itself was released before being perfect :-) ;) - use cases I don't kid myself that as soon as nested VMX is available in KVM, millions of users worldwide will flock to use it. Definitely, many KVM users will never find a need for nested virtualization. But I do believe that there are many use cases. We outlined some of them in our paper (to be presented in a couple of weeks in OSDI): 1. Hosting one of the new breed of operating systems which have a hypervisor as part of them. Windows 7 with XP mode is one example. Linux with KVM is another. 2. Platforms with embedded hypervisors in firmware need nested virt to run any workload - which can itself be a hypervisor with guests. 3. Clouds users could put in their virtual machine a hypervisor with sub-guests, and run multiple virtual machines on the one virtual machine which they get. 4. Enable live migration of entire hypervisors with their guests - for load balancing, disaster recovery, and so on. 5. Honeypots and protection against hypervisor-level rootkits 6. Make it easier to test, demonstrate, benchmark and debug hypervisors, and also entire virtualization setups. An entire virtualization setup (hypervisor and all its guests) could be run as one virtual machine, allowing testing many such setups on one physical machine. By the way, I find the question of why do we need nested VMX a bit odd, seeing that KVM already supports nested virtualization (for SVM). Is it the case that nested virtualization was found useful on AMD processors, but for Intel processors, it isn't? Of course not :-) I think KVM should support nested virtualization on neither architecture, or on both - and of course I think it should be on both :-) People keep looking for reasons to justify the cost of the effort, dunno why because it's cool isn't good enough ;) At any rate, that was mainly a question of how it might be useful for production kind of environments. - work todo - merge baseline patch - looks pretty good - review is finding mostly small things at this point - need some correctness verification (both review from Intel and testing) - need a test suite - test suite harness will help here - a few dozen nested SVM tests are there, can