Re: [PATCH RFC 1/5] KVM: introduce a set_bit function for bitmaps in user space

2010-04-11 Thread Avi Kivity
On 04/09/2010 12:30 PM, Takuya Yoshikawa wrote: This work is initially suggested by Avi Kivity for moving the dirty bitmaps used by KVM to user space: This makes it possible to manipulate the bitmaps from qemu without copying from KVM. Note: We are now brushing up this code before sending

Re: [PATCH RFC 2/5] KVM: use a rapper function to calculate the sizes of dirty bitmaps

2010-04-11 Thread Avi Kivity
On 04/09/2010 12:32 PM, Takuya Yoshikawa wrote: We will use this later in other parts. s/rapper/wrapper/... +static inline int kvm_dirty_bitmap_bytes(struct kvm_memory_slot *memslot) +{ + return ALIGN(memslot-npages, BITS_PER_LONG) / 8; +} + 'int' may overflow. struct

Re: [PATCH RFC 3/5] KVM: Use rapper functions to create and destroy dirty bitmaps

2010-04-11 Thread Avi Kivity
On 04/09/2010 12:34 PM, Takuya Yoshikawa wrote: For x86, we will change the allocation and free parts to do_mmap() and do_munmap(). This patch makes it cleaner. Should be done for all architectures. I don't want different ways of creating dirty bitmaps for different architectures. --

Re: [PATCH RFC 4/5] KVM: add new members to the memory slot for double buffering of bitmaps

2010-04-11 Thread Avi Kivity
On 04/09/2010 12:35 PM, Takuya Yoshikawa wrote: Currently, x86 vmalloc()s a dirty bitmap every time when we swich to the next dirty bitmap. To avoid this, we use the double buffering technique: we also move the bitmaps to userspace, so that extra bitmaps will not use the precious kernel

Re: [PATCH RFC 5/5] KVM: This is the main part of the moving dirty bitmaps to user space

2010-04-11 Thread Avi Kivity
On 04/09/2010 12:38 PM, Takuya Yoshikawa wrote: By this patch, bitmap allocation is replaced with do_mmap() and bitmap manipulation is replaced with *_user() functions. Note that this does not change the APIs between kernel and user space. To get more advantage from this hack, we need to add a

Re: VM performance issue in KVM guests.

2010-04-12 Thread Avi Kivity
On 04/12/2010 05:04 AM, Zhang, Xiantao wrote: What was the performance hit? What was your I/O setup (image format, using aio?) The issue only happens when vcpu number is over-committed(e.g. vcpu/pcpu2) and physical cpus are saturated. For example, when run webbench in windows OS in

Re: [PATCH v2 2/6] Introduce bit-based phys_ram_dirty for VGA, CODE, MIGRATION and MASTER.

2010-04-12 Thread Avi Kivity
On 04/06/2010 03:51 AM, Yoshiaki Tamura wrote: Replaces byte-based phys_ram_dirty bitmap with three bit-based phys_ram_dirty bitmap. On allocation, it sets all bits in the bitmap. index c74b0a4..9733892 100644 --- a/exec.c +++ b/exec.c @@ -110,7 +110,7 @@ uint8_t *code_gen_ptr; #if

Re: [PATCH v2 3/6] Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap.

2010-04-12 Thread Avi Kivity
On 04/06/2010 03:51 AM, Yoshiaki Tamura wrote: Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Keiohmura@lab.ntt.co.jp --- static inline int cpu_physical_memory_get_dirty_flags(ram_addr_t addr) { -return phys_ram_dirty[addr TARGET_PAGE_BITS]; +

Re: [PATCH 2/6] KVM MMU: fix kvm_mmu_zap_page() and its calling path

2010-04-12 Thread Avi Kivity
On 04/12/2010 11:01 AM, Xiao Guangrong wrote: - calculate zapped page number properly in mmu_zap_unsync_children() - calculate freeed page number properly kvm_mmu_change_mmu_pages() - restart list walking if have children page zapped Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com ---

Re: [PATCH 3/6] KVM MMU: optimize/cleanup for marking parent unsync

2010-04-12 Thread Avi Kivity
On 04/12/2010 11:02 AM, Xiao Guangrong wrote: - 'vcpu' is not used while mark parent unsync, so remove it - if it has alread marked unsync, no need to walk it's parent Please separate these two changes. The optimization looks good. Perhaps it can be done even nicer using mutually

Re: [PATCH 4/6] KVM MMU: optimize for writing cr4

2010-04-12 Thread Avi Kivity
On 04/12/2010 11:03 AM, Xiao Guangrong wrote: Usually, OS changes CR4.PGE bit to flush all global page, under this case, no need reset mmu and just flush tlb Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com --- arch/x86/kvm/x86.c |9 + 1 files changed, 9 insertions(+), 0

Re: [PATCH 5/6] KVM MMU: reduce kvm_mmu_page size

2010-04-12 Thread Avi Kivity
On 04/12/2010 11:05 AM, Xiao Guangrong wrote: 'multimapped' and 'unsync' in 'struct kvm_mmu_page' are just indication field, we can use flag bits instand of them @@ -202,9 +202,10 @@ struct kvm_mmu_page { * in this shadow page. */ DECLARE_BITMAP(slot_bitmap,

Re: [PATCH 6/6] KVM MMU: optimize synchronization shadow pages

2010-04-12 Thread Avi Kivity
On 04/12/2010 11:06 AM, Xiao Guangrong wrote: - chain all unsync shadow pages then we can fetch them quickly - flush local/remote tlb after all shadow page synced Signed-off-by: Xiao Guangrongxiaoguangr...@cn.fujitsu.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/mmu.c

Re: [PATCH 2/6] KVM MMU: fix kvm_mmu_zap_page() and its calling path

2010-04-12 Thread Avi Kivity
On 04/12/2010 11:53 AM, Xiao Guangrong wrote: kvm-arch.n_free_mmu_pages = 0; @@ -1589,7 +1589,8 @@ static void mmu_unshadow(struct kvm *kvm, gfn_t gfn) !sp-role.invalid) { pgprintk(%s: zap %lx %x\n, __func__, gfn, sp-role.word); -

Re: [PATCH RFC 1/5] KVM: introduce a set_bit function for bitmaps in user space

2010-04-12 Thread Avi Kivity
On 04/12/2010 04:29 AM, Takuya Yoshikawa wrote: Should be called __set_bit_user() since it is non-atomic. Actually I first named it like that and then noticed that in the uaccess' convention, __ prefix means it is with less checking version. On the other hand, for the bitops family, __

Re: [PATCH RFC 3/5] KVM: Use rapper functions to create and destroy dirty bitmaps

2010-04-12 Thread Avi Kivity
On 04/12/2010 05:07 AM, Takuya Yoshikawa wrote: (2010/04/12 2:13), Avi Kivity wrote: On 04/09/2010 12:34 PM, Takuya Yoshikawa wrote: For x86, we will change the allocation and free parts to do_mmap() and do_munmap(). This patch makes it cleaner. Should be done for all architectures. I don't

Re: [PATCH RFC 4/5] KVM: add new members to the memory slot for double buffering of bitmaps

2010-04-12 Thread Avi Kivity
On 04/12/2010 05:15 AM, Takuya Yoshikawa wrote: OK, but we have one problem: ia64. I checked all architectures' dirty bitmap implementations and thought generalizing this work is not so hard except for ia64. It's already too different from other parts. #ifdef CONFIG_IA64 unsigned long

Re: [PATCH RFC 5/5] KVM: This is the main part of the moving dirty bitmaps to user space

2010-04-12 Thread Avi Kivity
On 04/12/2010 05:29 AM, Takuya Yoshikawa wrote: TODO: 1. We want to use copy_in_user() for 32bit case too. Definitely. Why doesn't it work now? Sadly we don't have that for 32bit. We have to implement by ourselves. I tested two temporary implementations for 32bit: 1. This version

Re: [PATCH v2 2/6] Introduce bit-based phys_ram_dirty for VGA, CODE, MIGRATION and MASTER.

2010-04-12 Thread Avi Kivity
On 04/12/2010 12:39 PM, Yoshiaki Tamura wrote: Please put in some header file, maybe qemu-common.h. OK. BTW, is qemu-kvm.h planned to go upstream? No. Use kvm.h for kvm specific symbols (qemu-kvm.h includes it). Should be nicer as a loop calling a helper to allocate each bitmap. This

Re: [PATCH] svm: implement NEXTRIPsave SVM feature

2010-04-12 Thread Avi Kivity
On 04/12/2010 12:07 AM, Andre Przywara wrote: On SVM we set the instruction length of skipped instructions to hard-coded, well known values, which could be wrong when (bogus, but valid) prefixes (REX, segment override) are used. Newer AMD processors (Fam10h 45nm and better, aka. PhenomII or

Re: [PATCH 2/6] KVM MMU: fix kvm_mmu_zap_page() and its calling path

2010-04-12 Thread Avi Kivity
On 04/12/2010 12:22 PM, Xiao Guangrong wrote: Hi Avi, Avi Kivity wrote: hlist_for_each_entry_safe() is supposed to be be safe against removal of the element that is pointed to by the iteration cursor. If we destroyed the next point, hlist_for_each_entry_safe() is unsafe. List

Re: [PATCH] KVM: Enhance the coalesced_mmio_write() parameter to avoid stack buffer overflow

2010-04-12 Thread Avi Kivity
On 04/12/2010 04:57 AM, wzt@gmail.com wrote: coalesced_mmio_write() is not check the len value, if len is negative, memcpy(ring-coalesced_mmio[ring-last].data, val, len); will cause stack buffer overflow. How can len be negative? It can only be between 1 and 8. -- I have a truly

Re: [PATCH] svm: implement NEXTRIPsave SVM feature

2010-04-12 Thread Avi Kivity
On 04/12/2010 01:29 PM, Alexander Graf wrote: On 12.04.2010, at 12:20, Avi Kivity wrote: On 04/12/2010 12:07 AM, Andre Przywara wrote: On SVM we set the instruction length of skipped instructions to hard-coded, well known values, which could be wrong when (bogus, but valid) prefixes

Re: [PATCH v2 3/6] Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap.

2010-04-12 Thread Avi Kivity
On 04/12/2010 01:58 PM, Yoshiaki Tamura wrote: Is it necessary to update migration and vga bitmaps? We can simply update the master bitmap, and update the migration and vga bitmaps only when they need it. That can be done in a different patch. Let me explain the role of the master bitmap

Re: [PATCH 4/6] KVM MMU: optimize for writing cr4

2010-04-12 Thread Avi Kivity
On 04/12/2010 01:42 PM, Xiao Guangrong wrote: Hi Avi, Thanks for your comments. Avi Kivity wrote: Later we have: kvm_x86_ops-set_cr4(vcpu, cr4); vcpu-arch.cr4 = cr4; vcpu-arch.mmu.base_role.cr4_pge = (cr4 X86_CR4_PGE) !tdp_enabled; All

Re: [PATCH 2/6] KVM MMU: fix kvm_mmu_zap_page() and its calling path

2010-04-12 Thread Avi Kivity
On 04/12/2010 03:22 PM, Xiao Guangrong wrote: But kvm_mmu_zap_page() will only destroy sp == tpos == pos; n points at pos-next already, so it's safe. kvm_mmu_zap_page(sp) not only zaps sp but also zaps all sp's unsync children pages, if n is just sp's unsyc child, just at the same

Re: [PATCH] KVM: move DR register access handling into generic code.

2010-04-12 Thread Avi Kivity
On 04/12/2010 03:27 PM, Gleb Natapov wrote: Currently both SVM and VMX have their own DR handling code. Move it to x86.c. The standard process is to make them identical first and finally merge identical code, but I guess we can skip it in this case (Jan?) -- I have a truly marvellous

Re: [PATCH] KVM: move DR register access handling into generic code.

2010-04-12 Thread Avi Kivity
On 04/12/2010 07:52 PM, Gleb Natapov wrote: On Mon, Apr 12, 2010 at 06:09:50PM +0200, Jan Kiszka wrote: Avi Kivity wrote: On 04/12/2010 03:27 PM, Gleb Natapov wrote: Currently both SVM and VMX have their own DR handling code. Move it to x86.c. The standard

Re: [PATCH v4 1/3] Device specification for shared memory PCI device

2010-04-12 Thread Avi Kivity
On 04/08/2010 01:51 AM, Cam Macdonell wrote: (sorry about the late review) + +Regular Interrupts +-- + +If regular interrupts are used (due to either a guest not supporting MSI or the +user specifying not to use them on startup) then the value written to the lower +16-bits of

Re: [PATCH v4 2/3] Support adding a file to qemu's ram allocation

2010-04-12 Thread Avi Kivity
On 04/08/2010 01:51 AM, Cam Macdonell wrote: This avoids the need of using qemu_ram_alloc and mmap with MAP_FIXED to map a host file into guest RAM. This function mmaps the opened file anywhere and adds the memory to the ram blocks. Usage is qemu_ram_mmap(fd, size, MAP_SHARED, offset); ---

Re: [PATCH v4 3/3] Inter-VM shared memory PCI device

2010-04-12 Thread Avi Kivity
On 04/08/2010 01:52 AM, Cam Macdonell wrote: Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository.

Re: [PATCH v4] Shared memory uio_pci driver

2010-04-12 Thread Avi Kivity
On 04/08/2010 02:00 AM, Cam Macdonell wrote: This patch adds a driver for my shared memory PCI device using the uio_pci interface. The driver has three memory regions. The first memory region is for device registers for sending interrupts. The second BAR is for receiving MSI-X interrupts and

Re: [PATCH RFC 5/5] KVM: This is the main part of the moving dirty bitmaps to user space

2010-04-12 Thread Avi Kivity
On 04/12/2010 11:55 PM, Fernando Luis Vazquez Cao wrote: Sadly we don't have that for 32bit. We have to implement by ourselves. I tested two temporary implementations for 32bit: 1. This version using copy_from_user() and copy_to_user() with not nice vmalloc(). 2. Loop with __get_user()

Re: [PATCH 4/6] KVM MMU: optimize for writing cr4

2010-04-13 Thread Avi Kivity
On 04/13/2010 06:07 AM, Xiao Guangrong wrote: And i found the commit 87778d60ee: |KVM: MMU: Segregate mmu pages created with different cr4.pge settings | |Don't allow a vcpu with cr4.pge cleared to use a shadow page created with |cr4.pge set; this might cause a cr3 switch not to

Re: VM performance issue in KVM guests.

2010-04-13 Thread Avi Kivity
On 04/13/2010 03:50 AM, Zhang, Xiantao wrote: Avi Kivity wrote: On 04/12/2010 05:04 AM, Zhang, Xiantao wrote: What was the performance hit? What was your I/O setup (image format, using aio?) The issue only happens when vcpu number is over-committed(e.g. vcpu

Re: [PATCH] KVM: fix the handling of dirty bitmaps to avoid overflows

2010-04-13 Thread Avi Kivity
On 04/13/2010 10:03 AM, Takuya Yoshikawa wrote: It's better to limit memory slots to something that can be handled by everything, then. 2^31 pages is plenty. Return -EINVAL if the slot is too large. I agree with that, so we make this patch pending to fix like that? -- or should make a new

Re: [PATCH] get rid of mmu_only parameter in emulator_write_emulated()

2010-04-13 Thread Avi Kivity
On 04/13/2010 10:21 AM, Gleb Natapov wrote: May be I am missing something here, but it seams we can call kvm_mmu_pte_write() directly from emulator_cmpxchg_emulated() instead of passing mmu_only down to emulator_write_emulated_onepage() and call it there. @@ -3460,7 +3444,9 @@ static int

Re: [PATCH] get rid of mmu_only parameter in emulator_write_emulated()

2010-04-13 Thread Avi Kivity
On 04/13/2010 10:26 AM, Gleb Natapov wrote: On Tue, Apr 13, 2010 at 10:24:40AM +0300, Avi Kivity wrote: On 04/13/2010 10:21 AM, Gleb Natapov wrote: May be I am missing something here, but it seams we can call kvm_mmu_pte_write() directly from emulator_cmpxchg_emulated() instead

Re: [PATCH v2 3/6] Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap.

2010-04-13 Thread Avi Kivity
On 04/13/2010 11:01 AM, Yoshiaki Tamura wrote: Avi Kivity wrote: On 04/12/2010 01:58 PM, Yoshiaki Tamura wrote: Is it necessary to update migration and vga bitmaps? We can simply update the master bitmap, and update the migration and vga bitmaps only when they need it. That can be done

Re: VM performance issue in KVM guests.

2010-04-14 Thread Avi Kivity
On 04/14/2010 06:24 AM, Zhang, Xiantao wrote: Spin loops need to be addressed first, they are known to kill performance in overcommit situations. Even in overcommit case, if vcpu threads of one qemu are not scheduled or pulled to the same logical processor, the performance drop is

Re: [PATCH V3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-14 Thread Avi Kivity
On 04/14/2030 12:05 PM, Zhang, Yanmin wrote: Here is the new patch of V3 against tip/master of April 13th if anyone wants to try it. Thanks for persisting despite the flames. Can you please separate arch/x86/kvm part of the patch? That will make for easier reviewing, and will need to

Re: [PATCH V3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-14 Thread Avi Kivity
On 04/14/2010 12:43 PM, Sheng Yang wrote: On Wednesday 14 April 2010 17:20:15 Avi Kivity wrote: On 04/14/2030 12:05 PM, Zhang, Yanmin wrote: Here is the new patch of V3 against tip/master of April 13th if anyone wants to try it. Thanks for persisting despite the flames. Can

Re: [PATCH V3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-14 Thread Avi Kivity
On 04/14/2010 01:14 PM, Sheng Yang wrote: I wouldn't like to depend on model specific behaviour. One option is to read all the information synchronously and store it in a per-cpu area with atomic instructions, then queue the NMI. Another option is to have another callback which tells us that

Re: [PATCH V3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-14 Thread Avi Kivity
On 04/14/2010 01:43 PM, Ingo Molnar wrote: Thanks for persisting despite the flames. Can you please separate arch/x86/kvm part of the patch? That will make for easier reviewing, and will need to go through separate trees. Once it gets into a state that it can be applied could you

Re: KVM: x86: Push potential exception error code on task switches

2010-04-14 Thread Avi Kivity
On 04/14/2010 03:11 PM, Jan Kiszka wrote: When a fault triggers a task switch, the error code, if it exists, has to be pushed on the new task's stack. Implement the missing bits. @@ -2416,12 +2417,23 @@ static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt,

Re: KVM: x86: Push potential exception error code on task switches

2010-04-14 Thread Avi Kivity
On 04/14/2010 03:58 PM, Jan Kiszka wrote: The TSS descriptor (gate doesn't have a size). But isn't it possible to have a 32-bit TSS with a 16-bit CS/SS? Might be possible, but will cause troubles as the spec says: The error code is pushed on the stack as a doubleword or word

Re: KVM: x86: Push potential exception error code on task switches

2010-04-14 Thread Avi Kivity
On 04/14/2010 04:07 PM, Avi Kivity wrote: On 04/14/2010 03:58 PM, Jan Kiszka wrote: The TSS descriptor (gate doesn't have a size). But isn't it possible to have a 32-bit TSS with a 16-bit CS/SS? Might be possible, but will cause troubles as the spec says: The error code is pushed

[PATCH] KVM: MMU: Replace role.glevels with role.cr4_pae

2010-04-14 Thread Avi Kivity
tables between pae and longmode guest page tables at the same guest page. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/include/asm/kvm_host.h |2 +- arch/x86/kvm/mmu.c | 12 ++-- arch/x86/kvm/mmutrace.h |5 +++-- 3 files changed, 10 insertions

Re: [PATCH] KVM: MMU: Replace role.glevels with role.cr4_pae

2010-04-14 Thread Avi Kivity
On 04/14/2010 07:20 PM, Avi Kivity wrote: There is no real distinction between glevels=3 and glevels=4; both have exactly the same format and the code is treated exactly the same way. Drop role.glevels and replace is with role.cr4_pae (which is meaningful). This simplifies the code a bit

Re: [PATCH V3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-15 Thread Avi Kivity
On 04/15/2030 04:04 AM, Zhang, Yanmin wrote: An even more accurate way to determine this is to check whether the interrupt frame points back at the 'int $2' instruction. However we plan to switch to a self-IPI method to inject the NMI, and I'm not sure wether APIC NMIs are accepted on an

Re: VM performance issue in KVM guests.

2010-04-15 Thread Avi Kivity
On 04/15/2010 07:58 AM, Srivatsa Vaddagiri wrote: On Sun, Apr 11, 2010 at 11:40 PM, Avi Kivity a...@redhat.com mailto:a...@redhat.com wrote: The current handing of PLE is very suboptimal. With proper directed yield we should be much better there. Hi Avi, By directed

Re: [PATCH v4 3/3] Inter-VM shared memory PCI device

2010-04-15 Thread Avi Kivity
On 04/15/2010 02:30 AM, Cam Macdonell wrote: Sample programs, init scripts and the shared memory server are available in a git repo here: www.gitorious.org/nahanni Please consider qemu.git/contrib. Should the compilation be tied into Qemu's regular build with a switch

Re: KVM: x86: Push potential exception error code on task switches

2010-04-15 Thread Avi Kivity
On 04/14/2010 04:19 PM, Jan Kiszka wrote: Avi Kivity wrote: On 04/14/2010 03:58 PM, Jan Kiszka wrote: The TSS descriptor (gate doesn't have a size). But isn't it possible to have a 32-bit TSS with a 16-bit CS/SS? Might be possible, but will cause troubles as the spec

Re: [PATCH] KVM: MMU: Replace role.glevels with role.cr4_pae

2010-04-15 Thread Avi Kivity
On 04/14/2010 09:29 PM, Marcelo Tosatti wrote: On Wed, Apr 14, 2010 at 07:32:12PM +0300, Avi Kivity wrote: On 04/14/2010 07:20 PM, Avi Kivity wrote: There is no real distinction between glevels=3 and glevels=4; both have exactly the same format and the code is treated exactly

Re: [PATCH V3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-15 Thread Avi Kivity
On 04/15/2010 12:04 PM, oerg Roedel wrote: On Mon, Apr 15, 2030 at 04:57:38PM +0800, Zhang, Yanmin wrote: I checked svm.c and it seems svm.c doesn't trigger a NMI to host if the NMI happens in guest os. In addition, svm_complete_interrupts is called after interrupt is enabled. Yes.

Re: [PATCH] KVM: prevent spurious exit to userspace during task switch emulation.

2010-04-15 Thread Avi Kivity
On 04/15/2010 12:28 PM, Gleb Natapov wrote: kvm_task_switch() never requires userspace exit, so no matter what the function returns we should not exit to userspace. Signed-off-by: Gleb Natapovg...@redhat.com diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index c773a46..1bd434b 100644 ---

Re: [PATCH V3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-15 Thread Avi Kivity
On 04/15/2010 12:44 PM, oerg Roedel wrote: So, we'd need something like the following: if (exit == NMI) __get_cpu_var(nmi_vcpu) = vcpu; stgi(); if (exit == NMI) { while (!nmi_handled()) cpu_relax(); __get_cpu_var(nmi_vcpu) = NULL; }

Re: [PATCHv2] KVM: prevent spurious exit to userspace during task switch emulation.

2010-04-15 Thread Avi Kivity
On 04/15/2010 01:09 PM, Gleb Natapov wrote: If kvm_task_switch() fails code exits to userspace without specifying exit reason, so the previous exit reason is reused by userspace. Fix this by specifying exit reason correctly. --- Changelog: v1-v2: - report emulation error to userspace

Re: [PATCH V3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-15 Thread Avi Kivity
On 04/15/2010 01:40 PM, Joerg Roedel wrote: That means an NMI that happens outside guest code (for example, in the mmu, or during the exit itself) would be counted as if in guest code. Hmm, true. The same is true for an NMI that happens between VMSAVE and STGI but that window is

Re: [PATCH V3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-17 Thread Avi Kivity
On 04/15/2010 05:08 PM, Sheng Yang wrote: On Thursday 15 April 2010 18:44:15 Avi Kivity wrote: On 04/15/2010 01:40 PM, Joerg Roedel wrote: That means an NMI that happens outside guest code (for example, in the mmu, or during the exit itself) would be counted as if in guest code

Re: [PATCH V4 1/2] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-17 Thread Avi Kivity
On 04/16/2010 10:34 AM, Zhang, Yanmin wrote: Below is the kernel patch to enable perf to collect guest os statistics. Joerg, Would you like to add support on svm? I don't know the exact point to trigger NMI to host with svm. See below code with vmx: +

Re: [PATCH V2] drivers/uio/uio.c: DMA mapping, interrupt extensions, etc.

2010-04-17 Thread Avi Kivity
On 04/15/2010 11:55 PM, Tom Lyon wrote: This is the second of 2 related, but independent, patches. This is for uio.c, the previous is for uio_pci_generic.c. The 2 patches were previously one large patch. Changes for this version: - uio_pci_generic.c just gets extensions so that a single fd can

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-17 Thread Avi Kivity
On 04/17/2010 09:48 PM, Avi Kivity wrote: +static u64 last_value = 0; Needs to be atomic64_t. + cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) { struct pvclock_shadow_time shadow; unsigned version; cycle_t ret, offset; +u64 last; +do

Re: [PATCH 2/5] change msr numbers for kvmclock

2010-04-17 Thread Avi Kivity
On 04/15/2010 09:37 PM, Glauber Costa wrote: Avi pointed out a while ago that those MSRs falls into the pentium PMU range. So the idea here is to add new ones, and after a while, deprecate the old ones. Signed-off-by: Glauber Costaglom...@redhat.com --- arch/x86/include/asm/kvm_para.h |8

Re: [PATCH 3/5] Try using new kvm clock msrs

2010-04-17 Thread Avi Kivity
On 04/15/2010 09:37 PM, Glauber Costa wrote: We now added a new set of clock-related msrs in replacement of the old ones. In theory, we could just try to use them and get a return value indicating they do not exist, due to our use of kvm_write_msr_save. However, kvm clock registration happens

Re: [PATCH 4/5] export new cpuid KVM_CAP

2010-04-17 Thread Avi Kivity
On 04/15/2010 09:37 PM, Glauber Costa wrote: Since we're changing the msrs kvmclock uses, we have to communicate that to the guest, through cpuid. We can add a new KVM_CAP to the hypervisor, and then patch userspace to recognize it. And if we ever add a new cpuid bit in the future, we have to

Re: VM performance issue in KVM guests.

2010-04-17 Thread Avi Kivity
On 04/16/2010 05:27 AM, Zhang, Xiantao wrote: When vcpus are pinned to pcpus, there is a 50% chance that a guest's vcpus will be co-scheduled and spinlocks will perform will. When vcpus are not pinned, but affine wakeups are disabled, there is a 33% chance that vcpus will be co-scheduled.

Re: VM performance issue in KVM guests.

2010-04-17 Thread Avi Kivity
On 04/15/2010 04:33 PM, Peter Zijlstra wrote: On Thu, 2010-04-15 at 11:18 +0300, Avi Kivity wrote: Certainly that has even greater potential for Linux guests. Note that we spin on mutexes now, so we need to prevent preemption while the lock owner is running. either that, or disable

Re: [PATCH] VGA Bios allow 1920x1080

2010-04-17 Thread Avi Kivity
On 04/13/2010 07:07 AM, Øyvind Sæther wrote: The patch lets me run 1920x1080 resolution, some displays are only that and not 1920x1200 these days. Gentoos ebuild doesn't seem to make the vgabios and only uses /pc-bios/vgabios.bin, making kvm/vgabios and replace vgabios.bin with the

Re: [PATCH V3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-19 Thread Avi Kivity
On 04/17/2010 09:12 PM, Avi Kivity wrote: I think you were right the first time around. Re-reading again (esp. the part about treatment of indirect NMI vmexits), I think this was wrong, and that the code is correct. I am now thoroughly confused. -- error compiling committee.c: too many

Re: [PATCH V5 1/3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-19 Thread Avi Kivity
On 04/19/2010 08:32 AM, Zhang, Yanmin wrote: Below patch introduces perf_guest_info_callbacks and related register/unregister functions. Add more PERF_RECORD_MISC_XXX bits meaning guest kernel and guest user space. This doesn't apply against upstream. What branch was this generated

Re: [PATCH 5/8] KVM: PPC: Be more informative on BUG

2010-04-19 Thread Avi Kivity
On 04/19/2010 04:26 AM, Alexander Graf wrote: Very true. In fact, I certainly remember me putting a return and a WARN_ON(true) because WARN() gave me a warning here. I wonder where that code went ... hrm ... Either way, thanks for looking over this patch! Ugh - I messed up my patch

Re: [PATCH V5 1/3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-19 Thread Avi Kivity
On 04/19/2010 11:55 AM, Zhang, Yanmin wrote: On Mon, 2010-04-19 at 11:37 +0300, Avi Kivity wrote: On 04/19/2010 08:32 AM, Zhang, Yanmin wrote: Below patch introduces perf_guest_info_callbacks and related register/unregister functions. Add more PERF_RECORD_MISC_XXX bits meaning guest

Re: [PATCH V5 1/3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-19 Thread Avi Kivity
On 04/19/2010 11:59 AM, Avi Kivity wrote: What branch was this generated against? It's against the latest tip/master. I checked out to 19b26586090 as the latest tip/master has some updates on perf. I don't want to merge tip/master... does tip/perf/core contain the needed updates

Re: [PATCH 0/1] trace all instructions whose emulation failed

2010-04-19 Thread Avi Kivity
On 04/18/2010 09:33 AM, Manish Regmi wrote: Hi, The following patch makes sure all code path of failed emulation runs trace_kvm_emulate_insn_failed(). Please let me know if there is anything missing or wrong. Thank you. Signed-off-by: Manish Regmiregmi.man...@gmail.com diff --git

Re: [PATCH 1/1] correctly handle VM Entry Exit reasons and also show them in trace.

2010-04-19 Thread Avi Kivity
On 04/18/2010 09:35 AM, Manish Regmi wrote: Hi, When the vm exit reason is VM Entry failures it has leftmost bit set. This patch - clears the leftmost bit when copying to vmx-exit_reason. This will make the checks like if ((vmx-exit_reason == EXIT_REASON_MCE_DURING_VMENTRY) valid in

Re: [PATCH] kvm: use the correct RCU API

2010-04-19 Thread Avi Kivity
On 04/19/2010 12:41 PM, Lai Jiangshan wrote: The RCU/SRCU API have already changed for proving RCU usage. I got the following dmesg when PROVE_RCU=y because we used incorrect API. This patch coverts rcu_deference() to srcu_dereference() or family API.

Re: [PATCH V5 1/3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-19 Thread Avi Kivity
On 04/19/2010 12:22 PM, Zhang, Yanmin wrote: I don't want to merge tip/master... does tip/perf/core contain the needed updates? I think so. A moment ago, I checked out to b5a80b7e9 of tip/perf/core. All 3 patches could be applied cleanly and compilation is ok. A quick testing shows

Re: [BUG] kvm: dereference srcu-protected pointer without srcu_read_lock() held

2010-04-19 Thread Avi Kivity
On 04/19/2010 12:58 PM, Lai Jiangshan wrote: Applied the patch I just sent and let CONFIG_PROVE_RCU=y, we can got the following dmesg. And we found that it is because some codes in KVM dereferences srcu-protected pointer without srcu_read_lock() held or update-side lock held. It is not hard to

Re: [PATCH] KVM: PPC: Make Performance Counters work

2010-04-19 Thread Avi Kivity
On 04/17/2010 01:22 AM, Alexander Graf wrote: When we get a performance counter interrupt we need to route it on to the Linux handler after we got out of the guest context. We also need to tell our handling code that this particular interrupt doesn't need treatment. So let's add those two bits

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity
On 04/19/2010 01:43 PM, Peter Zijlstra wrote: + cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) { struct pvclock_shadow_time shadow; unsigned version; cycle_t ret, offset; +u64 last; +do { +last = last_value;

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity
On 04/19/2010 01:46 PM, Peter Zijlstra wrote: On Sat, 2010-04-17 at 21:48 +0300, Avi Kivity wrote: After this patch is applied, I don't see a single warp in time during 5 days of execution, in any of the machines I saw them before. Please define a cpuid bit that makes

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity
On 04/19/2010 01:39 PM, Peter Zijlstra wrote: On Fri, 2010-04-16 at 13:36 -0700, Jeremy Fitzhardinge wrote: + do { + last = last_value; Does this need a barrier() to prevent the compiler from re-reading last_value for the subsequent lines? Otherwise (ret last)

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity
On 04/19/2010 01:49 PM, Peter Zijlstra wrote: Right, so on x86 we have: X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency independent, not that it doesn't stop in C states and similar fun stuff. X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant and

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity
On 04/19/2010 01:51 PM, Peter Zijlstra wrote: Right, so on x86 we have: X86_FEATURE_CONSTANT_TSC, which only states that TSC is frequency independent, not that it doesn't stop in C states and similar fun stuff. X86_FEATURE_TSC_RELIABLE, which IIRC should indicate the TSC is constant and

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity
On 04/19/2010 02:05 PM, Peter Zijlstra wrote: ACCESS_ONCE() is your friend. I think it's implied with atomic64_read(). Yes it would be. I was merely trying to point out that last = ACCESS_ONCE(last_value); Is a narrower way of writing: last = last_value; barrier();

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity
On 04/19/2010 01:56 PM, Peter Zijlstra wrote: Right, do bear in mind that the x86 implementation of atomic64_read() is terrifyingly expensive, it is better to not do that read and simply use the result of the cmpxchg. atomic64_read() _is_ cmpxchg64b. Are you thinking of some

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity
On 04/19/2010 01:59 PM, Peter Zijlstra wrote: So what do we need? test for both TSC_RELIABLE and NONSTOP_TSC? IMO TSC_RELIABLE should imply NONSTOP_TSC. Yeah, I think RELIABLE does imply NONSTOP and CONSTANT, but NONSTOP CONSTANT does not make RELIABLE. The manual says:

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity
On 04/19/2010 02:19 PM, Peter Zijlstra wrote: Still have two cmpxchgs in the common case. The first iteration will fail, fetching last_value, the second will work. It will be better when we have contention, though, so it's worthwhile. Right, another option is to put the initial read

[PATCH] KVM: MMU: Drop cr4.pge from shadow page role

2010-04-19 Thread Avi Kivity
Since commit bf47a760f66ad, we no longer handle ptes with the global bit set specially, so there is no reason to distinguish between shadow pages created with cr4.gpe set and clear. Such tracking is expensive when the guest toggles cr4.pge, so drop it. Signed-off-by: Avi Kivity a...@redhat.com

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity
On 04/19/2010 05:21 PM, Glauber Costa wrote: Oh yes, just trying to avoid a patch with both atomic64_read() and ACCESS_ONCE(). you're mixing the private version of the patch you saw with this one. there isn't any atomic reads in here. I'll use a barrier then This patch writes

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-19 Thread Avi Kivity
On 04/19/2010 05:32 PM, Glauber Costa wrote: Right, another option is to put the initial read outside of the loop, that way you'll have the best of all cases, a single LOCK'ed op in the loop, and only a single LOCK'ed op for the fast path on sensible architectures ;-) last =

Re: [PATCH 4/5] export new cpuid KVM_CAP

2010-04-20 Thread Avi Kivity
On 04/19/2010 05:50 PM, Glauber Costa wrote: On Sat, Apr 17, 2010 at 09:58:26PM +0300, Avi Kivity wrote: On 04/15/2010 09:37 PM, Glauber Costa wrote: Since we're changing the msrs kvmclock uses, we have to communicate that to the guest, through cpuid. We can add a new KVM_CAP

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-20 Thread Avi Kivity
On 04/19/2010 07:18 PM, Jeremy Fitzhardinge wrote: On 04/19/2010 07:46 AM, Peter Zijlstra wrote: What avi says! :-) On a 32bit machine a 64bit read are two 32bit reads, so last = last_value; becomes: last.high = last_value.high; last.low = last_vlue.low; (or the reverse of

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-20 Thread Avi Kivity
On 04/20/2010 04:57 AM, Marcelo Tosatti wrote: Marcelo can probably confirm it, but he has a nehalem with an appearently very good tsc source. Even this machine warps. It stops warping if we only write pvclock data structure once and forget it, (which only updated tsc_timestamp once),

Re: [PATCH V3] perf kvm: Enhance perf to collect KVM guest os statistics from host side

2010-04-20 Thread Avi Kivity
On 04/20/2010 06:32 AM, Sheng Yang wrote: On Monday 19 April 2010 16:25:17 Avi Kivity wrote: On 04/17/2010 09:12 PM, Avi Kivity wrote: I think you were right the first time around. Re-reading again (esp. the part about treatment of indirect NMI vmexits), I think

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-20 Thread Avi Kivity
On 04/19/2010 09:35 PM, Zachary Amsden wrote: Sockets and boards too? (IOW, how reliable is TSC_RELIABLE)? Not sure, IIRC we clear that when the TSC sync test fails, eg when we mark the tsc clocksource unusable. Worrying. By the time we detect this the guest may already have gotten

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-20 Thread Avi Kivity
On 04/20/2010 03:59 PM, Glauber Costa wrote: Might be due to NMIs or SMIs interrupting the rdtsc(); ktime_get() operation which establishes the timeline. We could limit it by having a loop doing rdtsc(); ktime_get(); rdtsc(); and checking for some bound, but it isn't worthwhile (and will

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-20 Thread Avi Kivity
On 04/20/2010 09:23 PM, Jeremy Fitzhardinge wrote: On 04/20/2010 02:31 AM, Avi Kivity wrote: btw, do you want this code in pvclock.c, or shall we keep it kvmclock specific? I think its a pvclock-level fix. I'd been hoping to avoid having something like this, but I think its

Re: [PATCH 1/5] Add a global synchronization point for pvclock

2010-04-21 Thread Avi Kivity
On 04/21/2010 03:01 AM, Zachary Amsden wrote: on this machine Glauber mentioned, or even on a multi-core Core 2 Duo), but the delta calculation is very hard (if not impossible) to get right. The timewarps i've seen were in the 0-200ns range, and very rare (once every 10 minutes or so).

<    1   2   3   4   5   6   7   8   9   10   >