[PATCH 1/2] kvm: free resources after canceling async_pf

2013-09-04 Thread Radim Krčmář
When we cancel 'async_pf_execute()', we should behave as if the work was never scheduled in 'kvm_setup_async_pf()'. Fixes a bug when we can't unload module because the vm wasn't destroyed. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- virt/kvm/async_pf.c | 5 - 1 file changed, 4

[PATCH 0/2] kvm: fix a bug and remove a redundancy in async_pf

2013-09-04 Thread Radim Krčmář
I did not reproduce the bug fixed in [1/2], but there are not that many reasons why we could not unload a module, so the spot is quite obvious. Radim Krčmář (2): kvm: free resources after canceling async_pf kvm: remove .done from struct kvm_async_pf include/linux/kvm_host.h | 1 - virt/kvm

[PATCH 2/2] kvm: remove .done from struct kvm_async_pf

2013-09-04 Thread Radim Krčmář
'.done' is used to mark the completion of 'async_pf_execute()', but 'cancel_work_sync()' returns true when the work was canceled, so we use it instead. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- include/linux/kvm_host.h | 1 - virt/kvm/async_pf.c | 5 + 2 files changed, 1

[PATCH 4/7] static_key: use static_key_slow_inc_deferred()

2013-10-17 Thread Radim Krčmář
Simple replacement where possible. Saves us problematic access to the structure and allows optimalizations and bug fixes to take place. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/lapic.c | 7 --- kernel/events/core.c | 6 +++--- 2 files changed, 7 insertions(+), 6

Re: [PATCH 4/7] static_key: use static_key_slow_inc_deferred()

2013-10-18 Thread Radim Krčmář
2013-10-17 12:39+0200, Paolo Bonzini: Il 17/10/2013 12:10, Radim Krčmář ha scritto: Simple replacement where possible. Saves us problematic access to the structure and allows optimalizations and bug fixes to take place. I think you should introduce this first as a simple wrapper around

[PATCH v2 5/5] static_key: improve deferred inc behavior

2013-12-06 Thread Radim Krčmář
We can cancel deferred static_key_slow_dec() instead of increasing .enabled.counter. Timer now won't fire before 'timeout' since the last increase, so this patch further stabilizes the case of frequent switching. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- kernel/jump_label.c | 3 ++- 1

[PATCH v2 0/5] static_key: deferred key fixes and improvements

2013-12-06 Thread Radim Krčmář
this on Tuesday and then moved to higher priority work, but returned with enough courage to post a different first part. The first part was tested on amd64, s390x and ppc64, the rest also on armv7. Applies to next-20131206 and v3.13-rc3. Radim Krčmář (5): static_key: add a section for deferred keys

[PATCH v2 1/5] static_key: add a section for deferred keys

2013-12-06 Thread Radim Krčmář
-by: Radim Krčmář rkrc...@redhat.com --- More general solution would use compile-time magic to generate an array of pointers to deferred structures, but I am not sure if it is acceptable and possible. Worse approach added an unload_callback_list to the struct module. Callbacks of type void (*)(void

[PATCH v2 2/5] static_key: cancel rate limit timer on rmmod

2013-12-06 Thread Radim Krčmář
-monitor stdio sleep 0.5 until modprobe -rv kvm_intel 2/dev/null; do :; done Signed-off-by: Radim Krčmář rkrc...@redhat.com --- I decided not to post a patch that uses __deferred_key in kernel/module init, so these three functions might seem like an overkill. kernel/jump_label.c | 22

[PATCH v2 3/5] static_key: add static_key_slow_inc_deferred()

2013-12-06 Thread Radim Krčmář
Complement the static_key_slow_dec_deferred(). This avoids asymmetrical API, and prepares us for future optimizations and bug fixes. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/lapic.c | 7 --- include/linux/jump_label_ratelimit.h | 5 + kernel/events

[PATCH v2 4/5] static_key: keep deferred enabled counter debt

2013-12-06 Thread Radim Krčmář
and we skip an equal amount of static_key_slow_inc_deferred(). Signed-off-by: Radim Krčmář rkrc...@redhat.com --- include/linux/jump_label_ratelimit.h | 1 + kernel/jump_label.c | 17 +++-- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/include/linux

Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-13 Thread Radim Krčmář
2013-12-12 21:36+0100, Paolo Bonzini: From: Gleb Natapov g...@redhat.com A guest can cause a BUG_ON() leading to a host kernel crash. When the guest writes to the ICR to request an IPI, while in x2apic mode the following things happen, the destination is read from ICR2, which is a register

Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-13 Thread Radim Krčmář
2013-12-13 18:25+0100, Paolo Bonzini: Il 13/12/2013 17:07, Radim Krčmář ha scritto: This bug can only be hit when the destination cpu is 256, so the request itself is buggy -- we don't support that many in kvm and it would crash when initializing the vcpus if we did

Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Radim Krčmář
2013-12-14 11:46+0200, Gleb Natapov: On Fri, Dec 13, 2013 at 05:07:54PM +0100, Radim Krčmář wrote: 2013-12-12 21:36+0100, Paolo Bonzini: From: Gleb Natapov g...@redhat.com A guest can cause a BUG_ON() leading to a host kernel crash. When the guest writes to the ICR to request

Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Radim Krčmář
2013-12-16 14:16+0200, Gleb Natapov: On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote: - Where does the 'only one supported cluster' come from? only one supported cluster comes from 8 bit cpuid limitation of KVM's x2apic implementation. With 8 bit cpuid you can

Re: [PATCH] KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376)

2013-12-16 Thread Radim Krčmář
2013-12-16 13:55+0100, Radim Krčmář: 2013-12-16 14:16+0200, Gleb Natapov: On Mon, Dec 16, 2013 at 01:01:10PM +0100, Radim Krčmář wrote: - Where does the 'only one supported cluster' come from? only one supported cluster comes from 8 bit cpuid limitation of KVM's x2apic

[PATCH] KVM: SVM: fix NMI window after iret

2014-01-17 Thread Radim Krčmář
remains closed until the next exit. Fix this by making KVM_REQ_EVENT request in the iret handler. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- (btw. kvm-unit-tests weren't executed on SVM since Nov 2010, at least) arch/x86/kvm/svm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch

Re: [PATCH] KVM: SVM: fix NMI window after iret

2014-01-17 Thread Radim Krčmář
2014-01-17 12:18-0800, Greg KH: On Fri, Jan 17, 2014 at 08:52:42PM +0100, Radim Krčmář wrote: We should open NMI window right after an iret, but SVM exits before it. We wanted to single step using the trap flag and then open it. (or we could emulate the iret instead) We don't do it since

[PATCH] kvm: print suberror on all internal errors

2014-01-21 Thread Radim Krčmář
KVM_CAP_INTERNAL_ERROR_DATA condition on printf. (partially improved by bb44e0d12df70 and ba4047cf848a3 in the past) Signed-off-by: Radim Krčmář rkrc...@redhat.com --- kvm-all.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 0bfb060..0a91d8e 100644

Re: host_fx_image, guest_fx_image ; what are they ?

2014-03-05 Thread Radim Krčmář
2014-03-04 23:35+0530, ratheesh kannoth: Could you please help me understand below variables ? i found it in struct kvm_vcpu char *host_fx_image; char *guest_fx_image; Commit b114b0804df7131cb6764b948c1c530c834fa3c0 explains them. (How does the existence depend on

Re: [PATCH 4/7] KVM: vmx: Allow the guest to run with dirty debug registers

2014-03-09 Thread Radim Krčmář
2014-03-07 12:42+0100, Paolo Bonzini: When not running in guest-debug mode (i.e. the guest controls the debug registers, having to take an exit for each DR access is a waste of time. If the guest gets into a state where each context switch causes DR to be saved and restored, this can take away

Re: [PATCH 3/7] KVM: x86: Allow the guest to run with dirty debug registers

2014-03-09 Thread Radim Krčmář
2014-03-07 12:42+0100, Paolo Bonzini: When not running in guest-debug mode, the guest controls the debug registers and having to take an exit for each DR access is a waste of time. If the guest gets into a state where each context switch causes DR to be saved and restored, this can take away

Re: [PATCH 3/7] KVM: x86: Allow the guest to run with dirty debug registers

2014-03-10 Thread Radim Krčmář
2014-03-09 21:07+0100, Paolo Bonzini: Il 09/03/2014 19:28, Radim Krčmář ha scritto: /* + * Do this here before restoring debug registers on the host. And + * since we do this before handling the vmexit, a DR access vmexit + * can (a) read the correct value of the debug registers, (b

Re: [PATCH 4/7] KVM: vmx: Allow the guest to run with dirty debug registers

2014-03-10 Thread Radim Krčmář
2014-03-09 21:12+0100, Paolo Bonzini: Il 09/03/2014 19:26, Radim Krčmář ha scritto: + + /* + * No more DR vmexits; force a reload of the debug registers + * and reenter on this instruction. The next vmexit will + * retrieve the full state

Re: [PATCH 0/7] KVM: x86: Let the guest write to multiple debug registers with one vmexit

2014-03-10 Thread Radim Krčmář
KVM: svm: set/clear all DR intercepts in one swoop KVM: svm: Allow the guest to run with dirty debug registers All patches, Reviewed-by: Radim Krčmář rkrc...@redhat.com This series is good even without vmcs_{set,clr}_bits(). (There is enough of them already to warrant a cleanup patch

[PATCH] KVM: SVM: fix cr8 intercept window

2014-03-11 Thread Radim Krčmář
when needed. This fixes BSOD in Windows 2008. Cc: sta...@vger.kernel.org Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/svm.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 64d9bb9..f676c18 100644 --- a/arch

Re: [PATCH] KVM: SVM: fix cr8 intercept window

2014-03-12 Thread Radim Krčmář
2014-03-11 22:05-0300, Marcelo Tosatti: On Tue, Mar 11, 2014 at 07:11:18PM +0100, Radim Krčmář wrote: We always disable cr8 intercept in its handler, but only re-enable it if handling KVM_REQ_EVENT, so there can be a window where we do not intercept cr8 writes, which allows an interrupt

Re: [PATCH] KVM: SVM: fix cr8 intercept window

2014-03-13 Thread Radim Krčmář
2014-03-13 15:52+0200, Gleb Natapov: On Wed, Mar 12, 2014 at 06:20:01PM +0100, Paolo Bonzini wrote: Il 12/03/2014 11:40, Radim Krčmář ha scritto: 2014-03-11 22:05-0300, Marcelo Tosatti: On Tue, Mar 11, 2014 at 07:11:18PM +0100, Radim Krčmář wrote: We always disable cr8 intercept in its

Re: [PATCH v2 3/4] KVM: ioapic: extract body of kvm_ioapic_set_irq

2014-03-21 Thread Radim Krčmář
2014-03-21 10:28+0100, Paolo Bonzini: We will reuse it to process a nonzero IRR that is passed to KVM_SET_IRQCHIP. Reviewed-by: Alex Williamson alex.william...@redhat.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- v1-v2: more comments change argument name from

Re: [PATCH v2 3/4] KVM: ioapic: extract body of kvm_ioapic_set_irq

2014-03-24 Thread Radim Krčmář
2014-03-23 09:44+0100, Paolo Bonzini: Il 21/03/2014 19:58, Radim Krčmář ha scritto: + /* + * Return 0 for coalesced interrupts; for edge-triggered interrupts, + * this only happens if a previous edge has not been delivered due + * do masking. For level interrupts, the remote_irr field

Re: [PATCH v2 4/4] KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP

2014-03-24 Thread Radim Krčmář
2014-03-21 10:28+0100, Paolo Bonzini: After the previous patches, an interrupt whose bit is set in the IRR register will never be in the LAPIC's IRR and has never been injected on the migration source. So inject it on the destination. This fixes migration of Windows guests without HPET

Re: [PATCH v2 0/4] KVM: cleanup ioapic and fix KVM_SET_IRQCHIP with irr != 0

2014-03-24 Thread Radim Krčmář
argument name in patch 3 from level to irq_level use IOAPIC_NUM_PINS in patch 4 as a limit to for_each_set_bit remove debug printk in patch 4 Nice solution to a tricky problem, Reviewed-by: Radim Krčmář rkrc...@redhat.com Paolo Bonzini (4): KVM: ioapic: merge ioapic_deliver

Re: [PATCH v2 4/4] KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP

2014-03-24 Thread Radim Krčmář
2014-03-24 19:14+0100, Paolo Bonzini: Il 24/03/2014 18:58, Radim Krčmář ha scritto: I'd prefer 'ioapic-irr = 0' here ...) The point is that ioapic-irr = 0 is overriding the previous memcpy, because state-irr is used as argument to kvm_ioapic_inject_all instead. So I think iopic-irr = 0

Re: [PATCH v10 03/19] qspinlock: Add pending bit

2014-05-12 Thread Radim Krčmář
2014-05-07 11:01-0400, Waiman Long: From: Peter Zijlstra pet...@infradead.org Because the qspinlock needs to touch a second cacheline; add a pending bit and allow a single in-word spinner before we punt to the second cacheline. I think there is an unwanted scenario on virtual machines: 1)

Re: [PATCH v10 10/19] qspinlock, x86: Allow unfair spinlock in a virtual guest

2014-05-12 Thread Radim Krčmář
(tl;dr: paravirtualization could be better than unfair qspinlock) 2014-05-07 11:01-0400, Waiman Long: Locking is always an issue in a virtualized environment because of 2 different types of problems: 1) Lock holder preemption 2) Lock waiter preemption Paravirtualized ticketlocks have a

Re: [PATCH v10 03/19] qspinlock: Add pending bit

2014-05-14 Thread Radim Krčmář
2014-05-13 15:47-0400, Waiman Long: On 05/12/2014 11:22 AM, Radim Krčmář wrote: I think there is an unwanted scenario on virtual machines: 1) VCPU sets the pending bit and start spinning. 2) Pending VCPU gets descheduled. - we have PLE and lock holder isn't running [1

Re: [PATCH v10 03/19] qspinlock: Add pending bit

2014-05-14 Thread Radim Krčmář
2014-05-14 19:00+0200, Peter Zijlstra: On Wed, May 14, 2014 at 06:51:24PM +0200, Radim Krčmář wrote: Ok. I've seen merit in pvqspinlock even with slightly slower first-waiter, so I would have happily sacrificed those horrible branches. (I prefer elegant to optimized code, but I can see

Re: [PATCH 1/1] kvm: increase max vcpu count

2014-05-15 Thread Radim Krčmář
2014-05-15 17:50+0800, Li, Zhen-Hua: This patch is trying to increase the maximum supported vcpu number. There has been big system supporting more than 256 logical CPUs, and vmware can also support guest system with more than logical 256 CPUs. So kvm should also increase the maximum

Re: [RFC 08/07] qspinlock: integrate pending bit into queue

2014-05-21 Thread Radim Krčmář
2014-05-21 18:49+0200, Radim Krčmář: 2014-05-19 16:17-0400, Waiman Long: As for now, I will focus on just having one pending bit. I'll throw some ideas at it, One of the ideas follows; it seems sound, but I haven't benchmarked it thoroughly. (Wasted a lot of time by writing/playing

Re: [PATCH v7 08/14] Introduce chr-testdev

2014-07-22 Thread Radim Krčmář
2014-07-16 11:33+0200, Andrew Jones: On Wed, Jul 16, 2014 at 05:31:33AM -0400, Levente Kurusa wrote: - Original Message - [...] +void chr_testdev_exit(int code) +{ + char buf[8]; + int len; + + snprintf(buf, sizeof(buf), %dq, code); + len = strlen(buf);

[PATCH 1/9] KVM: add kvm_arch_sched_in

2014-08-19 Thread Radim Krčmář
Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch is slightly simpler code and guaranteed call order with respect to kvm_sched_in. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/arm/kvm/arm.c | 4 arch/mips/kvm

[PATCH 5/9] KVM: VMX: clamp PLE window

2014-08-19 Thread Radim Krčmář
Modifications could get unwanted values of PLE window. (low or negative) Use ple_window and the maximal value that cannot overflow as bounds. ple_window_max defaults to a very high value, but it would make sense to set it to some fraction of the scheduler tick. Signed-off-by: Radim Krčmář rkrc

[PATCH 6/9] KVM: trace kvm_ple_window grow/shrink

2014-08-19 Thread Radim Krčmář
Tracepoint for dynamic PLE window, fired on every potential change. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/trace.h | 29 + arch/x86/kvm/vmx.c | 4 arch/x86/kvm/x86.c | 1 + 3 files changed, 34 insertions(+) diff --git a/arch/x86

[PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-19 Thread Radim Krčmář
Every increase of ple_window_grow creates potential overflows. They are not serious, because we clamp ple_window and userspace is expected to fix ple_window_max within a second. --- arch/x86/kvm/vmx.c | 34 +- 1 file changed, 33 insertions(+), 1 deletion(-) diff

[PATCH 7/9] KVM: VMX: abstract ple_window modifiers

2014-08-19 Thread Radim Krčmář
They were almost identical and thus merged with a loathable macro. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- This solution is hopefully more acceptable than function pointers. arch/x86/kvm/vmx.c | 53 +++-- 1 file changed, 19 insertions

[PATCH 8/9] KVM: VMX: runtime knobs for dynamic PLE window

2014-08-19 Thread Radim Krčmář
ple_window is updated on every vmentry, so there is no reason to have it read-only anymore. ple_window_* weren't writable to prevent runtime overflow races; they are mitigated by clamping the value of ple_window. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- If we decide to ignore insane

[PATCH 4/9] KVM: VMX: dynamise PLE window

2014-08-19 Thread Radim Krčmář
---++- 1| = ple_window | = ple_window ple_window | *= ple_window_grow | /= ple_window_shrink otherwise | += ple_window_grow | -= ple_window_shrink Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/vmx.c

[PATCH 2/9] KVM: x86: introduce sched_in to kvm_x86_ops

2014-08-19 Thread Radim Krčmář
sched_in preempt notifier is available for x86, allow its use in specific virtualization technlogies as well. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/svm.c | 6 ++ arch/x86/kvm/vmx.c | 6 ++ arch

[PATCH 3/9] KVM: VMX: make PLE window per-vcpu

2014-08-19 Thread Radim Krčmář
Change PLE window into per-vcpu variable, seeded from module parameter, to allow greater flexibility. Brings in a small overhead on every vmentry. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- I've been thinking about a general hierarchical per-vcpu variable model, but it's hard to have

[PATCH 0/9] Dynamic Pause Loop Exiting window.

2014-08-19 Thread Radim Krčmář
overflow checks: they are impossible to hit with small increases and I don't think that anyone wants large ones. Also, I'd argue against the last patch: it should be done in userspace, but I'm not sure about Linux's policy. Radim Krčmář (9): KVM: add kvm_arch_sched_in KVM: x86: introduce sched_in

Re: [PATCH 7/9] KVM: VMX: abstract ple_window modifiers

2014-08-20 Thread Radim Krčmář
2014-08-20 09:02+0200, Paolo Bonzini: Il 19/08/2014 22:35, Radim Krčmář ha scritto: They were almost identical and thus merged with a loathable macro. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- This solution is hopefully more acceptable than function pointers. I think

Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-20 Thread Radim Krčmář
2014-08-20 09:16+0200, Paolo Bonzini: Il 19/08/2014 22:35, Radim Krčmář ha scritto: Every increase of ple_window_grow creates potential overflows. They are not serious, because we clamp ple_window and userspace is expected to fix ple_window_max within a second. --- I think avoiding

Re: [PATCH 5/9] KVM: VMX: clamp PLE window

2014-08-20 Thread Radim Krčmář
2014-08-20 09:18+0200, Paolo Bonzini: Il 19/08/2014 22:35, Radim Krčmář ha scritto: Modifications could get unwanted values of PLE window. (low or negative) Use ple_window and the maximal value that cannot overflow as bounds. ple_window_max defaults to a very high value, but it would

Re: [PATCH 3/9] KVM: VMX: make PLE window per-vcpu

2014-08-20 Thread Radim Krčmář
2014-08-20 09:13+0200, Paolo Bonzini: Il 19/08/2014 22:35, Radim Krčmář ha scritto: enum segment_cache_field { @@ -4403,6 +4406,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) if (ple_gap) { vmcs_write32(PLE_GAP, ple_gap); vmcs_write32(PLE_WINDOW

Re: [PATCH 1/9] KVM: add kvm_arch_sched_in

2014-08-20 Thread Radim Krčmář
2014-08-20 09:47+0200, Christian Borntraeger: On 19/08/14 22:35, Radim Krčmář wrote: --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3123,6 +3123,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu) if (vcpu-preempted) vcpu-preempted = false

Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-20 Thread Radim Krčmář
2014-08-20 15:15+0200, Paolo Bonzini: Il 20/08/2014 14:41, Radim Krčmář ha scritto: if (ple_window_grow 1 || ple_window_actual_max ple_window) new = ple_window; else if (ple_window_grow ple_window) new = max(ple_window_actual_max, old) * ple_window_grow

Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-20 Thread Radim Krčmář
2014-08-20 17:34+0200, Paolo Bonzini: Il 20/08/2014 17:31, Radim Krčmář ha scritto: Btw. without extra code, we are still going to overflow on races when changing PW_grow, should they be covered as well? You mean because there is no spinlock or similar protecting the changes? I guess you

Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum

2014-08-20 Thread Radim Krčmář
2014-08-20 18:03+0200, Paolo Bonzini: Il 20/08/2014 18:01, Radim Krčmář ha scritto: 2014-08-20 17:34+0200, Paolo Bonzini: Il 20/08/2014 17:31, Radim Krčmář ha scritto: Btw. without extra code, we are still going to overflow on races when changing PW_grow, should they be covered as well

[PATCH v2 3/6] KVM: VMX: make PLE window per-VCPU

2014-08-20 Thread Radim Krčmář
Change PLE window into per-VCPU variable, seeded from module parameter, to allow greater flexibility. Brings in a small overhead on every vmentry. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/vmx.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git

[PATCH v2 6/6] KVM: VMX: runtime knobs for dynamic PLE window

2014-08-20 Thread Radim Krčmář
ple_window is updated on every vmentry, so there is no reason to have it read-only anymore. ple_window* weren't writable to prevent runtime overflow races; they are prevented by a seqlock. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/vmx.c | 48

[PATCH v2 4/6] KVM: VMX: dynamise PLE window

2014-08-20 Thread Radim Krčmář
. A minimum equals to ple_window. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/vmx.c | 80 -- 1 file changed, 78 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 18e0e52..e63d7ac 100644

[PATCH v2 5/6] KVM: trace kvm_ple_window

2014-08-20 Thread Radim Krčmář
Tracepoint for dynamic PLE window, fired on every potential change. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/trace.h | 25 + arch/x86/kvm/vmx.c | 8 +--- arch/x86/kvm/x86.c | 1 + 3 files changed, 31 insertions(+), 3 deletions(-) diff

[PATCH v2 2/6] KVM: x86: introduce sched_in to kvm_x86_ops

2014-08-20 Thread Radim Krčmář
sched_in preempt notifier is available for x86, allow its use in specific virtualization technlogies as well. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/svm.c | 6 ++ arch/x86/kvm/vmx.c | 6 ++ arch

[PATCH v2 0/6] Dynamic Pause Loop Exiting window.

2014-08-20 Thread Radim Krčmář
spirals out of control. By increasing the window after every PLE exit, we can limit the amount of useless ones, so we don't reach the state where CPUs spend 99% of the time waiting for a lock. HP confirmed that this series prevents soft lockups and TSC sync errors on large guests. Radim Krčmář (6

[PATCH v2 1/6] KVM: add kvm_arch_sched_in

2014-08-20 Thread Radim Krčmář
Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch is slightly simpler code and guaranteed call order with respect to kvm_sched_in. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/arm/kvm/arm.c | 4 arch/mips/kvm

Re: [PATCH v2 1/6] KVM: add kvm_arch_sched_in

2014-08-21 Thread Radim Krčmář
2014-08-21 10:29+0200, Paolo Bonzini: Il 20/08/2014 22:53, Radim Krčmář ha scritto: Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch is slightly simpler code and guaranteed call order with respect to kvm_sched_in. Signed

Re: [PATCH v2 3/6] KVM: VMX: make PLE window per-VCPU

2014-08-21 Thread Radim Krčmář
2014-08-21 10:25+0200, Paolo Bonzini: Il 20/08/2014 22:53, Radim Krčmář ha scritto: + if (ple_gap) + vmcs_write32(PLE_WINDOW, vmx-ple_window); Maybe we can add a ple_window_dirty field? It can be tested instead of ple_gap to avoid the vmwrite in the common case? Sure, v3

Re: [PATCH v2 4/6] KVM: VMX: dynamise PLE window

2014-08-21 Thread Radim Krčmář
2014-08-21 10:24+0200, Paolo Bonzini: Il 20/08/2014 22:53, Radim Krčmář ha scritto: +static void update_ple_window_actual_max(void) +{ + ple_window_actual_max = + __shrink_ple_window(max(ple_window_max, ple_window), Why the max() here? To have ple_window act

Re: [PATCH v2 4/6] KVM: VMX: dynamise PLE window

2014-08-21 Thread Radim Krčmář
2014-08-21 10:26+0200, Paolo Bonzini: Il 20/08/2014 22:53, Radim Krčmář ha scritto: +static int __shrink_ple_window(int val, int shrinker, int minimum) s/shrinker/factor/ or s/shrinker/param/ (shrinker has another meaning in the kernel). True, thanks. +{ + if (shrinker 1

Re: [PATCH v2 5/6] KVM: trace kvm_ple_window

2014-08-21 Thread Radim Krčmář
2014-08-21 10:29+0200, Paolo Bonzini: Il 20/08/2014 22:53, Radim Krčmář ha scritto: Tracepoint for dynamic PLE window, fired on every potential change. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/trace.h | 25 + arch/x86/kvm/vmx.c | 8

Re: [PATCH v2 4/6] KVM: VMX: dynamise PLE window

2014-08-21 Thread Radim Krčmář
2014-08-21 14:29+0200, Paolo Bonzini: Il 21/08/2014 13:54, Radim Krčmář ha scritto: Guity as charged. Using 0/1 or true/false in this context directly would be pretty bad ... Is enum fine? (SHIRINK_PLE_WINDOW, GROW_PLE_WINDOW?) I prefer good old Ctrl-C Ctrl-V (adjusted for your favorite

Re: [PATCH v2 1/6] KVM: add kvm_arch_sched_in

2014-08-21 Thread Radim Krčmář
2014-08-21 14:27+0200, Paolo Bonzini: Il 21/08/2014 13:38, Radim Krčmář ha scritto: 2014-08-21 10:29+0200, Paolo Bonzini: Il 20/08/2014 22:53, Radim Krčmář ha scritto: Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch

Re: [PATCH v2 4/6] KVM: VMX: dynamise PLE window

2014-08-21 Thread Radim Krčmář
2014-08-21 15:18+0200, Paolo Bonzini: Il 21/08/2014 14:42, Radim Krčmář ha scritto: 2014-08-21 14:29+0200, Paolo Bonzini: Il 21/08/2014 13:54, Radim Krčmář ha scritto: Guity as charged. Using 0/1 or true/false in this context directly would be pretty bad ... Is enum fine

Re: [PATCH v2 5/6] KVM: trace kvm_ple_window

2014-08-21 Thread Radim Krčmář
2014-08-21 15:22+0200, Paolo Bonzini: Il 21/08/2014 13:56, Radim Krčmář ha scritto: 2014-08-21 10:29+0200, Paolo Bonzini: Il 20/08/2014 22:53, Radim Krčmář ha scritto: + TP_PROTO(int grow, unsigned int vcpu_id, int new, int old), s/int grow/bool grow/ (and similarly in TP_STRUCT__entry

[PATCH v3 0/7] Dynamic Pause Loop Exiting window.

2014-08-21 Thread Radim Krčmář
waiting for a lock. HP confirmed that this series prevents soft lockups and TSC sync errors on large guests. Radim Krčmář (7): KVM: add kvm_arch_sched_in KVM: x86: introduce sched_in to kvm_x86_ops KVM: VMX: make PLE window per-VCPU KVM: VMX: dynamise PLE window KVM: trace kvm_ple_window

[PATCH v3 5/7] KVM: trace kvm_ple_window grow/shrink

2014-08-21 Thread Radim Krčmář
Tracepoint for dynamic PLE window, fired on every potential change. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/trace.h | 30 ++ arch/x86/kvm/vmx.c | 10 -- arch/x86/kvm/x86.c | 1 + 3 files changed, 39 insertions(+), 2 deletions

[PATCH v3 4/7] KVM: VMX: dynamise PLE window

2014-08-21 Thread Radim Krčmář
; it is internally rounded down to a closest multiple of ple_window_grow. VCPU's PLE window is never allowed below ple_window. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/vmx.c | 87 -- 1 file changed, 85 insertions(+), 2

[PATCH v3 6/7] KVM: VMX: runtime knobs for dynamic PLE window

2014-08-21 Thread Radim Krčmář
ple_window is updated on every vmentry, so there is no reason to have it read-only anymore. ple_window* weren't writable to prevent runtime overflow races; they are prevented by a seqlock. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/vmx.c | 46

[PATCH v3 7/7] KVM: VMX: optimize ple_window updates to VMCS

2014-08-21 Thread Radim Krčmář
ple_window is preserved in VMCS, so can write it only after a change. Do this by keeping a dirty bit. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/vmx.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c

[PATCH v3 3/7] KVM: VMX: make PLE window per-VCPU

2014-08-21 Thread Radim Krčmář
Change PLE window into per-VCPU variable, seeded from module parameter, to allow greater flexibility. Brings in a small overhead on every vmentry. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/vmx.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git

[PATCH v3 1/7] KVM: add kvm_arch_sched_in

2014-08-21 Thread Radim Krčmář
Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch is slightly simpler code and guaranteed call order with respect to kvm_sched_in. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/arm/kvm/arm.c | 4 arch/mips/kvm

[PATCH v3 2/7] KVM: x86: introduce sched_in to kvm_x86_ops

2014-08-21 Thread Radim Krčmář
sched_in preempt notifier is available for x86, allow its use in specific virtualization technlogies as well. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/svm.c | 6 ++ arch/x86/kvm/vmx.c | 6 ++ arch

Re: [PATCH v3 0/7] Dynamic Pause Loop Exiting window.

2014-08-21 Thread Radim Krčmář
2014-08-21 18:30+0200, Paolo Bonzini: Il 21/08/2014 18:08, Radim Krčmář ha scritto: I'm not sure of the usefulness of patch 6, so I'm going to drop it. I'll keep it in my local junkyard branch in case it's going to be useful in some scenario I didn't think of. I've been using it to benchmark

Re: [PATCH v3 1/7] KVM: add kvm_arch_sched_in

2014-08-21 Thread Radim Krčmář
2014-08-22 00:19+0530, Raghavendra K T: On 08/21/2014 09:38 PM, Radim Krčmář wrote: Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch is slightly simpler code and guaranteed call order with respect to kvm_sched_in. Signed-off

Re: [PATCH v3 4/7] KVM: VMX: dynamise PLE window

2014-08-21 Thread Radim Krčmář
2014-08-22 00:40+0530, Raghavendra K T: On 08/21/2014 09:38 PM, Radim Krčmář wrote: Thanks for the nice patch. default of grow = 2 and shrink = 0 is very good, which aids fast clamping in overcommit and less ple_exits in undercommit. with a small concern over modifier (shrinker) value

Re: [PATCH v3 6/7] KVM: VMX: runtime knobs for dynamic PLE window

2014-08-21 Thread Radim Krčmář
2014-08-22 00:47+0530, Raghavendra K T: Positive thing about able to change default grow/shrink value is the flexibility of tuning ple window to different workloads and different number of cpus. But is it that a value of shrink = 1 and grow 1 is problematic ? (running a undercommit

Re: [PATCH v3 5/7] KVM: trace kvm_ple_window grow/shrink

2014-08-25 Thread Radim Krčmář
2014-08-25 15:53+0200, Sabrina Dubroca: Hello, 2014-08-21, 18:08:09 +0200, Radim Krčmář wrote: Tracepoint for dynamic PLE window, fired on every potential change. +#define trace_kvm_ple_window_grow(vcpu_id, new, old) \ + trace_kvm_ple_window(true, vcpu_id, new, old) +#define

Re: [PATCH v3 0/7] Dynamic Pause Loop Exiting window.

2014-08-25 Thread Radim Krčmář
2014-08-22 12:45+0800, Wanpeng Li: Hi Radim, On Thu, Aug 21, 2014 at 06:50:03PM +0200, Radim Krčmář wrote: 2014-08-21 18:30+0200, Paolo Bonzini: Il 21/08/2014 18:08, Radim Krčmář ha scritto: I'm not sure of the usefulness of patch 6, so I'm going to drop it. I'll keep it in my local

[PATCH 1/2] KVM: static inline empty kvm_arch functions

2014-08-28 Thread Radim Krčmář
kvm_arch_flush_shadow and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c. e790d9ef6 KVM: add kvm_arch_sched_in Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/arm/include/asm/kvm_host.h | 6 ++ arch/arm/kvm/arm.c | 19 arch/arm64/include/asm

[PATCH 0/2] KVM: minor cleanup and optimizations

2014-08-28 Thread Radim Krčmář
The first patch answers a demand for inline arch functions. (There is a lot of constant functions that could be inlined as well.) Second patch digs a bit into the history of KVM and removes a useless argument that seemed suspicious when preparing the first patch. Radim Krčmář (2): KVM: static

[PATCH 2/2] KVM: remove garbage arg to *hardware_{en,dis}able

2014-08-28 Thread Radim Krčmář
In the beggining was on_each_cpu(), which required an unused argument to kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten. Remove unnecessary arguments that stem from this. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/arm/include/asm/kvm_host.h | 2 +- arch/arm

Re: [PATCH] KVM: forward declare structs in kvm_types.h

2014-08-29 Thread Radim Krčmář
. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- Reviewed-by: Radim Krčmář rkrc...@redhat.com (Inclusion of linux/types.h seems to be piggybacking, but is a nice thing to do :) And I'd apply these changes: (Definitely the first hunk.) diff --git a/include/linux/kvm_host.h b/include/linux

Re: [PATCH 2/2] virtio-rng: fix stuck in catting hwrng attributes

2014-09-15 Thread Radim Krčmář
2014-09-14 10:25+0800, Amos Kong: On Sun, Sep 14, 2014 at 09:12:08AM +0800, Amos Kong wrote: ... diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index c591d7e..b5d1b6f 100644 --- a/drivers/char/hw_random/core.c +++

Re: [PATCH] Using the tlb flush util function where applicable

2014-09-15 Thread Radim Krčmář
the counting to kvm_check_request(). Signed-off-by: Radim Krčmář rkrc...@redhat.com --- (And what about a possible followup patch that replaces kvm_mmu_flush_tlb() with kvm_make_request() again? It would free the namespace a bit and we could call something similarly named from vcpu_enter_guest

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-16 Thread Radim Krčmář
think it's best to merge it through the KVM tree with someone's Acked-by. I would prefer to have the last hunk in a separate patch, but still, Acked-by: Radim Krčmář rkrc...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-16 Thread Radim Krčmář
[Emergency posting to fix the tag and couldn't find unmangled Cc list, so some recipients were dropped, sorry. (I guess you are glad though).] 2014-09-16 14:01-0700, Andres Lagar-Cavilla: On Tue, Sep 16, 2014 at 1:51 PM, Radim Krčmář rkrc...@redhat.com wrote: 2014-09-15 13:11-0700, Andres

Re: [PATCH] Using the tlb flush util function where applicable

2014-09-17 Thread Radim Krčmář
2014-09-17 08:15+0800, Wanpeng Li: Hi Radim, On Mon, Sep 15, 2014 at 09:33:52PM +0200, Radim Krčmář wrote: Do you prefer the current behavior? --- 8 --- KVM: x86: count actual tlb flushes - we count KVM_REQ_TLB_FLUSH requests, not actual flushes So there maybe multiple requests

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Radim Krčmář
2014-09-17 13:26+0300, Gleb Natapov: For async_pf_execute() you do not need to even retry. Next guest's page fault will retry it for you. Wouldn't that be a waste of vmentries? The guest might be able to handle interrupts while we are waiting, so if we used async-io-done notifier, this could

Re: [PATCH] kvm: Faults which trigger IO release the mmap_sem

2014-09-17 Thread Radim Krčmář
[Repost for lists, the last mail was eaten by a security troll.] 2014-09-16 14:01-0700, Andres Lagar-Cavilla: On Tue, Sep 16, 2014 at 1:51 PM, Radim Krčmář rkrc...@redhat.com wrote: 2014-09-15 13:11-0700, Andres Lagar-Cavilla: +int kvm_get_user_page_retry(struct task_struct *tsk, struct

Re: [RESEND PATCH 1/3] x86: Adding structs to reflect cpuid fields

2014-09-17 Thread Radim Krčmář
2014-09-17 16:06+0200, Borislav Petkov: On Wed, Sep 17, 2014 at 04:53:39PM +0300, Nadav Amit wrote: AFAIK backward compatibility is usually maintained in x86. I did not see in Intel SDM anything that says this CPUID field means something for CPU X and something else for CPU Y. Anyhow, it is

  1   2   3   4   5   6   >