[PATCH kvm-unit-tests 1/3] x86-run: correct a typo 'qemsystem' - 'qemusystem'
x86-run: correct a typo 'qemsystem' - 'qemusystem' Before this fix, you should always get error info as below when running 'x86-run' script. QEMU binary has no support for test device. Exiting. Signed-off-by: Yongjie Ren yongjie@intel.com --- x86-run |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/x86-run b/x86-run index 2cf1f38..9526a0b 100755 --- a/x86-run +++ b/x86-run @@ -8,7 +8,7 @@ then qemu=${qemukvm} else if - ${qemsystem} -device '?' 21 | fgrep -e \testdev\ -e \pc-testdev\ /dev/null; + ${qemusystem} -device '?' 21 | fgrep -e \testdev\ -e \pc-testdev\ /dev/null; then qemu=${qemusystem} else -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 3/3] x86-run: keep constant coding style for the 'if' statement
x86-run: keep constant coding style for the 'if' statement Signed-off-by: Yongjie Ren yongjie@intel.com --- x86-run |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/x86-run b/x86-run index daefd4a..6093a72 100755 --- a/x86-run +++ b/x86-run @@ -17,7 +17,8 @@ else fi fi -if ${qemu} -device '?' 21 | fgrep pci-testdev /dev/null; +if + ${qemu} -device '?' 21 | fgrep pci-testdev /dev/null; then pci_testdev=-device pci-testdev else -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 2/3] x86-run: use /bin/bash instead of /usr/bin/bash
'bash' should be always located in /bin/bash instead of /usr/bin/bash. Other bash scripts in kvm-unit-tests also use '/bin/bash' as the interpreter. Signed-off-by: Yongjie Ren yongjie@intel.com --- x86-run |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/x86-run b/x86-run index 9526a0b..daefd4a 100755 --- a/x86-run +++ b/x86-run @@ -1,4 +1,4 @@ -#!/usr/bin/bash +#!/bin/bash qemukvm=${QEMU:-qemu-kvm} qemusystem=${QEMU:-qemu-system-x86_64} -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] vhost-net: fix use-after-free in vhost_net_flush
From: Michael S. Tsirkin m...@redhat.com Date: Thu, 20 Jun 2013 14:48:13 +0300 vhost_net_ubuf_put_and_wait has a confusing name: it will actually also free it's argument. Thus since commit 1280c27f8e29acf4af2da914e80ec27c3dbd5c01 Never reference commits only by SHA1 ID, it is never sufficient. Always provide, after the SHA1 ID, in parenthesis, the header line from the commit message. To be honest, I'm kind of tired of telling people they need to do this over and over again. Maybe people keep forgetting because the reason why this is an issue hasn't really sunk in. If the patch you reference got backported into another tree, it will not have the SHA1 ID, and therefore someone reading the fix won't be able to find the fault causing change without going through a lot of trouble. By providing the commit header line you remove that problem altogether, no ambiguity is possible. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 1/3] x86-run: correct a typo 'qemsystem' - 'qemusystem'
On Mon, Jun 24, 2013 at 06:10:59AM +, Ren, Yongjie wrote: x86-run: correct a typo 'qemsystem' - 'qemusystem' Before this fix, you should always get error info as below when running 'x86-run' script. QEMU binary has no support for test device. Exiting. Patch is whitespace damaged. Signed-off-by: Yongjie Ren yongjie@intel.com --- x86-run |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/x86-run b/x86-run index 2cf1f38..9526a0b 100755 --- a/x86-run +++ b/x86-run @@ -8,7 +8,7 @@ then qemu=${qemukvm} else if - ${qemsystem} -device '?' 21 | fgrep -e \testdev\ -e \pc-testdev\ /dev/null; + ${qemusystem} -device '?' 21 | fgrep -e \testdev\ -e \pc-testdev\ /dev/null; While you are at is lets replace fgrep invocation, which is deprecated, with grep -F. then qemu=${qemusystem} else -- 1.7.9.5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] armv7 initial device passthrough support
On 6/15/2013 5:47 PM, Paolo Bonzini wrote: Il 13/06/2013 11:19, Mario Smarduch ha scritto: Updated Device Passthrough Patch. - optimized IRQ-CPU-vCPU binding, irq is installed once - added dynamic IRQ affinity on schedule in - added documentation and few other coding recommendations. Per earlier discussion VFIO is our target but we like something earlier to work with to tackle performance latency issue (some ARM related) for device passthrough while we migrate towards VFIO. I don't think this is acceptable upstream, unfortunately. KVM device assignment is deprecated and we should not add more users. That's fine we'll work our way towards dev-tree VFIO reusing what we can working with the community. At this point we're more concerned with numbers and best practices as opposed to mechanism this part will be time consuming. VFIO will be more background for us. What are the latency issues you have? Our focus now is on IRQ latency and throughput. Right now it appears lowest latency is 2x + exit/enter + IRQ injection overhead. We can't tolerate additional IPIs or deferred IRQ injection approaches. We're looking for numbers closer to what IBMs ELI managed. Also high res timers which ARM Virt. Ext supports very well. Exitless interrupts which ARM handles very well too. There are some future hw ARM interrupt enhancements coming up which may help a lot as well. There are many other latency/perf. reqs for NFV related to RT, essentially Guest must run near native. In the end it may turn out this may need to be outside of main tree we'll see. - Mario Paolo - Mario -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH kvm-unit-tests 1/3] x86-run: correct a typo 'qemsystem' - 'qemusystem'
-Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Gleb Natapov Sent: Monday, June 24, 2013 4:03 PM To: Ren, Yongjie Cc: kvm@vger.kernel.org; pbonz...@redhat.com Subject: Re: [PATCH kvm-unit-tests 1/3] x86-run: correct a typo 'qemsystem' - 'qemusystem' On Mon, Jun 24, 2013 at 06:10:59AM +, Ren, Yongjie wrote: x86-run: correct a typo 'qemsystem' - 'qemusystem' Before this fix, you should always get error info as below when running 'x86-run' script. QEMU binary has no support for test device. Exiting. Patch is whitespace damaged. Sorry, I'll correct it and resend my patches. Signed-off-by: Yongjie Ren yongjie@intel.com --- x86-run |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/x86-run b/x86-run index 2cf1f38..9526a0b 100755 --- a/x86-run +++ b/x86-run @@ -8,7 +8,7 @@ then qemu=${qemukvm} else if - ${qemsystem} -device '?' 21 | fgrep -e \testdev\ -e \pc-testdev\ /dev/null; + ${qemusystem} -device '?' 21 | fgrep -e \testdev\ -e \pc-testdev\ /dev/null; While you are at is lets replace fgrep invocation, which is deprecated, with grep -F. Yeah, I also found this issue and want to use 'grep -F' instead. I'll send another patch replace 'fgrep' with 'grep -F'. then qemu=${qemusystem} else -- 1.7.9.5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 1/4] x86-run: correct a typo 'qemsystem' - 'qemusystem'
x86-run: correct a typo 'qemsystem' - 'qemusystem' Before this fix, you should always get error info as below when running 'x86-run' script. QEMU binary has no support for test device. Exiting. Signed-off-by: Yongjie Ren yongjie@intel.com --- x86-run |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/x86-run b/x86-run index 2cf1f38..9526a0b 100755 --- a/x86-run +++ b/x86-run @@ -8,7 +8,7 @@ then qemu=${qemukvm} else if - ${qemsystem} -device '?' 21 | fgrep -e \testdev\ -e \pc-testdev\ /dev/null; + ${qemusystem} -device '?' 21 | fgrep -e \testdev\ -e \pc-testdev\ /dev/null; then qemu=${qemusystem} else -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 2/4] x86-run: use /bin/bash instead of /usr/bin/bash
'bash' should be always located in /bin/bash instead of /usr/bin/bash. Other bash scripts in kvm-unit-tests also use '/bin/bash' as the interpreter. Signed-off-by: Yongjie Ren yongjie@intel.com --- x86-run |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/x86-run b/x86-run index 9526a0b..daefd4a 100755 --- a/x86-run +++ b/x86-run @@ -1,4 +1,4 @@ -#!/usr/bin/bash +#!/bin/bash qemukvm=${QEMU:-qemu-kvm} qemusystem=${QEMU:-qemu-system-x86_64} -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 3/4] x86-run: keep constant coding style for the 'if' statement
x86-run: keep constant coding style for the 'if' statement Signed-off-by: Yongjie Ren yongjie@intel.com --- x86-run |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/x86-run b/x86-run index daefd4a..6093a72 100755 --- a/x86-run +++ b/x86-run @@ -17,7 +17,8 @@ else fi fi -if ${qemu} -device '?' 21 | fgrep pci-testdev /dev/null; +if + ${qemu} -device '?' 21 | fgrep pci-testdev /dev/null; then pci_testdev=-device pci-testdev else -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH kvm-unit-tests 4/4] x86-run: replace the deprecated 'fgrep' with 'grep -F'
x86-run: replace the deprecated 'fgrep' with 'grep -F'. Signed-off-by: Yongjie Ren yongjie@intel.com --- x86-run |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/x86-run b/x86-run index 6093a72..14ff331 100755 --- a/x86-run +++ b/x86-run @@ -3,12 +3,12 @@ qemukvm=${QEMU:-qemu-kvm} qemusystem=${QEMU:-qemu-system-x86_64} if - ${qemukvm} -device '?' 21 | fgrep -e \testdev\ -e \pc-testdev\ /dev/null; + ${qemukvm} -device '?' 21 | grep -F -e \testdev\ -e \pc-testdev\ /dev/null; then qemu=${qemukvm} else if - ${qemusystem} -device '?' 21 | fgrep -e \testdev\ -e \pc-testdev\ /dev/null; + ${qemusystem} -device '?' 21 | grep -F -e \testdev\ -e \pc-testdev\ /dev/null; then qemu=${qemusystem} else @@ -18,7 +18,7 @@ else fi if - ${qemu} -device '?' 21 | fgrep pci-testdev /dev/null; + ${qemu} -device '?' 21 | grep -F pci-testdev /dev/null; then pci_testdev=-device pci-testdev else @@ -26,7 +26,7 @@ else fi if - ${qemu} -device '?' 21 | fgrep pc-testdev /dev/null; + ${qemu} -device '?' 21 | grep -F pc-testdev /dev/null; then pc_testdev=-device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 else -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6 v5] powerpc: remove unnecessary line continuations
Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v5: - no change arch/powerpc/kernel/process.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index ceb4e7b..639a8de 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -325,7 +325,7 @@ static void set_debug_reg_defaults(struct thread_struct *thread) /* * Force User/Supervisor bits to b11 (user-only MSR[PR]=1) */ - thread-dbcr1 = DBCR1_IAC1US | DBCR1_IAC2US | \ + thread-dbcr1 = DBCR1_IAC1US | DBCR1_IAC2US | DBCR1_IAC3US | DBCR1_IAC4US; /* * Force Data Address Compare User/Supervisor bits to be User-only -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction
ehpriv instruction is used for setting software breakpoints by user space. This patch adds support to exit to user space with run-debug have relevant information. As this is the first point we are using run-debug, also defined the run-debug structure. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/disassemble.h |4 arch/powerpc/include/uapi/asm/kvm.h| 21 + arch/powerpc/kvm/e500_emulate.c| 27 +++ 3 files changed, 48 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/disassemble.h b/arch/powerpc/include/asm/disassemble.h index 9b198d1..856f8de 100644 --- a/arch/powerpc/include/asm/disassemble.h +++ b/arch/powerpc/include/asm/disassemble.h @@ -77,4 +77,8 @@ static inline unsigned int get_d(u32 inst) return inst 0x; } +static inline unsigned int get_oc(u32 inst) +{ + return (inst 11) 0x7fff; +} #endif /* __ASM_PPC_DISASSEMBLE_H__ */ diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..ded0607 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -269,7 +269,24 @@ struct kvm_fpu { __u64 fpr[32]; }; +/* + * Defines for h/w breakpoint, watchpoint (read, write or both) and + * software breakpoint. + * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status + * for KVM_DEBUG_EXIT. + */ +#define KVMPPC_DEBUG_NONE 0x0 +#define KVMPPC_DEBUG_BREAKPOINT(1UL 1) +#define KVMPPC_DEBUG_WATCH_WRITE (1UL 2) +#define KVMPPC_DEBUG_WATCH_READ(1UL 3) struct kvm_debug_exit_arch { + __u64 address; + /* +* exiting to userspace because of h/w breakpoint, watchpoint +* (read, write or both) and software breakpoint. +*/ + __u32 status; + __u32 reserved; }; /* for KVM_SET_GUEST_DEBUG */ @@ -281,10 +298,6 @@ struct kvm_guest_debug_arch { * Type denotes h/w breakpoint, read watchpoint, write * watchpoint or watchpoint (both read and write). */ -#define KVMPPC_DEBUG_NONE 0x0 -#define KVMPPC_DEBUG_BREAKPOINT(1UL 1) -#define KVMPPC_DEBUG_WATCH_WRITE (1UL 2) -#define KVMPPC_DEBUG_WATCH_READ(1UL 3) __u32 type; __u32 reserved; } bp[16]; diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c index b10a012..dab9d07 100644 --- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -26,6 +26,8 @@ #define XOP_TLBRE 946 #define XOP_TLBWE 978 #define XOP_TLBILX 18 +#define XOP_EHPRIV 270 +#define EHPRIV_OC_DEBUG 0 #ifdef CONFIG_KVM_E500MC static int dbell2prio(ulong param) @@ -82,6 +84,26 @@ static int kvmppc_e500_emul_msgsnd(struct kvm_vcpu *vcpu, int rb) } #endif +static int kvmppc_e500_emul_ehpriv(struct kvm_run *run, struct kvm_vcpu *vcpu, + unsigned int inst, int *advance) +{ + int emulated = EMULATE_DONE; + + switch (get_oc(inst)) { + case EHPRIV_OC_DEBUG: + run-exit_reason = KVM_EXIT_DEBUG; + run-debug.arch.address = vcpu-arch.pc; + run-debug.arch.status = 0; + kvmppc_account_exit(vcpu, DEBUG_EXITS); + emulated = EMULATE_EXIT_USER; + *advance = 0; + break; + default: + emulated = EMULATE_FAIL; + } + return emulated; +} + int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned int inst, int *advance) { @@ -130,6 +152,11 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, emulated = kvmppc_e500_emul_tlbivax(vcpu, ea); break; + case XOP_EHPRIV: + emulated = kvmppc_e500_emul_ehpriv(run, vcpu, inst, + advance); + break; + default: emulated = EMULATE_FAIL; } -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6 v5] KVM :PPC: Userspace Debug support
From: Bharat Bhushan bharat.bhus...@freescale.com This patchset adds the userspace debug support for booke/bookehv. this is tested on powerpc e500v2/e500mc devices. We are now assuming that debug resource will not be used by kernel for its own debugging. It will be used for only kernel user process debugging. So the kernel debug load interface during context_to is used to load debug conext for that selected process. v4-v5 - Some comments reworded and other cleanup (like change of function name etc) v3-v4 - 4 out of 7 patches of initial patchset were applied. This patchset is on and above those 4 patches - KVM local struct kvmppc_booke_debug_reg is replaced by powerpc global struct debug_reg - use switch_booke_debug_regs() for debug register context switch. - Save DBSR before kernel pre-emption is enabled. - Some more cleanup v2-v3 - We are now assuming that debug resource will not be used by kernel for its own debugging. It will be used for only kernel user process debugging. So the kernel debug load interface during context_to is used to load debug conext for that selected process. v1-v2 - Debug registers are save/restore in vcpu_put/vcpu_get. Earlier the debug registers are saved/restored in guest entry/exit Bharat Bhushan (6): powerpc: remove unnecessary line continuations powerpc: move debug registers in a structure powerpc: export debug register save function for KVM KVM: PPC: exit to user space on ehpriv instruction KVM: PPC: Using struct debug_reg KVM: PPC: Add userspace debug stub support arch/powerpc/include/asm/disassemble.h |4 + arch/powerpc/include/asm/kvm_host.h| 16 +-- arch/powerpc/include/asm/processor.h | 38 +++-- arch/powerpc/include/asm/reg_booke.h |8 +- arch/powerpc/include/asm/switch_to.h |4 + arch/powerpc/include/uapi/asm/kvm.h| 22 ++- arch/powerpc/kernel/asm-offsets.c |2 +- arch/powerpc/kernel/process.c | 45 +++--- arch/powerpc/kernel/ptrace.c | 154 +- arch/powerpc/kernel/signal_32.c|6 +- arch/powerpc/kernel/traps.c| 35 ++-- arch/powerpc/kvm/booke.c | 267 arch/powerpc/kvm/booke.h |5 + arch/powerpc/kvm/e500_emulate.c| 27 14 files changed, 449 insertions(+), 184 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6 v5] KVM: PPC: Using struct debug_reg
For KVM also use the struct debug_reg defined in asm/processor.h Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/kvm_host.h | 13 + arch/powerpc/kvm/booke.c| 34 -- 2 files changed, 25 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index af326cd..838a577 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -381,17 +381,6 @@ struct kvmppc_slb { #define KVMPPC_EPR_USER1 /* exit to userspace to fill EPR */ #define KVMPPC_EPR_KERNEL 2 /* in-kernel irqchip */ -struct kvmppc_booke_debug_reg { - u32 dbcr0; - u32 dbcr1; - u32 dbcr2; -#ifdef CONFIG_KVM_E500MC - u32 dbcr4; -#endif - u64 iac[KVMPPC_BOOKE_MAX_IAC]; - u64 dac[KVMPPC_BOOKE_MAX_DAC]; -}; - #define KVMPPC_IRQ_DEFAULT 0 #define KVMPPC_IRQ_MPIC1 #define KVMPPC_IRQ_XICS2 @@ -535,7 +524,7 @@ struct kvm_vcpu_arch { u32 eptcfg; u32 epr; u32 crit_save; - struct kvmppc_booke_debug_reg dbg_reg; + struct debug_reg dbg_reg; #endif gpa_t paddr_accessed; gva_t vaddr_accessed; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 62d4ece..3e9fc1d 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1424,7 +1424,6 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) int r = 0; union kvmppc_one_reg val; int size; - long int i; size = one_reg_size(reg-id); if (size sizeof(val)) @@ -1432,16 +1431,24 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) switch (reg-id) { case KVM_REG_PPC_IAC1: + val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac1); + break; case KVM_REG_PPC_IAC2: + val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac2); + break; +#if CONFIG_PPC_ADV_DEBUG_IACS 2 case KVM_REG_PPC_IAC3: + val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac3); + break; case KVM_REG_PPC_IAC4: - i = reg-id - KVM_REG_PPC_IAC1; - val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac[i]); + val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac4); break; +#endif case KVM_REG_PPC_DAC1: + val = get_reg_val(reg-id, vcpu-arch.dbg_reg.dac1); + break; case KVM_REG_PPC_DAC2: - i = reg-id - KVM_REG_PPC_DAC1; - val = get_reg_val(reg-id, vcpu-arch.dbg_reg.dac[i]); + val = get_reg_val(reg-id, vcpu-arch.dbg_reg.dac2); break; case KVM_REG_PPC_EPR: { u32 epr = get_guest_epr(vcpu); @@ -1481,7 +1488,6 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) int r = 0; union kvmppc_one_reg val; int size; - long int i; size = one_reg_size(reg-id); if (size sizeof(val)) @@ -1492,16 +1498,24 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) switch (reg-id) { case KVM_REG_PPC_IAC1: + vcpu-arch.dbg_reg.iac1 = set_reg_val(reg-id, val); + break; case KVM_REG_PPC_IAC2: + vcpu-arch.dbg_reg.iac2 = set_reg_val(reg-id, val); + break; +#if CONFIG_PPC_ADV_DEBUG_IACS 2 case KVM_REG_PPC_IAC3: + vcpu-arch.dbg_reg.iac3 = set_reg_val(reg-id, val); + break; case KVM_REG_PPC_IAC4: - i = reg-id - KVM_REG_PPC_IAC1; - vcpu-arch.dbg_reg.iac[i] = set_reg_val(reg-id, val); + vcpu-arch.dbg_reg.iac4 = set_reg_val(reg-id, val); break; +#endif case KVM_REG_PPC_DAC1: + vcpu-arch.dbg_reg.dac1 = set_reg_val(reg-id, val); + break; case KVM_REG_PPC_DAC2: - i = reg-id - KVM_REG_PPC_DAC1; - vcpu-arch.dbg_reg.dac[i] = set_reg_val(reg-id, val); + vcpu-arch.dbg_reg.dac2 = set_reg_val(reg-id, val); break; case KVM_REG_PPC_EPR: { u32 new_epr = set_reg_val(reg-id, val); -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6 v5] powerpc: move debug registers in a structure
This way we can use same data type struct with KVM and also help in using other debug related function. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/processor.h | 38 + arch/powerpc/include/asm/reg_booke.h |8 +- arch/powerpc/kernel/asm-offsets.c|2 +- arch/powerpc/kernel/process.c| 42 +- arch/powerpc/kernel/ptrace.c | 154 +- arch/powerpc/kernel/signal_32.c |6 +- arch/powerpc/kernel/traps.c | 35 7 files changed, 146 insertions(+), 139 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index d7e67ca..5b8a7f1 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -147,22 +147,7 @@ typedef struct { #define TS_FPR(i) fpr[i][TS_FPROFFSET] #define TS_TRANS_FPR(i) transact_fpr[i][TS_FPROFFSET] -struct thread_struct { - unsigned long ksp;/* Kernel stack pointer */ - unsigned long ksp_limit; /* if ksp = ksp_limit stack overflow */ - -#ifdef CONFIG_PPC64 - unsigned long ksp_vsid; -#endif - struct pt_regs *regs; /* Pointer to saved register state */ - mm_segment_tfs; /* for get_fs() validation */ -#ifdef CONFIG_BOOKE - /* BookE base exception scratch space; align on cacheline */ - unsigned long normsave[8] cacheline_aligned; -#endif -#ifdef CONFIG_PPC32 - void*pgdir; /* root of page-table tree */ -#endif +struct debug_reg { #ifdef CONFIG_PPC_ADV_DEBUG_REGS /* * The following help to manage the use of Debug Control Registers @@ -199,6 +184,27 @@ struct thread_struct { unsigned long dvc2; #endif #endif +}; + +struct thread_struct { + unsigned long ksp;/* Kernel stack pointer */ + unsigned long ksp_limit; /* if ksp = ksp_limit stack overflow */ + +#ifdef CONFIG_PPC64 + unsigned long ksp_vsid; +#endif + struct pt_regs *regs; /* Pointer to saved register state */ + mm_segment_tfs; /* for get_fs() validation */ +#ifdef CONFIG_BOOKE + /* BookE base exception scratch space; align on cacheline */ + unsigned long normsave[8] cacheline_aligned; +#endif +#ifdef CONFIG_PPC32 + void*pgdir; /* root of page-table tree */ +#endif + /* Debug Registers */ + struct debug_reg debug; + /* FP and VSX 0-31 register set */ double fpr[32][TS_FPRWIDTH]; struct { diff --git a/arch/powerpc/include/asm/reg_booke.h b/arch/powerpc/include/asm/reg_booke.h index b417de3..455dc89 100644 --- a/arch/powerpc/include/asm/reg_booke.h +++ b/arch/powerpc/include/asm/reg_booke.h @@ -381,7 +381,7 @@ #define DBCR0_IA34T0x4000 /* Instr Addr 3-4 range Toggle */ #define DBCR0_FT 0x0001 /* Freeze Timers on debug event */ -#define dbcr_iac_range(task) ((task)-thread.dbcr0) +#define dbcr_iac_range(task) ((task)-thread.debug.dbcr0) #define DBCR_IAC12IDBCR0_IA12 /* Range Inclusive */ #define DBCR_IAC12X(DBCR0_IA12 | DBCR0_IA12X) /* Range Exclusive */ #define DBCR_IAC12MODE (DBCR0_IA12 | DBCR0_IA12X) /* IAC 1-2 Mode Bits */ @@ -395,7 +395,7 @@ #define DBCR1_DAC1W0x2000 /* DAC1 Write Debug Event */ #define DBCR1_DAC2W0x1000 /* DAC2 Write Debug Event */ -#define dbcr_dac(task) ((task)-thread.dbcr1) +#define dbcr_dac(task) ((task)-thread.debug.dbcr1) #define DBCR_DAC1R DBCR1_DAC1R #define DBCR_DAC1W DBCR1_DAC1W #define DBCR_DAC2R DBCR1_DAC2R @@ -441,7 +441,7 @@ #define DBCR0_CRET 0x0020 /* Critical Return Debug Event */ #define DBCR0_FT 0x0001 /* Freeze Timers on debug event */ -#define dbcr_dac(task) ((task)-thread.dbcr0) +#define dbcr_dac(task) ((task)-thread.debug.dbcr0) #define DBCR_DAC1R DBCR0_DAC1R #define DBCR_DAC1W DBCR0_DAC1W #define DBCR_DAC2R DBCR0_DAC2R @@ -475,7 +475,7 @@ #define DBCR1_IAC34MX 0x00C0 /* Instr Addr 3-4 range eXclusive */ #define DBCR1_IAC34AT 0x0001 /* Instr Addr 3-4 range Toggle */ -#define dbcr_iac_range(task) ((task)-thread.dbcr1) +#define dbcr_iac_range(task) ((task)-thread.debug.dbcr1) #define DBCR_IAC12IDBCR1_IAC12M/* Range Inclusive */ #define DBCR_IAC12XDBCR1_IAC12MX /* Range Exclusive */ #define DBCR_IAC12MODE DBCR1_IAC12MX /* IAC 1-2 Mode Bits */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index b51a97c..c241c60 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -106,7 +106,7 @@ int main(void) #else /* CONFIG_PPC64 */ DEFINE(PGDIR, offsetof(struct thread_struct, pgdir)); #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) - DEFINE(THREAD_DBCR0, offsetof(struct
[PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support
This patch adds the debug stub support on booke/bookehv. Now QEMU debug stub can use hw breakpoint, watchpoint and software breakpoint to debug guest. This is how we save/restore debug register context when switching between guest, userspace and kernel user-process: When QEMU is running - thread-debug_reg == QEMU debug register context. - Kernel will handle switching the debug register on context switch. - no vcpu_load() called QEMU makes ioctls (except RUN) - This will call vcpu_load() - should not change context. - Some ioctls can change vcpu debug register, context saved in vcpu-debug_regs QEMU Makes RUN ioctl - Save thread-debug_reg on STACK - Store thread-debug_reg == vcpu-debug_reg - load thread-debug_reg - RUN VCPU ( So thread points to vcpu context ) Context switch happens When VCPU running - makes vcpu_load() should not load any context - kernel loads the vcpu context as thread-debug_regs points to vcpu context. On heavyweight_exit - Load the context saved on stack in thread-debug_reg Currently we do not support debug resource emulation to guest, On debug exception, always exit to user space irrespective of user space is expecting the debug exception or not. If this is unexpected exception (breakpoint/watchpoint event not set by userspace) then let us leave the action on user space. This is similar to what it was before, only thing is that now we have proper exit state available to user space. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/uapi/asm/kvm.h |1 + arch/powerpc/kvm/booke.c| 233 --- arch/powerpc/kvm/booke.h|5 + 4 files changed, 224 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 838a577..aeb490d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -524,7 +524,10 @@ struct kvm_vcpu_arch { u32 eptcfg; u32 epr; u32 crit_save; + /* guest debug registers*/ struct debug_reg dbg_reg; + /* hardware visible debug registers when in guest state */ + struct debug_reg shadow_dbg_reg; #endif gpa_t paddr_accessed; gva_t vaddr_accessed; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index ded0607..f5077c2 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -27,6 +27,7 @@ #define __KVM_HAVE_PPC_SMT #define __KVM_HAVE_IRQCHIP #define __KVM_HAVE_IRQ_LINE +#define __KVM_HAVE_GUEST_DEBUG struct kvm_regs { __u64 pc; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 3e9fc1d..8be3502 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) #endif } +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) +{ + /* Synchronize guest's desire to get debug interrupts into shadow MSR */ +#ifndef CONFIG_KVM_BOOKE_HV + vcpu-arch.shadow_msr = ~MSR_DE; + vcpu-arch.shadow_msr |= vcpu-arch.shared-msr MSR_DE; +#endif + + /* Force enable debug interrupts when user space wants to debug */ + if (vcpu-guest_debug) { +#ifdef CONFIG_KVM_BOOKE_HV + /* +* Since there is no shadow MSR, sync MSR_DE into the guest +* visible MSR. +*/ + vcpu-arch.shared-msr |= MSR_DE; +#else + vcpu-arch.shadow_msr |= MSR_DE; + vcpu-arch.shared-msr = ~MSR_DE; +#endif + } +} + /* * Helper function for full MSR writes. No need to call this if only * EE/CE/ME/DE/RI are changing. @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr) kvmppc_mmu_msr_notify(vcpu, old_msr); kvmppc_vcpu_sync_spe(vcpu); kvmppc_vcpu_sync_fpu(vcpu); + kvmppc_vcpu_sync_debug(vcpu); } static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, @@ -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu) int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) { int ret, s; + struct thread_struct thread; #ifdef CONFIG_PPC_FPU unsigned int fpscr; int fpexc_mode; @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif + /* Switch to guest debug context */ + thread.debug = vcpu-arch.shadow_dbg_reg; + switch_booke_debug_regs(thread); + thread.debug = current-thread.debug; + current-thread.debug = vcpu-arch.shadow_dbg_reg; ret = __kvmppc_vcpu_run(kvm_run, vcpu); /* No need for kvm_guest_exit. It's done in handle_exit. We also get here with interrupts enabled. */ + /* Switch back to user space debug context */ +
[PATCH 3/6 v5] powerpc: export debug register save function for KVM
KVM need this function when switching from vcpu to user-space thread. My subsequent patch will use this function. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/switch_to.h |4 arch/powerpc/kernel/process.c|3 ++- 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 200d763..50b357f 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -30,6 +30,10 @@ extern void enable_kernel_spe(void); extern void giveup_spe(struct task_struct *); extern void load_up_spe(struct task_struct *); +#ifdef CONFIG_PPC_ADV_DEBUG_REGS +extern void switch_booke_debug_regs(struct thread_struct *new_thread); +#endif + #ifndef CONFIG_SMP extern void discard_lazy_cpu_state(void); #else diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 01ff496..3375cb7 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -362,12 +362,13 @@ static void prime_debug_regs(struct thread_struct *thread) * debug registers, set the debug registers from the values * stored in the new thread. */ -static void switch_booke_debug_regs(struct thread_struct *new_thread) +void switch_booke_debug_regs(struct thread_struct *new_thread) { if ((current-thread.debug.dbcr0 DBCR0_IDM) || (new_thread-debug.dbcr0 DBCR0_IDM)) prime_debug_regs(new_thread); } +EXPORT_SYMBOL(switch_booke_debug_regs); #else /* !CONFIG_PPC_ADV_DEBUG_REGS */ #ifndef CONFIG_HAVE_HW_BREAKPOINT static void set_debug_reg_defaults(struct thread_struct *thread) -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: fix boot of guests with more than 4gb of ram
On Mon, Jun 24, 2013 at 02:23:31AM +0100, Sasha Levin wrote: Commit kvm tools: virtio: remove hardcoded assumptions about guest page size has introduced a bug that prevented guests with more than 4gb of ram from booting. 4GB of memory?!?! ;) The issue is that 'pfn' is a 32bit integer, so when multiplying it by page size to get the actual page will cause an overflow if the pfn referred to a memory area above 4gb. Signed-off-by: Sasha Levin sasha.le...@oracle.com Acked-by: Will Deacon will.dea...@arm.com Will -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Would a DOS on dovecot running under a VM cause host to crash?
On Fri, Jun 21, 2013 at 10:27:07AM +1200, Hugh Davenport wrote: The attack lasted around 4 minutes, in which there was 1161 lines in the log for a single attacker ip, and no other similar logs previously. Would this be enough to kill not only the VM running dovecot, but the underlying host machine? Have you checked logs on the host? Specifically /var/log/messages for seg fault messages or Out-of-Memory Killer messages. It's also worth checking /var/log/libvirt/qemu/domain.log if you are using libvirt. That file contains the QEMU stderr output. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6 v5] powerpc: export debug register save function for KVM
On 24.06.2013, at 11:08, Bharat Bhushan wrote: KVM need this function when switching from vcpu to user-space thread. My subsequent patch will use this function. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/switch_to.h |4 arch/powerpc/kernel/process.c|3 ++- 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 200d763..50b357f 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -30,6 +30,10 @@ extern void enable_kernel_spe(void); extern void giveup_spe(struct task_struct *); extern void load_up_spe(struct task_struct *); +#ifdef CONFIG_PPC_ADV_DEBUG_REGS +extern void switch_booke_debug_regs(struct thread_struct *new_thread); +#endif + #ifndef CONFIG_SMP extern void discard_lazy_cpu_state(void); #else diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 01ff496..3375cb7 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -362,12 +362,13 @@ static void prime_debug_regs(struct thread_struct *thread) * debug registers, set the debug registers from the values * stored in the new thread. */ -static void switch_booke_debug_regs(struct thread_struct *new_thread) +void switch_booke_debug_regs(struct thread_struct *new_thread) { if ((current-thread.debug.dbcr0 DBCR0_IDM) || (new_thread-debug.dbcr0 DBCR0_IDM)) prime_debug_regs(new_thread); } +EXPORT_SYMBOL(switch_booke_debug_regs); EXPORT_SYMBOL_GPL? Alex #else /* !CONFIG_PPC_ADV_DEBUG_REGS */ #ifndef CONFIG_HAVE_HW_BREAKPOINT static void set_debug_reg_defaults(struct thread_struct *thread) -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: fix boot of guests with more than 4gb of ram
24.06.2013 05:23, Sasha Levin wrote: queue = p9dev-vqs[vq]; queue-pfn = pfn; - p = guest_flat_to_host(kvm, queue-pfn * page_size); + p = guest_flat_to_host(kvm, (u64)queue-pfn * page_size); Maybe it's worth to use a common function for this, something like guest_queue_to_host(kvm, queue) ? Thanks, /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 3/6 v5] powerpc: export debug register save function for KVM
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, June 24, 2013 3:03 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; tiejun.c...@windriver.com; Bhushan Bharat-R65777 Subject: Re: [PATCH 3/6 v5] powerpc: export debug register save function for KVM On 24.06.2013, at 11:08, Bharat Bhushan wrote: KVM need this function when switching from vcpu to user-space thread. My subsequent patch will use this function. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/switch_to.h |4 arch/powerpc/kernel/process.c|3 ++- 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 200d763..50b357f 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -30,6 +30,10 @@ extern void enable_kernel_spe(void); extern void giveup_spe(struct task_struct *); extern void load_up_spe(struct task_struct *); +#ifdef CONFIG_PPC_ADV_DEBUG_REGS +extern void switch_booke_debug_regs(struct thread_struct +*new_thread); #endif + #ifndef CONFIG_SMP extern void discard_lazy_cpu_state(void); #else diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 01ff496..3375cb7 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -362,12 +362,13 @@ static void prime_debug_regs(struct thread_struct *thread) * debug registers, set the debug registers from the values * stored in the new thread. */ -static void switch_booke_debug_regs(struct thread_struct *new_thread) +void switch_booke_debug_regs(struct thread_struct *new_thread) { if ((current-thread.debug.dbcr0 DBCR0_IDM) || (new_thread-debug.dbcr0 DBCR0_IDM)) prime_debug_regs(new_thread); } +EXPORT_SYMBOL(switch_booke_debug_regs); EXPORT_SYMBOL_GPL? Oops, I missed this comment. Will correct in next version. -Bharat Alex #else /* !CONFIG_PPC_ADV_DEBUG_REGS */ #ifndef CONFIG_HAVE_HW_BREAKPOINT static void set_debug_reg_defaults(struct thread_struct *thread) -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 1/4] x86-run: correct a typo 'qemsystem' - 'qemusystem'
On Mon, Jun 24, 2013 at 08:47:36AM +, Ren, Yongjie wrote: x86-run: correct a typo 'qemsystem' - 'qemusystem' Before this fix, you should always get error info as below when running 'x86-run' script. QEMU binary has no support for test device. Exiting. Signed-off-by: Yongjie Ren yongjie@intel.com Applied all four. Thanks. --- x86-run |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/x86-run b/x86-run index 2cf1f38..9526a0b 100755 --- a/x86-run +++ b/x86-run @@ -8,7 +8,7 @@ then qemu=${qemukvm} else if - ${qemsystem} -device '?' 21 | fgrep -e \testdev\ -e \pc-testdev\ /dev/null; + ${qemusystem} -device '?' 21 | fgrep -e \testdev\ -e \pc-testdev\ /dev/null; then qemu=${qemusystem} else -- 1.7.9.5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Would a DOS on dovecot running under a VM cause host to crash?
Checked the main logs. No go. Didn't check qemu logs. Will do that. I'm starting to think it was the power as when I turned of the UPS as a test. The server shutdown as well... Will get that fixed. Cheers, Hugh Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jun 21, 2013 at 10:27:07AM +1200, Hugh Davenport wrote: The attack lasted around 4 minutes, in which there was 1161 lines in the log for a single attacker ip, and no other similar logs previously. Would this be enough to kill not only the VM running dovecot, but the underlying host machine? Have you checked logs on the host? Specifically /var/log/messages for seg fault messages or Out-of-Memory Killer messages. It's also worth checking /var/log/libvirt/qemu/domain.log if you are using libvirt. That file contains the QEMU stderr output. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support
On 24.06.2013, at 11:08, Bharat Bhushan wrote: This patch adds the debug stub support on booke/bookehv. Now QEMU debug stub can use hw breakpoint, watchpoint and software breakpoint to debug guest. This is how we save/restore debug register context when switching between guest, userspace and kernel user-process: When QEMU is running - thread-debug_reg == QEMU debug register context. - Kernel will handle switching the debug register on context switch. - no vcpu_load() called QEMU makes ioctls (except RUN) - This will call vcpu_load() - should not change context. - Some ioctls can change vcpu debug register, context saved in vcpu-debug_regs QEMU Makes RUN ioctl - Save thread-debug_reg on STACK - Store thread-debug_reg == vcpu-debug_reg - load thread-debug_reg - RUN VCPU ( So thread points to vcpu context ) Context switch happens When VCPU running - makes vcpu_load() should not load any context - kernel loads the vcpu context as thread-debug_regs points to vcpu context. On heavyweight_exit - Load the context saved on stack in thread-debug_reg Currently we do not support debug resource emulation to guest, On debug exception, always exit to user space irrespective of user space is expecting the debug exception or not. If this is unexpected exception (breakpoint/watchpoint event not set by userspace) then let us leave the action on user space. This is similar to what it was before, only thing is that now we have proper exit state available to user space. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/uapi/asm/kvm.h |1 + arch/powerpc/kvm/booke.c| 233 --- arch/powerpc/kvm/booke.h|5 + 4 files changed, 224 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 838a577..aeb490d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -524,7 +524,10 @@ struct kvm_vcpu_arch { u32 eptcfg; u32 epr; u32 crit_save; + /* guest debug registers*/ struct debug_reg dbg_reg; + /* hardware visible debug registers when in guest state */ + struct debug_reg shadow_dbg_reg; #endif gpa_t paddr_accessed; gva_t vaddr_accessed; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index ded0607..f5077c2 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -27,6 +27,7 @@ #define __KVM_HAVE_PPC_SMT #define __KVM_HAVE_IRQCHIP #define __KVM_HAVE_IRQ_LINE +#define __KVM_HAVE_GUEST_DEBUG struct kvm_regs { __u64 pc; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 3e9fc1d..8be3502 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) #endif } +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) +{ + /* Synchronize guest's desire to get debug interrupts into shadow MSR */ +#ifndef CONFIG_KVM_BOOKE_HV + vcpu-arch.shadow_msr = ~MSR_DE; + vcpu-arch.shadow_msr |= vcpu-arch.shared-msr MSR_DE; +#endif + + /* Force enable debug interrupts when user space wants to debug */ + if (vcpu-guest_debug) { +#ifdef CONFIG_KVM_BOOKE_HV + /* + * Since there is no shadow MSR, sync MSR_DE into the guest + * visible MSR. + */ + vcpu-arch.shared-msr |= MSR_DE; +#else + vcpu-arch.shadow_msr |= MSR_DE; + vcpu-arch.shared-msr = ~MSR_DE; +#endif + } +} + /* * Helper function for full MSR writes. No need to call this if only * EE/CE/ME/DE/RI are changing. @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr) kvmppc_mmu_msr_notify(vcpu, old_msr); kvmppc_vcpu_sync_spe(vcpu); kvmppc_vcpu_sync_fpu(vcpu); + kvmppc_vcpu_sync_debug(vcpu); } static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, @@ -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu) int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) { int ret, s; + struct thread_struct thread; #ifdef CONFIG_PPC_FPU unsigned int fpscr; int fpexc_mode; @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif + /* Switch to guest debug context */ + thread.debug = vcpu-arch.shadow_dbg_reg; + switch_booke_debug_regs(thread); + thread.debug = current-thread.debug; + current-thread.debug = vcpu-arch.shadow_dbg_reg; ret = __kvmppc_vcpu_run(kvm_run, vcpu); /* No need for kvm_guest_exit. It's done in handle_exit. We also get here with interrupts enabled. */
RE: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, June 24, 2013 4:13 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; tiejun.c...@windriver.com; Bhushan Bharat-R65777 Subject: Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support On 24.06.2013, at 11:08, Bharat Bhushan wrote: This patch adds the debug stub support on booke/bookehv. Now QEMU debug stub can use hw breakpoint, watchpoint and software breakpoint to debug guest. This is how we save/restore debug register context when switching between guest, userspace and kernel user-process: When QEMU is running - thread-debug_reg == QEMU debug register context. - Kernel will handle switching the debug register on context switch. - no vcpu_load() called QEMU makes ioctls (except RUN) - This will call vcpu_load() - should not change context. - Some ioctls can change vcpu debug register, context saved in - vcpu-debug_regs QEMU Makes RUN ioctl - Save thread-debug_reg on STACK - Store thread-debug_reg == vcpu-debug_reg load thread-debug_reg - RUN VCPU ( So thread points to vcpu context ) Context switch happens When VCPU running - makes vcpu_load() should not load any context kernel loads the vcpu - context as thread-debug_regs points to vcpu context. On heavyweight_exit - Load the context saved on stack in thread-debug_reg Currently we do not support debug resource emulation to guest, On debug exception, always exit to user space irrespective of user space is expecting the debug exception or not. If this is unexpected exception (breakpoint/watchpoint event not set by userspace) then let us leave the action on user space. This is similar to what it was before, only thing is that now we have proper exit state available to user space. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/uapi/asm/kvm.h |1 + arch/powerpc/kvm/booke.c| 233 --- arch/powerpc/kvm/booke.h|5 + 4 files changed, 224 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 838a577..aeb490d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -524,7 +524,10 @@ struct kvm_vcpu_arch { u32 eptcfg; u32 epr; u32 crit_save; + /* guest debug registers*/ struct debug_reg dbg_reg; + /* hardware visible debug registers when in guest state */ + struct debug_reg shadow_dbg_reg; #endif gpa_t paddr_accessed; gva_t vaddr_accessed; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index ded0607..f5077c2 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -27,6 +27,7 @@ #define __KVM_HAVE_PPC_SMT #define __KVM_HAVE_IRQCHIP #define __KVM_HAVE_IRQ_LINE +#define __KVM_HAVE_GUEST_DEBUG struct kvm_regs { __u64 pc; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 3e9fc1d..8be3502 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) #endif } +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) { + /* Synchronize guest's desire to get debug interrupts into shadow +MSR */ #ifndef CONFIG_KVM_BOOKE_HV + vcpu-arch.shadow_msr = ~MSR_DE; + vcpu-arch.shadow_msr |= vcpu-arch.shared-msr MSR_DE; #endif + + /* Force enable debug interrupts when user space wants to debug */ + if (vcpu-guest_debug) { +#ifdef CONFIG_KVM_BOOKE_HV + /* +* Since there is no shadow MSR, sync MSR_DE into the guest +* visible MSR. +*/ + vcpu-arch.shared-msr |= MSR_DE; +#else + vcpu-arch.shadow_msr |= MSR_DE; + vcpu-arch.shared-msr = ~MSR_DE; +#endif + } +} + /* * Helper function for full MSR writes. No need to call this if only * EE/CE/ME/DE/RI are changing. @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr) kvmppc_mmu_msr_notify(vcpu, old_msr); kvmppc_vcpu_sync_spe(vcpu); kvmppc_vcpu_sync_fpu(vcpu); + kvmppc_vcpu_sync_debug(vcpu); } static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, @@ -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu) int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) { int ret, s; + struct thread_struct thread; #ifdef CONFIG_PPC_FPU unsigned int fpscr; int fpexc_mode; @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif + /* Switch to
Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021
On 23.06.2013 19:36, Gleb Natapov wrote: On Sun, Jun 23, 2013 at 06:51:30PM +0200, Stefan Pietsch wrote: On 23.06.2013 09:51, Gleb Natapov wrote: On Thu, Jun 20, 2013 at 07:01:49PM +0200, Stefan Pietsch wrote: Can you provide the output of 25391454e73e3156202264eb3c473825afe4bc94 and emulate_invalid_guest_state=0. Also run x/20i $pc-20 in qemu monitor after the hang. 25391454e73e3156202264eb3c473825afe4bc94 emulate_invalid_guest_state=0 Very interesting. Looks like somewhere during TPR access FS register gets corrupted. Can you remove /usr/share/kvm/kvmvapic.bin and try again? This will disable some code paths during TPR access and will narrow down the issue. Doing this, qemu complains Could not open option rom 'kvmvapic.bin': No such file or directory, but the virtual machine boots successful with emulate_invalid_guest_state=0 and emulate_invalid_guest_state=1. Hmm, I think we ate close. Can you try with upstream qemu? kvmvapic.bin comes with Debian package seabios 1.7.2-3. I already tried this with the Debian package qemu-kvm 1.5.0+dfsg-4. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021
On Mon, Jun 24, 2013 at 01:43:26PM +0200, Stefan Pietsch wrote: On 23.06.2013 19:36, Gleb Natapov wrote: On Sun, Jun 23, 2013 at 06:51:30PM +0200, Stefan Pietsch wrote: On 23.06.2013 09:51, Gleb Natapov wrote: On Thu, Jun 20, 2013 at 07:01:49PM +0200, Stefan Pietsch wrote: Can you provide the output of 25391454e73e3156202264eb3c473825afe4bc94 and emulate_invalid_guest_state=0. Also run x/20i $pc-20 in qemu monitor after the hang. 25391454e73e3156202264eb3c473825afe4bc94 emulate_invalid_guest_state=0 Very interesting. Looks like somewhere during TPR access FS register gets corrupted. Can you remove /usr/share/kvm/kvmvapic.bin and try again? This will disable some code paths during TPR access and will narrow down the issue. Doing this, qemu complains Could not open option rom 'kvmvapic.bin': No such file or directory, but the virtual machine boots successful with emulate_invalid_guest_state=0 and emulate_invalid_guest_state=1. Hmm, I think we ate close. Can you try with upstream qemu? kvmvapic.bin comes with Debian package seabios 1.7.2-3. I already tried this with the Debian package qemu-kvm 1.5.0+dfsg-4. And it didn't work? Mind trying some debug kernel patches? I suspect your CPU does something no CPU I have do, so I want to verify it. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021
On 24.06.2013 13:47, Gleb Natapov wrote: On Mon, Jun 24, 2013 at 01:43:26PM +0200, Stefan Pietsch wrote: On 23.06.2013 19:36, Gleb Natapov wrote: On Sun, Jun 23, 2013 at 06:51:30PM +0200, Stefan Pietsch wrote: On 23.06.2013 09:51, Gleb Natapov wrote: On Thu, Jun 20, 2013 at 07:01:49PM +0200, Stefan Pietsch wrote: Can you provide the output of 25391454e73e3156202264eb3c473825afe4bc94 and emulate_invalid_guest_state=0. Also run x/20i $pc-20 in qemu monitor after the hang. 25391454e73e3156202264eb3c473825afe4bc94 emulate_invalid_guest_state=0 Very interesting. Looks like somewhere during TPR access FS register gets corrupted. Can you remove /usr/share/kvm/kvmvapic.bin and try again? This will disable some code paths during TPR access and will narrow down the issue. Doing this, qemu complains Could not open option rom 'kvmvapic.bin': No such file or directory, but the virtual machine boots successful with emulate_invalid_guest_state=0 and emulate_invalid_guest_state=1. Hmm, I think we ate close. Can you try with upstream qemu? kvmvapic.bin comes with Debian package seabios 1.7.2-3. I already tried this with the Debian package qemu-kvm 1.5.0+dfsg-4. And it didn't work? Mind trying some debug kernel patches? I suspect your CPU does something no CPU I have do, so I want to verify it. As soon as I remove kvmvapic.bin the virtual machine boots with qemu-kvm 1.5.0. I just verified this with Linux kernel 3.10.0-rc5. emulate_invalid_guest_state=0 or emulate_invalid_guest_state=1 make no difference. Please send your patches. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support
On 24.06.2013, at 13:22, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, June 24, 2013 4:13 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421; tiejun.c...@windriver.com; Bhushan Bharat-R65777 Subject: Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support On 24.06.2013, at 11:08, Bharat Bhushan wrote: This patch adds the debug stub support on booke/bookehv. Now QEMU debug stub can use hw breakpoint, watchpoint and software breakpoint to debug guest. This is how we save/restore debug register context when switching between guest, userspace and kernel user-process: When QEMU is running - thread-debug_reg == QEMU debug register context. - Kernel will handle switching the debug register on context switch. - no vcpu_load() called QEMU makes ioctls (except RUN) - This will call vcpu_load() - should not change context. - Some ioctls can change vcpu debug register, context saved in - vcpu-debug_regs QEMU Makes RUN ioctl - Save thread-debug_reg on STACK - Store thread-debug_reg == vcpu-debug_reg load thread-debug_reg - RUN VCPU ( So thread points to vcpu context ) Context switch happens When VCPU running - makes vcpu_load() should not load any context kernel loads the vcpu - context as thread-debug_regs points to vcpu context. On heavyweight_exit - Load the context saved on stack in thread-debug_reg Currently we do not support debug resource emulation to guest, On debug exception, always exit to user space irrespective of user space is expecting the debug exception or not. If this is unexpected exception (breakpoint/watchpoint event not set by userspace) then let us leave the action on user space. This is similar to what it was before, only thing is that now we have proper exit state available to user space. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/uapi/asm/kvm.h |1 + arch/powerpc/kvm/booke.c| 233 --- arch/powerpc/kvm/booke.h|5 + 4 files changed, 224 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 838a577..aeb490d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -524,7 +524,10 @@ struct kvm_vcpu_arch { u32 eptcfg; u32 epr; u32 crit_save; + /* guest debug registers*/ struct debug_reg dbg_reg; + /* hardware visible debug registers when in guest state */ + struct debug_reg shadow_dbg_reg; #endif gpa_t paddr_accessed; gva_t vaddr_accessed; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index ded0607..f5077c2 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -27,6 +27,7 @@ #define __KVM_HAVE_PPC_SMT #define __KVM_HAVE_IRQCHIP #define __KVM_HAVE_IRQ_LINE +#define __KVM_HAVE_GUEST_DEBUG struct kvm_regs { __u64 pc; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 3e9fc1d..8be3502 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) #endif } +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) { + /* Synchronize guest's desire to get debug interrupts into shadow +MSR */ #ifndef CONFIG_KVM_BOOKE_HV + vcpu-arch.shadow_msr = ~MSR_DE; + vcpu-arch.shadow_msr |= vcpu-arch.shared-msr MSR_DE; #endif + + /* Force enable debug interrupts when user space wants to debug */ + if (vcpu-guest_debug) { +#ifdef CONFIG_KVM_BOOKE_HV + /* +* Since there is no shadow MSR, sync MSR_DE into the guest +* visible MSR. +*/ + vcpu-arch.shared-msr |= MSR_DE; +#else + vcpu-arch.shadow_msr |= MSR_DE; + vcpu-arch.shared-msr = ~MSR_DE; +#endif + } +} + /* * Helper function for full MSR writes. No need to call this if only * EE/CE/ME/DE/RI are changing. @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr) kvmppc_mmu_msr_notify(vcpu, old_msr); kvmppc_vcpu_sync_spe(vcpu); kvmppc_vcpu_sync_fpu(vcpu); + kvmppc_vcpu_sync_debug(vcpu); } static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, @@ -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu) int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) { int ret, s; + struct thread_struct thread; #ifdef CONFIG_PPC_FPU unsigned int fpscr; int fpexc_mode; @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif + /* Switch to guest debug context */ + thread.debug =
[PATCH] KVM: Fix RTC interrupt coalescing tracking
This reverts most of the f1ed0450a5fac7067590317cbf027f566b6ccbca. After the commit kvm_apic_set_irq() no longer returns accurate information about interrupt injection status if injection is done into disabled APIC. RTC interrupt coalescing tracking relies on the information to be accurate and cannot recover if it is not. Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 9d75193..9f4bea8 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -405,17 +405,17 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) return highest_irr; } -static void __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, - int vector, int level, int trig_mode, - unsigned long *dest_map); +static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, +int vector, int level, int trig_mode, +unsigned long *dest_map); -void kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, - unsigned long *dest_map) +int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, + unsigned long *dest_map) { struct kvm_lapic *apic = vcpu-arch.apic; - __apic_accept_irq(apic, irq-delivery_mode, irq-vector, - irq-level, irq-trig_mode, dest_map); + return __apic_accept_irq(apic, irq-delivery_mode, irq-vector, + irq-level, irq-trig_mode, dest_map); } static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val) @@ -608,8 +608,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, *r = -1; if (irq-shorthand == APIC_DEST_SELF) { - kvm_apic_set_irq(src-vcpu, irq, dest_map); - *r = 1; + *r = kvm_apic_set_irq(src-vcpu, irq, dest_map); return true; } @@ -654,8 +653,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, continue; if (*r 0) *r = 0; - kvm_apic_set_irq(dst[i]-vcpu, irq, dest_map); - *r += 1; + *r += kvm_apic_set_irq(dst[i]-vcpu, irq, dest_map); } ret = true; @@ -664,11 +662,15 @@ out: return ret; } -/* Set an IRQ pending in the lapic. */ -static void __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, - int vector, int level, int trig_mode, - unsigned long *dest_map) +/* + * Add a pending IRQ into lapic. + * Return 1 if successfully added and 0 if discarded. + */ +static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, +int vector, int level, int trig_mode, +unsigned long *dest_map) { + int result = 0; struct kvm_vcpu *vcpu = apic-vcpu; switch (delivery_mode) { @@ -682,10 +684,13 @@ static void __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, if (dest_map) __set_bit(vcpu-vcpu_id, dest_map); - if (kvm_x86_ops-deliver_posted_interrupt) + if (kvm_x86_ops-deliver_posted_interrupt) { + result = 1; kvm_x86_ops-deliver_posted_interrupt(vcpu, vector); - else { - if (apic_test_and_set_irr(vector, apic)) { + } else { + result = !apic_test_and_set_irr(vector, apic); + + if (!result) { if (trig_mode) apic_debug(level trig mode repeatedly for vector %d, vector); @@ -697,7 +702,7 @@ static void __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, } out: trace_kvm_apic_accept_irq(vcpu-vcpu_id, delivery_mode, - trig_mode, vector, false); + trig_mode, vector, !result); break; case APIC_DM_REMRD: @@ -709,12 +714,14 @@ out: break; case APIC_DM_NMI: + result = 1; kvm_inject_nmi(vcpu); kvm_vcpu_kick(vcpu); break; case APIC_DM_INIT: if (!trig_mode || level) { + result = 1; /* assumes that there are only KVM_APIC_INIT/SIPI */ apic-pending_events = (1UL KVM_APIC_INIT); /* make sure pending_events is visible before sending @@ -731,6 +738,7 @@ out: case APIC_DM_STARTUP: apic_debug(SIPI to vcpu %d vector 0x%02x\n, vcpu-vcpu_id, vector); + result = 1;
Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021
On Mon, Jun 24, 2013 at 01:59:34PM +0200, Stefan Pietsch wrote: On 24.06.2013 13:47, Gleb Natapov wrote: On Mon, Jun 24, 2013 at 01:43:26PM +0200, Stefan Pietsch wrote: On 23.06.2013 19:36, Gleb Natapov wrote: On Sun, Jun 23, 2013 at 06:51:30PM +0200, Stefan Pietsch wrote: On 23.06.2013 09:51, Gleb Natapov wrote: On Thu, Jun 20, 2013 at 07:01:49PM +0200, Stefan Pietsch wrote: Can you provide the output of 25391454e73e3156202264eb3c473825afe4bc94 and emulate_invalid_guest_state=0. Also run x/20i $pc-20 in qemu monitor after the hang. 25391454e73e3156202264eb3c473825afe4bc94 emulate_invalid_guest_state=0 Very interesting. Looks like somewhere during TPR access FS register gets corrupted. Can you remove /usr/share/kvm/kvmvapic.bin and try again? This will disable some code paths during TPR access and will narrow down the issue. Doing this, qemu complains Could not open option rom 'kvmvapic.bin': No such file or directory, but the virtual machine boots successful with emulate_invalid_guest_state=0 and emulate_invalid_guest_state=1. Hmm, I think we ate close. Can you try with upstream qemu? kvmvapic.bin comes with Debian package seabios 1.7.2-3. I already tried this with the Debian package qemu-kvm 1.5.0+dfsg-4. And it didn't work? Mind trying some debug kernel patches? I suspect your CPU does something no CPU I have do, so I want to verify it. As soon as I remove kvmvapic.bin the virtual machine boots with qemu-kvm 1.5.0. I just verified this with Linux kernel 3.10.0-rc5. emulate_invalid_guest_state=0 or emulate_invalid_guest_state=1 make no difference. Please send your patches. Here it is, run with it and kvmvapic.bin present. See what is printed in dmesg after the failure. diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f4a5b3f..65488a4 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3385,6 +3385,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu, { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 ar; + unsigned long rip; if (vmx-rmode.vm86_active seg != VCPU_SREG_LDTR) { *var = vmx-rmode.segs[seg]; @@ -3408,6 +3409,9 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu, var-db = (ar 14) 1; var-g = (ar 15) 1; var-unusable = (ar 16) 1; + rip = kvm_rip_read(vcpu); + if ((rip == 0xc101611c || rip == 0xc101611a) seg == VCPU_SREG_FS) + printk(base=%p limit=%p selector=%x ar=%x\n, var-base, var-limit, var-selector, ar); } static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V10 0/18] Paravirtualized ticket spinlocks
This series replaces the existing paravirtualized spinlock mechanism with a paravirtualized ticketlock mechanism. The series provides implementation for both Xen and KVM. Changes in V10: Addressed Konrad's review comments: - Added break in patch 5 since now we know exact cpu to wakeup - Dropped patch 12 and Konrad needs to revert two patches to enable xen on hvm 70dd4998, f10cd522c - Remove TIMEOUT and corrected spacing in patch 15 - Kicked spelling and correct spacing in patches 17, 18 Changes in V9: - Changed spin_threshold to 32k to avoid excess halt exits that are causing undercommit degradation (after PLE handler improvement). - Added kvm_irq_delivery_to_apic (suggested by Gleb) - Optimized halt exit path to use PLE handler V8 of PVspinlock was posted last year. After Avi's suggestions to look at PLE handler's improvements, various optimizations in PLE handling have been tried. With this series we see that we could get little more improvements on top of that. Ticket locks have an inherent problem in a virtualized case, because the vCPUs are scheduled rather than running concurrently (ignoring gang scheduled vCPUs). This can result in catastrophic performance collapses when the vCPU scheduler doesn't schedule the correct next vCPU, and ends up scheduling a vCPU which burns its entire timeslice spinning. (Note that this is not the same problem as lock-holder preemption, which this series also addresses; that's also a problem, but not catastrophic). (See Thomas Friebel's talk Prevent Guests from Spinning Around http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) Currently we deal with this by having PV spinlocks, which adds a layer of indirection in front of all the spinlock functions, and defining a completely new implementation for Xen (and for other pvops users, but there are none at present). PV ticketlocks keeps the existing ticketlock implemenentation (fastpath) as-is, but adds a couple of pvops for the slow paths: - If a CPU has been waiting for a spinlock for SPIN_THRESHOLD iterations, then call out to the __ticket_lock_spinning() pvop, which allows a backend to block the vCPU rather than spinning. This pvop can set the lock into slowpath state. - When releasing a lock, if it is in slowpath state, the call __ticket_unlock_kick() to kick the next vCPU in line awake. If the lock is no longer in contention, it also clears the slowpath flag. The slowpath state is stored in the LSB of the within the lock tail ticket. This has the effect of reducing the max number of CPUs by half (so, a small ticket can deal with 128 CPUs, and large ticket 32768). For KVM, one hypercall is introduced in hypervisor,that allows a vcpu to kick another vcpu out of halt state. The blocking of vcpu is done using halt() in (lock_spinning) slowpath. Overall, it results in a large reduction in code, it makes the native and virtualized cases closer, and it removes a layer of indirection around all the spinlock functions. The fast path (taking an uncontended lock which isn't in slowpath state) is optimal, identical to the non-paravirtualized case. The inner part of ticket lock code becomes: inc = xadd(lock-tickets, inc); inc.tail = ~TICKET_SLOWPATH_FLAG; if (likely(inc.head == inc.tail)) goto out; for (;;) { unsigned count = SPIN_THRESHOLD; do { if (ACCESS_ONCE(lock-tickets.head) == inc.tail) goto out; cpu_relax(); } while (--count); __ticket_lock_spinning(lock, inc.tail); } out:barrier(); which results in: push %rbp mov%rsp,%rbp mov$0x200,%eax lock xadd %ax,(%rdi) movzbl %ah,%edx cmp%al,%dl jne1f # Slowpath if lock in contention pop%rbp retq ### SLOWPATH START 1: and$-2,%edx movzbl %dl,%esi 2: mov$0x800,%eax jmp4f 3: pause sub$0x1,%eax je 5f 4: movzbl (%rdi),%ecx cmp%cl,%dl jne3b pop%rbp retq 5: callq *__ticket_lock_spinning jmp2b ### SLOWPATH END with CONFIG_PARAVIRT_SPINLOCKS=n, the code has changed slightly, where the fastpath case is straight through (taking the lock without contention), and the spin loop is out of line: push %rbp mov%rsp,%rbp mov$0x100,%eax lock xadd %ax,(%rdi) movzbl %ah,%edx cmp%al,%dl jne1f pop%rbp retq ### SLOWPATH START 1: pause movzbl (%rdi),%eax cmp%dl,%al jne1b pop%rbp retq ### SLOWPATH END The unlock code is complicated by the need to both add to the lock's head and fetch the slowpath flag from tail.
[PATCH RFC V10 5/18] xen/pvticketlock: Xen implementation for PV ticket locks
xen/pvticketlock: Xen implementation for PV ticket locks From: Jeremy Fitzhardinge jer...@goop.org Replace the old Xen implementation of PV spinlocks with and implementation of xen_lock_spinning and xen_unlock_kick. xen_lock_spinning simply registers the cpu in its entry in lock_waiting, adds itself to the waiting_cpus set, and blocks on an event channel until the channel becomes pending. xen_unlock_kick searches the cpus in waiting_cpus looking for the one which next wants this lock with the next ticket, if any. If found, it kicks it by making its event channel pending, which wakes it up. We need to make sure interrupts are disabled while we're relying on the contents of the per-cpu lock_waiting values, otherwise an interrupt handler could come in, try to take some other lock, block, and overwrite our values. Signed-off-by: Jeremy Fitzhardinge jer...@goop.org Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com [ Raghavendra: use function + enum instead of macro, cmpxchg for zero status reset Reintroduce break since we know the exact vCPU to send IPI as suggested by Konrad.] Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/xen/spinlock.c | 348 +++ 1 file changed, 79 insertions(+), 269 deletions(-) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index d6481a9..d471c76 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -16,45 +16,44 @@ #include xen-ops.h #include debugfs.h -#ifdef CONFIG_XEN_DEBUG_FS -static struct xen_spinlock_stats -{ - u64 taken; - u32 taken_slow; - u32 taken_slow_nested; - u32 taken_slow_pickup; - u32 taken_slow_spurious; - u32 taken_slow_irqenable; +enum xen_contention_stat { + TAKEN_SLOW, + TAKEN_SLOW_PICKUP, + TAKEN_SLOW_SPURIOUS, + RELEASED_SLOW, + RELEASED_SLOW_KICKED, + NR_CONTENTION_STATS +}; - u64 released; - u32 released_slow; - u32 released_slow_kicked; +#ifdef CONFIG_XEN_DEBUG_FS #define HISTO_BUCKETS 30 - u32 histo_spin_total[HISTO_BUCKETS+1]; - u32 histo_spin_spinning[HISTO_BUCKETS+1]; +static struct xen_spinlock_stats +{ + u32 contention_stats[NR_CONTENTION_STATS]; u32 histo_spin_blocked[HISTO_BUCKETS+1]; - - u64 time_total; - u64 time_spinning; u64 time_blocked; } spinlock_stats; static u8 zero_stats; -static unsigned lock_timeout = 1 10; -#define TIMEOUT lock_timeout - static inline void check_zero(void) { - if (unlikely(zero_stats)) { - memset(spinlock_stats, 0, sizeof(spinlock_stats)); - zero_stats = 0; + u8 ret; + u8 old = ACCESS_ONCE(zero_stats); + if (unlikely(old)) { + ret = cmpxchg(zero_stats, old, 0); + /* This ensures only one fellow resets the stat */ + if (ret == old) + memset(spinlock_stats, 0, sizeof(spinlock_stats)); } } -#define ADD_STATS(elem, val) \ - do { check_zero(); spinlock_stats.elem += (val); } while(0) +static inline void add_stats(enum xen_contention_stat var, u32 val) +{ + check_zero(); + spinlock_stats.contention_stats[var] += val; +} static inline u64 spin_time_start(void) { @@ -73,22 +72,6 @@ static void __spin_time_accum(u64 delta, u32 *array) array[HISTO_BUCKETS]++; } -static inline void spin_time_accum_spinning(u64 start) -{ - u32 delta = xen_clocksource_read() - start; - - __spin_time_accum(delta, spinlock_stats.histo_spin_spinning); - spinlock_stats.time_spinning += delta; -} - -static inline void spin_time_accum_total(u64 start) -{ - u32 delta = xen_clocksource_read() - start; - - __spin_time_accum(delta, spinlock_stats.histo_spin_total); - spinlock_stats.time_total += delta; -} - static inline void spin_time_accum_blocked(u64 start) { u32 delta = xen_clocksource_read() - start; @@ -98,19 +81,15 @@ static inline void spin_time_accum_blocked(u64 start) } #else /* !CONFIG_XEN_DEBUG_FS */ #define TIMEOUT(1 10) -#define ADD_STATS(elem, val) do { (void)(val); } while(0) +static inline void add_stats(enum xen_contention_stat var, u32 val) +{ +} static inline u64 spin_time_start(void) { return 0; } -static inline void spin_time_accum_total(u64 start) -{ -} -static inline void spin_time_accum_spinning(u64 start) -{ -} static inline void spin_time_accum_blocked(u64 start) { } @@ -133,229 +112,83 @@ typedef u16 xen_spinners_t; asm(LOCK_PREFIX decw %0 : +m ((xl)-spinners) : : memory); #endif -struct xen_spinlock { - unsigned char lock; /* 0 - free; 1 - locked */ - xen_spinners_t spinners;/* count of waiting cpus */ +struct xen_lock_waiting { + struct arch_spinlock *lock; + __ticket_t want; }; static DEFINE_PER_CPU(int,
[PATCH RFC V10 7/18] x86/pvticketlock: Use callee-save for lock_spinning
x86/pvticketlock: Use callee-save for lock_spinning From: Jeremy Fitzhardinge jer...@goop.org Although the lock_spinning calls in the spinlock code are on the uncommon path, their presence can cause the compiler to generate many more register save/restores in the function pre/postamble, which is in the fast path. To avoid this, convert it to using the pvops callee-save calling convention, which defers all the save/restores until the actual function is called, keeping the fastpath clean. Signed-off-by: Jeremy Fitzhardinge jer...@goop.org Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Tested-by: Attilio Rao attilio@citrix.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/include/asm/paravirt.h |2 +- arch/x86/include/asm/paravirt_types.h |2 +- arch/x86/kernel/paravirt-spinlocks.c |2 +- arch/x86/xen/spinlock.c |3 ++- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 040e72d..7131e12c 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -715,7 +715,7 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, __ticket_t ticket) { - PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket); + PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket); } static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index d5deb6d..350d017 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -330,7 +330,7 @@ struct arch_spinlock; #include asm/spinlock_types.h struct pv_lock_ops { - void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket); + struct paravirt_callee_save lock_spinning; void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket); }; diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c index c2e010e..4251c1d 100644 --- a/arch/x86/kernel/paravirt-spinlocks.c +++ b/arch/x86/kernel/paravirt-spinlocks.c @@ -9,7 +9,7 @@ struct pv_lock_ops pv_lock_ops = { #ifdef CONFIG_SMP - .lock_spinning = paravirt_nop, + .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop), .unlock_kick = paravirt_nop, #endif }; diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 870e49f..ac8f592 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -171,6 +171,7 @@ out: local_irq_restore(flags); spin_time_accum_blocked(start); } +PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning); static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next) { @@ -255,7 +256,7 @@ void __init xen_init_spinlocks(void) return; } - pv_lock_ops.lock_spinning = xen_lock_spinning; + pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning); pv_lock_ops.unlock_kick = xen_unlock_kick; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V10 12/18] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state. the presence of these hypercalls is indicated to guest via kvm_feature_pv_unhalt. Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Suzuki Poulose suz...@in.ibm.com [Raghu: Apic related changes, folding pvunhalted into vcpu_runnable] Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/include/asm/kvm_host.h |5 + arch/x86/include/uapi/asm/kvm_para.h |1 + arch/x86/kvm/cpuid.c |3 ++- arch/x86/kvm/x86.c | 37 ++ include/uapi/linux/kvm_para.h|1 + 5 files changed, 46 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3741c65..95702de 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -503,6 +503,11 @@ struct kvm_vcpu_arch { * instruction. */ bool write_fault_to_shadow_pgtable; + + /* pv related host specific info */ + struct { + bool pv_unhalted; + } pv; }; struct kvm_lpage_info { diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 06fdbd9..94dc8ca 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -23,6 +23,7 @@ #define KVM_FEATURE_ASYNC_PF 4 #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 +#define KVM_FEATURE_PV_UNHALT 7 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index a20ecb5..b110fe6 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -413,7 +413,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, (1 KVM_FEATURE_CLOCKSOURCE2) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_PV_EOI) | -(1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); +(1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) | +(1 KVM_FEATURE_PV_UNHALT); if (sched_info_on()) entry-eax |= (1 KVM_FEATURE_STEAL_TIME); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 094b5d9..f8bea30 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5449,6 +5449,36 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) return 1; } +/* + * kvm_pv_kick_cpu_op: Kick a vcpu. + * + * @apicid - apicid of vcpu to be kicked. + */ +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid) +{ + struct kvm_vcpu *vcpu = NULL; + int i; + + kvm_for_each_vcpu(i, vcpu, kvm) { + if (!kvm_apic_present(vcpu)) + continue; + + if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0)) + break; + } + if (vcpu) { + /* +* Setting unhalt flag here can result in spurious runnable +* state when unhalt reset does not happen in vcpu_block. +* But that is harmless since that should soon result in halt. +*/ + vcpu-arch.pv.pv_unhalted = true; + /* We need everybody see unhalt before vcpu unblocks */ + smp_wmb(); + kvm_vcpu_kick(vcpu); + } +} + int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) { unsigned long nr, a0, a1, a2, a3, ret; @@ -5482,6 +5512,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) case KVM_HC_VAPIC_POLL_IRQ: ret = 0; break; + case KVM_HC_KICK_CPU: + kvm_pv_kick_cpu_op(vcpu-kvm, a0); + ret = 0; + break; default: ret = -KVM_ENOSYS; break; @@ -5909,6 +5943,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) kvm_apic_accept_events(vcpu); switch(vcpu-arch.mp_state) { case KVM_MP_STATE_HALTED: + vcpu-arch.pv.pv_unhalted = false; vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE; case KVM_MP_STATE_RUNNABLE: @@ -6729,6 +6764,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) BUG_ON(vcpu-kvm == NULL); kvm = vcpu-kvm; + vcpu-arch.pv.pv_unhalted = false; vcpu-arch.emulate_ctxt.ops = emulate_ops; if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
[PATCH RFC V10 9/18] jump_label: Split out rate limiting from jump_label.h
jump_label: Split jumplabel ratelimit From: Andrew Jones drjo...@redhat.com Commit b202952075f62603bea9bfb6ebc6b0420db11949 (perf, core: Rate limit perf_sched_events jump_label patching) introduced rate limiting for jump label disabling. The changes were made in the jump label code in order to be more widely available and to keep things tidier. This is all fine, except now jump_label.h includes linux/workqueue.h, which makes it impossible to include jump_label.h from anything that workqueue.h needs. For example, it's now impossible to include jump_label.h from asm/spinlock.h, which is done in proposed pv-ticketlock patches. This patch splits out the rate limiting related changes from jump_label.h into a new file, jump_label_ratelimit.h, to resolve the issue. Signed-off-by: Andrew Jones drjo...@redhat.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- include/linux/jump_label.h | 26 +- include/linux/jump_label_ratelimit.h | 34 ++ include/linux/perf_event.h |1 + kernel/jump_label.c |1 + 4 files changed, 37 insertions(+), 25 deletions(-) create mode 100644 include/linux/jump_label_ratelimit.h diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 0976fc4..53cdf89 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -48,7 +48,6 @@ #include linux/types.h #include linux/compiler.h -#include linux/workqueue.h #if defined(CC_HAVE_ASM_GOTO) defined(CONFIG_JUMP_LABEL) @@ -61,12 +60,6 @@ struct static_key { #endif }; -struct static_key_deferred { - struct static_key key; - unsigned long timeout; - struct delayed_work work; -}; - # include asm/jump_label.h # define HAVE_JUMP_LABEL #endif /* CC_HAVE_ASM_GOTO CONFIG_JUMP_LABEL */ @@ -119,10 +112,7 @@ extern void arch_jump_label_transform_static(struct jump_entry *entry, extern int jump_label_text_reserved(void *start, void *end); extern void static_key_slow_inc(struct static_key *key); extern void static_key_slow_dec(struct static_key *key); -extern void static_key_slow_dec_deferred(struct static_key_deferred *key); extern void jump_label_apply_nops(struct module *mod); -extern void -jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl); #define STATIC_KEY_INIT_TRUE ((struct static_key) \ { .enabled = ATOMIC_INIT(1), .entries = (void *)1 }) @@ -141,10 +131,6 @@ static __always_inline void jump_label_init(void) { } -struct static_key_deferred { - struct static_key key; -}; - static __always_inline bool static_key_false(struct static_key *key) { if (unlikely(atomic_read(key-enabled)) 0) @@ -169,11 +155,6 @@ static inline void static_key_slow_dec(struct static_key *key) atomic_dec(key-enabled); } -static inline void static_key_slow_dec_deferred(struct static_key_deferred *key) -{ - static_key_slow_dec(key-key); -} - static inline int jump_label_text_reserved(void *start, void *end) { return 0; @@ -187,12 +168,6 @@ static inline int jump_label_apply_nops(struct module *mod) return 0; } -static inline void -jump_label_rate_limit(struct static_key_deferred *key, - unsigned long rl) -{ -} - #define STATIC_KEY_INIT_TRUE ((struct static_key) \ { .enabled = ATOMIC_INIT(1) }) #define STATIC_KEY_INIT_FALSE ((struct static_key) \ @@ -203,6 +178,7 @@ jump_label_rate_limit(struct static_key_deferred *key, #define STATIC_KEY_INIT STATIC_KEY_INIT_FALSE #define jump_label_enabled static_key_enabled +static inline int atomic_read(const atomic_t *v); static inline bool static_key_enabled(struct static_key *key) { return (atomic_read(key-enabled) 0); diff --git a/include/linux/jump_label_ratelimit.h b/include/linux/jump_label_ratelimit.h new file mode 100644 index 000..1137883 --- /dev/null +++ b/include/linux/jump_label_ratelimit.h @@ -0,0 +1,34 @@ +#ifndef _LINUX_JUMP_LABEL_RATELIMIT_H +#define _LINUX_JUMP_LABEL_RATELIMIT_H + +#include linux/jump_label.h +#include linux/workqueue.h + +#if defined(CC_HAVE_ASM_GOTO) defined(CONFIG_JUMP_LABEL) +struct static_key_deferred { + struct static_key key; + unsigned long timeout; + struct delayed_work work; +}; +#endif + +#ifdef HAVE_JUMP_LABEL +extern void static_key_slow_dec_deferred(struct static_key_deferred *key); +extern void +jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl); + +#else /* !HAVE_JUMP_LABEL */ +struct static_key_deferred { + struct static_key key; +}; +static inline void static_key_slow_dec_deferred(struct static_key_deferred *key) +{ + static_key_slow_dec(key-key); +} +static inline void +jump_label_rate_limit(struct static_key_deferred *key, + unsigned long rl) +{ +} +#endif /* HAVE_JUMP_LABEL */ +#endif /* _LINUX_JUMP_LABEL_RATELIMIT_H */
[PATCH RFC V10 13/18] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration
kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration From: Raghavendra K T raghavendra...@linux.vnet.ibm.com During migration, any vcpu that got kicked but did not become runnable (still in halted state) should be runnable after migration. Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/kvm/x86.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f8bea30..92a9932 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6243,7 +6243,12 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, struct kvm_mp_state *mp_state) { kvm_apic_accept_events(vcpu); - mp_state-mp_state = vcpu-arch.mp_state; + if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED + vcpu-arch.pv.pv_unhalted) + mp_state-mp_state = KVM_MP_STATE_RUNNABLE; + else + mp_state-mp_state = vcpu-arch.mp_state; + return 0; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V10 18/18] kvm hypervisor: Add directed yield in vcpu block path
kvm hypervisor: Add directed yield in vcpu block path From: Raghavendra K T raghavendra...@linux.vnet.ibm.com We use the improved PLE handler logic in vcpu block patch for scheduling rather than plain schedule, so that we can make intelligent decisions. Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/ia64/include/asm/kvm_host.h|5 + arch/powerpc/include/asm/kvm_host.h |5 + arch/s390/include/asm/kvm_host.h|5 + arch/x86/include/asm/kvm_host.h |2 +- arch/x86/kvm/x86.c |8 include/linux/kvm_host.h|2 +- virt/kvm/kvm_main.c |6 -- 7 files changed, 29 insertions(+), 4 deletions(-) diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h index 989dd3f..999ab15 100644 --- a/arch/ia64/include/asm/kvm_host.h +++ b/arch/ia64/include/asm/kvm_host.h @@ -595,6 +595,11 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu); int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run); void kvm_sal_emul(struct kvm_vcpu *vcpu); +static inline void kvm_do_schedule(struct kvm_vcpu *vcpu) +{ + schedule(); +} + #define __KVM_HAVE_ARCH_VM_ALLOC 1 struct kvm *kvm_arch_alloc_vm(void); void kvm_arch_free_vm(struct kvm *kvm); diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index af326cd..1aeecc0 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -628,4 +628,9 @@ struct kvm_vcpu_arch { #define __KVM_HAVE_ARCH_WQP #define __KVM_HAVE_CREATE_DEVICE +static inline void kvm_do_schedule(struct kvm_vcpu *vcpu) +{ + schedule(); +} + #endif /* __POWERPC_KVM_HOST_H__ */ diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index 16bd5d1..db09a56 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -266,4 +266,9 @@ struct kvm_arch{ }; extern int sie64a(struct kvm_s390_sie_block *, u64 *); +static inline void kvm_do_schedule(struct kvm_vcpu *vcpu) +{ + schedule(); +} + #endif diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 95702de..72ff791 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1042,5 +1042,5 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info); int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data); void kvm_handle_pmu_event(struct kvm_vcpu *vcpu); void kvm_deliver_pmi(struct kvm_vcpu *vcpu); - +void kvm_do_schedule(struct kvm_vcpu *vcpu); #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b963c86..84a4eb2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7281,6 +7281,14 @@ bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu) kvm_x86_ops-interrupt_allowed(vcpu); } +void kvm_do_schedule(struct kvm_vcpu *vcpu) +{ + /* We try to yield to a kicked vcpu else do a schedule */ + if (kvm_vcpu_on_spin(vcpu) = 0) + schedule(); +} +EXPORT_SYMBOL_GPL(kvm_do_schedule); + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f0eea07..39efc18 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -565,7 +565,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, void kvm_vcpu_block(struct kvm_vcpu *vcpu); void kvm_vcpu_kick(struct kvm_vcpu *vcpu); bool kvm_vcpu_yield_to(struct kvm_vcpu *target); -void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu); +bool kvm_vcpu_on_spin(struct kvm_vcpu *vcpu); void kvm_resched(struct kvm_vcpu *vcpu); void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); void kvm_put_guest_fpu(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 302681c..8387247 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1685,7 +1685,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) if (signal_pending(current)) break; - schedule(); + kvm_do_schedule(vcpu); } finish_wait(vcpu-wq, wait); @@ -1786,7 +1786,7 @@ bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu *vcpu) } #endif -void kvm_vcpu_on_spin(struct kvm_vcpu *me) +bool kvm_vcpu_on_spin(struct kvm_vcpu *me) { struct kvm *kvm = me-kvm; struct kvm_vcpu *vcpu; @@ -1835,6 +1835,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) /* Ensure vcpu is not eligible during next spinloop */ kvm_vcpu_set_dy_eligible(me, false); + + return yielded; } EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V10 17/18] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock From: Raghavendra K T raghavendra...@linux.vnet.ibm.com KVM_HC_KICK_CPU hypercall added to wakeup halted vcpu in paravirtual spinlock enabled guest. KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled in guest. Thanks Vatsa for rewriting KVM_HC_KICK_CPU Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- Documentation/virtual/kvm/cpuid.txt |4 Documentation/virtual/kvm/hypercalls.txt | 13 + 2 files changed, 17 insertions(+) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 83afe65..654f43c 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -43,6 +43,10 @@ KVM_FEATURE_CLOCKSOURCE2 || 3 || kvmclock available at msrs KVM_FEATURE_ASYNC_PF || 4 || async pf can be enabled by || || writing to msr 0x4b564d02 -- +KVM_FEATURE_PV_UNHALT || 6 || guest checks this feature bit + || || before enabling paravirtualized + || || spinlock support. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt index ea113b5..0facb7e 100644 --- a/Documentation/virtual/kvm/hypercalls.txt +++ b/Documentation/virtual/kvm/hypercalls.txt @@ -64,3 +64,16 @@ Purpose: To enable communication between the hypervisor and guest there is a shared page that contains parts of supervisor visible register state. The guest can map this shared page to access its supervisor register through memory using this hypercall. + +5. KVM_HC_KICK_CPU + +Architecture: x86 +Status: active +Purpose: Hypercall used to wakeup a vcpu from HLT state +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest +kernel mode for an event to occur (ex: a spinlock to become available) can +execute HLT instruction once it has busy-waited for more than a threshold +time-interval. Execution of HLT instruction would cause the hypervisor to put +the vcpu to sleep until occurence of an appropriate event. Another vcpu of the +same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall, +specifying APIC ID of the vcpu to be woken up. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V10 16/18] kvm hypervisor : Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic
Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic From: Raghavendra K T raghavendra...@linux.vnet.ibm.com Note that we are using APIC_DM_REMRD which has reserved usage. In future if APIC_DM_REMRD usage is standardized, then we should find some other way or go back to old method. Suggested-by: Gleb Natapov g...@redhat.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/kvm/lapic.c |5 - arch/x86/kvm/x86.c | 25 ++--- 2 files changed, 10 insertions(+), 20 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index e1adbb4..3f5f82e 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -706,7 +706,10 @@ out: break; case APIC_DM_REMRD: - apic_debug(Ignoring delivery mode 3\n); + result = 1; + vcpu-arch.pv.pv_unhalted = 1; + kvm_make_request(KVM_REQ_EVENT, vcpu); + kvm_vcpu_kick(vcpu); break; case APIC_DM_SMI: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 92a9932..b963c86 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5456,27 +5456,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) */ static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid) { - struct kvm_vcpu *vcpu = NULL; - int i; + struct kvm_lapic_irq lapic_irq; - kvm_for_each_vcpu(i, vcpu, kvm) { - if (!kvm_apic_present(vcpu)) - continue; + lapic_irq.shorthand = 0; + lapic_irq.dest_mode = 0; + lapic_irq.dest_id = apicid; - if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0)) - break; - } - if (vcpu) { - /* -* Setting unhalt flag here can result in spurious runnable -* state when unhalt reset does not happen in vcpu_block. -* But that is harmless since that should soon result in halt. -*/ - vcpu-arch.pv.pv_unhalted = true; - /* We need everybody see unhalt before vcpu unblocks */ - smp_wmb(); - kvm_vcpu_kick(vcpu); - } + lapic_irq.delivery_mode = APIC_DM_REMRD; + kvm_irq_delivery_to_apic(kvm, 0, lapic_irq, NULL); } int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V10 14/18] kvm guest : Add configuration support to enable debug information for KVM Guests
kvm guest : Add configuration support to enable debug information for KVM Guests From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Suzuki Poulose suz...@in.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/Kconfig |9 + 1 file changed, 9 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 80fcc4b..f8ff42d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -646,6 +646,15 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config KVM_DEBUG_FS + bool Enable debug information for KVM Guests in debugfs + depends on KVM_GUEST DEBUG_FS + default n + ---help--- + This option enables collection of various statistics for KVM guest. + Statistics are displayed in debugfs filesystem. Enabling this option + may incur significant overhead. + source arch/x86/lguest/Kconfig config PARAVIRT_TIME_ACCOUNTING -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V10 11/18] xen/pvticketlock: Allow interrupts to be enabled while blocking
xen/pvticketlock: Allow interrupts to be enabled while blocking From: Jeremy Fitzhardinge jer...@goop.org If interrupts were enabled when taking the spinlock, we can leave them enabled while blocking to get the lock. If we can enable interrupts while waiting for the lock to become available, and we take an interrupt before entering the poll, and the handler takes a spinlock which ends up going into the slow state (invalidating the per-cpu lock and want values), then when the interrupt handler returns the event channel will remain pending so the poll will return immediately, causing it to return out to the main spinlock loop. Signed-off-by: Jeremy Fitzhardinge jer...@goop.org Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/xen/spinlock.c | 46 -- 1 file changed, 40 insertions(+), 6 deletions(-) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 3ebabde..2b012a5 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -140,7 +140,20 @@ static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want) * partially setup state. */ local_irq_save(flags); - + /* +* We don't really care if we're overwriting some other +* (lock,want) pair, as that would mean that we're currently +* in an interrupt context, and the outer context had +* interrupts enabled. That has already kicked the VCPU out +* of xen_poll_irq(), so it will just return spuriously and +* retry with newly setup (lock,want). +* +* The ordering protocol on this is that the lock pointer +* may only be set non-NULL if the want ticket is correct. +* If we're updating want, we must first clear lock. +*/ + w-lock = NULL; + smp_wmb(); w-want = want; smp_wmb(); w-lock = lock; @@ -155,24 +168,43 @@ static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want) /* Only check lock once pending cleared */ barrier(); - /* Mark entry to slowpath before doing the pickup test to make - sure we don't deadlock with an unlocker. */ + /* +* Mark entry to slowpath before doing the pickup test to make +* sure we don't deadlock with an unlocker. +*/ __ticket_enter_slowpath(lock); - /* check again make sure it didn't become free while - we weren't looking */ + /* +* check again make sure it didn't become free while +* we weren't looking +*/ if (ACCESS_ONCE(lock-tickets.head) == want) { add_stats(TAKEN_SLOW_PICKUP, 1); goto out; } + + /* Allow interrupts while blocked */ + local_irq_restore(flags); + + /* +* If an interrupt happens here, it will leave the wakeup irq +* pending, which will cause xen_poll_irq() to return +* immediately. +*/ + /* Block until irq becomes pending (or perhaps a spurious wakeup) */ xen_poll_irq(irq); add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq)); + + local_irq_save(flags); + kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq)); out: cpumask_clear_cpu(cpu, waiting_cpus); w-lock = NULL; + local_irq_restore(flags); + spin_time_accum_blocked(start); } PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning); @@ -186,7 +218,9 @@ static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next) for_each_cpu(cpu, waiting_cpus) { const struct xen_lock_waiting *w = per_cpu(lock_waiting, cpu); - if (w-lock == lock w-want == next) { + /* Make sure we read lock before want */ + if (ACCESS_ONCE(w-lock) == lock + ACCESS_ONCE(w-want) == next) { add_stats(RELEASED_SLOW_KICKED, 1); xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR); break; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com During smp_boot_cpus paravirtualied KVM guest detects if the hypervisor has required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so, support for pv-ticketlocks is registered via pv_lock_ops. Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu. Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Suzuki Poulose suz...@in.ibm.com [Raghu: check_zero race fix, enum for kvm_contention_stat jumplabel related changes ] Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/include/asm/kvm_para.h | 14 ++ arch/x86/kernel/kvm.c | 255 +++ 2 files changed, 267 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 695399f..427afcb 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -118,10 +118,20 @@ void kvm_async_pf_task_wait(u32 token); void kvm_async_pf_task_wake(u32 token); u32 kvm_read_and_reset_pf_reason(void); extern void kvm_disable_steal_time(void); -#else -#define kvm_guest_init() do { } while (0) + +#ifdef CONFIG_PARAVIRT_SPINLOCKS +void __init kvm_spinlock_init(void); +#else /* !CONFIG_PARAVIRT_SPINLOCKS */ +static inline void kvm_spinlock_init(void) +{ +} +#endif /* CONFIG_PARAVIRT_SPINLOCKS */ + +#else /* CONFIG_KVM_GUEST */ +#define kvm_guest_init() do {} while (0) #define kvm_async_pf_task_wait(T) do {} while(0) #define kvm_async_pf_task_wake(T) do {} while(0) + static inline u32 kvm_read_and_reset_pf_reason(void) { return 0; diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index cd6d9a5..97ade5a 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -34,6 +34,7 @@ #include linux/sched.h #include linux/slab.h #include linux/kprobes.h +#include linux/debugfs.h #include asm/timer.h #include asm/cpu.h #include asm/traps.h @@ -419,6 +420,7 @@ static void __init kvm_smp_prepare_boot_cpu(void) WARN_ON(kvm_register_clock(primary cpu clock)); kvm_guest_cpu_init(); native_smp_prepare_boot_cpu(); + kvm_spinlock_init(); } static void __cpuinit kvm_guest_cpu_online(void *dummy) @@ -523,3 +525,256 @@ static __init int activate_jump_labels(void) return 0; } arch_initcall(activate_jump_labels); + +/* Kick a cpu by its apicid. Used to wake up a halted vcpu */ +void kvm_kick_cpu(int cpu) +{ + int apicid; + + apicid = per_cpu(x86_cpu_to_apicid, cpu); + kvm_hypercall1(KVM_HC_KICK_CPU, apicid); +} + +#ifdef CONFIG_PARAVIRT_SPINLOCKS + +enum kvm_contention_stat { + TAKEN_SLOW, + TAKEN_SLOW_PICKUP, + RELEASED_SLOW, + RELEASED_SLOW_KICKED, + NR_CONTENTION_STATS +}; + +#ifdef CONFIG_KVM_DEBUG_FS +#define HISTO_BUCKETS 30 + +static struct kvm_spinlock_stats +{ + u32 contention_stats[NR_CONTENTION_STATS]; + u32 histo_spin_blocked[HISTO_BUCKETS+1]; + u64 time_blocked; +} spinlock_stats; + +static u8 zero_stats; + +static inline void check_zero(void) +{ + u8 ret; + u8 old; + + old = ACCESS_ONCE(zero_stats); + if (unlikely(old)) { + ret = cmpxchg(zero_stats, old, 0); + /* This ensures only one fellow resets the stat */ + if (ret == old) + memset(spinlock_stats, 0, sizeof(spinlock_stats)); + } +} + +static inline void add_stats(enum kvm_contention_stat var, u32 val) +{ + check_zero(); + spinlock_stats.contention_stats[var] += val; +} + + +static inline u64 spin_time_start(void) +{ + return sched_clock(); +} + +static void __spin_time_accum(u64 delta, u32 *array) +{ + unsigned index; + + index = ilog2(delta); + check_zero(); + + if (index HISTO_BUCKETS) + array[index]++; + else + array[HISTO_BUCKETS]++; +} + +static inline void spin_time_accum_blocked(u64 start) +{ + u32 delta; + + delta = sched_clock() - start; + __spin_time_accum(delta, spinlock_stats.histo_spin_blocked); + spinlock_stats.time_blocked += delta; +} + +static struct dentry *d_spin_debug; +static struct dentry *d_kvm_debug; + +struct dentry *kvm_init_debugfs(void) +{ + d_kvm_debug = debugfs_create_dir(kvm, NULL); + if (!d_kvm_debug) + printk(KERN_WARNING Could not create 'kvm' debugfs directory\n); + + return d_kvm_debug; +} + +static int __init kvm_spinlock_debugfs(void) +{ + struct dentry *d_kvm; + + d_kvm = kvm_init_debugfs(); + if (d_kvm == NULL) + return -ENOMEM; + + d_spin_debug = debugfs_create_dir(spinlocks, d_kvm); + + debugfs_create_u8(zero_stats, 0644, d_spin_debug, zero_stats); + + debugfs_create_u32(taken_slow, 0444, d_spin_debug, +
[PATCH RFC V10 4/18] xen: Defer spinlock setup until boot CPU setup
xen: Defer spinlock setup until boot CPU setup From: Jeremy Fitzhardinge jer...@goop.org There's no need to do it at very early init, and doing it there makes it impossible to use the jump_label machinery. Signed-off-by: Jeremy Fitzhardinge jer...@goop.org Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/xen/smp.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index 8ff3799..dcdc91c 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -246,6 +246,7 @@ static void __init xen_smp_prepare_boot_cpu(void) xen_filter_cpu_maps(); xen_setup_vcpu_info_placement(); + xen_init_spinlocks(); } static void __init xen_smp_prepare_cpus(unsigned int max_cpus) @@ -647,7 +648,6 @@ void __init xen_smp_init(void) { smp_ops = xen_smp_ops; xen_fill_possible_map(); - xen_init_spinlocks(); } static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V10 1/18] x86/spinlock: Replace pv spinlocks with pv ticketlocks
x86/spinlock: Replace pv spinlocks with pv ticketlocks From: Jeremy Fitzhardinge jer...@goop.org Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk Prevent Guests from Spinning Around http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. Signed-off-by: Jeremy Fitzhardinge jer...@goop.org Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Tested-by: Attilio Rao attilio@citrix.com [ Raghavendra: Changed SPIN_THRESHOLD ] Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/include/asm/paravirt.h | 32 arch/x86/include/asm/paravirt_types.h | 10 ++ arch/x86/include/asm/spinlock.h | 53 +++-- arch/x86/include/asm/spinlock_types.h |4 -- arch/x86/kernel/paravirt-spinlocks.c | 15 + arch/x86/xen/spinlock.c |8 - 6 files changed, 61 insertions(+), 61 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index cfdc9ee..040e72d 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, #if defined(CONFIG_SMP) defined(CONFIG_PARAVIRT_SPINLOCKS) -static inline int arch_spin_is_locked(struct arch_spinlock *lock) +static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, + __ticket_t ticket) { - return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock); + PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket); } -static inline int arch_spin_is_contended(struct arch_spinlock *lock) +static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, + __ticket_t ticket) { - return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock); -} -#define arch_spin_is_contended arch_spin_is_contended - -static __always_inline void arch_spin_lock(struct arch_spinlock *lock) -{ - PVOP_VCALL1(pv_lock_ops.spin_lock, lock); -} - -static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock, - unsigned long flags) -{ - PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags); -} - -static __always_inline int arch_spin_trylock(struct arch_spinlock *lock) -{ - return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock); -} - -static __always_inline void arch_spin_unlock(struct arch_spinlock *lock) -{ - PVOP_VCALL1(pv_lock_ops.spin_unlock, lock); + PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); } #endif diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 0db1fca..d5deb6d 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -327,13 +327,11 @@ struct pv_mmu_ops { }; struct arch_spinlock; +#include asm/spinlock_types.h + struct pv_lock_ops { - int (*spin_is_locked)(struct arch_spinlock *lock); - int (*spin_is_contended)(struct arch_spinlock *lock); - void (*spin_lock)(struct arch_spinlock *lock); - void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long flags); - int (*spin_trylock)(struct arch_spinlock *lock); - void (*spin_unlock)(struct arch_spinlock *lock); + void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket); + void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket); }; /* This contains all the paravirt structures: we get a convenient diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 33692ea..4d54244 100644 ---
[PATCH RFC V10 6/18] xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks
xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks From: Jeremy Fitzhardinge jer...@goop.org Signed-off-by: Jeremy Fitzhardinge jer...@goop.org Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/xen/spinlock.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index d471c76..870e49f 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -239,6 +239,8 @@ void xen_uninit_lock_cpu(int cpu) per_cpu(lock_kicker_irq, cpu) = -1; } +static bool xen_pvspin __initdata = true; + void __init xen_init_spinlocks(void) { /* @@ -248,10 +250,22 @@ void __init xen_init_spinlocks(void) if (xen_hvm_domain()) return; + if (!xen_pvspin) { + printk(KERN_DEBUG xen: PV spinlocks disabled\n); + return; + } + pv_lock_ops.lock_spinning = xen_lock_spinning; pv_lock_ops.unlock_kick = xen_unlock_kick; } +static __init int xen_parse_nopvspin(char *arg) +{ + xen_pvspin = false; + return 0; +} +early_param(xen_nopvspin, xen_parse_nopvspin); + #ifdef CONFIG_XEN_DEBUG_FS static struct dentry *d_spin_debug; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V10 10/18] x86/ticketlock: Add slowpath logic
x86/ticketlock: Add slowpath logic From: Jeremy Fitzhardinge jer...@goop.org Maintain a flag in the LSB of the ticket lock tail which indicates whether anyone is in the lock slowpath and may need kicking when the current holder unlocks. The flags are set when the first locker enters the slowpath, and cleared when unlocking to an empty queue (ie, no contention). In the specific implementation of lock_spinning(), make sure to set the slowpath flags on the lock just before blocking. We must do this before the last-chance pickup test to prevent a deadlock with the unlocker: UnlockerLocker test for lock pickup - fail unlock test slowpath - false set slowpath flags block Whereas this works in any ordering: UnlockerLocker set slowpath flags test for lock pickup - fail block unlock test slowpath - true, kick If the unlocker finds that the lock has the slowpath flag set but it is actually uncontended (ie, head == tail, so nobody is waiting), then it clears the slowpath flag. The unlock code uses a locked add to update the head counter. This also acts as a full memory barrier so that its safe to subsequently read back the slowflag state, knowing that the updated lock is visible to the other CPUs. If it were an unlocked add, then the flag read may just be forwarded from the store buffer before it was visible to the other CPUs, which could result in a deadlock. Unfortunately this means we need to do a locked instruction when unlocking with PV ticketlocks. However, if PV ticketlocks are not enabled, then the old non-locked add is the only unlocking code. Note: this code relies on gcc making sure that unlikely() code is out of line of the fastpath, which only happens when OPTIMIZE_SIZE=n. If it doesn't the generated code isn't too bad, but its definitely suboptimal. Thanks to Srivatsa Vaddagiri for providing a bugfix to the original version of this change, which has been folded in. Thanks to Stephan Diestelhorst for commenting on some code which relied on an inaccurate reading of the x86 memory ordering rules. Signed-off-by: Jeremy Fitzhardinge jer...@goop.org Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Stephan Diestelhorst stephan.diestelho...@amd.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/include/asm/paravirt.h |2 - arch/x86/include/asm/spinlock.h | 86 - arch/x86/include/asm/spinlock_types.h |2 + arch/x86/kernel/paravirt-spinlocks.c |3 + arch/x86/xen/spinlock.c |6 ++ 5 files changed, 74 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 7131e12c..401f350 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -718,7 +718,7 @@ static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket); } -static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, +static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket) { PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 04a5cd5..d68883d 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -1,11 +1,14 @@ #ifndef _ASM_X86_SPINLOCK_H #define _ASM_X86_SPINLOCK_H +#include linux/jump_label.h #include linux/atomic.h #include asm/page.h #include asm/processor.h #include linux/compiler.h #include asm/paravirt.h +#include asm/bitops.h + /* * Your basic SMP spinlocks, allowing only a single CPU anywhere * @@ -37,32 +40,28 @@ /* How long a lock should spin before we consider blocking */ #define SPIN_THRESHOLD (1 15) -#ifndef CONFIG_PARAVIRT_SPINLOCKS +extern struct static_key paravirt_ticketlocks_enabled; +static __always_inline bool static_key_false(struct static_key *key); -static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, - __ticket_t ticket) +#ifdef CONFIG_PARAVIRT_SPINLOCKS + +static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) { + set_bit(0, (volatile unsigned long *)lock-tickets.tail); } -static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, -__ticket_t ticket) +#else /* !CONFIG_PARAVIRT_SPINLOCKS */ +static
[PATCH RFC V10 3/18] x86/ticketlock: Collapse a layer of functions
x86/ticketlock: Collapse a layer of functions From: Jeremy Fitzhardinge jer...@goop.org Now that the paravirtualization layer doesn't exist at the spinlock level any more, we can collapse the __ticket_ functions into the arch_ functions. Signed-off-by: Jeremy Fitzhardinge jer...@goop.org Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Tested-by: Attilio Rao attilio@citrix.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/include/asm/spinlock.h | 35 +-- 1 file changed, 5 insertions(+), 30 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 4d54244..7442410 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -76,7 +76,7 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, * in the high part, because a wide xadd increment of the low part would carry * up and contaminate the high part. */ -static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock) +static __always_inline void arch_spin_lock(struct arch_spinlock *lock) { register struct __raw_tickets inc = { .tail = 1 }; @@ -96,7 +96,7 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock) out: barrier(); /* make sure nothing creeps before the lock is taken */ } -static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) +static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) { arch_spinlock_t old, new; @@ -110,7 +110,7 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == old.head_tail; } -static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) +static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) { __ticket_t next = lock-tickets.head + 1; @@ -118,46 +118,21 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) __ticket_unlock_kick(lock, next); } -static inline int __ticket_spin_is_locked(arch_spinlock_t *lock) +static inline int arch_spin_is_locked(arch_spinlock_t *lock) { struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets); return tmp.tail != tmp.head; } -static inline int __ticket_spin_is_contended(arch_spinlock_t *lock) +static inline int arch_spin_is_contended(arch_spinlock_t *lock) { struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets); return (__ticket_t)(tmp.tail - tmp.head) 1; } - -static inline int arch_spin_is_locked(arch_spinlock_t *lock) -{ - return __ticket_spin_is_locked(lock); -} - -static inline int arch_spin_is_contended(arch_spinlock_t *lock) -{ - return __ticket_spin_is_contended(lock); -} #define arch_spin_is_contended arch_spin_is_contended -static __always_inline void arch_spin_lock(arch_spinlock_t *lock) -{ - __ticket_spin_lock(lock); -} - -static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) -{ - return __ticket_spin_trylock(lock); -} - -static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) -{ - __ticket_spin_unlock(lock); -} - static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V10 8/18] x86/pvticketlock: When paravirtualizing ticket locks, increment by 2
x86/pvticketlock: When paravirtualizing ticket locks, increment by 2 From: Jeremy Fitzhardinge jer...@goop.org Increment ticket head/tails by 2 rather than 1 to leave the LSB free to store a is in slowpath state bit. This halves the number of possible CPUs for a given ticket size, but this shouldn't matter in practice - kernels built for 32k+ CPU systems are probably specially built for the hardware rather than a generic distro kernel. Signed-off-by: Jeremy Fitzhardinge jer...@goop.org Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Tested-by: Attilio Rao attilio@citrix.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/include/asm/spinlock.h | 10 +- arch/x86/include/asm/spinlock_types.h | 10 +- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 7442410..04a5cd5 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -78,7 +78,7 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, */ static __always_inline void arch_spin_lock(struct arch_spinlock *lock) { - register struct __raw_tickets inc = { .tail = 1 }; + register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC }; inc = xadd(lock-tickets, inc); @@ -104,7 +104,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) if (old.tickets.head != old.tickets.tail) return 0; - new.head_tail = old.head_tail + (1 TICKET_SHIFT); + new.head_tail = old.head_tail + (TICKET_LOCK_INC TICKET_SHIFT); /* cmpxchg is a full barrier, so nothing can move before it */ return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == old.head_tail; @@ -112,9 +112,9 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) { - __ticket_t next = lock-tickets.head + 1; + __ticket_t next = lock-tickets.head + TICKET_LOCK_INC; - __add(lock-tickets.head, 1, UNLOCK_LOCK_PREFIX); + __add(lock-tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX); __ticket_unlock_kick(lock, next); } @@ -129,7 +129,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock) { struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets); - return (__ticket_t)(tmp.tail - tmp.head) 1; + return (__ticket_t)(tmp.tail - tmp.head) TICKET_LOCK_INC; } #define arch_spin_is_contended arch_spin_is_contended diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h index 83fd3c7..e96fcbd 100644 --- a/arch/x86/include/asm/spinlock_types.h +++ b/arch/x86/include/asm/spinlock_types.h @@ -3,7 +3,13 @@ #include linux/types.h -#if (CONFIG_NR_CPUS 256) +#ifdef CONFIG_PARAVIRT_SPINLOCKS +#define __TICKET_LOCK_INC 2 +#else +#define __TICKET_LOCK_INC 1 +#endif + +#if (CONFIG_NR_CPUS (256 / __TICKET_LOCK_INC)) typedef u8 __ticket_t; typedef u16 __ticketpair_t; #else @@ -11,6 +17,8 @@ typedef u16 __ticket_t; typedef u32 __ticketpair_t; #endif +#define TICKET_LOCK_INC((__ticket_t)__TICKET_LOCK_INC) + #define TICKET_SHIFT (sizeof(__ticket_t) * 8) typedef struct arch_spinlock { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V10 2/18] x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks
x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks From: Raghavendra K T raghavendra...@linux.vnet.ibm.com The code size expands somewhat, and its better to just call a function rather than inline it. Thanks Jeremy for original version of ARCH_NOINLINE_SPIN_UNLOCK config patch, which is simplified. Suggested-by: Linus Torvalds torva...@linux-foundation.org Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/Kconfig |1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 685692c..80fcc4b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -621,6 +621,7 @@ config PARAVIRT_DEBUG config PARAVIRT_SPINLOCKS bool Paravirtualization layer for spinlocks depends on PARAVIRT SMP + select UNINLINE_SPIN_UNLOCK ---help--- Paravirtualized spinlocks allow a pvops backend to replace the spinlock implementation with something virtualization-friendly -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V10 0/18] Paravirtualized ticket spinlocks
On Mon, Jun 24, 2013 at 06:10:14PM +0530, Raghavendra K T wrote: Results: === base = 3.10-rc2 kernel patched = base + this series The test was on 32 core (model: Intel(R) Xeon(R) CPU X7560) HT disabled with 32 KVM guest vcpu 8GB RAM. Have you ever tried to get results with HT enabled? +---+---+---++---+ ebizzy (records/sec) higher is better +---+---+---++---+ basestdevpatchedstdev%improvement +---+---+---++---+ 1x 5574.9000 237.49975618.94.0366 0.77311 2x 2741.5000 561.30903332. 102.473821.53930 3x 2146.2500 216.77182302.76.3870 7.27237 4x 1663. 141.92351753.750083.5220 5.45701 +---+---+---++---+ This looks good. Are your ebizzy results consistent run to run though? +---+---+---++---+ dbench (Throughput) higher is better +---+---+---++---+ basestdevpatchedstdev%improvement +---+---+---++---+ 1x 14111.5600 754.4525 14645.9900 114.3087 3.78718 2x 2481.627071.26652667.128073.8193 7.47498 3x 1510.248331.86341503.879236.0777-0.42173 4x 1029.487516.91661039.706943.8840 0.99267 +---+---+---++---+ Hmm, I wonder what 2.5x looks like. Also, the 3% improvement with no overcommit is interesting. What's happening there? It makes me wonder what 1x looks like. thanks, drew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V10 0/18] Paravirtualized ticket spinlocks
On 06/24/2013 06:47 PM, Andrew Jones wrote: On Mon, Jun 24, 2013 at 06:10:14PM +0530, Raghavendra K T wrote: Results: === base = 3.10-rc2 kernel patched = base + this series The test was on 32 core (model: Intel(R) Xeon(R) CPU X7560) HT disabled with 32 KVM guest vcpu 8GB RAM. Have you ever tried to get results with HT enabled? I have not done it yet with the latest. I will get that result. +---+---+---++---+ ebizzy (records/sec) higher is better +---+---+---++---+ basestdevpatchedstdev%improvement +---+---+---++---+ 1x 5574.9000 237.49975618.94.0366 0.77311 2x 2741.5000 561.30903332. 102.473821.53930 3x 2146.2500 216.77182302.76.3870 7.27237 4x 1663. 141.92351753.750083.5220 5.45701 +---+---+---++---+ This looks good. Are your ebizzy results consistent run to run though? yes.. ebizzy looked more consistent. +---+---+---++---+ dbench (Throughput) higher is better +---+---+---++---+ basestdevpatchedstdev%improvement +---+---+---++---+ 1x 14111.5600 754.4525 14645.9900 114.3087 3.78718 2x 2481.627071.26652667.128073.8193 7.47498 3x 1510.248331.86341503.879236.0777-0.42173 4x 1029.487516.91661039.706943.8840 0.99267 +---+---+---++---+ Hmm, I wonder what 2.5x looks like. Also, the 3% improvement with no overcommit is interesting. What's happening there? It makes me wonder what 1x looks like. I 'll try to get 0.5x and 2.5x run for dbench. thanks, drew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pci: Enable overrides for missing ACS capabilities
On Wed, Jun 19, 2013 at 6:43 AM, Don Dutile ddut...@redhat.com wrote: On 06/18/2013 10:52 PM, Bjorn Helgaas wrote: On Tue, Jun 18, 2013 at 5:03 PM, Don Dutileddut...@redhat.com wrote: On 06/18/2013 06:22 PM, Alex Williamson wrote: On Tue, 2013-06-18 at 15:31 -0600, Bjorn Helgaas wrote: On Tue, Jun 18, 2013 at 12:20 PM, Alex Williamson alex.william...@redhat.com wrote: On Tue, 2013-06-18 at 11:28 -0600, Bjorn Helgaas wrote: On Thu, May 30, 2013 at 12:40:19PM -0600, Alex Williamson wrote: ... Who do you expect to decide whether to use this option? I think it requires intimate knowledge of how the device works. I think the benefit of using the option is that it makes assignment of devices to guests more flexible, which will make it attractive to users. But most users have no way of knowing whether it's actually *safe* to use this. So I worry that you're adding an easy way to pretend isolation exists when there's no good way of being confident that it actually does. ... I wonder if we should taint the kernel if this option is used (but not for specific devices added to pci_dev_acs_enabled[]). It would also be nice if pci_dev_specific_acs_enabled() gave some indication in dmesg for the specific devices you're hoping to add to pci_dev_acs_enabled[]. It's not an enumeration-time quirk right now, so I'm not sure how we'd limit it to one message per device. Right, setup vs use and getting single prints is a lot of extra code. Tainting is troublesome for support, Don had some objections when I suggested the same to him. For RH GSS (Global Support Services), a 'taint' in the kernel printk means RH doesn't support that system. The 'non-support' due to 'taint' being printed out in this case may be incorrect -- RH may support that use, at least until a more sufficient patched kernel is provided. Thus my dissension that 'taint' be output. WARN is ok. 'driver beware', 'unleashed dog afoot' sure... So ... that's really a RH-specific support issue, and easily worked around by RH adding a patch that turns off tainting. sure. what's another patch to the thousands... :-/ It still sounds like a good idea to me for upstream, where use of this option can very possibly lead to corruption or information leakage between devices the user claimed were isolated, but in fact were not. Did I miss something? This patch provides a user-level/chosen override; like all other overrides, (pci=realloc, etc.), it can lead to a failing system. IMO, this patch is no different. If you want to tag this patch with taint, then let's audit all the (PCI) overrides and taint them appropriately. Taint should be reserved to changes to the kernel that were done outside the development of the kernel, or with the explicit intent to circumvent the normal operation of the kernel. This patch provides a way to enable ACS checking to succeed when the devices have not provided sufficiently complete ACS information. i.e., it's a growth path for PCIe-ACS and its need for proper support. We're telling the kernel to assume something (the hardware provides protection) that may not be true. If that assumption turns out to be false, the result is that a VM can be crashed or comprised by another VM. One difference I see is that this override can lead to a crash that looks like random memory corruption and has no apparent connection to the actual cause. Most other overrides won't cause run-time crashes (I think they're more likely to cause boot or device configuration failures), and the dmesg log will probably have good clues as to the reason. But the possibility of compromise is probably even more serious, because there would be no crash at all, and we'd have no indication that VM A read or corrupted data in VM B. I'm very concerned about that, enough so that it's not clear to me that an override belongs in the upstream kernel at all. Yes, that would mean some hardware is not suitable for device assignment. That just sounds like if hardware manufacturers do their homework and support ACS properly, their hardware is more useful for virtualization than other hardware. I don't see the problem with that. Bjorn -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kvm_intel: Could not allocate 42 bytes percpu data
Hello, Lots (~700+) of the following messages are showing up in the dmesg of a 3.10-rc1 based kernel (Host OS is running on a large socket count box with HT-on). [ 82.270682] PERCPU: allocation failed, size=42 align=16, alloc from reserved chunk failed [ 82.272633] kvm_intel: Could not allocate 42 bytes percpu data ... also call traces like the following... [ 101.852136] c901ad5aa090 88084675dd08 81633743 88084675ddc8 [ 101.860889] 81145053 81f3fa78 88084809dd40 8907d1cfd2e8 [ 101.869466] 8907d1cfd280 88087fffdb08 88084675c010 88084675dfd8 [ 101.878190] Call Trace: [ 101.880953] [81633743] dump_stack+0x19/0x1e [ 101.886679] [81145053] pcpu_alloc+0x9a3/0xa40 [ 101.892754] [81145103] __alloc_reserved_percpu+0x13/0x20 [ 101.899733] [810b2d7f] load_module+0x35f/0x1a70 [ 101.905835] [8163ad6e] ? do_page_fault+0xe/0x10 [ 101.911953] [810b467b] SyS_init_module+0xfb/0x140 [ 101.918287] [8163f542] system_call_fastpath+0x16/0x1b [ 101.924981] kvm_intel: Could not allocate 42 bytes percpu data Wondering if anyone else has seen this with the recent [3.10] based kernels esp. on larger boxes? There was a similar issue that was reported earlier (where modules were being loaded per cpu without checking if an instance was already loaded/being-loaded). That issue seems to have been addressed in the recent past (e.g. https://lkml.org/lkml/2013/1/24/659 along with a couple of follow on cleanups) Is the above yet another variant of the original issue or perhaps some race condition that got exposed when there are lot more threads ? Vinod -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] armv7 initial device passthrough support
On Mon, Jun 24, 2013 at 10:08:08AM +0200, Mario Smarduch wrote: On 6/15/2013 5:47 PM, Paolo Bonzini wrote: Il 13/06/2013 11:19, Mario Smarduch ha scritto: Updated Device Passthrough Patch. - optimized IRQ-CPU-vCPU binding, irq is installed once - added dynamic IRQ affinity on schedule in - added documentation and few other coding recommendations. Per earlier discussion VFIO is our target but we like something earlier to work with to tackle performance latency issue (some ARM related) for device passthrough while we migrate towards VFIO. I don't think this is acceptable upstream, unfortunately. KVM device assignment is deprecated and we should not add more users. That's fine we'll work our way towards dev-tree VFIO reusing what we can working with the community. At this point we're more concerned with numbers and best practices as opposed to mechanism this part will be time consuming. VFIO will be more background for us. What are the latency issues you have? Our focus now is on IRQ latency and throughput. Right now it appears lowest latency is 2x + exit/enter + IRQ injection overhead. We can't tolerate additional IPIs or deferred IRQ injection approaches. We're looking for numbers closer to what IBMs ELI managed. Also high res timers which ARM Virt. Ext supports very well. Exitless interrupts which ARM handles very well too. There are some future hw ARM interrupt enhancements coming up which may help a lot as well. There are many other latency/perf. reqs for NFV related to RT, essentially Guest must run near native. In the end it may turn out this may need to be outside of main tree we'll see. It doesn't sound like this will be the end result. Everything that you try to do in your patch set can be accomplished using VFIO and a more generic infrastructure for virtual IRQ integration with KVM and user space. We should avoid creating an environment with important functionality outside of the main tree, if at all possible. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021
On 24.06.2013 14:30, Gleb Natapov wrote: On Mon, Jun 24, 2013 at 01:59:34PM +0200, Stefan Pietsch wrote: As soon as I remove kvmvapic.bin the virtual machine boots with qemu-kvm 1.5.0. I just verified this with Linux kernel 3.10.0-rc5. emulate_invalid_guest_state=0 or emulate_invalid_guest_state=1 make no difference. Please send your patches. Here it is, run with it and kvmvapic.bin present. See what is printed in dmesg after the failure. diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f4a5b3f..65488a4 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3385,6 +3385,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu, { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 ar; + unsigned long rip; if (vmx-rmode.vm86_active seg != VCPU_SREG_LDTR) { *var = vmx-rmode.segs[seg]; @@ -3408,6 +3409,9 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu, var-db = (ar 14) 1; var-g = (ar 15) 1; var-unusable = (ar 16) 1; + rip = kvm_rip_read(vcpu); + if ((rip == 0xc101611c || rip == 0xc101611a) seg == VCPU_SREG_FS) + printk(base=%p limit=%p selector=%x ar=%x\n, var-base, var-limit, var-selector, ar); } static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg) Booting kernel Linux 3.10-rc5 with your patch applied produces these messages in dmesg when starting a virtual machine: emulate_invalid_guest_state=0 [ 118.732151] base= limit= (null) selector=ffff ar=0 [ 118.732341] base= limit= (null) selector=ffff ar=0 emulate_invalid_guest_state=1 [ 196.481653] base= limit= (null) selector=ffff ar=0 [ 196.481700] base= limit= (null) selector=ffff ar=0 [ 196.481706] base= limit= (null) selector=ffff ar=0 [ 196.481711] base= limit= (null) selector=ffff ar=0 [ 196.481716] base= limit= (null) selector=ffff ar=0 [ 196.481720] base= limit= (null) selector=ffff ar=0 [ 196.481725] base= limit= (null) selector=ffff ar=0 [ 196.481730] base= limit= (null) selector=ffff ar=0 [ 196.481735] base= limit= (null) selector=ffff ar=0 [ 196.481739] base= limit= (null) selector=ffff ar=0 [ 196.481777] base= limit= (null) selector=ffff ar=0 [ 196.482068] base= limit= (null) selector=ffff ar=0 [ 196.482073] base= limit= (null) selector=ffff ar=0 [ 196.482079] base= limit= (null) selector=ffff ar=0 [ 196.482084] base= limit= (null) selector=ffff ar=0 [ 196.482131] base= limit= (null) selector=ffff ar=0 [ 196.482136] base= limit= (null) selector=ffff ar=0 [ 196.482142] base= limit= (null) selector=ffff ar=0 [ 196.482146] base= limit= (null) selector=ffff ar=0 [ 196.482193] base= limit= (null) selector=ffff ar=0 [ 196.482198] base= limit= (null) selector=ffff ar=0 [ 196.482203] base= limit= (null) selector=ffff ar=0 [ 196.482208] base= limit= (null) selector=ffff ar=0 [ 196.482255] base= limit= (null) selector=ffff ar=0 [ 196.482259] base= limit= (null) selector=ffff ar=0 [ 196.482265] base= limit= (null) selector=ffff ar=0 [ 196.482269] base= limit= (null) selector=ffff ar=0 [ 196.482316] base= limit= (null) selector=ffff ar=0 [ 196.482321] base= limit= (null) selector=ffff ar=0 [ 196.482326] base= limit= (null) selector=ffff ar=0 [ 196.482331] base= limit= (null) selector=ffff ar=0 [ 196.482378] base= limit= (null) selector=ffff ar=0 [ 196.482382] base= limit= (null) selector=ffff ar=0 [ 196.482388] base= limit= (null) selector=ffff ar=0 [ 196.482392] base= limit= (null) selector=ffff ar=0 [ 196.482439] base= limit= (null) selector=ffff ar=0 [ 196.482444] base= limit= (null) selector=ffff ar=0 [ 196.482449] base= limit= (null) selector=ffff ar=0 [ 196.482454] base= limit= (null) selector=ffff ar=0 [ 196.482501] base= limit= (null) selector=ffff ar=0 [ 196.482505] base= limit= (null) selector=ffff ar=0 [ 196.482511] base= limit= (null) selector=ffff ar=0 [ 196.482516] base= limit= (null) selector=ffff ar=0 [ 196.482562] base= limit= (null) selector=ffff ar=0 [ 196.482567] base= limit= (null) selector=ffff ar=0 [ 196.482573] base= limit= (null) selector=ffff ar=0 [ 196.482577] base= limit= (null) selector=ffff ar=0 [ 196.483137] base= limit= (null) selector=ffff ar=0 [ 196.483142] base= limit= (null) selector=ffff ar=0 [ 196.483147] base= limit= (null) selector=ffff ar=0 [ 196.483152]
Re: [PATCH 2/2] armv7 initial device passthrough support
On Mon, Jun 24, 2013 at 3:01 PM, Christoffer Dall christoffer.d...@linaro.org wrote: On Mon, Jun 24, 2013 at 10:08:08AM +0200, Mario Smarduch wrote: On 6/15/2013 5:47 PM, Paolo Bonzini wrote: Il 13/06/2013 11:19, Mario Smarduch ha scritto: Updated Device Passthrough Patch. - optimized IRQ-CPU-vCPU binding, irq is installed once - added dynamic IRQ affinity on schedule in - added documentation and few other coding recommendations. Per earlier discussion VFIO is our target but we like something earlier to work with to tackle performance latency issue (some ARM related) for device passthrough while we migrate towards VFIO. I don't think this is acceptable upstream, unfortunately. KVM device assignment is deprecated and we should not add more users. That's fine we'll work our way towards dev-tree VFIO reusing what we can working with the community. At this point we're more concerned with numbers and best practices as opposed to mechanism this part will be time consuming. VFIO will be more background for us. What are the latency issues you have? Our focus now is on IRQ latency and throughput. Right now it appears lowest latency is 2x + exit/enter + IRQ injection overhead. We can't tolerate additional IPIs or deferred IRQ injection approaches. We're looking for numbers closer to what IBMs ELI managed. Also high res timers which ARM Virt. Ext supports very well. Exitless interrupts which ARM handles very well too. There are some future hw ARM interrupt enhancements coming up which may help a lot as well. There are many other latency/perf. reqs for NFV related to RT, essentially Guest must run near native. In the end it may turn out this may need to be outside of main tree we'll see. It doesn't sound like this will be the end result. Everything that you try to do in your patch set can be accomplished using VFIO and a more generic infrastructure for virtual IRQ integration with KVM and user space. We should avoid creating an environment with important functionality outside of the main tree, if at all possible. Also, as we architect that generic infrastructure we need to keep in mind that there are important use cases for doing I/O in user space that are not KVM guests-- just normal applications that need direct device access. Stuart -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm_intel: Could not allocate 42 bytes percpu data
On 06/24/2013 03:01 PM, Chegu Vinod wrote: Hello, Lots (~700+) of the following messages are showing up in the dmesg of a 3.10-rc1 based kernel (Host OS is running on a large socket count box with HT-on). [ 82.270682] PERCPU: allocation failed, size=42 align=16, alloc from reserved chunk failed [ 82.272633] kvm_intel: Could not allocate 42 bytes percpu data On 3.10? Geez. I thought we had fixed this. I'll grab a big machine and see if I can debug. Rusty -- any ideas off the top of your head?' ... also call traces like the following... [ 101.852136] c901ad5aa090 88084675dd08 81633743 88084675ddc8 [ 101.860889] 81145053 81f3fa78 88084809dd40 8907d1cfd2e8 [ 101.869466] 8907d1cfd280 88087fffdb08 88084675c010 88084675dfd8 [ 101.878190] Call Trace: [ 101.880953] [81633743] dump_stack+0x19/0x1e [ 101.886679] [81145053] pcpu_alloc+0x9a3/0xa40 [ 101.892754] [81145103] __alloc_reserved_percpu+0x13/0x20 [ 101.899733] [810b2d7f] load_module+0x35f/0x1a70 [ 101.905835] [8163ad6e] ? do_page_fault+0xe/0x10 [ 101.911953] [810b467b] SyS_init_module+0xfb/0x140 [ 101.918287] [8163f542] system_call_fastpath+0x16/0x1b [ 101.924981] kvm_intel: Could not allocate 42 bytes percpu data Wondering if anyone else has seen this with the recent [3.10] based kernels esp. on larger boxes? There was a similar issue that was reported earlier (where modules were being loaded per cpu without checking if an instance was already loaded/being-loaded). That issue seems to have been addressed in the recent past (e.g. https://lkml.org/lkml/2013/1/24/659 along with a couple of follow on cleanups) Is the above yet another variant of the original issue or perhaps some race condition that got exposed when there are lot more threads ? Hmm ... not sure but yeah, that's the likely culprit. P. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: fix boot of guests with more than 4gb of ram
On Sun, 2013-06-23 at 21:23 -0400, Sasha Levin wrote: Commit kvm tools: virtio: remove hardcoded assumptions about guest page size has introduced a bug that prevented guests with more than 4gb of ram from booting. The issue is that 'pfn' is a 32bit integer, so when multiplying it by page size to get the actual page will cause an overflow if the pfn referred to a memory area above 4gb. Couldn't we just make pfn 64 bit? cheers -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6 v5] KVM: PPC: Using struct debug_reg
For KVM also use the struct debug_reg defined in asm/processor.h Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/kvm_host.h | 13 + arch/powerpc/kvm/booke.c| 34 -- 2 files changed, 25 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index af326cd..838a577 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -381,17 +381,6 @@ struct kvmppc_slb { #define KVMPPC_EPR_USER1 /* exit to userspace to fill EPR */ #define KVMPPC_EPR_KERNEL 2 /* in-kernel irqchip */ -struct kvmppc_booke_debug_reg { - u32 dbcr0; - u32 dbcr1; - u32 dbcr2; -#ifdef CONFIG_KVM_E500MC - u32 dbcr4; -#endif - u64 iac[KVMPPC_BOOKE_MAX_IAC]; - u64 dac[KVMPPC_BOOKE_MAX_DAC]; -}; - #define KVMPPC_IRQ_DEFAULT 0 #define KVMPPC_IRQ_MPIC1 #define KVMPPC_IRQ_XICS2 @@ -535,7 +524,7 @@ struct kvm_vcpu_arch { u32 eptcfg; u32 epr; u32 crit_save; - struct kvmppc_booke_debug_reg dbg_reg; + struct debug_reg dbg_reg; #endif gpa_t paddr_accessed; gva_t vaddr_accessed; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 62d4ece..3e9fc1d 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1424,7 +1424,6 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) int r = 0; union kvmppc_one_reg val; int size; - long int i; size = one_reg_size(reg-id); if (size sizeof(val)) @@ -1432,16 +1431,24 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) switch (reg-id) { case KVM_REG_PPC_IAC1: + val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac1); + break; case KVM_REG_PPC_IAC2: + val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac2); + break; +#if CONFIG_PPC_ADV_DEBUG_IACS 2 case KVM_REG_PPC_IAC3: + val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac3); + break; case KVM_REG_PPC_IAC4: - i = reg-id - KVM_REG_PPC_IAC1; - val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac[i]); + val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac4); break; +#endif case KVM_REG_PPC_DAC1: + val = get_reg_val(reg-id, vcpu-arch.dbg_reg.dac1); + break; case KVM_REG_PPC_DAC2: - i = reg-id - KVM_REG_PPC_DAC1; - val = get_reg_val(reg-id, vcpu-arch.dbg_reg.dac[i]); + val = get_reg_val(reg-id, vcpu-arch.dbg_reg.dac2); break; case KVM_REG_PPC_EPR: { u32 epr = get_guest_epr(vcpu); @@ -1481,7 +1488,6 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) int r = 0; union kvmppc_one_reg val; int size; - long int i; size = one_reg_size(reg-id); if (size sizeof(val)) @@ -1492,16 +1498,24 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) switch (reg-id) { case KVM_REG_PPC_IAC1: + vcpu-arch.dbg_reg.iac1 = set_reg_val(reg-id, val); + break; case KVM_REG_PPC_IAC2: + vcpu-arch.dbg_reg.iac2 = set_reg_val(reg-id, val); + break; +#if CONFIG_PPC_ADV_DEBUG_IACS 2 case KVM_REG_PPC_IAC3: + vcpu-arch.dbg_reg.iac3 = set_reg_val(reg-id, val); + break; case KVM_REG_PPC_IAC4: - i = reg-id - KVM_REG_PPC_IAC1; - vcpu-arch.dbg_reg.iac[i] = set_reg_val(reg-id, val); + vcpu-arch.dbg_reg.iac4 = set_reg_val(reg-id, val); break; +#endif case KVM_REG_PPC_DAC1: + vcpu-arch.dbg_reg.dac1 = set_reg_val(reg-id, val); + break; case KVM_REG_PPC_DAC2: - i = reg-id - KVM_REG_PPC_DAC1; - vcpu-arch.dbg_reg.dac[i] = set_reg_val(reg-id, val); + vcpu-arch.dbg_reg.dac2 = set_reg_val(reg-id, val); break; case KVM_REG_PPC_EPR: { u32 new_epr = set_reg_val(reg-id, val); -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6 v5] KVM :PPC: Userspace Debug support
From: Bharat Bhushan bharat.bhus...@freescale.com This patchset adds the userspace debug support for booke/bookehv. this is tested on powerpc e500v2/e500mc devices. We are now assuming that debug resource will not be used by kernel for its own debugging. It will be used for only kernel user process debugging. So the kernel debug load interface during context_to is used to load debug conext for that selected process. v4-v5 - Some comments reworded and other cleanup (like change of function name etc) v3-v4 - 4 out of 7 patches of initial patchset were applied. This patchset is on and above those 4 patches - KVM local struct kvmppc_booke_debug_reg is replaced by powerpc global struct debug_reg - use switch_booke_debug_regs() for debug register context switch. - Save DBSR before kernel pre-emption is enabled. - Some more cleanup v2-v3 - We are now assuming that debug resource will not be used by kernel for its own debugging. It will be used for only kernel user process debugging. So the kernel debug load interface during context_to is used to load debug conext for that selected process. v1-v2 - Debug registers are save/restore in vcpu_put/vcpu_get. Earlier the debug registers are saved/restored in guest entry/exit Bharat Bhushan (6): powerpc: remove unnecessary line continuations powerpc: move debug registers in a structure powerpc: export debug register save function for KVM KVM: PPC: exit to user space on ehpriv instruction KVM: PPC: Using struct debug_reg KVM: PPC: Add userspace debug stub support arch/powerpc/include/asm/disassemble.h |4 + arch/powerpc/include/asm/kvm_host.h| 16 +-- arch/powerpc/include/asm/processor.h | 38 +++-- arch/powerpc/include/asm/reg_booke.h |8 +- arch/powerpc/include/asm/switch_to.h |4 + arch/powerpc/include/uapi/asm/kvm.h| 22 ++- arch/powerpc/kernel/asm-offsets.c |2 +- arch/powerpc/kernel/process.c | 45 +++--- arch/powerpc/kernel/ptrace.c | 154 +- arch/powerpc/kernel/signal_32.c|6 +- arch/powerpc/kernel/traps.c| 35 ++-- arch/powerpc/kvm/booke.c | 267 arch/powerpc/kvm/booke.h |5 + arch/powerpc/kvm/e500_emulate.c| 27 14 files changed, 449 insertions(+), 184 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6 v5] powerpc: move debug registers in a structure
This way we can use same data type struct with KVM and also help in using other debug related function. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/processor.h | 38 + arch/powerpc/include/asm/reg_booke.h |8 +- arch/powerpc/kernel/asm-offsets.c|2 +- arch/powerpc/kernel/process.c| 42 +- arch/powerpc/kernel/ptrace.c | 154 +- arch/powerpc/kernel/signal_32.c |6 +- arch/powerpc/kernel/traps.c | 35 7 files changed, 146 insertions(+), 139 deletions(-) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index d7e67ca..5b8a7f1 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -147,22 +147,7 @@ typedef struct { #define TS_FPR(i) fpr[i][TS_FPROFFSET] #define TS_TRANS_FPR(i) transact_fpr[i][TS_FPROFFSET] -struct thread_struct { - unsigned long ksp;/* Kernel stack pointer */ - unsigned long ksp_limit; /* if ksp = ksp_limit stack overflow */ - -#ifdef CONFIG_PPC64 - unsigned long ksp_vsid; -#endif - struct pt_regs *regs; /* Pointer to saved register state */ - mm_segment_tfs; /* for get_fs() validation */ -#ifdef CONFIG_BOOKE - /* BookE base exception scratch space; align on cacheline */ - unsigned long normsave[8] cacheline_aligned; -#endif -#ifdef CONFIG_PPC32 - void*pgdir; /* root of page-table tree */ -#endif +struct debug_reg { #ifdef CONFIG_PPC_ADV_DEBUG_REGS /* * The following help to manage the use of Debug Control Registers @@ -199,6 +184,27 @@ struct thread_struct { unsigned long dvc2; #endif #endif +}; + +struct thread_struct { + unsigned long ksp;/* Kernel stack pointer */ + unsigned long ksp_limit; /* if ksp = ksp_limit stack overflow */ + +#ifdef CONFIG_PPC64 + unsigned long ksp_vsid; +#endif + struct pt_regs *regs; /* Pointer to saved register state */ + mm_segment_tfs; /* for get_fs() validation */ +#ifdef CONFIG_BOOKE + /* BookE base exception scratch space; align on cacheline */ + unsigned long normsave[8] cacheline_aligned; +#endif +#ifdef CONFIG_PPC32 + void*pgdir; /* root of page-table tree */ +#endif + /* Debug Registers */ + struct debug_reg debug; + /* FP and VSX 0-31 register set */ double fpr[32][TS_FPRWIDTH]; struct { diff --git a/arch/powerpc/include/asm/reg_booke.h b/arch/powerpc/include/asm/reg_booke.h index b417de3..455dc89 100644 --- a/arch/powerpc/include/asm/reg_booke.h +++ b/arch/powerpc/include/asm/reg_booke.h @@ -381,7 +381,7 @@ #define DBCR0_IA34T0x4000 /* Instr Addr 3-4 range Toggle */ #define DBCR0_FT 0x0001 /* Freeze Timers on debug event */ -#define dbcr_iac_range(task) ((task)-thread.dbcr0) +#define dbcr_iac_range(task) ((task)-thread.debug.dbcr0) #define DBCR_IAC12IDBCR0_IA12 /* Range Inclusive */ #define DBCR_IAC12X(DBCR0_IA12 | DBCR0_IA12X) /* Range Exclusive */ #define DBCR_IAC12MODE (DBCR0_IA12 | DBCR0_IA12X) /* IAC 1-2 Mode Bits */ @@ -395,7 +395,7 @@ #define DBCR1_DAC1W0x2000 /* DAC1 Write Debug Event */ #define DBCR1_DAC2W0x1000 /* DAC2 Write Debug Event */ -#define dbcr_dac(task) ((task)-thread.dbcr1) +#define dbcr_dac(task) ((task)-thread.debug.dbcr1) #define DBCR_DAC1R DBCR1_DAC1R #define DBCR_DAC1W DBCR1_DAC1W #define DBCR_DAC2R DBCR1_DAC2R @@ -441,7 +441,7 @@ #define DBCR0_CRET 0x0020 /* Critical Return Debug Event */ #define DBCR0_FT 0x0001 /* Freeze Timers on debug event */ -#define dbcr_dac(task) ((task)-thread.dbcr0) +#define dbcr_dac(task) ((task)-thread.debug.dbcr0) #define DBCR_DAC1R DBCR0_DAC1R #define DBCR_DAC1W DBCR0_DAC1W #define DBCR_DAC2R DBCR0_DAC2R @@ -475,7 +475,7 @@ #define DBCR1_IAC34MX 0x00C0 /* Instr Addr 3-4 range eXclusive */ #define DBCR1_IAC34AT 0x0001 /* Instr Addr 3-4 range Toggle */ -#define dbcr_iac_range(task) ((task)-thread.dbcr1) +#define dbcr_iac_range(task) ((task)-thread.debug.dbcr1) #define DBCR_IAC12IDBCR1_IAC12M/* Range Inclusive */ #define DBCR_IAC12XDBCR1_IAC12MX /* Range Exclusive */ #define DBCR_IAC12MODE DBCR1_IAC12MX /* IAC 1-2 Mode Bits */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index b51a97c..c241c60 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -106,7 +106,7 @@ int main(void) #else /* CONFIG_PPC64 */ DEFINE(PGDIR, offsetof(struct thread_struct, pgdir)); #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) - DEFINE(THREAD_DBCR0, offsetof(struct
[PATCH 1/6 v5] powerpc: remove unnecessary line continuations
Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v5: - no change arch/powerpc/kernel/process.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index ceb4e7b..639a8de 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -325,7 +325,7 @@ static void set_debug_reg_defaults(struct thread_struct *thread) /* * Force User/Supervisor bits to b11 (user-only MSR[PR]=1) */ - thread-dbcr1 = DBCR1_IAC1US | DBCR1_IAC2US | \ + thread-dbcr1 = DBCR1_IAC1US | DBCR1_IAC2US | DBCR1_IAC3US | DBCR1_IAC4US; /* * Force Data Address Compare User/Supervisor bits to be User-only -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support
This patch adds the debug stub support on booke/bookehv. Now QEMU debug stub can use hw breakpoint, watchpoint and software breakpoint to debug guest. This is how we save/restore debug register context when switching between guest, userspace and kernel user-process: When QEMU is running - thread-debug_reg == QEMU debug register context. - Kernel will handle switching the debug register on context switch. - no vcpu_load() called QEMU makes ioctls (except RUN) - This will call vcpu_load() - should not change context. - Some ioctls can change vcpu debug register, context saved in vcpu-debug_regs QEMU Makes RUN ioctl - Save thread-debug_reg on STACK - Store thread-debug_reg == vcpu-debug_reg - load thread-debug_reg - RUN VCPU ( So thread points to vcpu context ) Context switch happens When VCPU running - makes vcpu_load() should not load any context - kernel loads the vcpu context as thread-debug_regs points to vcpu context. On heavyweight_exit - Load the context saved on stack in thread-debug_reg Currently we do not support debug resource emulation to guest, On debug exception, always exit to user space irrespective of user space is expecting the debug exception or not. If this is unexpected exception (breakpoint/watchpoint event not set by userspace) then let us leave the action on user space. This is similar to what it was before, only thing is that now we have proper exit state available to user space. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/uapi/asm/kvm.h |1 + arch/powerpc/kvm/booke.c| 233 --- arch/powerpc/kvm/booke.h|5 + 4 files changed, 224 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 838a577..aeb490d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -524,7 +524,10 @@ struct kvm_vcpu_arch { u32 eptcfg; u32 epr; u32 crit_save; + /* guest debug registers*/ struct debug_reg dbg_reg; + /* hardware visible debug registers when in guest state */ + struct debug_reg shadow_dbg_reg; #endif gpa_t paddr_accessed; gva_t vaddr_accessed; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index ded0607..f5077c2 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -27,6 +27,7 @@ #define __KVM_HAVE_PPC_SMT #define __KVM_HAVE_IRQCHIP #define __KVM_HAVE_IRQ_LINE +#define __KVM_HAVE_GUEST_DEBUG struct kvm_regs { __u64 pc; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 3e9fc1d..8be3502 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) #endif } +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) +{ + /* Synchronize guest's desire to get debug interrupts into shadow MSR */ +#ifndef CONFIG_KVM_BOOKE_HV + vcpu-arch.shadow_msr = ~MSR_DE; + vcpu-arch.shadow_msr |= vcpu-arch.shared-msr MSR_DE; +#endif + + /* Force enable debug interrupts when user space wants to debug */ + if (vcpu-guest_debug) { +#ifdef CONFIG_KVM_BOOKE_HV + /* +* Since there is no shadow MSR, sync MSR_DE into the guest +* visible MSR. +*/ + vcpu-arch.shared-msr |= MSR_DE; +#else + vcpu-arch.shadow_msr |= MSR_DE; + vcpu-arch.shared-msr = ~MSR_DE; +#endif + } +} + /* * Helper function for full MSR writes. No need to call this if only * EE/CE/ME/DE/RI are changing. @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr) kvmppc_mmu_msr_notify(vcpu, old_msr); kvmppc_vcpu_sync_spe(vcpu); kvmppc_vcpu_sync_fpu(vcpu); + kvmppc_vcpu_sync_debug(vcpu); } static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, @@ -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu) int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) { int ret, s; + struct thread_struct thread; #ifdef CONFIG_PPC_FPU unsigned int fpscr; int fpexc_mode; @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif + /* Switch to guest debug context */ + thread.debug = vcpu-arch.shadow_dbg_reg; + switch_booke_debug_regs(thread); + thread.debug = current-thread.debug; + current-thread.debug = vcpu-arch.shadow_dbg_reg; ret = __kvmppc_vcpu_run(kvm_run, vcpu); /* No need for kvm_guest_exit. It's done in handle_exit. We also get here with interrupts enabled. */ + /* Switch back to user space debug context */ +
[PATCH 3/6 v5] powerpc: export debug register save function for KVM
KVM need this function when switching from vcpu to user-space thread. My subsequent patch will use this function. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/switch_to.h |4 arch/powerpc/kernel/process.c|3 ++- 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 200d763..50b357f 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -30,6 +30,10 @@ extern void enable_kernel_spe(void); extern void giveup_spe(struct task_struct *); extern void load_up_spe(struct task_struct *); +#ifdef CONFIG_PPC_ADV_DEBUG_REGS +extern void switch_booke_debug_regs(struct thread_struct *new_thread); +#endif + #ifndef CONFIG_SMP extern void discard_lazy_cpu_state(void); #else diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 01ff496..3375cb7 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -362,12 +362,13 @@ static void prime_debug_regs(struct thread_struct *thread) * debug registers, set the debug registers from the values * stored in the new thread. */ -static void switch_booke_debug_regs(struct thread_struct *new_thread) +void switch_booke_debug_regs(struct thread_struct *new_thread) { if ((current-thread.debug.dbcr0 DBCR0_IDM) || (new_thread-debug.dbcr0 DBCR0_IDM)) prime_debug_regs(new_thread); } +EXPORT_SYMBOL(switch_booke_debug_regs); #else /* !CONFIG_PPC_ADV_DEBUG_REGS */ #ifndef CONFIG_HAVE_HW_BREAKPOINT static void set_debug_reg_defaults(struct thread_struct *thread) -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6 v5] powerpc: export debug register save function for KVM
On 24.06.2013, at 11:08, Bharat Bhushan wrote: KVM need this function when switching from vcpu to user-space thread. My subsequent patch will use this function. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/switch_to.h |4 arch/powerpc/kernel/process.c|3 ++- 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 200d763..50b357f 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -30,6 +30,10 @@ extern void enable_kernel_spe(void); extern void giveup_spe(struct task_struct *); extern void load_up_spe(struct task_struct *); +#ifdef CONFIG_PPC_ADV_DEBUG_REGS +extern void switch_booke_debug_regs(struct thread_struct *new_thread); +#endif + #ifndef CONFIG_SMP extern void discard_lazy_cpu_state(void); #else diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 01ff496..3375cb7 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -362,12 +362,13 @@ static void prime_debug_regs(struct thread_struct *thread) * debug registers, set the debug registers from the values * stored in the new thread. */ -static void switch_booke_debug_regs(struct thread_struct *new_thread) +void switch_booke_debug_regs(struct thread_struct *new_thread) { if ((current-thread.debug.dbcr0 DBCR0_IDM) || (new_thread-debug.dbcr0 DBCR0_IDM)) prime_debug_regs(new_thread); } +EXPORT_SYMBOL(switch_booke_debug_regs); EXPORT_SYMBOL_GPL? Alex #else /* !CONFIG_PPC_ADV_DEBUG_REGS */ #ifndef CONFIG_HAVE_HW_BREAKPOINT static void set_debug_reg_defaults(struct thread_struct *thread) -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 3/6 v5] powerpc: export debug register save function for KVM
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, June 24, 2013 3:03 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Wood Scott-B07421; tiejun.c...@windriver.com; Bhushan Bharat-R65777 Subject: Re: [PATCH 3/6 v5] powerpc: export debug register save function for KVM On 24.06.2013, at 11:08, Bharat Bhushan wrote: KVM need this function when switching from vcpu to user-space thread. My subsequent patch will use this function. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/switch_to.h |4 arch/powerpc/kernel/process.c|3 ++- 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 200d763..50b357f 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -30,6 +30,10 @@ extern void enable_kernel_spe(void); extern void giveup_spe(struct task_struct *); extern void load_up_spe(struct task_struct *); +#ifdef CONFIG_PPC_ADV_DEBUG_REGS +extern void switch_booke_debug_regs(struct thread_struct +*new_thread); #endif + #ifndef CONFIG_SMP extern void discard_lazy_cpu_state(void); #else diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 01ff496..3375cb7 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -362,12 +362,13 @@ static void prime_debug_regs(struct thread_struct *thread) * debug registers, set the debug registers from the values * stored in the new thread. */ -static void switch_booke_debug_regs(struct thread_struct *new_thread) +void switch_booke_debug_regs(struct thread_struct *new_thread) { if ((current-thread.debug.dbcr0 DBCR0_IDM) || (new_thread-debug.dbcr0 DBCR0_IDM)) prime_debug_regs(new_thread); } +EXPORT_SYMBOL(switch_booke_debug_regs); EXPORT_SYMBOL_GPL? Oops, I missed this comment. Will correct in next version. -Bharat Alex #else /* !CONFIG_PPC_ADV_DEBUG_REGS */ #ifndef CONFIG_HAVE_HW_BREAKPOINT static void set_debug_reg_defaults(struct thread_struct *thread) -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support
On 24.06.2013, at 11:08, Bharat Bhushan wrote: This patch adds the debug stub support on booke/bookehv. Now QEMU debug stub can use hw breakpoint, watchpoint and software breakpoint to debug guest. This is how we save/restore debug register context when switching between guest, userspace and kernel user-process: When QEMU is running - thread-debug_reg == QEMU debug register context. - Kernel will handle switching the debug register on context switch. - no vcpu_load() called QEMU makes ioctls (except RUN) - This will call vcpu_load() - should not change context. - Some ioctls can change vcpu debug register, context saved in vcpu-debug_regs QEMU Makes RUN ioctl - Save thread-debug_reg on STACK - Store thread-debug_reg == vcpu-debug_reg - load thread-debug_reg - RUN VCPU ( So thread points to vcpu context ) Context switch happens When VCPU running - makes vcpu_load() should not load any context - kernel loads the vcpu context as thread-debug_regs points to vcpu context. On heavyweight_exit - Load the context saved on stack in thread-debug_reg Currently we do not support debug resource emulation to guest, On debug exception, always exit to user space irrespective of user space is expecting the debug exception or not. If this is unexpected exception (breakpoint/watchpoint event not set by userspace) then let us leave the action on user space. This is similar to what it was before, only thing is that now we have proper exit state available to user space. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/uapi/asm/kvm.h |1 + arch/powerpc/kvm/booke.c| 233 --- arch/powerpc/kvm/booke.h|5 + 4 files changed, 224 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 838a577..aeb490d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -524,7 +524,10 @@ struct kvm_vcpu_arch { u32 eptcfg; u32 epr; u32 crit_save; + /* guest debug registers*/ struct debug_reg dbg_reg; + /* hardware visible debug registers when in guest state */ + struct debug_reg shadow_dbg_reg; #endif gpa_t paddr_accessed; gva_t vaddr_accessed; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index ded0607..f5077c2 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -27,6 +27,7 @@ #define __KVM_HAVE_PPC_SMT #define __KVM_HAVE_IRQCHIP #define __KVM_HAVE_IRQ_LINE +#define __KVM_HAVE_GUEST_DEBUG struct kvm_regs { __u64 pc; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 3e9fc1d..8be3502 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) #endif } +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) +{ + /* Synchronize guest's desire to get debug interrupts into shadow MSR */ +#ifndef CONFIG_KVM_BOOKE_HV + vcpu-arch.shadow_msr = ~MSR_DE; + vcpu-arch.shadow_msr |= vcpu-arch.shared-msr MSR_DE; +#endif + + /* Force enable debug interrupts when user space wants to debug */ + if (vcpu-guest_debug) { +#ifdef CONFIG_KVM_BOOKE_HV + /* + * Since there is no shadow MSR, sync MSR_DE into the guest + * visible MSR. + */ + vcpu-arch.shared-msr |= MSR_DE; +#else + vcpu-arch.shadow_msr |= MSR_DE; + vcpu-arch.shared-msr = ~MSR_DE; +#endif + } +} + /* * Helper function for full MSR writes. No need to call this if only * EE/CE/ME/DE/RI are changing. @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr) kvmppc_mmu_msr_notify(vcpu, old_msr); kvmppc_vcpu_sync_spe(vcpu); kvmppc_vcpu_sync_fpu(vcpu); + kvmppc_vcpu_sync_debug(vcpu); } static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, @@ -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu) int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) { int ret, s; + struct thread_struct thread; #ifdef CONFIG_PPC_FPU unsigned int fpscr; int fpexc_mode; @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif + /* Switch to guest debug context */ + thread.debug = vcpu-arch.shadow_dbg_reg; + switch_booke_debug_regs(thread); + thread.debug = current-thread.debug; + current-thread.debug = vcpu-arch.shadow_dbg_reg; ret = __kvmppc_vcpu_run(kvm_run, vcpu); /* No need for kvm_guest_exit. It's done in handle_exit. We also get here with interrupts enabled. */
RE: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, June 24, 2013 4:13 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Wood Scott-B07421; tiejun.c...@windriver.com; Bhushan Bharat-R65777 Subject: Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support On 24.06.2013, at 11:08, Bharat Bhushan wrote: This patch adds the debug stub support on booke/bookehv. Now QEMU debug stub can use hw breakpoint, watchpoint and software breakpoint to debug guest. This is how we save/restore debug register context when switching between guest, userspace and kernel user-process: When QEMU is running - thread-debug_reg == QEMU debug register context. - Kernel will handle switching the debug register on context switch. - no vcpu_load() called QEMU makes ioctls (except RUN) - This will call vcpu_load() - should not change context. - Some ioctls can change vcpu debug register, context saved in - vcpu-debug_regs QEMU Makes RUN ioctl - Save thread-debug_reg on STACK - Store thread-debug_reg == vcpu-debug_reg load thread-debug_reg - RUN VCPU ( So thread points to vcpu context ) Context switch happens When VCPU running - makes vcpu_load() should not load any context kernel loads the vcpu - context as thread-debug_regs points to vcpu context. On heavyweight_exit - Load the context saved on stack in thread-debug_reg Currently we do not support debug resource emulation to guest, On debug exception, always exit to user space irrespective of user space is expecting the debug exception or not. If this is unexpected exception (breakpoint/watchpoint event not set by userspace) then let us leave the action on user space. This is similar to what it was before, only thing is that now we have proper exit state available to user space. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/uapi/asm/kvm.h |1 + arch/powerpc/kvm/booke.c| 233 --- arch/powerpc/kvm/booke.h|5 + 4 files changed, 224 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 838a577..aeb490d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -524,7 +524,10 @@ struct kvm_vcpu_arch { u32 eptcfg; u32 epr; u32 crit_save; + /* guest debug registers*/ struct debug_reg dbg_reg; + /* hardware visible debug registers when in guest state */ + struct debug_reg shadow_dbg_reg; #endif gpa_t paddr_accessed; gva_t vaddr_accessed; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index ded0607..f5077c2 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -27,6 +27,7 @@ #define __KVM_HAVE_PPC_SMT #define __KVM_HAVE_IRQCHIP #define __KVM_HAVE_IRQ_LINE +#define __KVM_HAVE_GUEST_DEBUG struct kvm_regs { __u64 pc; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 3e9fc1d..8be3502 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) #endif } +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) { + /* Synchronize guest's desire to get debug interrupts into shadow +MSR */ #ifndef CONFIG_KVM_BOOKE_HV + vcpu-arch.shadow_msr = ~MSR_DE; + vcpu-arch.shadow_msr |= vcpu-arch.shared-msr MSR_DE; #endif + + /* Force enable debug interrupts when user space wants to debug */ + if (vcpu-guest_debug) { +#ifdef CONFIG_KVM_BOOKE_HV + /* +* Since there is no shadow MSR, sync MSR_DE into the guest +* visible MSR. +*/ + vcpu-arch.shared-msr |= MSR_DE; +#else + vcpu-arch.shadow_msr |= MSR_DE; + vcpu-arch.shared-msr = ~MSR_DE; +#endif + } +} + /* * Helper function for full MSR writes. No need to call this if only * EE/CE/ME/DE/RI are changing. @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr) kvmppc_mmu_msr_notify(vcpu, old_msr); kvmppc_vcpu_sync_spe(vcpu); kvmppc_vcpu_sync_fpu(vcpu); + kvmppc_vcpu_sync_debug(vcpu); } static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, @@ -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu) int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) { int ret, s; + struct thread_struct thread; #ifdef CONFIG_PPC_FPU unsigned int fpscr; int fpexc_mode; @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif + /* Switch to
Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support
On 24.06.2013, at 13:22, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Monday, June 24, 2013 4:13 PM To: Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Wood Scott-B07421; tiejun.c...@windriver.com; Bhushan Bharat-R65777 Subject: Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support On 24.06.2013, at 11:08, Bharat Bhushan wrote: This patch adds the debug stub support on booke/bookehv. Now QEMU debug stub can use hw breakpoint, watchpoint and software breakpoint to debug guest. This is how we save/restore debug register context when switching between guest, userspace and kernel user-process: When QEMU is running - thread-debug_reg == QEMU debug register context. - Kernel will handle switching the debug register on context switch. - no vcpu_load() called QEMU makes ioctls (except RUN) - This will call vcpu_load() - should not change context. - Some ioctls can change vcpu debug register, context saved in - vcpu-debug_regs QEMU Makes RUN ioctl - Save thread-debug_reg on STACK - Store thread-debug_reg == vcpu-debug_reg load thread-debug_reg - RUN VCPU ( So thread points to vcpu context ) Context switch happens When VCPU running - makes vcpu_load() should not load any context kernel loads the vcpu - context as thread-debug_regs points to vcpu context. On heavyweight_exit - Load the context saved on stack in thread-debug_reg Currently we do not support debug resource emulation to guest, On debug exception, always exit to user space irrespective of user space is expecting the debug exception or not. If this is unexpected exception (breakpoint/watchpoint event not set by userspace) then let us leave the action on user space. This is similar to what it was before, only thing is that now we have proper exit state available to user space. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/uapi/asm/kvm.h |1 + arch/powerpc/kvm/booke.c| 233 --- arch/powerpc/kvm/booke.h|5 + 4 files changed, 224 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 838a577..aeb490d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -524,7 +524,10 @@ struct kvm_vcpu_arch { u32 eptcfg; u32 epr; u32 crit_save; + /* guest debug registers*/ struct debug_reg dbg_reg; + /* hardware visible debug registers when in guest state */ + struct debug_reg shadow_dbg_reg; #endif gpa_t paddr_accessed; gva_t vaddr_accessed; diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index ded0607..f5077c2 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -27,6 +27,7 @@ #define __KVM_HAVE_PPC_SMT #define __KVM_HAVE_IRQCHIP #define __KVM_HAVE_IRQ_LINE +#define __KVM_HAVE_GUEST_DEBUG struct kvm_regs { __u64 pc; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 3e9fc1d..8be3502 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) #endif } +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) { + /* Synchronize guest's desire to get debug interrupts into shadow +MSR */ #ifndef CONFIG_KVM_BOOKE_HV + vcpu-arch.shadow_msr = ~MSR_DE; + vcpu-arch.shadow_msr |= vcpu-arch.shared-msr MSR_DE; #endif + + /* Force enable debug interrupts when user space wants to debug */ + if (vcpu-guest_debug) { +#ifdef CONFIG_KVM_BOOKE_HV + /* +* Since there is no shadow MSR, sync MSR_DE into the guest +* visible MSR. +*/ + vcpu-arch.shared-msr |= MSR_DE; +#else + vcpu-arch.shadow_msr |= MSR_DE; + vcpu-arch.shared-msr = ~MSR_DE; +#endif + } +} + /* * Helper function for full MSR writes. No need to call this if only * EE/CE/ME/DE/RI are changing. @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr) kvmppc_mmu_msr_notify(vcpu, old_msr); kvmppc_vcpu_sync_spe(vcpu); kvmppc_vcpu_sync_fpu(vcpu); + kvmppc_vcpu_sync_debug(vcpu); } static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, @@ -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu) int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) { int ret, s; + struct thread_struct thread; #ifdef CONFIG_PPC_FPU unsigned int fpscr; int fpexc_mode; @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif + /* Switch to guest debug context */ + thread.debug =