[PATCH kvm-unit-tests 1/3] x86-run: correct a typo 'qemsystem' - 'qemusystem'

2013-06-24 Thread Ren, Yongjie
x86-run: correct a typo 'qemsystem' - 'qemusystem'
Before this fix, you should always get error info as below when running 
'x86-run' script.
QEMU binary has no support for test device. Exiting.

Signed-off-by: Yongjie Ren yongjie@intel.com
---
 x86-run |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/x86-run b/x86-run
index 2cf1f38..9526a0b 100755
--- a/x86-run
+++ b/x86-run
@@ -8,7 +8,7 @@ then
qemu=${qemukvm}
 else
if
-   ${qemsystem} -device '?' 21 | fgrep -e \testdev\ -e 
\pc-testdev\  /dev/null;
+   ${qemusystem} -device '?' 21 | fgrep -e \testdev\ -e 
\pc-testdev\  /dev/null;
then
qemu=${qemusystem}
else
--
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 3/3] x86-run: keep constant coding style for the 'if' statement

2013-06-24 Thread Ren, Yongjie
x86-run: keep constant coding style for the 'if' statement

Signed-off-by: Yongjie Ren yongjie@intel.com
---
 x86-run |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/x86-run b/x86-run
index daefd4a..6093a72 100755
--- a/x86-run
+++ b/x86-run
@@ -17,7 +17,8 @@ else
fi
 fi

-if ${qemu} -device '?' 21 | fgrep pci-testdev  /dev/null;
+if
+   ${qemu} -device '?' 21 | fgrep pci-testdev  /dev/null;
 then
pci_testdev=-device pci-testdev
 else
--
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 2/3] x86-run: use /bin/bash instead of /usr/bin/bash

2013-06-24 Thread Ren, Yongjie
'bash' should be always located in /bin/bash instead of /usr/bin/bash.
Other bash scripts in kvm-unit-tests also use '/bin/bash' as the interpreter.

Signed-off-by: Yongjie Ren yongjie@intel.com
---
 x86-run |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/x86-run b/x86-run
index 9526a0b..daefd4a 100755
--- a/x86-run
+++ b/x86-run
@@ -1,4 +1,4 @@
-#!/usr/bin/bash
+#!/bin/bash

 qemukvm=${QEMU:-qemu-kvm}
 qemusystem=${QEMU:-qemu-system-x86_64}
--
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] vhost-net: fix use-after-free in vhost_net_flush

2013-06-24 Thread David Miller
From: Michael S. Tsirkin m...@redhat.com
Date: Thu, 20 Jun 2013 14:48:13 +0300

 vhost_net_ubuf_put_and_wait has a confusing name:
 it will actually also free it's argument.
 Thus since commit 1280c27f8e29acf4af2da914e80ec27c3dbd5c01

Never reference commits only by SHA1 ID, it is never sufficient.

Always provide, after the SHA1 ID, in parenthesis, the header line
from the commit message.

To be honest, I'm kind of tired of telling people they need to do
this over and over again.

Maybe people keep forgetting because the reason why this is an issue
hasn't really sunk in.

If the patch you reference got backported into another tree, it will
not have the SHA1 ID, and therefore someone reading the fix won't
be able to find the fault causing change without going through a lot
of trouble.  By providing the commit header line you remove that
problem altogether, no ambiguity is possible.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 1/3] x86-run: correct a typo 'qemsystem' - 'qemusystem'

2013-06-24 Thread Gleb Natapov
On Mon, Jun 24, 2013 at 06:10:59AM +, Ren, Yongjie wrote:
 x86-run: correct a typo 'qemsystem' - 'qemusystem'
 Before this fix, you should always get error info as below when running 
 'x86-run' script.
 QEMU binary has no support for test device. Exiting.
 
Patch is whitespace damaged.

 Signed-off-by: Yongjie Ren yongjie@intel.com
 ---
  x86-run |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/x86-run b/x86-run
 index 2cf1f38..9526a0b 100755
 --- a/x86-run
 +++ b/x86-run
 @@ -8,7 +8,7 @@ then
 qemu=${qemukvm}
  else
 if
 -   ${qemsystem} -device '?' 21 | fgrep -e \testdev\ -e 
 \pc-testdev\  /dev/null;
 +   ${qemusystem} -device '?' 21 | fgrep -e \testdev\ -e 
 \pc-testdev\  /dev/null;
While you are at is lets replace fgrep invocation, which is deprecated,
with grep -F.

 then
 qemu=${qemusystem}
 else
 --
 1.7.9.5

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] armv7 initial device passthrough support

2013-06-24 Thread Mario Smarduch


On 6/15/2013 5:47 PM, Paolo Bonzini wrote:
 Il 13/06/2013 11:19, Mario Smarduch ha scritto:
 Updated Device Passthrough Patch.
 - optimized IRQ-CPU-vCPU binding, irq is installed once
 - added dynamic IRQ affinity on schedule in
 - added documentation and few other coding recommendations.

 Per earlier discussion VFIO is our target but we like
 something earlier to work with to tackle performance
 latency issue (some ARM related) for device passthrough 
 while we migrate towards VFIO.
 
 I don't think this is acceptable upstream, unfortunately.  KVM device
 assignment is deprecated and we should not add more users.
That's fine we'll work our way towards dev-tree VFIO reusing what we can
working with the community.

At this point we're more concerned with numbers and best practices as 
opposed to mechanism this part will be time consuming. 
VFIO will be more background for us.

 
 What are the latency issues you have?

Our focus now is on IRQ latency and throughput. Right now it appears lowest 
latency
is 2x + exit/enter + IRQ injection overhead. We can't tolerate additional 
IPIs or deferred IRQ injection approaches. We're looking for numbers closer
to what IBMs ELI managed. Also high res timers which ARM Virt. Ext supports 
very well. Exitless interrupts which ARM handles very well too. There are
some future hw ARM interrupt enhancements coming up which may help a lot as 
well.

There are many other latency/perf. reqs for NFV related to RT,
essentially Guest must run near native. In the end it may turn out this
may need to be outside of main tree we'll see.

- Mario
 
 Paolo
 
 - Mario


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH kvm-unit-tests 1/3] x86-run: correct a typo 'qemsystem' - 'qemusystem'

2013-06-24 Thread Ren, Yongjie
 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org]
 On Behalf Of Gleb Natapov
 Sent: Monday, June 24, 2013 4:03 PM
 To: Ren, Yongjie
 Cc: kvm@vger.kernel.org; pbonz...@redhat.com
 Subject: Re: [PATCH kvm-unit-tests 1/3] x86-run: correct a typo
 'qemsystem' - 'qemusystem'
 
 On Mon, Jun 24, 2013 at 06:10:59AM +, Ren, Yongjie wrote:
  x86-run: correct a typo 'qemsystem' - 'qemusystem'
  Before this fix, you should always get error info as below when running
 'x86-run' script.
  QEMU binary has no support for test device. Exiting.
 
 Patch is whitespace damaged.
 
Sorry, I'll correct it and resend my patches.

  Signed-off-by: Yongjie Ren yongjie@intel.com
  ---
   x86-run |2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/x86-run b/x86-run
  index 2cf1f38..9526a0b 100755
  --- a/x86-run
  +++ b/x86-run
  @@ -8,7 +8,7 @@ then
  qemu=${qemukvm}
   else
  if
  -   ${qemsystem} -device '?' 21 | fgrep -e \testdev\
 -e \pc-testdev\  /dev/null;
  +   ${qemusystem} -device '?' 21 | fgrep -e
 \testdev\ -e \pc-testdev\  /dev/null;
 While you are at is lets replace fgrep invocation, which is deprecated,
 with grep -F.
 
Yeah, I also found this issue and want to use 'grep -F' instead.
I'll send another patch replace 'fgrep' with 'grep -F'.

  then
  qemu=${qemusystem}
  else
  --
  1.7.9.5
 
 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 1/4] x86-run: correct a typo 'qemsystem' - 'qemusystem'

2013-06-24 Thread Ren, Yongjie
x86-run: correct a typo 'qemsystem' - 'qemusystem'
Before this fix, you should always get error info as below when running 
'x86-run' script.
QEMU binary has no support for test device. Exiting.

Signed-off-by: Yongjie Ren yongjie@intel.com
---
 x86-run |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/x86-run b/x86-run
index 2cf1f38..9526a0b 100755
--- a/x86-run
+++ b/x86-run
@@ -8,7 +8,7 @@ then
qemu=${qemukvm}
 else
if
-   ${qemsystem} -device '?' 21 | fgrep -e \testdev\ -e 
\pc-testdev\  /dev/null;
+   ${qemusystem} -device '?' 21 | fgrep -e \testdev\ -e 
\pc-testdev\  /dev/null;
then
qemu=${qemusystem}
else
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 2/4] x86-run: use /bin/bash instead of /usr/bin/bash

2013-06-24 Thread Ren, Yongjie
'bash' should be always located in /bin/bash instead of /usr/bin/bash.
Other bash scripts in kvm-unit-tests also use '/bin/bash' as the interpreter.

Signed-off-by: Yongjie Ren yongjie@intel.com
---
 x86-run |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/x86-run b/x86-run
index 9526a0b..daefd4a 100755
--- a/x86-run
+++ b/x86-run
@@ -1,4 +1,4 @@
-#!/usr/bin/bash
+#!/bin/bash
 
 qemukvm=${QEMU:-qemu-kvm}
 qemusystem=${QEMU:-qemu-system-x86_64}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 3/4] x86-run: keep constant coding style for the 'if' statement

2013-06-24 Thread Ren, Yongjie
x86-run: keep constant coding style for the 'if' statement

Signed-off-by: Yongjie Ren yongjie@intel.com
---
 x86-run |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/x86-run b/x86-run
index daefd4a..6093a72 100755
--- a/x86-run
+++ b/x86-run
@@ -17,7 +17,8 @@ else
fi
 fi
 
-if ${qemu} -device '?' 21 | fgrep pci-testdev  /dev/null;
+if
+   ${qemu} -device '?' 21 | fgrep pci-testdev  /dev/null;
 then
pci_testdev=-device pci-testdev
 else
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests 4/4] x86-run: replace the deprecated 'fgrep' with 'grep -F'

2013-06-24 Thread Ren, Yongjie
x86-run: replace the deprecated 'fgrep' with 'grep -F'.

Signed-off-by: Yongjie Ren yongjie@intel.com
---
 x86-run |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/x86-run b/x86-run
index 6093a72..14ff331 100755
--- a/x86-run
+++ b/x86-run
@@ -3,12 +3,12 @@
 qemukvm=${QEMU:-qemu-kvm}
 qemusystem=${QEMU:-qemu-system-x86_64}
 if
-   ${qemukvm} -device '?' 21 | fgrep -e \testdev\ -e \pc-testdev\  
/dev/null;
+   ${qemukvm} -device '?' 21 | grep -F -e \testdev\ -e \pc-testdev\ 
 /dev/null;
 then
qemu=${qemukvm}
 else
if
-   ${qemusystem} -device '?' 21 | fgrep -e \testdev\ -e 
\pc-testdev\  /dev/null;
+   ${qemusystem} -device '?' 21 | grep -F -e \testdev\ -e 
\pc-testdev\  /dev/null;
then
qemu=${qemusystem}
else
@@ -18,7 +18,7 @@ else
 fi
 
 if
-   ${qemu} -device '?' 21 | fgrep pci-testdev  /dev/null;
+   ${qemu} -device '?' 21 | grep -F pci-testdev  /dev/null;
 then
pci_testdev=-device pci-testdev
 else
@@ -26,7 +26,7 @@ else
 fi
 
 if
-   ${qemu} -device '?' 21 | fgrep pc-testdev  /dev/null;
+   ${qemu} -device '?' 21 | grep -F pc-testdev  /dev/null;
 then
pc_testdev=-device pc-testdev -device 
isa-debug-exit,iobase=0xf4,iosize=0x4
 else
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6 v5] powerpc: remove unnecessary line continuations

2013-06-24 Thread Bharat Bhushan
Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
v5:
 - no change

 arch/powerpc/kernel/process.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ceb4e7b..639a8de 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -325,7 +325,7 @@ static void set_debug_reg_defaults(struct thread_struct 
*thread)
/*
 * Force User/Supervisor bits to b11 (user-only MSR[PR]=1)
 */
-   thread-dbcr1 = DBCR1_IAC1US | DBCR1_IAC2US |   \
+   thread-dbcr1 = DBCR1_IAC1US | DBCR1_IAC2US |
DBCR1_IAC3US | DBCR1_IAC4US;
/*
 * Force Data Address Compare User/Supervisor bits to be User-only
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction

2013-06-24 Thread Bharat Bhushan
ehpriv instruction is used for setting software breakpoints
by user space. This patch adds support to exit to user space
with run-debug have relevant information.

As this is the first point we are using run-debug, also defined
the run-debug structure.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/include/asm/disassemble.h |4 
 arch/powerpc/include/uapi/asm/kvm.h|   21 +
 arch/powerpc/kvm/e500_emulate.c|   27 +++
 3 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/disassemble.h 
b/arch/powerpc/include/asm/disassemble.h
index 9b198d1..856f8de 100644
--- a/arch/powerpc/include/asm/disassemble.h
+++ b/arch/powerpc/include/asm/disassemble.h
@@ -77,4 +77,8 @@ static inline unsigned int get_d(u32 inst)
return inst  0x;
 }
 
+static inline unsigned int get_oc(u32 inst)
+{
+   return (inst  11)  0x7fff;
+}
 #endif /* __ASM_PPC_DISASSEMBLE_H__ */
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index 0fb1a6e..ded0607 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -269,7 +269,24 @@ struct kvm_fpu {
__u64 fpr[32];
 };
 
+/*
+ * Defines for h/w breakpoint, watchpoint (read, write or both) and
+ * software breakpoint.
+ * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status
+ * for KVM_DEBUG_EXIT.
+ */
+#define KVMPPC_DEBUG_NONE  0x0
+#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
+#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
+#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
 struct kvm_debug_exit_arch {
+   __u64 address;
+   /*
+* exiting to userspace because of h/w breakpoint, watchpoint
+* (read, write or both) and software breakpoint.
+*/
+   __u32 status;
+   __u32 reserved;
 };
 
 /* for KVM_SET_GUEST_DEBUG */
@@ -281,10 +298,6 @@ struct kvm_guest_debug_arch {
 * Type denotes h/w breakpoint, read watchpoint, write
 * watchpoint or watchpoint (both read and write).
 */
-#define KVMPPC_DEBUG_NONE  0x0
-#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
-#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
-#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
__u32 type;
__u32 reserved;
} bp[16];
diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c
index b10a012..dab9d07 100644
--- a/arch/powerpc/kvm/e500_emulate.c
+++ b/arch/powerpc/kvm/e500_emulate.c
@@ -26,6 +26,8 @@
 #define XOP_TLBRE   946
 #define XOP_TLBWE   978
 #define XOP_TLBILX  18
+#define XOP_EHPRIV  270
+#define EHPRIV_OC_DEBUG 0
 
 #ifdef CONFIG_KVM_E500MC
 static int dbell2prio(ulong param)
@@ -82,6 +84,26 @@ static int kvmppc_e500_emul_msgsnd(struct kvm_vcpu *vcpu, 
int rb)
 }
 #endif
 
+static int kvmppc_e500_emul_ehpriv(struct kvm_run *run, struct kvm_vcpu *vcpu,
+  unsigned int inst, int *advance)
+{
+   int emulated = EMULATE_DONE;
+
+   switch (get_oc(inst)) {
+   case EHPRIV_OC_DEBUG:
+   run-exit_reason = KVM_EXIT_DEBUG;
+   run-debug.arch.address = vcpu-arch.pc;
+   run-debug.arch.status = 0;
+   kvmppc_account_exit(vcpu, DEBUG_EXITS);
+   emulated = EMULATE_EXIT_USER;
+   *advance = 0;
+   break;
+   default:
+   emulated = EMULATE_FAIL;
+   }
+   return emulated;
+}
+
 int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
unsigned int inst, int *advance)
 {
@@ -130,6 +152,11 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
emulated = kvmppc_e500_emul_tlbivax(vcpu, ea);
break;
 
+   case XOP_EHPRIV:
+   emulated = kvmppc_e500_emul_ehpriv(run, vcpu, inst,
+  advance);
+   break;
+
default:
emulated = EMULATE_FAIL;
}
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6 v5] KVM :PPC: Userspace Debug support

2013-06-24 Thread Bharat Bhushan
From: Bharat Bhushan bharat.bhus...@freescale.com

This patchset adds the userspace debug support for booke/bookehv.
this is tested on powerpc e500v2/e500mc devices.

We are now assuming that debug resource will not be used by kernel for its own 
debugging. It will be used for only kernel user process debugging.
So the kernel debug load interface during context_to is used to load debug 
conext for that selected process.

v4-v5
 - Some comments reworded and other cleanup (like change of function name etc)

v3-v4
 - 4 out of 7 patches of initial patchset were applied.
   This patchset is on and above those 4 patches
 - KVM local struct kvmppc_booke_debug_reg is replaced by
   powerpc global struct debug_reg
 - use switch_booke_debug_regs() for debug register context switch.
 - Save DBSR before kernel pre-emption is enabled.
 - Some more cleanup

v2-v3
 - We are now assuming that debug resource will not be used by
   kernel for its own debugging.
   It will be used for only kernel user process debugging.
   So the kernel debug load interface during context_to is
   used to load debug conext for that selected process.

v1-v2
 - Debug registers are save/restore in vcpu_put/vcpu_get.
   Earlier the debug registers are saved/restored in guest entry/exit

Bharat Bhushan (6):
  powerpc: remove unnecessary line continuations
  powerpc: move debug registers in a structure
  powerpc: export debug register save function for KVM
  KVM: PPC: exit to user space on ehpriv instruction
  KVM: PPC: Using struct debug_reg
  KVM: PPC: Add userspace debug stub support

 arch/powerpc/include/asm/disassemble.h |4 +
 arch/powerpc/include/asm/kvm_host.h|   16 +--
 arch/powerpc/include/asm/processor.h   |   38 +++--
 arch/powerpc/include/asm/reg_booke.h   |8 +-
 arch/powerpc/include/asm/switch_to.h   |4 +
 arch/powerpc/include/uapi/asm/kvm.h|   22 ++-
 arch/powerpc/kernel/asm-offsets.c  |2 +-
 arch/powerpc/kernel/process.c  |   45 +++---
 arch/powerpc/kernel/ptrace.c   |  154 +-
 arch/powerpc/kernel/signal_32.c|6 +-
 arch/powerpc/kernel/traps.c|   35 ++--
 arch/powerpc/kvm/booke.c   |  267 
 arch/powerpc/kvm/booke.h   |5 +
 arch/powerpc/kvm/e500_emulate.c|   27 
 14 files changed, 449 insertions(+), 184 deletions(-)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6 v5] KVM: PPC: Using struct debug_reg

2013-06-24 Thread Bharat Bhushan
For KVM also use the struct debug_reg defined in asm/processor.h

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/include/asm/kvm_host.h |   13 +
 arch/powerpc/kvm/booke.c|   34 --
 2 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index af326cd..838a577 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -381,17 +381,6 @@ struct kvmppc_slb {
 #define KVMPPC_EPR_USER1 /* exit to userspace to fill EPR */
 #define KVMPPC_EPR_KERNEL  2 /* in-kernel irqchip */
 
-struct kvmppc_booke_debug_reg {
-   u32 dbcr0;
-   u32 dbcr1;
-   u32 dbcr2;
-#ifdef CONFIG_KVM_E500MC
-   u32 dbcr4;
-#endif
-   u64 iac[KVMPPC_BOOKE_MAX_IAC];
-   u64 dac[KVMPPC_BOOKE_MAX_DAC];
-};
-
 #define KVMPPC_IRQ_DEFAULT 0
 #define KVMPPC_IRQ_MPIC1
 #define KVMPPC_IRQ_XICS2
@@ -535,7 +524,7 @@ struct kvm_vcpu_arch {
u32 eptcfg;
u32 epr;
u32 crit_save;
-   struct kvmppc_booke_debug_reg dbg_reg;
+   struct debug_reg dbg_reg;
 #endif
gpa_t paddr_accessed;
gva_t vaddr_accessed;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 62d4ece..3e9fc1d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1424,7 +1424,6 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
int r = 0;
union kvmppc_one_reg val;
int size;
-   long int i;
 
size = one_reg_size(reg-id);
if (size  sizeof(val))
@@ -1432,16 +1431,24 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
 
switch (reg-id) {
case KVM_REG_PPC_IAC1:
+   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac1);
+   break;
case KVM_REG_PPC_IAC2:
+   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac2);
+   break;
+#if CONFIG_PPC_ADV_DEBUG_IACS  2
case KVM_REG_PPC_IAC3:
+   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac3);
+   break;
case KVM_REG_PPC_IAC4:
-   i = reg-id - KVM_REG_PPC_IAC1;
-   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac[i]);
+   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac4);
break;
+#endif
case KVM_REG_PPC_DAC1:
+   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.dac1);
+   break;
case KVM_REG_PPC_DAC2:
-   i = reg-id - KVM_REG_PPC_DAC1;
-   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.dac[i]);
+   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.dac2);
break;
case KVM_REG_PPC_EPR: {
u32 epr = get_guest_epr(vcpu);
@@ -1481,7 +1488,6 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
int r = 0;
union kvmppc_one_reg val;
int size;
-   long int i;
 
size = one_reg_size(reg-id);
if (size  sizeof(val))
@@ -1492,16 +1498,24 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
 
switch (reg-id) {
case KVM_REG_PPC_IAC1:
+   vcpu-arch.dbg_reg.iac1 = set_reg_val(reg-id, val);
+   break;
case KVM_REG_PPC_IAC2:
+   vcpu-arch.dbg_reg.iac2 = set_reg_val(reg-id, val);
+   break;
+#if CONFIG_PPC_ADV_DEBUG_IACS  2
case KVM_REG_PPC_IAC3:
+   vcpu-arch.dbg_reg.iac3 = set_reg_val(reg-id, val);
+   break;
case KVM_REG_PPC_IAC4:
-   i = reg-id - KVM_REG_PPC_IAC1;
-   vcpu-arch.dbg_reg.iac[i] = set_reg_val(reg-id, val);
+   vcpu-arch.dbg_reg.iac4 = set_reg_val(reg-id, val);
break;
+#endif
case KVM_REG_PPC_DAC1:
+   vcpu-arch.dbg_reg.dac1 = set_reg_val(reg-id, val);
+   break;
case KVM_REG_PPC_DAC2:
-   i = reg-id - KVM_REG_PPC_DAC1;
-   vcpu-arch.dbg_reg.dac[i] = set_reg_val(reg-id, val);
+   vcpu-arch.dbg_reg.dac2 = set_reg_val(reg-id, val);
break;
case KVM_REG_PPC_EPR: {
u32 new_epr = set_reg_val(reg-id, val);
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6 v5] powerpc: move debug registers in a structure

2013-06-24 Thread Bharat Bhushan
This way we can use same data type struct with KVM and
also help in using other debug related function.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/include/asm/processor.h |   38 +
 arch/powerpc/include/asm/reg_booke.h |8 +-
 arch/powerpc/kernel/asm-offsets.c|2 +-
 arch/powerpc/kernel/process.c|   42 +-
 arch/powerpc/kernel/ptrace.c |  154 +-
 arch/powerpc/kernel/signal_32.c  |6 +-
 arch/powerpc/kernel/traps.c  |   35 
 7 files changed, 146 insertions(+), 139 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index d7e67ca..5b8a7f1 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -147,22 +147,7 @@ typedef struct {
 #define TS_FPR(i) fpr[i][TS_FPROFFSET]
 #define TS_TRANS_FPR(i) transact_fpr[i][TS_FPROFFSET]
 
-struct thread_struct {
-   unsigned long   ksp;/* Kernel stack pointer */
-   unsigned long   ksp_limit;  /* if ksp = ksp_limit stack overflow */
-
-#ifdef CONFIG_PPC64
-   unsigned long   ksp_vsid;
-#endif
-   struct pt_regs  *regs;  /* Pointer to saved register state */
-   mm_segment_tfs; /* for get_fs() validation */
-#ifdef CONFIG_BOOKE
-   /* BookE base exception scratch space; align on cacheline */
-   unsigned long   normsave[8] cacheline_aligned;
-#endif
-#ifdef CONFIG_PPC32
-   void*pgdir; /* root of page-table tree */
-#endif
+struct debug_reg {
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
/*
 * The following help to manage the use of Debug Control Registers
@@ -199,6 +184,27 @@ struct thread_struct {
unsigned long   dvc2;
 #endif
 #endif
+};
+
+struct thread_struct {
+   unsigned long   ksp;/* Kernel stack pointer */
+   unsigned long   ksp_limit;  /* if ksp = ksp_limit stack overflow */
+
+#ifdef CONFIG_PPC64
+   unsigned long   ksp_vsid;
+#endif
+   struct pt_regs  *regs;  /* Pointer to saved register state */
+   mm_segment_tfs; /* for get_fs() validation */
+#ifdef CONFIG_BOOKE
+   /* BookE base exception scratch space; align on cacheline */
+   unsigned long   normsave[8] cacheline_aligned;
+#endif
+#ifdef CONFIG_PPC32
+   void*pgdir; /* root of page-table tree */
+#endif
+   /* Debug Registers */
+   struct debug_reg debug;
+
/* FP and VSX 0-31 register set */
double  fpr[32][TS_FPRWIDTH];
struct {
diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index b417de3..455dc89 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -381,7 +381,7 @@
 #define DBCR0_IA34T0x4000  /* Instr Addr 3-4 range Toggle */
 #define DBCR0_FT   0x0001  /* Freeze Timers on debug event */
 
-#define dbcr_iac_range(task)   ((task)-thread.dbcr0)
+#define dbcr_iac_range(task)   ((task)-thread.debug.dbcr0)
 #define DBCR_IAC12IDBCR0_IA12  /* Range Inclusive */
 #define DBCR_IAC12X(DBCR0_IA12 | DBCR0_IA12X)  /* Range Exclusive */
 #define DBCR_IAC12MODE (DBCR0_IA12 | DBCR0_IA12X)  /* IAC 1-2 Mode Bits */
@@ -395,7 +395,7 @@
 #define DBCR1_DAC1W0x2000  /* DAC1 Write Debug Event */
 #define DBCR1_DAC2W0x1000  /* DAC2 Write Debug Event */
 
-#define dbcr_dac(task) ((task)-thread.dbcr1)
+#define dbcr_dac(task) ((task)-thread.debug.dbcr1)
 #define DBCR_DAC1R DBCR1_DAC1R
 #define DBCR_DAC1W DBCR1_DAC1W
 #define DBCR_DAC2R DBCR1_DAC2R
@@ -441,7 +441,7 @@
 #define DBCR0_CRET 0x0020  /* Critical Return Debug Event */
 #define DBCR0_FT   0x0001  /* Freeze Timers on debug event */
 
-#define dbcr_dac(task) ((task)-thread.dbcr0)
+#define dbcr_dac(task) ((task)-thread.debug.dbcr0)
 #define DBCR_DAC1R DBCR0_DAC1R
 #define DBCR_DAC1W DBCR0_DAC1W
 #define DBCR_DAC2R DBCR0_DAC2R
@@ -475,7 +475,7 @@
 #define DBCR1_IAC34MX  0x00C0  /* Instr Addr 3-4 range eXclusive */
 #define DBCR1_IAC34AT  0x0001  /* Instr Addr 3-4 range Toggle */
 
-#define dbcr_iac_range(task)   ((task)-thread.dbcr1)
+#define dbcr_iac_range(task)   ((task)-thread.debug.dbcr1)
 #define DBCR_IAC12IDBCR1_IAC12M/* Range Inclusive */
 #define DBCR_IAC12XDBCR1_IAC12MX   /* Range Exclusive */
 #define DBCR_IAC12MODE DBCR1_IAC12MX   /* IAC 1-2 Mode Bits */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index b51a97c..c241c60 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -106,7 +106,7 @@ int main(void)
 #else /* CONFIG_PPC64 */
DEFINE(PGDIR, offsetof(struct thread_struct, pgdir));
 #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
-   DEFINE(THREAD_DBCR0, offsetof(struct 

[PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support

2013-06-24 Thread Bharat Bhushan
This patch adds the debug stub support on booke/bookehv.
Now QEMU debug stub can use hw breakpoint, watchpoint and
software breakpoint to debug guest.

This is how we save/restore debug register context when switching
between guest, userspace and kernel user-process:

When QEMU is running
 - thread-debug_reg == QEMU debug register context.
 - Kernel will handle switching the debug register on context switch.
 - no vcpu_load() called

QEMU makes ioctls (except RUN)
 - This will call vcpu_load()
 - should not change context.
 - Some ioctls can change vcpu debug register, context saved in 
vcpu-debug_regs

QEMU Makes RUN ioctl
 - Save thread-debug_reg on STACK
 - Store thread-debug_reg == vcpu-debug_reg
 - load thread-debug_reg
 - RUN VCPU ( So thread points to vcpu context )

Context switch happens When VCPU running
 - makes vcpu_load() should not load any context
 - kernel loads the vcpu context as thread-debug_regs points to vcpu context.

On heavyweight_exit
 - Load the context saved on stack in thread-debug_reg

Currently we do not support debug resource emulation to guest,
On debug exception, always exit to user space irrespective of
user space is expecting the debug exception or not. If this is
unexpected exception (breakpoint/watchpoint event not set by
userspace) then let us leave the action on user space. This
is similar to what it was before, only thing is that now we
have proper exit state available to user space.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/include/asm/kvm_host.h |3 +
 arch/powerpc/include/uapi/asm/kvm.h |1 +
 arch/powerpc/kvm/booke.c|  233 ---
 arch/powerpc/kvm/booke.h|5 +
 4 files changed, 224 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 838a577..aeb490d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -524,7 +524,10 @@ struct kvm_vcpu_arch {
u32 eptcfg;
u32 epr;
u32 crit_save;
+   /* guest debug registers*/
struct debug_reg dbg_reg;
+   /* hardware visible debug registers when in guest state */
+   struct debug_reg shadow_dbg_reg;
 #endif
gpa_t paddr_accessed;
gva_t vaddr_accessed;
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index ded0607..f5077c2 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -27,6 +27,7 @@
 #define __KVM_HAVE_PPC_SMT
 #define __KVM_HAVE_IRQCHIP
 #define __KVM_HAVE_IRQ_LINE
+#define __KVM_HAVE_GUEST_DEBUG
 
 struct kvm_regs {
__u64 pc;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 3e9fc1d..8be3502 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu)
 #endif
 }
 
+static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu)
+{
+   /* Synchronize guest's desire to get debug interrupts into shadow MSR */
+#ifndef CONFIG_KVM_BOOKE_HV
+   vcpu-arch.shadow_msr = ~MSR_DE;
+   vcpu-arch.shadow_msr |= vcpu-arch.shared-msr  MSR_DE;
+#endif
+
+   /* Force enable debug interrupts when user space wants to debug */
+   if (vcpu-guest_debug) {
+#ifdef CONFIG_KVM_BOOKE_HV
+   /*
+* Since there is no shadow MSR, sync MSR_DE into the guest
+* visible MSR.
+*/
+   vcpu-arch.shared-msr |= MSR_DE;
+#else
+   vcpu-arch.shadow_msr |= MSR_DE;
+   vcpu-arch.shared-msr = ~MSR_DE;
+#endif
+   }
+}
+
 /*
  * Helper function for full MSR writes.  No need to call this if only
  * EE/CE/ME/DE/RI are changing.
@@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
kvmppc_mmu_msr_notify(vcpu, old_msr);
kvmppc_vcpu_sync_spe(vcpu);
kvmppc_vcpu_sync_fpu(vcpu);
+   kvmppc_vcpu_sync_debug(vcpu);
 }
 
 static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
@@ -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
int ret, s;
+   struct thread_struct thread;
 #ifdef CONFIG_PPC_FPU
unsigned int fpscr;
int fpexc_mode;
@@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 
kvmppc_load_guest_fp(vcpu);
 #endif
+   /* Switch to guest debug context */
+   thread.debug = vcpu-arch.shadow_dbg_reg;
+   switch_booke_debug_regs(thread);
+   thread.debug = current-thread.debug;
+   current-thread.debug = vcpu-arch.shadow_dbg_reg;
 
ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
/* No need for kvm_guest_exit. It's done in handle_exit.
   We also get here with interrupts enabled. */
 
+   /* Switch back to user space debug context */
+   

[PATCH 3/6 v5] powerpc: export debug register save function for KVM

2013-06-24 Thread Bharat Bhushan
KVM need this function when switching from vcpu to user-space
thread. My subsequent patch will use this function.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/include/asm/switch_to.h |4 
 arch/powerpc/kernel/process.c|3 ++-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 200d763..50b357f 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -30,6 +30,10 @@ extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
 
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+extern void switch_booke_debug_regs(struct thread_struct *new_thread);
+#endif
+
 #ifndef CONFIG_SMP
 extern void discard_lazy_cpu_state(void);
 #else
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 01ff496..3375cb7 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -362,12 +362,13 @@ static void prime_debug_regs(struct thread_struct *thread)
  * debug registers, set the debug registers from the values
  * stored in the new thread.
  */
-static void switch_booke_debug_regs(struct thread_struct *new_thread)
+void switch_booke_debug_regs(struct thread_struct *new_thread)
 {
if ((current-thread.debug.dbcr0  DBCR0_IDM)
|| (new_thread-debug.dbcr0  DBCR0_IDM))
prime_debug_regs(new_thread);
 }
+EXPORT_SYMBOL(switch_booke_debug_regs);
 #else  /* !CONFIG_PPC_ADV_DEBUG_REGS */
 #ifndef CONFIG_HAVE_HW_BREAKPOINT
 static void set_debug_reg_defaults(struct thread_struct *thread)
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: fix boot of guests with more than 4gb of ram

2013-06-24 Thread Will Deacon
On Mon, Jun 24, 2013 at 02:23:31AM +0100, Sasha Levin wrote:
 Commit kvm tools: virtio: remove hardcoded assumptions
 about guest page size has introduced a bug that prevented
 guests with more than 4gb of ram from booting.

4GB of memory?!?! ;)

 The issue is that 'pfn' is a 32bit integer, so when multiplying
 it by page size to get the actual page will cause an overflow if
 the pfn referred to a memory area above 4gb.
 
 Signed-off-by: Sasha Levin sasha.le...@oracle.com

Acked-by: Will Deacon will.dea...@arm.com

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Would a DOS on dovecot running under a VM cause host to crash?

2013-06-24 Thread Stefan Hajnoczi
On Fri, Jun 21, 2013 at 10:27:07AM +1200, Hugh Davenport wrote:
 The attack lasted around 4 minutes, in which there was 1161 lines
 in the log for a
 single attacker ip, and no other similar logs previously.
 
 Would this be enough to kill not only the VM running dovecot, but
 the underlying host
 machine?

Have you checked logs on the host?  Specifically /var/log/messages for
seg fault messages or Out-of-Memory Killer messages.

It's also worth checking /var/log/libvirt/qemu/domain.log if you are
using libvirt.  That file contains the QEMU stderr output.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6 v5] powerpc: export debug register save function for KVM

2013-06-24 Thread Alexander Graf

On 24.06.2013, at 11:08, Bharat Bhushan wrote:

 KVM need this function when switching from vcpu to user-space
 thread. My subsequent patch will use this function.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 arch/powerpc/include/asm/switch_to.h |4 
 arch/powerpc/kernel/process.c|3 ++-
 2 files changed, 6 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/switch_to.h 
 b/arch/powerpc/include/asm/switch_to.h
 index 200d763..50b357f 100644
 --- a/arch/powerpc/include/asm/switch_to.h
 +++ b/arch/powerpc/include/asm/switch_to.h
 @@ -30,6 +30,10 @@ extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
 
 +#ifdef CONFIG_PPC_ADV_DEBUG_REGS
 +extern void switch_booke_debug_regs(struct thread_struct *new_thread);
 +#endif
 +
 #ifndef CONFIG_SMP
 extern void discard_lazy_cpu_state(void);
 #else
 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
 index 01ff496..3375cb7 100644
 --- a/arch/powerpc/kernel/process.c
 +++ b/arch/powerpc/kernel/process.c
 @@ -362,12 +362,13 @@ static void prime_debug_regs(struct thread_struct 
 *thread)
  * debug registers, set the debug registers from the values
  * stored in the new thread.
  */
 -static void switch_booke_debug_regs(struct thread_struct *new_thread)
 +void switch_booke_debug_regs(struct thread_struct *new_thread)
 {
   if ((current-thread.debug.dbcr0  DBCR0_IDM)
   || (new_thread-debug.dbcr0  DBCR0_IDM))
   prime_debug_regs(new_thread);
 }
 +EXPORT_SYMBOL(switch_booke_debug_regs);

EXPORT_SYMBOL_GPL?


Alex

 #else /* !CONFIG_PPC_ADV_DEBUG_REGS */
 #ifndef CONFIG_HAVE_HW_BREAKPOINT
 static void set_debug_reg_defaults(struct thread_struct *thread)
 -- 
 1.7.0.4
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: fix boot of guests with more than 4gb of ram

2013-06-24 Thread Michael Tokarev
24.06.2013 05:23, Sasha Levin wrote:
   queue   = p9dev-vqs[vq];
   queue-pfn  = pfn;
 - p   = guest_flat_to_host(kvm, queue-pfn * page_size);
 + p   = guest_flat_to_host(kvm, (u64)queue-pfn * page_size);

Maybe it's worth to use a common function for this,
something like guest_queue_to_host(kvm, queue) ?

Thanks,

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/6 v5] powerpc: export debug register save function for KVM

2013-06-24 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, June 24, 2013 3:03 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421;
 tiejun.c...@windriver.com; Bhushan Bharat-R65777
 Subject: Re: [PATCH 3/6 v5] powerpc: export debug register save function for 
 KVM
 
 
 On 24.06.2013, at 11:08, Bharat Bhushan wrote:
 
  KVM need this function when switching from vcpu to user-space thread.
  My subsequent patch will use this function.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/include/asm/switch_to.h |4 
  arch/powerpc/kernel/process.c|3 ++-
  2 files changed, 6 insertions(+), 1 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/switch_to.h
  b/arch/powerpc/include/asm/switch_to.h
  index 200d763..50b357f 100644
  --- a/arch/powerpc/include/asm/switch_to.h
  +++ b/arch/powerpc/include/asm/switch_to.h
  @@ -30,6 +30,10 @@ extern void enable_kernel_spe(void); extern void
  giveup_spe(struct task_struct *); extern void load_up_spe(struct
  task_struct *);
 
  +#ifdef CONFIG_PPC_ADV_DEBUG_REGS
  +extern void switch_booke_debug_regs(struct thread_struct
  +*new_thread); #endif
  +
  #ifndef CONFIG_SMP
  extern void discard_lazy_cpu_state(void); #else diff --git
  a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index
  01ff496..3375cb7 100644
  --- a/arch/powerpc/kernel/process.c
  +++ b/arch/powerpc/kernel/process.c
  @@ -362,12 +362,13 @@ static void prime_debug_regs(struct
  thread_struct *thread)
   * debug registers, set the debug registers from the values
   * stored in the new thread.
   */
  -static void switch_booke_debug_regs(struct thread_struct *new_thread)
  +void switch_booke_debug_regs(struct thread_struct *new_thread)
  {
  if ((current-thread.debug.dbcr0  DBCR0_IDM)
  || (new_thread-debug.dbcr0  DBCR0_IDM))
  prime_debug_regs(new_thread);
  }
  +EXPORT_SYMBOL(switch_booke_debug_regs);
 
 EXPORT_SYMBOL_GPL?

Oops, I missed this comment. Will correct in next version. 

-Bharat

 
 
 Alex
 
  #else   /* !CONFIG_PPC_ADV_DEBUG_REGS */
  #ifndef CONFIG_HAVE_HW_BREAKPOINT
  static void set_debug_reg_defaults(struct thread_struct *thread)
  --
  1.7.0.4
 
 
  --
  To unsubscribe from this list: send the line unsubscribe kvm-ppc in
  the body of a message to majord...@vger.kernel.org More majordomo info
  at  http://vger.kernel.org/majordomo-info.html
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests 1/4] x86-run: correct a typo 'qemsystem' - 'qemusystem'

2013-06-24 Thread Gleb Natapov
On Mon, Jun 24, 2013 at 08:47:36AM +, Ren, Yongjie wrote:
 x86-run: correct a typo 'qemsystem' - 'qemusystem'
 Before this fix, you should always get error info as below when running 
 'x86-run' script.
 QEMU binary has no support for test device. Exiting.
 
 Signed-off-by: Yongjie Ren yongjie@intel.com
Applied all four. Thanks.

 ---
  x86-run |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/x86-run b/x86-run
 index 2cf1f38..9526a0b 100755
 --- a/x86-run
 +++ b/x86-run
 @@ -8,7 +8,7 @@ then
   qemu=${qemukvm}
  else
   if
 - ${qemsystem} -device '?' 21 | fgrep -e \testdev\ -e 
 \pc-testdev\  /dev/null;
 + ${qemusystem} -device '?' 21 | fgrep -e \testdev\ -e 
 \pc-testdev\  /dev/null;
   then
   qemu=${qemusystem}
   else
 -- 
 1.7.9.5

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Would a DOS on dovecot running under a VM cause host to crash?

2013-06-24 Thread Hugh Davenport
Checked the main logs. No go. Didn't check qemu logs. Will do that.

I'm starting to think it was the power as when I turned of the UPS as a test. 
The server shutdown as well... Will get that fixed.

Cheers,

Hugh

Stefan Hajnoczi stefa...@gmail.com wrote:

On Fri, Jun 21, 2013 at 10:27:07AM +1200, Hugh Davenport wrote:
 The attack lasted around 4 minutes, in which there was 1161 lines
 in the log for a
 single attacker ip, and no other similar logs previously.
 
 Would this be enough to kill not only the VM running dovecot, but
 the underlying host
 machine?

Have you checked logs on the host?  Specifically /var/log/messages for
seg fault messages or Out-of-Memory Killer messages.

It's also worth checking /var/log/libvirt/qemu/domain.log if you are
using libvirt.  That file contains the QEMU stderr output.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support

2013-06-24 Thread Alexander Graf

On 24.06.2013, at 11:08, Bharat Bhushan wrote:

 This patch adds the debug stub support on booke/bookehv.
 Now QEMU debug stub can use hw breakpoint, watchpoint and
 software breakpoint to debug guest.
 
 This is how we save/restore debug register context when switching
 between guest, userspace and kernel user-process:
 
 When QEMU is running
 - thread-debug_reg == QEMU debug register context.
 - Kernel will handle switching the debug register on context switch.
 - no vcpu_load() called
 
 QEMU makes ioctls (except RUN)
 - This will call vcpu_load()
 - should not change context.
 - Some ioctls can change vcpu debug register, context saved in 
 vcpu-debug_regs
 
 QEMU Makes RUN ioctl
 - Save thread-debug_reg on STACK
 - Store thread-debug_reg == vcpu-debug_reg
 - load thread-debug_reg
 - RUN VCPU ( So thread points to vcpu context )
 
 Context switch happens When VCPU running
 - makes vcpu_load() should not load any context
 - kernel loads the vcpu context as thread-debug_regs points to vcpu context.
 
 On heavyweight_exit
 - Load the context saved on stack in thread-debug_reg
 
 Currently we do not support debug resource emulation to guest,
 On debug exception, always exit to user space irrespective of
 user space is expecting the debug exception or not. If this is
 unexpected exception (breakpoint/watchpoint event not set by
 userspace) then let us leave the action on user space. This
 is similar to what it was before, only thing is that now we
 have proper exit state available to user space.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 arch/powerpc/include/asm/kvm_host.h |3 +
 arch/powerpc/include/uapi/asm/kvm.h |1 +
 arch/powerpc/kvm/booke.c|  233 ---
 arch/powerpc/kvm/booke.h|5 +
 4 files changed, 224 insertions(+), 18 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index 838a577..aeb490d 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -524,7 +524,10 @@ struct kvm_vcpu_arch {
   u32 eptcfg;
   u32 epr;
   u32 crit_save;
 + /* guest debug registers*/
   struct debug_reg dbg_reg;
 + /* hardware visible debug registers when in guest state */
 + struct debug_reg shadow_dbg_reg;
 #endif
   gpa_t paddr_accessed;
   gva_t vaddr_accessed;
 diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
 b/arch/powerpc/include/uapi/asm/kvm.h
 index ded0607..f5077c2 100644
 --- a/arch/powerpc/include/uapi/asm/kvm.h
 +++ b/arch/powerpc/include/uapi/asm/kvm.h
 @@ -27,6 +27,7 @@
 #define __KVM_HAVE_PPC_SMT
 #define __KVM_HAVE_IRQCHIP
 #define __KVM_HAVE_IRQ_LINE
 +#define __KVM_HAVE_GUEST_DEBUG
 
 struct kvm_regs {
   __u64 pc;
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index 3e9fc1d..8be3502 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu)
 #endif
 }
 
 +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu)
 +{
 + /* Synchronize guest's desire to get debug interrupts into shadow MSR */
 +#ifndef CONFIG_KVM_BOOKE_HV
 + vcpu-arch.shadow_msr = ~MSR_DE;
 + vcpu-arch.shadow_msr |= vcpu-arch.shared-msr  MSR_DE;
 +#endif
 +
 + /* Force enable debug interrupts when user space wants to debug */
 + if (vcpu-guest_debug) {
 +#ifdef CONFIG_KVM_BOOKE_HV
 + /*
 +  * Since there is no shadow MSR, sync MSR_DE into the guest
 +  * visible MSR.
 +  */
 + vcpu-arch.shared-msr |= MSR_DE;
 +#else
 + vcpu-arch.shadow_msr |= MSR_DE;
 + vcpu-arch.shared-msr = ~MSR_DE;
 +#endif
 + }
 +}
 +
 /*
  * Helper function for full MSR writes.  No need to call this if only
  * EE/CE/ME/DE/RI are changing.
 @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
   kvmppc_mmu_msr_notify(vcpu, old_msr);
   kvmppc_vcpu_sync_spe(vcpu);
   kvmppc_vcpu_sync_fpu(vcpu);
 + kvmppc_vcpu_sync_debug(vcpu);
 }
 
 static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
 @@ -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
   int ret, s;
 + struct thread_struct thread;
 #ifdef CONFIG_PPC_FPU
   unsigned int fpscr;
   int fpexc_mode;
 @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
 kvm_vcpu *vcpu)
 
   kvmppc_load_guest_fp(vcpu);
 #endif
 + /* Switch to guest debug context */
 + thread.debug = vcpu-arch.shadow_dbg_reg;
 + switch_booke_debug_regs(thread);
 + thread.debug = current-thread.debug;
 + current-thread.debug = vcpu-arch.shadow_dbg_reg;
 
   ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
   /* No need for kvm_guest_exit. It's done in handle_exit.
  We also get here with interrupts enabled. */

RE: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support

2013-06-24 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, June 24, 2013 4:13 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421;
 tiejun.c...@windriver.com; Bhushan Bharat-R65777
 Subject: Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support
 
 
 On 24.06.2013, at 11:08, Bharat Bhushan wrote:
 
  This patch adds the debug stub support on booke/bookehv.
  Now QEMU debug stub can use hw breakpoint, watchpoint and software
  breakpoint to debug guest.
 
  This is how we save/restore debug register context when switching
  between guest, userspace and kernel user-process:
 
  When QEMU is running
  - thread-debug_reg == QEMU debug register context.
  - Kernel will handle switching the debug register on context switch.
  - no vcpu_load() called
 
  QEMU makes ioctls (except RUN)
  - This will call vcpu_load()
  - should not change context.
  - Some ioctls can change vcpu debug register, context saved in
  - vcpu-debug_regs
 
  QEMU Makes RUN ioctl
  - Save thread-debug_reg on STACK
  - Store thread-debug_reg == vcpu-debug_reg load thread-debug_reg
  - RUN VCPU ( So thread points to vcpu context )
 
  Context switch happens When VCPU running
  - makes vcpu_load() should not load any context kernel loads the vcpu
  - context as thread-debug_regs points to vcpu context.
 
  On heavyweight_exit
  - Load the context saved on stack in thread-debug_reg
 
  Currently we do not support debug resource emulation to guest, On
  debug exception, always exit to user space irrespective of user space
  is expecting the debug exception or not. If this is unexpected
  exception (breakpoint/watchpoint event not set by
  userspace) then let us leave the action on user space. This is similar
  to what it was before, only thing is that now we have proper exit
  state available to user space.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/include/asm/kvm_host.h |3 +
  arch/powerpc/include/uapi/asm/kvm.h |1 +
  arch/powerpc/kvm/booke.c|  233 
  ---
  arch/powerpc/kvm/booke.h|5 +
  4 files changed, 224 insertions(+), 18 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm_host.h
  b/arch/powerpc/include/asm/kvm_host.h
  index 838a577..aeb490d 100644
  --- a/arch/powerpc/include/asm/kvm_host.h
  +++ b/arch/powerpc/include/asm/kvm_host.h
  @@ -524,7 +524,10 @@ struct kvm_vcpu_arch {
  u32 eptcfg;
  u32 epr;
  u32 crit_save;
  +   /* guest debug registers*/
  struct debug_reg dbg_reg;
  +   /* hardware visible debug registers when in guest state */
  +   struct debug_reg shadow_dbg_reg;
  #endif
  gpa_t paddr_accessed;
  gva_t vaddr_accessed;
  diff --git a/arch/powerpc/include/uapi/asm/kvm.h
  b/arch/powerpc/include/uapi/asm/kvm.h
  index ded0607..f5077c2 100644
  --- a/arch/powerpc/include/uapi/asm/kvm.h
  +++ b/arch/powerpc/include/uapi/asm/kvm.h
  @@ -27,6 +27,7 @@
  #define __KVM_HAVE_PPC_SMT
  #define __KVM_HAVE_IRQCHIP
  #define __KVM_HAVE_IRQ_LINE
  +#define __KVM_HAVE_GUEST_DEBUG
 
  struct kvm_regs {
  __u64 pc;
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index
  3e9fc1d..8be3502 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu
  *vcpu) #endif }
 
  +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) {
  +   /* Synchronize guest's desire to get debug interrupts into shadow
  +MSR */ #ifndef CONFIG_KVM_BOOKE_HV
  +   vcpu-arch.shadow_msr = ~MSR_DE;
  +   vcpu-arch.shadow_msr |= vcpu-arch.shared-msr  MSR_DE; #endif
  +
  +   /* Force enable debug interrupts when user space wants to debug */
  +   if (vcpu-guest_debug) {
  +#ifdef CONFIG_KVM_BOOKE_HV
  +   /*
  +* Since there is no shadow MSR, sync MSR_DE into the guest
  +* visible MSR.
  +*/
  +   vcpu-arch.shared-msr |= MSR_DE;
  +#else
  +   vcpu-arch.shadow_msr |= MSR_DE;
  +   vcpu-arch.shared-msr = ~MSR_DE;
  +#endif
  +   }
  +}
  +
  /*
   * Helper function for full MSR writes.  No need to call this if
  only
   * EE/CE/ME/DE/RI are changing.
  @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
  kvmppc_mmu_msr_notify(vcpu, old_msr);
  kvmppc_vcpu_sync_spe(vcpu);
  kvmppc_vcpu_sync_fpu(vcpu);
  +   kvmppc_vcpu_sync_debug(vcpu);
  }
 
  static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, @@
  -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
  int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) {
  int ret, s;
  +   struct thread_struct thread;
  #ifdef CONFIG_PPC_FPU
  unsigned int fpscr;
  int fpexc_mode;
  @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run,
  struct kvm_vcpu *vcpu)
 
  kvmppc_load_guest_fp(vcpu);
  #endif
  +   /* Switch to 

Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-24 Thread Stefan Pietsch
On 23.06.2013 19:36, Gleb Natapov wrote:
 On Sun, Jun 23, 2013 at 06:51:30PM +0200, Stefan Pietsch wrote:
 On 23.06.2013 09:51, Gleb Natapov wrote:
 On Thu, Jun 20, 2013 at 07:01:49PM +0200, Stefan Pietsch wrote:
 Can you provide the output of 25391454e73e3156202264eb3c473825afe4bc94
 and emulate_invalid_guest_state=0. Also run x/20i $pc-20 in qemu
 monitor after the hang.


 25391454e73e3156202264eb3c473825afe4bc94
  emulate_invalid_guest_state=0

 Very interesting. Looks like somewhere during TPR access FS
 register gets corrupted. Can you remove /usr/share/kvm/kvmvapic.bin
 and try again? This will disable some code paths during TPR access and
 will narrow down the issue.


 Doing this, qemu complains
 Could not open option rom 'kvmvapic.bin': No such file or directory,
 but the virtual machine boots successful with
 emulate_invalid_guest_state=0 and emulate_invalid_guest_state=1.

 Hmm, I think we ate close. Can you try with upstream qemu?
 
 kvmvapic.bin comes with Debian package seabios 1.7.2-3.

I already tried this with the Debian package qemu-kvm 1.5.0+dfsg-4.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-24 Thread Gleb Natapov
On Mon, Jun 24, 2013 at 01:43:26PM +0200, Stefan Pietsch wrote:
 On 23.06.2013 19:36, Gleb Natapov wrote:
  On Sun, Jun 23, 2013 at 06:51:30PM +0200, Stefan Pietsch wrote:
  On 23.06.2013 09:51, Gleb Natapov wrote:
  On Thu, Jun 20, 2013 at 07:01:49PM +0200, Stefan Pietsch wrote:
  Can you provide the output of 25391454e73e3156202264eb3c473825afe4bc94
  and emulate_invalid_guest_state=0. Also run x/20i $pc-20 in qemu
  monitor after the hang.
 
 
  25391454e73e3156202264eb3c473825afe4bc94
   emulate_invalid_guest_state=0
 
  Very interesting. Looks like somewhere during TPR access FS
  register gets corrupted. Can you remove /usr/share/kvm/kvmvapic.bin
  and try again? This will disable some code paths during TPR access and
  will narrow down the issue.
 
 
  Doing this, qemu complains
  Could not open option rom 'kvmvapic.bin': No such file or directory,
  but the virtual machine boots successful with
  emulate_invalid_guest_state=0 and emulate_invalid_guest_state=1.
 
  Hmm, I think we ate close. Can you try with upstream qemu?
  
  kvmvapic.bin comes with Debian package seabios 1.7.2-3.
 
 I already tried this with the Debian package qemu-kvm 1.5.0+dfsg-4.
And it didn't work? Mind trying some debug kernel patches? I suspect
your CPU does something no CPU I have do, so I want to verify it.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-24 Thread Stefan Pietsch
On 24.06.2013 13:47, Gleb Natapov wrote:
 On Mon, Jun 24, 2013 at 01:43:26PM +0200, Stefan Pietsch wrote:
 On 23.06.2013 19:36, Gleb Natapov wrote:
 On Sun, Jun 23, 2013 at 06:51:30PM +0200, Stefan Pietsch wrote:
 On 23.06.2013 09:51, Gleb Natapov wrote:
 On Thu, Jun 20, 2013 at 07:01:49PM +0200, Stefan Pietsch wrote:
 Can you provide the output of 25391454e73e3156202264eb3c473825afe4bc94
 and emulate_invalid_guest_state=0. Also run x/20i $pc-20 in qemu
 monitor after the hang.


 25391454e73e3156202264eb3c473825afe4bc94
  emulate_invalid_guest_state=0

 Very interesting. Looks like somewhere during TPR access FS
 register gets corrupted. Can you remove /usr/share/kvm/kvmvapic.bin
 and try again? This will disable some code paths during TPR access and
 will narrow down the issue.


 Doing this, qemu complains
 Could not open option rom 'kvmvapic.bin': No such file or directory,
 but the virtual machine boots successful with
 emulate_invalid_guest_state=0 and emulate_invalid_guest_state=1.

 Hmm, I think we ate close. Can you try with upstream qemu?

 kvmvapic.bin comes with Debian package seabios 1.7.2-3.

 I already tried this with the Debian package qemu-kvm 1.5.0+dfsg-4.
 And it didn't work? Mind trying some debug kernel patches? I suspect
 your CPU does something no CPU I have do, so I want to verify it.


As soon as I remove kvmvapic.bin the virtual machine boots with
qemu-kvm 1.5.0. I just verified this with Linux kernel 3.10.0-rc5.
emulate_invalid_guest_state=0 or emulate_invalid_guest_state=1 make
no difference.

Please send your patches.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support

2013-06-24 Thread Alexander Graf

On 24.06.2013, at 13:22, Bhushan Bharat-R65777 wrote:

 
 
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, June 24, 2013 4:13 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Wood Scott-B07421;
 tiejun.c...@windriver.com; Bhushan Bharat-R65777
 Subject: Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support
 
 
 On 24.06.2013, at 11:08, Bharat Bhushan wrote:
 
 This patch adds the debug stub support on booke/bookehv.
 Now QEMU debug stub can use hw breakpoint, watchpoint and software
 breakpoint to debug guest.
 
 This is how we save/restore debug register context when switching
 between guest, userspace and kernel user-process:
 
 When QEMU is running
 - thread-debug_reg == QEMU debug register context.
 - Kernel will handle switching the debug register on context switch.
 - no vcpu_load() called
 
 QEMU makes ioctls (except RUN)
 - This will call vcpu_load()
 - should not change context.
 - Some ioctls can change vcpu debug register, context saved in
 - vcpu-debug_regs
 
 QEMU Makes RUN ioctl
 - Save thread-debug_reg on STACK
 - Store thread-debug_reg == vcpu-debug_reg load thread-debug_reg
 - RUN VCPU ( So thread points to vcpu context )
 
 Context switch happens When VCPU running
 - makes vcpu_load() should not load any context kernel loads the vcpu
 - context as thread-debug_regs points to vcpu context.
 
 On heavyweight_exit
 - Load the context saved on stack in thread-debug_reg
 
 Currently we do not support debug resource emulation to guest, On
 debug exception, always exit to user space irrespective of user space
 is expecting the debug exception or not. If this is unexpected
 exception (breakpoint/watchpoint event not set by
 userspace) then let us leave the action on user space. This is similar
 to what it was before, only thing is that now we have proper exit
 state available to user space.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 arch/powerpc/include/asm/kvm_host.h |3 +
 arch/powerpc/include/uapi/asm/kvm.h |1 +
 arch/powerpc/kvm/booke.c|  233 
 ---
 arch/powerpc/kvm/booke.h|5 +
 4 files changed, 224 insertions(+), 18 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_host.h
 b/arch/powerpc/include/asm/kvm_host.h
 index 838a577..aeb490d 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -524,7 +524,10 @@ struct kvm_vcpu_arch {
 u32 eptcfg;
 u32 epr;
 u32 crit_save;
 +   /* guest debug registers*/
 struct debug_reg dbg_reg;
 +   /* hardware visible debug registers when in guest state */
 +   struct debug_reg shadow_dbg_reg;
 #endif
 gpa_t paddr_accessed;
 gva_t vaddr_accessed;
 diff --git a/arch/powerpc/include/uapi/asm/kvm.h
 b/arch/powerpc/include/uapi/asm/kvm.h
 index ded0607..f5077c2 100644
 --- a/arch/powerpc/include/uapi/asm/kvm.h
 +++ b/arch/powerpc/include/uapi/asm/kvm.h
 @@ -27,6 +27,7 @@
 #define __KVM_HAVE_PPC_SMT
 #define __KVM_HAVE_IRQCHIP
 #define __KVM_HAVE_IRQ_LINE
 +#define __KVM_HAVE_GUEST_DEBUG
 
 struct kvm_regs {
 __u64 pc;
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index
 3e9fc1d..8be3502 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu
 *vcpu) #endif }
 
 +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) {
 +   /* Synchronize guest's desire to get debug interrupts into shadow
 +MSR */ #ifndef CONFIG_KVM_BOOKE_HV
 +   vcpu-arch.shadow_msr = ~MSR_DE;
 +   vcpu-arch.shadow_msr |= vcpu-arch.shared-msr  MSR_DE; #endif
 +
 +   /* Force enable debug interrupts when user space wants to debug */
 +   if (vcpu-guest_debug) {
 +#ifdef CONFIG_KVM_BOOKE_HV
 +   /*
 +* Since there is no shadow MSR, sync MSR_DE into the guest
 +* visible MSR.
 +*/
 +   vcpu-arch.shared-msr |= MSR_DE;
 +#else
 +   vcpu-arch.shadow_msr |= MSR_DE;
 +   vcpu-arch.shared-msr = ~MSR_DE;
 +#endif
 +   }
 +}
 +
 /*
 * Helper function for full MSR writes.  No need to call this if
 only
 * EE/CE/ME/DE/RI are changing.
 @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
 kvmppc_mmu_msr_notify(vcpu, old_msr);
 kvmppc_vcpu_sync_spe(vcpu);
 kvmppc_vcpu_sync_fpu(vcpu);
 +   kvmppc_vcpu_sync_debug(vcpu);
 }
 
 static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, @@
 -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) {
 int ret, s;
 +   struct thread_struct thread;
 #ifdef CONFIG_PPC_FPU
 unsigned int fpscr;
 int fpexc_mode;
 @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run,
 struct kvm_vcpu *vcpu)
 
 kvmppc_load_guest_fp(vcpu);
 #endif
 +   /* Switch to guest debug context */
 +   thread.debug = 

[PATCH] KVM: Fix RTC interrupt coalescing tracking

2013-06-24 Thread Gleb Natapov
This reverts most of the f1ed0450a5fac7067590317cbf027f566b6ccbca. After
the commit kvm_apic_set_irq() no longer returns accurate information
about interrupt injection status if injection is done into disabled
APIC. RTC interrupt coalescing tracking relies on the information to be
accurate and cannot recover if it is not.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9d75193..9f4bea8 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -405,17 +405,17 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
return highest_irr;
 }
 
-static void __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
- int vector, int level, int trig_mode,
- unsigned long *dest_map);
+static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
+int vector, int level, int trig_mode,
+unsigned long *dest_map);
 
-void kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
- unsigned long *dest_map)
+int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
+   unsigned long *dest_map)
 {
struct kvm_lapic *apic = vcpu-arch.apic;
 
-   __apic_accept_irq(apic, irq-delivery_mode, irq-vector,
- irq-level, irq-trig_mode, dest_map);
+   return __apic_accept_irq(apic, irq-delivery_mode, irq-vector,
+   irq-level, irq-trig_mode, dest_map);
 }
 
 static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
@@ -608,8 +608,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
*r = -1;
 
if (irq-shorthand == APIC_DEST_SELF) {
-   kvm_apic_set_irq(src-vcpu, irq, dest_map);
-   *r = 1;
+   *r = kvm_apic_set_irq(src-vcpu, irq, dest_map);
return true;
}
 
@@ -654,8 +653,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
continue;
if (*r  0)
*r = 0;
-   kvm_apic_set_irq(dst[i]-vcpu, irq, dest_map);
-   *r += 1;
+   *r += kvm_apic_set_irq(dst[i]-vcpu, irq, dest_map);
}
 
ret = true;
@@ -664,11 +662,15 @@ out:
return ret;
 }
 
-/* Set an IRQ pending in the lapic. */
-static void __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
- int vector, int level, int trig_mode,
- unsigned long *dest_map)
+/*
+ * Add a pending IRQ into lapic.
+ * Return 1 if successfully added and 0 if discarded.
+ */
+static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
+int vector, int level, int trig_mode,
+unsigned long *dest_map)
 {
+   int result = 0;
struct kvm_vcpu *vcpu = apic-vcpu;
 
switch (delivery_mode) {
@@ -682,10 +684,13 @@ static void __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
if (dest_map)
__set_bit(vcpu-vcpu_id, dest_map);
 
-   if (kvm_x86_ops-deliver_posted_interrupt)
+   if (kvm_x86_ops-deliver_posted_interrupt) {
+   result = 1;
kvm_x86_ops-deliver_posted_interrupt(vcpu, vector);
-   else {
-   if (apic_test_and_set_irr(vector, apic)) {
+   } else {
+   result = !apic_test_and_set_irr(vector, apic);
+
+   if (!result) {
if (trig_mode)
apic_debug(level trig mode repeatedly 
for vector %d, vector);
@@ -697,7 +702,7 @@ static void __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
}
 out:
trace_kvm_apic_accept_irq(vcpu-vcpu_id, delivery_mode,
- trig_mode, vector, false);
+   trig_mode, vector, !result);
break;
 
case APIC_DM_REMRD:
@@ -709,12 +714,14 @@ out:
break;
 
case APIC_DM_NMI:
+   result = 1;
kvm_inject_nmi(vcpu);
kvm_vcpu_kick(vcpu);
break;
 
case APIC_DM_INIT:
if (!trig_mode || level) {
+   result = 1;
/* assumes that there are only KVM_APIC_INIT/SIPI */
apic-pending_events = (1UL  KVM_APIC_INIT);
/* make sure pending_events is visible before sending
@@ -731,6 +738,7 @@ out:
case APIC_DM_STARTUP:
apic_debug(SIPI to vcpu %d vector 0x%02x\n,
   vcpu-vcpu_id, vector);
+   result = 1;

Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-24 Thread Gleb Natapov
On Mon, Jun 24, 2013 at 01:59:34PM +0200, Stefan Pietsch wrote:
 On 24.06.2013 13:47, Gleb Natapov wrote:
  On Mon, Jun 24, 2013 at 01:43:26PM +0200, Stefan Pietsch wrote:
  On 23.06.2013 19:36, Gleb Natapov wrote:
  On Sun, Jun 23, 2013 at 06:51:30PM +0200, Stefan Pietsch wrote:
  On 23.06.2013 09:51, Gleb Natapov wrote:
  On Thu, Jun 20, 2013 at 07:01:49PM +0200, Stefan Pietsch wrote:
  Can you provide the output of 25391454e73e3156202264eb3c473825afe4bc94
  and emulate_invalid_guest_state=0. Also run x/20i $pc-20 in qemu
  monitor after the hang.
 
 
  25391454e73e3156202264eb3c473825afe4bc94
   emulate_invalid_guest_state=0
 
  Very interesting. Looks like somewhere during TPR access FS
  register gets corrupted. Can you remove /usr/share/kvm/kvmvapic.bin
  and try again? This will disable some code paths during TPR access and
  will narrow down the issue.
 
 
  Doing this, qemu complains
  Could not open option rom 'kvmvapic.bin': No such file or directory,
  but the virtual machine boots successful with
  emulate_invalid_guest_state=0 and emulate_invalid_guest_state=1.
 
  Hmm, I think we ate close. Can you try with upstream qemu?
 
  kvmvapic.bin comes with Debian package seabios 1.7.2-3.
 
  I already tried this with the Debian package qemu-kvm 1.5.0+dfsg-4.
  And it didn't work? Mind trying some debug kernel patches? I suspect
  your CPU does something no CPU I have do, so I want to verify it.
 
 
 As soon as I remove kvmvapic.bin the virtual machine boots with
 qemu-kvm 1.5.0. I just verified this with Linux kernel 3.10.0-rc5.
 emulate_invalid_guest_state=0 or emulate_invalid_guest_state=1 make
 no difference.
 
 Please send your patches.
Here it is, run with it and kvmvapic.bin present. See what is printed in
dmesg after the failure.


diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f4a5b3f..65488a4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3385,6 +3385,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 ar;
+   unsigned long rip;
 
if (vmx-rmode.vm86_active  seg != VCPU_SREG_LDTR) {
*var = vmx-rmode.segs[seg];
@@ -3408,6 +3409,9 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
var-db = (ar  14)  1;
var-g = (ar  15)  1;
var-unusable = (ar  16)  1;
+   rip = kvm_rip_read(vcpu);
+   if ((rip == 0xc101611c || rip == 0xc101611a)  seg == VCPU_SREG_FS)
+   printk(base=%p limit=%p selector=%x ar=%x\n, var-base, 
var-limit, var-selector, ar);
 }
 
 static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V10 0/18] Paravirtualized ticket spinlocks

2013-06-24 Thread Raghavendra K T

This series replaces the existing paravirtualized spinlock mechanism
with a paravirtualized ticketlock mechanism. The series provides
implementation for both Xen and KVM.

Changes in V10:
Addressed Konrad's review comments:
- Added break in patch 5 since now we know exact cpu to wakeup
- Dropped patch 12 and Konrad needs to revert two patches to enable xen on hvm 
  70dd4998, f10cd522c
- Remove TIMEOUT and corrected spacing in patch 15
- Kicked spelling and correct spacing in patches 17, 18 

Changes in V9:
- Changed spin_threshold to 32k to avoid excess halt exits that are
   causing undercommit degradation (after PLE handler improvement).
- Added  kvm_irq_delivery_to_apic (suggested by Gleb)
- Optimized halt exit path to use PLE handler

V8 of PVspinlock was posted last year. After Avi's suggestions to look
at PLE handler's improvements, various optimizations in PLE handling
have been tried.

With this series we see that we could get little more improvements on top
of that. 

Ticket locks have an inherent problem in a virtualized case, because
the vCPUs are scheduled rather than running concurrently (ignoring
gang scheduled vCPUs).  This can result in catastrophic performance
collapses when the vCPU scheduler doesn't schedule the correct next
vCPU, and ends up scheduling a vCPU which burns its entire timeslice
spinning.  (Note that this is not the same problem as lock-holder
preemption, which this series also addresses; that's also a problem,
but not catastrophic).

(See Thomas Friebel's talk Prevent Guests from Spinning Around
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

Currently we deal with this by having PV spinlocks, which adds a layer
of indirection in front of all the spinlock functions, and defining a
completely new implementation for Xen (and for other pvops users, but
there are none at present).

PV ticketlocks keeps the existing ticketlock implemenentation
(fastpath) as-is, but adds a couple of pvops for the slow paths:

- If a CPU has been waiting for a spinlock for SPIN_THRESHOLD
  iterations, then call out to the __ticket_lock_spinning() pvop,
  which allows a backend to block the vCPU rather than spinning.  This
  pvop can set the lock into slowpath state.

- When releasing a lock, if it is in slowpath state, the call
  __ticket_unlock_kick() to kick the next vCPU in line awake.  If the
  lock is no longer in contention, it also clears the slowpath flag.

The slowpath state is stored in the LSB of the within the lock tail
ticket.  This has the effect of reducing the max number of CPUs by
half (so, a small ticket can deal with 128 CPUs, and large ticket
32768).

For KVM, one hypercall is introduced in hypervisor,that allows a vcpu to kick
another vcpu out of halt state.
The blocking of vcpu is done using halt() in (lock_spinning) slowpath.

Overall, it results in a large reduction in code, it makes the native
and virtualized cases closer, and it removes a layer of indirection
around all the spinlock functions.

The fast path (taking an uncontended lock which isn't in slowpath
state) is optimal, identical to the non-paravirtualized case.

The inner part of ticket lock code becomes:
inc = xadd(lock-tickets, inc);
inc.tail = ~TICKET_SLOWPATH_FLAG;

if (likely(inc.head == inc.tail))
goto out;
for (;;) {
unsigned count = SPIN_THRESHOLD;
do {
if (ACCESS_ONCE(lock-tickets.head) == inc.tail)
goto out;
cpu_relax();
} while (--count);
__ticket_lock_spinning(lock, inc.tail);
}
out:barrier();
which results in:
push   %rbp
mov%rsp,%rbp

mov$0x200,%eax
lock xadd %ax,(%rdi)
movzbl %ah,%edx
cmp%al,%dl
jne1f   # Slowpath if lock in contention

pop%rbp
retq   

### SLOWPATH START
1:  and$-2,%edx
movzbl %dl,%esi

2:  mov$0x800,%eax
jmp4f

3:  pause  
sub$0x1,%eax
je 5f

4:  movzbl (%rdi),%ecx
cmp%cl,%dl
jne3b

pop%rbp
retq   

5:  callq  *__ticket_lock_spinning
jmp2b
### SLOWPATH END

with CONFIG_PARAVIRT_SPINLOCKS=n, the code has changed slightly, where
the fastpath case is straight through (taking the lock without
contention), and the spin loop is out of line:

push   %rbp
mov%rsp,%rbp

mov$0x100,%eax
lock xadd %ax,(%rdi)
movzbl %ah,%edx
cmp%al,%dl
jne1f

pop%rbp
retq   

### SLOWPATH START
1:  pause  
movzbl (%rdi),%eax
cmp%dl,%al
jne1b

pop%rbp
retq   
### SLOWPATH END

The unlock code is complicated by the need to both add to the lock's
head and fetch the slowpath flag from tail. 

[PATCH RFC V10 5/18] xen/pvticketlock: Xen implementation for PV ticket locks

2013-06-24 Thread Raghavendra K T
xen/pvticketlock: Xen implementation for PV ticket locks

From: Jeremy Fitzhardinge jer...@goop.org

Replace the old Xen implementation of PV spinlocks with and implementation
of xen_lock_spinning and xen_unlock_kick.

xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
adds itself to the waiting_cpus set, and blocks on an event channel
until the channel becomes pending.

xen_unlock_kick searches the cpus in waiting_cpus looking for the one
which next wants this lock with the next ticket, if any.  If found,
it kicks it by making its event channel pending, which wakes it up.

We need to make sure interrupts are disabled while we're relying on the
contents of the per-cpu lock_waiting values, otherwise an interrupt
handler could come in, try to take some other lock, block, and overwrite
our values.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 [ Raghavendra:  use function + enum instead of macro, cmpxchg for zero status 
reset
Reintroduce break since we know the exact vCPU to send IPI as suggested by 
Konrad.]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/xen/spinlock.c |  348 +++
 1 file changed, 79 insertions(+), 269 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index d6481a9..d471c76 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -16,45 +16,44 @@
 #include xen-ops.h
 #include debugfs.h
 
-#ifdef CONFIG_XEN_DEBUG_FS
-static struct xen_spinlock_stats
-{
-   u64 taken;
-   u32 taken_slow;
-   u32 taken_slow_nested;
-   u32 taken_slow_pickup;
-   u32 taken_slow_spurious;
-   u32 taken_slow_irqenable;
+enum xen_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   TAKEN_SLOW_SPURIOUS,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
 
-   u64 released;
-   u32 released_slow;
-   u32 released_slow_kicked;
 
+#ifdef CONFIG_XEN_DEBUG_FS
 #define HISTO_BUCKETS  30
-   u32 histo_spin_total[HISTO_BUCKETS+1];
-   u32 histo_spin_spinning[HISTO_BUCKETS+1];
+static struct xen_spinlock_stats
+{
+   u32 contention_stats[NR_CONTENTION_STATS];
u32 histo_spin_blocked[HISTO_BUCKETS+1];
-
-   u64 time_total;
-   u64 time_spinning;
u64 time_blocked;
 } spinlock_stats;
 
 static u8 zero_stats;
 
-static unsigned lock_timeout = 1  10;
-#define TIMEOUT lock_timeout
-
 static inline void check_zero(void)
 {
-   if (unlikely(zero_stats)) {
-   memset(spinlock_stats, 0, sizeof(spinlock_stats));
-   zero_stats = 0;
+   u8 ret;
+   u8 old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(zero_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(spinlock_stats, 0, sizeof(spinlock_stats));
}
 }
 
-#define ADD_STATS(elem, val)   \
-   do { check_zero(); spinlock_stats.elem += (val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
 
 static inline u64 spin_time_start(void)
 {
@@ -73,22 +72,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
array[HISTO_BUCKETS]++;
 }
 
-static inline void spin_time_accum_spinning(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
-   spinlock_stats.time_spinning += delta;
-}
-
-static inline void spin_time_accum_total(u64 start)
-{
-   u32 delta = xen_clocksource_read() - start;
-
-   __spin_time_accum(delta, spinlock_stats.histo_spin_total);
-   spinlock_stats.time_total += delta;
-}
-
 static inline void spin_time_accum_blocked(u64 start)
 {
u32 delta = xen_clocksource_read() - start;
@@ -98,19 +81,15 @@ static inline void spin_time_accum_blocked(u64 start)
 }
 #else  /* !CONFIG_XEN_DEBUG_FS */
 #define TIMEOUT(1  10)
-#define ADD_STATS(elem, val)   do { (void)(val); } while(0)
+static inline void add_stats(enum xen_contention_stat var, u32 val)
+{
+}
 
 static inline u64 spin_time_start(void)
 {
return 0;
 }
 
-static inline void spin_time_accum_total(u64 start)
-{
-}
-static inline void spin_time_accum_spinning(u64 start)
-{
-}
 static inline void spin_time_accum_blocked(u64 start)
 {
 }
@@ -133,229 +112,83 @@ typedef u16 xen_spinners_t;
asm(LOCK_PREFIX  decw %0 : +m ((xl)-spinners) : : memory);
 #endif
 
-struct xen_spinlock {
-   unsigned char lock; /* 0 - free; 1 - locked */
-   xen_spinners_t spinners;/* count of waiting cpus */
+struct xen_lock_waiting {
+   struct arch_spinlock *lock;
+   __ticket_t want;
 };
 
 static DEFINE_PER_CPU(int, 

[PATCH RFC V10 7/18] x86/pvticketlock: Use callee-save for lock_spinning

2013-06-24 Thread Raghavendra K T
x86/pvticketlock: Use callee-save for lock_spinning

From: Jeremy Fitzhardinge jer...@goop.org

Although the lock_spinning calls in the spinlock code are on the
uncommon path, their presence can cause the compiler to generate many
more register save/restores in the function pre/postamble, which is in
the fast path.  To avoid this, convert it to using the pvops callee-save
calling convention, which defers all the save/restores until the actual
function is called, keeping the fastpath clean.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/paravirt.h   |2 +-
 arch/x86/include/asm/paravirt_types.h |2 +-
 arch/x86/kernel/paravirt-spinlocks.c  |2 +-
 arch/x86/xen/spinlock.c   |3 ++-
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 040e72d..7131e12c 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -715,7 +715,7 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
__ticket_t ticket)
 {
-   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
+   PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
 static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index d5deb6d..350d017 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -330,7 +330,7 @@ struct arch_spinlock;
 #include asm/spinlock_types.h
 
 struct pv_lock_ops {
-   void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+   struct paravirt_callee_save lock_spinning;
void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index c2e010e..4251c1d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -9,7 +9,7 @@
 
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
-   .lock_spinning = paravirt_nop,
+   .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
.unlock_kick = paravirt_nop,
 #endif
 };
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 870e49f..ac8f592 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -171,6 +171,7 @@ out:
local_irq_restore(flags);
spin_time_accum_blocked(start);
 }
+PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
 
 static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
 {
@@ -255,7 +256,7 @@ void __init xen_init_spinlocks(void)
return;
}
 
-   pv_lock_ops.lock_spinning = xen_lock_spinning;
+   pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V10 12/18] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

2013-06-24 Thread Raghavendra K T
kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks

From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com

kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state.
the presence of these hypercalls is indicated to guest via
kvm_feature_pv_unhalt.

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose suz...@in.ibm.com
[Raghu: Apic related changes, folding pvunhalted into vcpu_runnable]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/kvm_host.h  |5 +
 arch/x86/include/uapi/asm/kvm_para.h |1 +
 arch/x86/kvm/cpuid.c |3 ++-
 arch/x86/kvm/x86.c   |   37 ++
 include/uapi/linux/kvm_para.h|1 +
 5 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3741c65..95702de 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -503,6 +503,11 @@ struct kvm_vcpu_arch {
 * instruction.
 */
bool write_fault_to_shadow_pgtable;
+
+   /* pv related host specific info */
+   struct {
+   bool pv_unhalted;
+   } pv;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 06fdbd9..94dc8ca 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -23,6 +23,7 @@
 #define KVM_FEATURE_ASYNC_PF   4
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
+#define KVM_FEATURE_PV_UNHALT  7
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a20ecb5..b110fe6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -413,7 +413,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 
function,
 (1  KVM_FEATURE_CLOCKSOURCE2) |
 (1  KVM_FEATURE_ASYNC_PF) |
 (1  KVM_FEATURE_PV_EOI) |
-(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+(1  KVM_FEATURE_PV_UNHALT);
 
if (sched_info_on())
entry-eax |= (1  KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 094b5d9..f8bea30 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5449,6 +5449,36 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
return 1;
 }
 
+/*
+ * kvm_pv_kick_cpu_op:  Kick a vcpu.
+ *
+ * @apicid - apicid of vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
+{
+   struct kvm_vcpu *vcpu = NULL;
+   int i;
+
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   if (!kvm_apic_present(vcpu))
+   continue;
+
+   if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
+   break;
+   }
+   if (vcpu) {
+   /*
+* Setting unhalt flag here can result in spurious runnable
+* state when unhalt reset does not happen in vcpu_block.
+* But that is harmless since that should soon result in halt.
+*/
+   vcpu-arch.pv.pv_unhalted = true;
+   /* We need everybody see unhalt before vcpu unblocks */
+   smp_wmb();
+   kvm_vcpu_kick(vcpu);
+   }
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
unsigned long nr, a0, a1, a2, a3, ret;
@@ -5482,6 +5512,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
case KVM_HC_VAPIC_POLL_IRQ:
ret = 0;
break;
+   case KVM_HC_KICK_CPU:
+   kvm_pv_kick_cpu_op(vcpu-kvm, a0);
+   ret = 0;
+   break;
default:
ret = -KVM_ENOSYS;
break;
@@ -5909,6 +5943,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
kvm_apic_accept_events(vcpu);
switch(vcpu-arch.mp_state) {
case KVM_MP_STATE_HALTED:
+   vcpu-arch.pv.pv_unhalted = false;
vcpu-arch.mp_state =
KVM_MP_STATE_RUNNABLE;
case KVM_MP_STATE_RUNNABLE:
@@ -6729,6 +6764,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
BUG_ON(vcpu-kvm == NULL);
kvm = vcpu-kvm;
 
+   vcpu-arch.pv.pv_unhalted = false;
vcpu-arch.emulate_ctxt.ops = emulate_ops;
if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))

[PATCH RFC V10 9/18] jump_label: Split out rate limiting from jump_label.h

2013-06-24 Thread Raghavendra K T
jump_label: Split jumplabel ratelimit

From: Andrew Jones drjo...@redhat.com

Commit b202952075f62603bea9bfb6ebc6b0420db11949 (perf, core: Rate limit
perf_sched_events jump_label patching) introduced rate limiting
for jump label disabling. The changes were made in the jump label code
in order to be more widely available and to keep things tidier. This is
all fine, except now jump_label.h includes linux/workqueue.h, which
makes it impossible to include jump_label.h from anything that
workqueue.h needs. For example, it's now impossible to include
jump_label.h from asm/spinlock.h, which is done in proposed
pv-ticketlock patches. This patch splits out the rate limiting related
changes from jump_label.h into a new file, jump_label_ratelimit.h, to
resolve the issue.

Signed-off-by: Andrew Jones drjo...@redhat.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 include/linux/jump_label.h   |   26 +-
 include/linux/jump_label_ratelimit.h |   34 ++
 include/linux/perf_event.h   |1 +
 kernel/jump_label.c  |1 +
 4 files changed, 37 insertions(+), 25 deletions(-)
 create mode 100644 include/linux/jump_label_ratelimit.h

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 0976fc4..53cdf89 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -48,7 +48,6 @@
 
 #include linux/types.h
 #include linux/compiler.h
-#include linux/workqueue.h
 
 #if defined(CC_HAVE_ASM_GOTO)  defined(CONFIG_JUMP_LABEL)
 
@@ -61,12 +60,6 @@ struct static_key {
 #endif
 };
 
-struct static_key_deferred {
-   struct static_key key;
-   unsigned long timeout;
-   struct delayed_work work;
-};
-
 # include asm/jump_label.h
 # define HAVE_JUMP_LABEL
 #endif /* CC_HAVE_ASM_GOTO  CONFIG_JUMP_LABEL */
@@ -119,10 +112,7 @@ extern void arch_jump_label_transform_static(struct 
jump_entry *entry,
 extern int jump_label_text_reserved(void *start, void *end);
 extern void static_key_slow_inc(struct static_key *key);
 extern void static_key_slow_dec(struct static_key *key);
-extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
 extern void jump_label_apply_nops(struct module *mod);
-extern void
-jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
 
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
{ .enabled = ATOMIC_INIT(1), .entries = (void *)1 })
@@ -141,10 +131,6 @@ static __always_inline void jump_label_init(void)
 {
 }
 
-struct static_key_deferred {
-   struct static_key  key;
-};
-
 static __always_inline bool static_key_false(struct static_key *key)
 {
if (unlikely(atomic_read(key-enabled))  0)
@@ -169,11 +155,6 @@ static inline void static_key_slow_dec(struct static_key 
*key)
atomic_dec(key-enabled);
 }
 
-static inline void static_key_slow_dec_deferred(struct static_key_deferred 
*key)
-{
-   static_key_slow_dec(key-key);
-}
-
 static inline int jump_label_text_reserved(void *start, void *end)
 {
return 0;
@@ -187,12 +168,6 @@ static inline int jump_label_apply_nops(struct module *mod)
return 0;
 }
 
-static inline void
-jump_label_rate_limit(struct static_key_deferred *key,
-   unsigned long rl)
-{
-}
-
 #define STATIC_KEY_INIT_TRUE ((struct static_key) \
{ .enabled = ATOMIC_INIT(1) })
 #define STATIC_KEY_INIT_FALSE ((struct static_key) \
@@ -203,6 +178,7 @@ jump_label_rate_limit(struct static_key_deferred *key,
 #define STATIC_KEY_INIT STATIC_KEY_INIT_FALSE
 #define jump_label_enabled static_key_enabled
 
+static inline int atomic_read(const atomic_t *v);
 static inline bool static_key_enabled(struct static_key *key)
 {
return (atomic_read(key-enabled)  0);
diff --git a/include/linux/jump_label_ratelimit.h 
b/include/linux/jump_label_ratelimit.h
new file mode 100644
index 000..1137883
--- /dev/null
+++ b/include/linux/jump_label_ratelimit.h
@@ -0,0 +1,34 @@
+#ifndef _LINUX_JUMP_LABEL_RATELIMIT_H
+#define _LINUX_JUMP_LABEL_RATELIMIT_H
+
+#include linux/jump_label.h
+#include linux/workqueue.h
+
+#if defined(CC_HAVE_ASM_GOTO)  defined(CONFIG_JUMP_LABEL)
+struct static_key_deferred {
+   struct static_key key;
+   unsigned long timeout;
+   struct delayed_work work;
+};
+#endif
+
+#ifdef HAVE_JUMP_LABEL
+extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
+extern void
+jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
+
+#else  /* !HAVE_JUMP_LABEL */
+struct static_key_deferred {
+   struct static_key  key;
+};
+static inline void static_key_slow_dec_deferred(struct static_key_deferred 
*key)
+{
+   static_key_slow_dec(key-key);
+}
+static inline void
+jump_label_rate_limit(struct static_key_deferred *key,
+   unsigned long rl)
+{
+}
+#endif /* HAVE_JUMP_LABEL */
+#endif /* _LINUX_JUMP_LABEL_RATELIMIT_H */

[PATCH RFC V10 13/18] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration

2013-06-24 Thread Raghavendra K T
kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

During migration, any vcpu that got kicked but did not become runnable
(still in halted state) should be runnable after migration.

Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/kvm/x86.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f8bea30..92a9932 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6243,7 +6243,12 @@ int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu 
*vcpu,
struct kvm_mp_state *mp_state)
 {
kvm_apic_accept_events(vcpu);
-   mp_state-mp_state = vcpu-arch.mp_state;
+   if (vcpu-arch.mp_state == KVM_MP_STATE_HALTED 
+   vcpu-arch.pv.pv_unhalted)
+   mp_state-mp_state = KVM_MP_STATE_RUNNABLE;
+   else
+   mp_state-mp_state = vcpu-arch.mp_state;
+
return 0;
 }
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V10 18/18] kvm hypervisor: Add directed yield in vcpu block path

2013-06-24 Thread Raghavendra K T
kvm hypervisor: Add directed yield in vcpu block path

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

We use the improved PLE handler logic in vcpu block patch for
scheduling rather than plain schedule, so that we can make
intelligent decisions.

Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/ia64/include/asm/kvm_host.h|5 +
 arch/powerpc/include/asm/kvm_host.h |5 +
 arch/s390/include/asm/kvm_host.h|5 +
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/x86.c  |8 
 include/linux/kvm_host.h|2 +-
 virt/kvm/kvm_main.c |6 --
 7 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 989dd3f..999ab15 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -595,6 +595,11 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
 void kvm_sal_emul(struct kvm_vcpu *vcpu);
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+   schedule();
+}
+
 #define __KVM_HAVE_ARCH_VM_ALLOC 1
 struct kvm *kvm_arch_alloc_vm(void);
 void kvm_arch_free_vm(struct kvm *kvm);
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index af326cd..1aeecc0 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -628,4 +628,9 @@ struct kvm_vcpu_arch {
 #define __KVM_HAVE_ARCH_WQP
 #define __KVM_HAVE_CREATE_DEVICE
 
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+   schedule();
+}
+
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 16bd5d1..db09a56 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -266,4 +266,9 @@ struct kvm_arch{
 };
 
 extern int sie64a(struct kvm_s390_sie_block *, u64 *);
+static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+   schedule();
+}
+
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 95702de..72ff791 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1042,5 +1042,5 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info);
 int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
 void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
 void kvm_deliver_pmi(struct kvm_vcpu *vcpu);
-
+void kvm_do_schedule(struct kvm_vcpu *vcpu);
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b963c86..84a4eb2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7281,6 +7281,14 @@ bool kvm_arch_can_inject_async_page_present(struct 
kvm_vcpu *vcpu)
kvm_x86_ops-interrupt_allowed(vcpu);
 }
 
+void kvm_do_schedule(struct kvm_vcpu *vcpu)
+{
+   /* We try to yield to a kicked vcpu else do a schedule */
+   if (kvm_vcpu_on_spin(vcpu) = 0)
+   schedule();
+}
+EXPORT_SYMBOL_GPL(kvm_do_schedule);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f0eea07..39efc18 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -565,7 +565,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct 
kvm_memory_slot *memslot,
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 bool kvm_vcpu_yield_to(struct kvm_vcpu *target);
-void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
+bool kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
 void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 302681c..8387247 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1685,7 +1685,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
if (signal_pending(current))
break;
 
-   schedule();
+   kvm_do_schedule(vcpu);
}
 
finish_wait(vcpu-wq, wait);
@@ -1786,7 +1786,7 @@ bool kvm_vcpu_eligible_for_directed_yield(struct kvm_vcpu 
*vcpu)
 }
 #endif
 
-void kvm_vcpu_on_spin(struct kvm_vcpu *me)
+bool kvm_vcpu_on_spin(struct kvm_vcpu *me)
 {
struct kvm *kvm = me-kvm;
struct kvm_vcpu *vcpu;
@@ -1835,6 +1835,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 
/* Ensure vcpu is not eligible during next spinloop */
kvm_vcpu_set_dy_eligible(me, false);
+
+   return yielded;
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_on_spin);
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC V10 17/18] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

2013-06-24 Thread Raghavendra K T
Documentation/kvm : Add documentation on Hypercalls and features used for PV 
spinlock

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in paravirtual spinlock
enabled guest.

KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled
in guest.

Thanks Vatsa for rewriting KVM_HC_KICK_CPU

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 Documentation/virtual/kvm/cpuid.txt  |4 
 Documentation/virtual/kvm/hypercalls.txt |   13 +
 2 files changed, 17 insertions(+)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 83afe65..654f43c 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -43,6 +43,10 @@ KVM_FEATURE_CLOCKSOURCE2   || 3 || kvmclock 
available at msrs
 KVM_FEATURE_ASYNC_PF   || 4 || async pf can be enabled by
||   || writing to msr 0x4b564d02
 --
+KVM_FEATURE_PV_UNHALT  || 6 || guest checks this feature bit
+   ||   || before enabling paravirtualized
+   ||   || spinlock support.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/Documentation/virtual/kvm/hypercalls.txt 
b/Documentation/virtual/kvm/hypercalls.txt
index ea113b5..0facb7e 100644
--- a/Documentation/virtual/kvm/hypercalls.txt
+++ b/Documentation/virtual/kvm/hypercalls.txt
@@ -64,3 +64,16 @@ Purpose: To enable communication between the hypervisor and 
guest there is a
 shared page that contains parts of supervisor visible register state.
 The guest can map this shared page to access its supervisor register through
 memory using this hypercall.
+
+5. KVM_HC_KICK_CPU
+
+Architecture: x86
+Status: active
+Purpose: Hypercall used to wakeup a vcpu from HLT state
+Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
+kernel mode for an event to occur (ex: a spinlock to become available) can
+execute HLT instruction once it has busy-waited for more than a threshold
+time-interval. Execution of HLT instruction would cause the hypervisor to put
+the vcpu to sleep until occurence of an appropriate event. Another vcpu of the
+same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall,
+specifying APIC ID of the vcpu to be woken up.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V10 16/18] kvm hypervisor : Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic

2013-06-24 Thread Raghavendra K T
Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Note that we are using APIC_DM_REMRD which has reserved usage.
In future if APIC_DM_REMRD usage is standardized, then we should
find some other way or go back to old method.

Suggested-by: Gleb Natapov g...@redhat.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/kvm/lapic.c |5 -
 arch/x86/kvm/x86.c   |   25 ++---
 2 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e1adbb4..3f5f82e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -706,7 +706,10 @@ out:
break;
 
case APIC_DM_REMRD:
-   apic_debug(Ignoring delivery mode 3\n);
+   result = 1;
+   vcpu-arch.pv.pv_unhalted = 1;
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
+   kvm_vcpu_kick(vcpu);
break;
 
case APIC_DM_SMI:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 92a9932..b963c86 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5456,27 +5456,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
  */
 static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
 {
-   struct kvm_vcpu *vcpu = NULL;
-   int i;
+   struct kvm_lapic_irq lapic_irq;
 
-   kvm_for_each_vcpu(i, vcpu, kvm) {
-   if (!kvm_apic_present(vcpu))
-   continue;
+   lapic_irq.shorthand = 0;
+   lapic_irq.dest_mode = 0;
+   lapic_irq.dest_id = apicid;
 
-   if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
-   break;
-   }
-   if (vcpu) {
-   /*
-* Setting unhalt flag here can result in spurious runnable
-* state when unhalt reset does not happen in vcpu_block.
-* But that is harmless since that should soon result in halt.
-*/
-   vcpu-arch.pv.pv_unhalted = true;
-   /* We need everybody see unhalt before vcpu unblocks */
-   smp_wmb();
-   kvm_vcpu_kick(vcpu);
-   }
+   lapic_irq.delivery_mode = APIC_DM_REMRD;
+   kvm_irq_delivery_to_apic(kvm, 0, lapic_irq, NULL);
 }
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V10 14/18] kvm guest : Add configuration support to enable debug information for KVM Guests

2013-06-24 Thread Raghavendra K T
kvm guest : Add configuration support to enable debug information for KVM Guests

From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose suz...@in.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/Kconfig |9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 80fcc4b..f8ff42d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -646,6 +646,15 @@ config KVM_GUEST
  underlying device model, the host provides the guest with
  timing infrastructure such as time of day, and system time
 
+config KVM_DEBUG_FS
+   bool Enable debug information for KVM Guests in debugfs
+   depends on KVM_GUEST  DEBUG_FS
+   default n
+   ---help---
+ This option enables collection of various statistics for KVM guest.
+ Statistics are displayed in debugfs filesystem. Enabling this option
+ may incur significant overhead.
+
 source arch/x86/lguest/Kconfig
 
 config PARAVIRT_TIME_ACCOUNTING

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V10 11/18] xen/pvticketlock: Allow interrupts to be enabled while blocking

2013-06-24 Thread Raghavendra K T
xen/pvticketlock: Allow interrupts to be enabled while blocking

From: Jeremy Fitzhardinge jer...@goop.org

If interrupts were enabled when taking the spinlock, we can leave them
enabled while blocking to get the lock.

If we can enable interrupts while waiting for the lock to become
available, and we take an interrupt before entering the poll,
and the handler takes a spinlock which ends up going into
the slow state (invalidating the per-cpu lock and want values),
then when the interrupt handler returns the event channel will
remain pending so the poll will return immediately, causing it to
return out to the main spinlock loop.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/xen/spinlock.c |   46 --
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 3ebabde..2b012a5 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -140,7 +140,20 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
 * partially setup state.
 */
local_irq_save(flags);
-
+   /*
+* We don't really care if we're overwriting some other
+* (lock,want) pair, as that would mean that we're currently
+* in an interrupt context, and the outer context had
+* interrupts enabled.  That has already kicked the VCPU out
+* of xen_poll_irq(), so it will just return spuriously and
+* retry with newly setup (lock,want).
+*
+* The ordering protocol on this is that the lock pointer
+* may only be set non-NULL if the want ticket is correct.
+* If we're updating want, we must first clear lock.
+*/
+   w-lock = NULL;
+   smp_wmb();
w-want = want;
smp_wmb();
w-lock = lock;
@@ -155,24 +168,43 @@ static void xen_lock_spinning(struct arch_spinlock *lock, 
__ticket_t want)
/* Only check lock once pending cleared */
barrier();
 
-   /* Mark entry to slowpath before doing the pickup test to make
-  sure we don't deadlock with an unlocker. */
+   /*
+* Mark entry to slowpath before doing the pickup test to make
+* sure we don't deadlock with an unlocker.
+*/
__ticket_enter_slowpath(lock);
 
-   /* check again make sure it didn't become free while
-  we weren't looking  */
+   /*
+* check again make sure it didn't become free while
+* we weren't looking
+*/
if (ACCESS_ONCE(lock-tickets.head) == want) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;
}
+
+   /* Allow interrupts while blocked */
+   local_irq_restore(flags);
+
+   /*
+* If an interrupt happens here, it will leave the wakeup irq
+* pending, which will cause xen_poll_irq() to return
+* immediately.
+*/
+
/* Block until irq becomes pending (or perhaps a spurious wakeup) */
xen_poll_irq(irq);
add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
+
+   local_irq_save(flags);
+
kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq));
 out:
cpumask_clear_cpu(cpu, waiting_cpus);
w-lock = NULL;
+
local_irq_restore(flags);
+
spin_time_accum_blocked(start);
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
@@ -186,7 +218,9 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
for_each_cpu(cpu, waiting_cpus) {
const struct xen_lock_waiting *w = per_cpu(lock_waiting, cpu);
 
-   if (w-lock == lock  w-want == next) {
+   /* Make sure we read lock before want */
+   if (ACCESS_ONCE(w-lock) == lock 
+   ACCESS_ONCE(w-want) == next) {
add_stats(RELEASED_SLOW_KICKED, 1);
xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
break;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-06-24 Thread Raghavendra K T
kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose suz...@in.ibm.com
[Raghu: check_zero race fix, enum for kvm_contention_stat
jumplabel related changes ]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/kvm_para.h |   14 ++
 arch/x86/kernel/kvm.c   |  255 +++
 2 files changed, 267 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 695399f..427afcb 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -118,10 +118,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* !CONFIG_PARAVIRT_SPINLOCKS */
+static inline void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index cd6d9a5..97ade5a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,7 @@
 #include linux/sched.h
 #include linux/slab.h
 #include linux/kprobes.h
+#include linux/debugfs.h
 #include asm/timer.h
 #include asm/cpu.h
 #include asm/traps.h
@@ -419,6 +420,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
WARN_ON(kvm_register_clock(primary cpu clock));
kvm_guest_cpu_init();
native_smp_prepare_boot_cpu();
+   kvm_spinlock_init();
 }
 
 static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -523,3 +525,256 @@ static __init int activate_jump_labels(void)
return 0;
 }
 arch_initcall(activate_jump_labels);
+
+/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
+void kvm_kick_cpu(int cpu)
+{
+   int apicid;
+
+   apicid = per_cpu(x86_cpu_to_apicid, cpu);
+   kvm_hypercall1(KVM_HC_KICK_CPU, apicid);
+}
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+#define HISTO_BUCKETS  30
+
+static struct kvm_spinlock_stats
+{
+   u32 contention_stats[NR_CONTENTION_STATS];
+   u32 histo_spin_blocked[HISTO_BUCKETS+1];
+   u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+   u8 ret;
+   u8 old;
+
+   old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(zero_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(spinlock_stats, 0, sizeof(spinlock_stats));
+   }
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+   return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+   unsigned index;
+
+   index = ilog2(delta);
+   check_zero();
+
+   if (index  HISTO_BUCKETS)
+   array[index]++;
+   else
+   array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+   u32 delta;
+
+   delta = sched_clock() - start;
+   __spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+   spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+   d_kvm_debug = debugfs_create_dir(kvm, NULL);
+   if (!d_kvm_debug)
+   printk(KERN_WARNING Could not create 'kvm' debugfs 
directory\n);
+
+   return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+   struct dentry *d_kvm;
+
+   d_kvm = kvm_init_debugfs();
+   if (d_kvm == NULL)
+   return -ENOMEM;
+
+   d_spin_debug = debugfs_create_dir(spinlocks, d_kvm);
+
+   debugfs_create_u8(zero_stats, 0644, d_spin_debug, zero_stats);
+
+   debugfs_create_u32(taken_slow, 0444, d_spin_debug,
+  

[PATCH RFC V10 4/18] xen: Defer spinlock setup until boot CPU setup

2013-06-24 Thread Raghavendra K T
xen: Defer spinlock setup until boot CPU setup

From: Jeremy Fitzhardinge jer...@goop.org

There's no need to do it at very early init, and doing it there
makes it impossible to use the jump_label machinery.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/xen/smp.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 8ff3799..dcdc91c 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -246,6 +246,7 @@ static void __init xen_smp_prepare_boot_cpu(void)
 
xen_filter_cpu_maps();
xen_setup_vcpu_info_placement();
+   xen_init_spinlocks();
 }
 
 static void __init xen_smp_prepare_cpus(unsigned int max_cpus)
@@ -647,7 +648,6 @@ void __init xen_smp_init(void)
 {
smp_ops = xen_smp_ops;
xen_fill_possible_map();
-   xen_init_spinlocks();
 }
 
 static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V10 1/18] x86/spinlock: Replace pv spinlocks with pv ticketlocks

2013-06-24 Thread Raghavendra K T
x86/spinlock: Replace pv spinlocks with pv ticketlocks

From: Jeremy Fitzhardinge jer...@goop.org

Rather than outright replacing the entire spinlock implementation in
order to paravirtualize it, keep the ticket lock implementation but add
a couple of pvops hooks on the slow patch (long spin on lock, unlocking
a contended lock).

Ticket locks have a number of nice properties, but they also have some
surprising behaviours in virtual environments.  They enforce a strict
FIFO ordering on cpus trying to take a lock; however, if the hypervisor
scheduler does not schedule the cpus in the correct order, the system can
waste a huge amount of time spinning until the next cpu can take the lock.

(See Thomas Friebel's talk Prevent Guests from Spinning Around
http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)

To address this, we add two hooks:
 - __ticket_spin_lock which is called after the cpu has been
   spinning on the lock for a significant number of iterations but has
   failed to take the lock (presumably because the cpu holding the lock
   has been descheduled).  The lock_spinning pvop is expected to block
   the cpu until it has been kicked by the current lock holder.
 - __ticket_spin_unlock, which on releasing a contended lock
   (there are more cpus with tail tickets), it looks to see if the next
   cpu is blocked and wakes it if so.

When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
functions causes all the extra code to go away.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
[ Raghavendra: Changed SPIN_THRESHOLD ]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/paravirt.h   |   32 
 arch/x86/include/asm/paravirt_types.h |   10 ++
 arch/x86/include/asm/spinlock.h   |   53 +++--
 arch/x86/include/asm/spinlock_types.h |4 --
 arch/x86/kernel/paravirt-spinlocks.c  |   15 +
 arch/x86/xen/spinlock.c   |8 -
 6 files changed, 61 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index cfdc9ee..040e72d 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -712,36 +712,16 @@ static inline void __set_fixmap(unsigned /* enum 
fixed_addresses */ idx,
 
 #if defined(CONFIG_SMP)  defined(CONFIG_PARAVIRT_SPINLOCKS)
 
-static inline int arch_spin_is_locked(struct arch_spinlock *lock)
+static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
+   __ticket_t ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock);
+   PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static inline int arch_spin_is_contended(struct arch_spinlock *lock)
+static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+   __ticket_t ticket)
 {
-   return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock);
-}
-#define arch_spin_is_contended arch_spin_is_contended
-
-static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_lock, lock);
-}
-
-static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock,
- unsigned long flags)
-{
-   PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags);
-}
-
-static __always_inline int arch_spin_trylock(struct arch_spinlock *lock)
-{
-   return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock);
-}
-
-static __always_inline void arch_spin_unlock(struct arch_spinlock *lock)
-{
-   PVOP_VCALL1(pv_lock_ops.spin_unlock, lock);
+   PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
 }
 
 #endif
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 0db1fca..d5deb6d 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -327,13 +327,11 @@ struct pv_mmu_ops {
 };
 
 struct arch_spinlock;
+#include asm/spinlock_types.h
+
 struct pv_lock_ops {
-   int (*spin_is_locked)(struct arch_spinlock *lock);
-   int (*spin_is_contended)(struct arch_spinlock *lock);
-   void (*spin_lock)(struct arch_spinlock *lock);
-   void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long 
flags);
-   int (*spin_trylock)(struct arch_spinlock *lock);
-   void (*spin_unlock)(struct arch_spinlock *lock);
+   void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket);
+   void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
 };
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 33692ea..4d54244 100644
--- 

[PATCH RFC V10 6/18] xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

2013-06-24 Thread Raghavendra K T
xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks

From: Jeremy Fitzhardinge jer...@goop.org

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/xen/spinlock.c |   14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index d471c76..870e49f 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -239,6 +239,8 @@ void xen_uninit_lock_cpu(int cpu)
per_cpu(lock_kicker_irq, cpu) = -1;
 }
 
+static bool xen_pvspin __initdata = true;
+
 void __init xen_init_spinlocks(void)
 {
/*
@@ -248,10 +250,22 @@ void __init xen_init_spinlocks(void)
if (xen_hvm_domain())
return;
 
+   if (!xen_pvspin) {
+   printk(KERN_DEBUG xen: PV spinlocks disabled\n);
+   return;
+   }
+
pv_lock_ops.lock_spinning = xen_lock_spinning;
pv_lock_ops.unlock_kick = xen_unlock_kick;
 }
 
+static __init int xen_parse_nopvspin(char *arg)
+{
+   xen_pvspin = false;
+   return 0;
+}
+early_param(xen_nopvspin, xen_parse_nopvspin);
+
 #ifdef CONFIG_XEN_DEBUG_FS
 
 static struct dentry *d_spin_debug;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V10 10/18] x86/ticketlock: Add slowpath logic

2013-06-24 Thread Raghavendra K T
x86/ticketlock: Add slowpath logic

From: Jeremy Fitzhardinge jer...@goop.org

Maintain a flag in the LSB of the ticket lock tail which indicates
whether anyone is in the lock slowpath and may need kicking when
the current holder unlocks.  The flags are set when the first locker
enters the slowpath, and cleared when unlocking to an empty queue (ie,
no contention).

In the specific implementation of lock_spinning(), make sure to set
the slowpath flags on the lock just before blocking.  We must do
this before the last-chance pickup test to prevent a deadlock
with the unlocker:

UnlockerLocker
test for lock pickup
- fail
unlock
test slowpath
- false
set slowpath flags
block

Whereas this works in any ordering:

UnlockerLocker
set slowpath flags
test for lock pickup
- fail
block
unlock
test slowpath
- true, kick

If the unlocker finds that the lock has the slowpath flag set but it is
actually uncontended (ie, head == tail, so nobody is waiting), then it
clears the slowpath flag.

The unlock code uses a locked add to update the head counter.  This also
acts as a full memory barrier so that its safe to subsequently
read back the slowflag state, knowing that the updated lock is visible
to the other CPUs.  If it were an unlocked add, then the flag read may
just be forwarded from the store buffer before it was visible to the other
CPUs, which could result in a deadlock.

Unfortunately this means we need to do a locked instruction when
unlocking with PV ticketlocks.  However, if PV ticketlocks are not
enabled, then the old non-locked add is the only unlocking code.

Note: this code relies on gcc making sure that unlikely() code is out of
line of the fastpath, which only happens when OPTIMIZE_SIZE=n.  If it
doesn't the generated code isn't too bad, but its definitely suboptimal.

Thanks to Srivatsa Vaddagiri for providing a bugfix to the original
version of this change, which has been folded in.
Thanks to Stephan Diestelhorst for commenting on some code which relied
on an inaccurate reading of the x86 memory ordering rules.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: Stephan Diestelhorst stephan.diestelho...@amd.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/paravirt.h   |2 -
 arch/x86/include/asm/spinlock.h   |   86 -
 arch/x86/include/asm/spinlock_types.h |2 +
 arch/x86/kernel/paravirt-spinlocks.c  |3 +
 arch/x86/xen/spinlock.c   |6 ++
 5 files changed, 74 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 7131e12c..401f350 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -718,7 +718,7 @@ static __always_inline void __ticket_lock_spinning(struct 
arch_spinlock *lock,
PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
+static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
__ticket_t ticket)
 {
PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 04a5cd5..d68883d 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -1,11 +1,14 @@
 #ifndef _ASM_X86_SPINLOCK_H
 #define _ASM_X86_SPINLOCK_H
 
+#include linux/jump_label.h
 #include linux/atomic.h
 #include asm/page.h
 #include asm/processor.h
 #include linux/compiler.h
 #include asm/paravirt.h
+#include asm/bitops.h
+
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
  *
@@ -37,32 +40,28 @@
 /* How long a lock should spin before we consider blocking */
 #define SPIN_THRESHOLD (1  15)
 
-#ifndef CONFIG_PARAVIRT_SPINLOCKS
+extern struct static_key paravirt_ticketlocks_enabled;
+static __always_inline bool static_key_false(struct static_key *key);
 
-static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
-   __ticket_t ticket)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
+   set_bit(0, (volatile unsigned long *)lock-tickets.tail);
 }
 
-static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock,
-__ticket_t ticket)
+#else  /* !CONFIG_PARAVIRT_SPINLOCKS */
+static 

[PATCH RFC V10 3/18] x86/ticketlock: Collapse a layer of functions

2013-06-24 Thread Raghavendra K T
x86/ticketlock: Collapse a layer of functions

From: Jeremy Fitzhardinge jer...@goop.org

Now that the paravirtualization layer doesn't exist at the spinlock
level any more, we can collapse the __ticket_ functions into the arch_
functions.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/spinlock.h |   35 +--
 1 file changed, 5 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 4d54244..7442410 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -76,7 +76,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  * in the high part, because a wide xadd increment of the low part would carry
  * up and contaminate the high part.
  */
-static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
+static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
register struct __raw_tickets inc = { .tail = 1 };
 
@@ -96,7 +96,7 @@ static __always_inline void __ticket_spin_lock(struct 
arch_spinlock *lock)
 out:   barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
-static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
+static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
arch_spinlock_t old, new;
 
@@ -110,7 +110,7 @@ static __always_inline int 
__ticket_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
+static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
__ticket_t next = lock-tickets.head + 1;
 
@@ -118,46 +118,21 @@ static __always_inline void 
__ticket_spin_unlock(arch_spinlock_t *lock)
__ticket_unlock_kick(lock, next);
 }
 
-static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return tmp.tail != tmp.head;
 }
 
-static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
return (__ticket_t)(tmp.tail - tmp.head)  1;
 }
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_locked(lock);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
-   return __ticket_spin_is_contended(lock);
-}
 #define arch_spin_is_contended arch_spin_is_contended
 
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
-   __ticket_spin_lock(lock);
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
-   return __ticket_spin_trylock(lock);
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
-   __ticket_spin_unlock(lock);
-}
-
 static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
  unsigned long flags)
 {

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V10 8/18] x86/pvticketlock: When paravirtualizing ticket locks, increment by 2

2013-06-24 Thread Raghavendra K T
x86/pvticketlock: When paravirtualizing ticket locks, increment by 2

From: Jeremy Fitzhardinge jer...@goop.org

Increment ticket head/tails by 2 rather than 1 to leave the LSB free
to store a is in slowpath state bit.  This halves the number
of possible CPUs for a given ticket size, but this shouldn't matter
in practice - kernels built for 32k+ CPU systems are probably
specially built for the hardware rather than a generic distro
kernel.

Signed-off-by: Jeremy Fitzhardinge jer...@goop.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Tested-by: Attilio Rao attilio@citrix.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/spinlock.h   |   10 +-
 arch/x86/include/asm/spinlock_types.h |   10 +-
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 7442410..04a5cd5 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -78,7 +78,7 @@ static __always_inline void __ticket_unlock_kick(struct 
arch_spinlock *lock,
  */
 static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
 {
-   register struct __raw_tickets inc = { .tail = 1 };
+   register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
inc = xadd(lock-tickets, inc);
 
@@ -104,7 +104,7 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
if (old.tickets.head != old.tickets.tail)
return 0;
 
-   new.head_tail = old.head_tail + (1  TICKET_SHIFT);
+   new.head_tail = old.head_tail + (TICKET_LOCK_INC  TICKET_SHIFT);
 
/* cmpxchg is a full barrier, so nothing can move before it */
return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
@@ -112,9 +112,9 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
 
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-   __ticket_t next = lock-tickets.head + 1;
+   __ticket_t next = lock-tickets.head + TICKET_LOCK_INC;
 
-   __add(lock-tickets.head, 1, UNLOCK_LOCK_PREFIX);
+   __add(lock-tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
__ticket_unlock_kick(lock, next);
 }
 
@@ -129,7 +129,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t 
*lock)
 {
struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
 
-   return (__ticket_t)(tmp.tail - tmp.head)  1;
+   return (__ticket_t)(tmp.tail - tmp.head)  TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended arch_spin_is_contended
 
diff --git a/arch/x86/include/asm/spinlock_types.h 
b/arch/x86/include/asm/spinlock_types.h
index 83fd3c7..e96fcbd 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -3,7 +3,13 @@
 
 #include linux/types.h
 
-#if (CONFIG_NR_CPUS  256)
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define __TICKET_LOCK_INC  2
+#else
+#define __TICKET_LOCK_INC  1
+#endif
+
+#if (CONFIG_NR_CPUS  (256 / __TICKET_LOCK_INC))
 typedef u8  __ticket_t;
 typedef u16 __ticketpair_t;
 #else
@@ -11,6 +17,8 @@ typedef u16 __ticket_t;
 typedef u32 __ticketpair_t;
 #endif
 
+#define TICKET_LOCK_INC((__ticket_t)__TICKET_LOCK_INC)
+
 #define TICKET_SHIFT   (sizeof(__ticket_t) * 8)
 
 typedef struct arch_spinlock {

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V10 2/18] x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks

2013-06-24 Thread Raghavendra K T
x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

The code size expands somewhat, and its better to just call
a function rather than inline it.

Thanks Jeremy for original version of ARCH_NOINLINE_SPIN_UNLOCK config patch,
which is simplified.

Suggested-by: Linus Torvalds torva...@linux-foundation.org
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/Kconfig |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 685692c..80fcc4b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -621,6 +621,7 @@ config PARAVIRT_DEBUG
 config PARAVIRT_SPINLOCKS
bool Paravirtualization layer for spinlocks
depends on PARAVIRT  SMP
+   select UNINLINE_SPIN_UNLOCK
---help---
  Paravirtualized spinlocks allow a pvops backend to replace the
  spinlock implementation with something virtualization-friendly

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V10 0/18] Paravirtualized ticket spinlocks

2013-06-24 Thread Andrew Jones
On Mon, Jun 24, 2013 at 06:10:14PM +0530, Raghavendra K T wrote:
 
 Results:
 ===
 base = 3.10-rc2 kernel
 patched = base + this series
 
 The test was on 32 core (model: Intel(R) Xeon(R) CPU X7560) HT disabled
 with 32 KVM guest vcpu 8GB RAM.

Have you ever tried to get results with HT enabled?

 
 +---+---+---++---+
ebizzy (records/sec) higher is better
 +---+---+---++---+
 basestdevpatchedstdev%improvement
 +---+---+---++---+
 1x  5574.9000   237.49975618.94.0366 0.77311
 2x  2741.5000   561.30903332.   102.473821.53930
 3x  2146.2500   216.77182302.76.3870 7.27237
 4x  1663.   141.92351753.750083.5220 5.45701
 +---+---+---++---+

This looks good. Are your ebizzy results consistent run to run
though?

 +---+---+---++---+
   dbench  (Throughput) higher is better
 +---+---+---++---+
 basestdevpatchedstdev%improvement
 +---+---+---++---+
 1x 14111.5600   754.4525   14645.9900   114.3087 3.78718
 2x  2481.627071.26652667.128073.8193 7.47498
 3x  1510.248331.86341503.879236.0777-0.42173
 4x  1029.487516.91661039.706943.8840 0.99267
 +---+---+---++---+

Hmm, I wonder what 2.5x looks like. Also, the 3% improvement with
no overcommit is interesting. What's happening there? It makes
me wonder what  1x looks like.

thanks,
drew
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V10 0/18] Paravirtualized ticket spinlocks

2013-06-24 Thread Raghavendra K T

On 06/24/2013 06:47 PM, Andrew Jones wrote:

On Mon, Jun 24, 2013 at 06:10:14PM +0530, Raghavendra K T wrote:


Results:
===
base = 3.10-rc2 kernel
patched = base + this series

The test was on 32 core (model: Intel(R) Xeon(R) CPU X7560) HT disabled
with 32 KVM guest vcpu 8GB RAM.


Have you ever tried to get results with HT enabled?



I have not done it yet with the latest. I will get that result.



+---+---+---++---+
ebizzy (records/sec) higher is better
+---+---+---++---+
 basestdevpatchedstdev%improvement
+---+---+---++---+
1x  5574.9000   237.49975618.94.0366 0.77311
2x  2741.5000   561.30903332.   102.473821.53930
3x  2146.2500   216.77182302.76.3870 7.27237
4x  1663.   141.92351753.750083.5220 5.45701
+---+---+---++---+


This looks good. Are your ebizzy results consistent run to run
though?



yes.. ebizzy looked more consistent.


+---+---+---++---+
   dbench  (Throughput) higher is better
+---+---+---++---+
 basestdevpatchedstdev%improvement
+---+---+---++---+
1x 14111.5600   754.4525   14645.9900   114.3087 3.78718
2x  2481.627071.26652667.128073.8193 7.47498
3x  1510.248331.86341503.879236.0777-0.42173
4x  1029.487516.91661039.706943.8840 0.99267
+---+---+---++---+


Hmm, I wonder what 2.5x looks like. Also, the 3% improvement with
no overcommit is interesting. What's happening there? It makes
me wonder what  1x looks like.



I 'll try to get 0.5x and 2.5x run for dbench.


thanks,
drew





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pci: Enable overrides for missing ACS capabilities

2013-06-24 Thread Bjorn Helgaas
On Wed, Jun 19, 2013 at 6:43 AM, Don Dutile ddut...@redhat.com wrote:
 On 06/18/2013 10:52 PM, Bjorn Helgaas wrote:

 On Tue, Jun 18, 2013 at 5:03 PM, Don Dutileddut...@redhat.com  wrote:

 On 06/18/2013 06:22 PM, Alex Williamson wrote:


 On Tue, 2013-06-18 at 15:31 -0600, Bjorn Helgaas wrote:


 On Tue, Jun 18, 2013 at 12:20 PM, Alex Williamson
 alex.william...@redhat.com   wrote:


 On Tue, 2013-06-18 at 11:28 -0600, Bjorn Helgaas wrote:


 On Thu, May 30, 2013 at 12:40:19PM -0600, Alex Williamson wrote:

 ...

 Who do you expect to decide whether to use this option?  I think it
 requires intimate knowledge of how the device works.

 I think the benefit of using the option is that it makes assignment
 of
 devices to guests more flexible, which will make it attractive to
 users.
 But most users have no way of knowing whether it's actually *safe* to
 use this.  So I worry that you're adding an easy way to pretend
 isolation
 exists when there's no good way of being confident that it actually
 does.


 ...


 I wonder if we should taint the kernel if this option is used (but not
 for specific devices added to pci_dev_acs_enabled[]).  It would also
 be nice if pci_dev_specific_acs_enabled() gave some indication in
 dmesg for the specific devices you're hoping to add to
 pci_dev_acs_enabled[].  It's not an enumeration-time quirk right now,
 so I'm not sure how we'd limit it to one message per device.


 Right, setup vs use and getting single prints is a lot of extra code.
 Tainting is troublesome for support, Don had some objections when I
 suggested the same to him.

 For RH GSS (Global Support Services), a 'taint' in the kernel printk
 means
 RH doesn't support that system.  The 'non-support' due to 'taint' being
 printed
 out in this case may be incorrect -- RH may support that use, at least
 until
 a more sufficient patched kernel is provided.
 Thus my dissension that 'taint' be output.  WARN is ok. 'driver beware',
 'unleashed dog afoot' sure...


 So ...  that's really a RH-specific support issue, and easily worked
 around by RH adding a patch that turns off tainting.

 sure. what's another patch to the thousands... :-/

 It still sounds like a good idea to me for upstream, where use of this
 option can very possibly lead to corruption or information leakage
 between devices the user claimed were isolated, but in fact were not.

 Did I miss something?  This patch provides a user-level/chosen override;
 like all other overrides, (pci=realloc, etc.), it can lead to a failing
 system.
 IMO, this patch is no different.  If you want to tag this patch with taint,
 then let's audit all the (PCI) overrides and taint them appropriately.
 Taint should be reserved to changes to the kernel that were done outside
 the development of the kernel, or with the explicit intent to circumvent
 the normal operation of the kernel.  This patch provides a way to enable
 ACS checking to succeed when the devices have not provided sufficiently
 complete
 ACS information.  i.e., it's a growth path for PCIe-ACS and its need for
 proper support.

We're telling the kernel to assume something (the hardware provides
protection) that may not be true.  If that assumption turns out to be
false, the result is that a VM can be crashed or comprised by another
VM.

One difference I see is that this override can lead to a crash that
looks like random memory corruption and has no apparent connection to
the actual cause.  Most other overrides won't cause run-time crashes
(I think they're more likely to cause boot or device configuration
failures), and the dmesg log will probably have good clues as to the
reason.

But the possibility of compromise is probably even more serious,
because there would be no crash at all, and we'd have no indication
that VM A read or corrupted data in VM B.  I'm very concerned about
that, enough so that it's not clear to me that an override belongs in
the upstream kernel at all.

Yes, that would mean some hardware is not suitable for device
assignment.  That just sounds like if hardware manufacturers do their
homework and support ACS properly, their hardware is more useful for
virtualization than other hardware.  I don't see the problem with
that.

Bjorn
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm_intel: Could not allocate 42 bytes percpu data

2013-06-24 Thread Chegu Vinod


Hello,

Lots (~700+) of the following messages are showing up in the dmesg of a 
3.10-rc1 based kernel (Host OS is running on a large socket count box 
with HT-on).


[   82.270682] PERCPU: allocation failed, size=42 align=16, alloc from 
reserved chunk failed

[   82.272633] kvm_intel: Could not allocate 42 bytes percpu data

... also call traces like the following...

[  101.852136]  c901ad5aa090 88084675dd08 81633743 
88084675ddc8
[  101.860889]  81145053 81f3fa78 88084809dd40 
8907d1cfd2e8
[  101.869466]  8907d1cfd280 88087fffdb08 88084675c010 
88084675dfd8

[  101.878190] Call Trace:
[  101.880953]  [81633743] dump_stack+0x19/0x1e
[  101.886679]  [81145053] pcpu_alloc+0x9a3/0xa40
[  101.892754]  [81145103] __alloc_reserved_percpu+0x13/0x20
[  101.899733]  [810b2d7f] load_module+0x35f/0x1a70
[  101.905835]  [8163ad6e] ? do_page_fault+0xe/0x10
[  101.911953]  [810b467b] SyS_init_module+0xfb/0x140
[  101.918287]  [8163f542] system_call_fastpath+0x16/0x1b
[  101.924981] kvm_intel: Could not allocate 42 bytes percpu data


Wondering if anyone else has seen this with the recent [3.10] based 
kernels esp. on larger boxes?


There was a similar issue that was reported earlier (where modules were 
being loaded per cpu without checking if an instance was already 
loaded/being-loaded). That issue seems to have been addressed in the 
recent past (e.g. https://lkml.org/lkml/2013/1/24/659 along with a 
couple of follow on cleanups)   Is the above yet another variant of the 
original issue or perhaps some race condition that got exposed when 
there are lot more threads ?


Vinod



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] armv7 initial device passthrough support

2013-06-24 Thread Christoffer Dall
On Mon, Jun 24, 2013 at 10:08:08AM +0200, Mario Smarduch wrote:
 
 
 On 6/15/2013 5:47 PM, Paolo Bonzini wrote:
  Il 13/06/2013 11:19, Mario Smarduch ha scritto:
  Updated Device Passthrough Patch.
  - optimized IRQ-CPU-vCPU binding, irq is installed once
  - added dynamic IRQ affinity on schedule in
  - added documentation and few other coding recommendations.
 
  Per earlier discussion VFIO is our target but we like
  something earlier to work with to tackle performance
  latency issue (some ARM related) for device passthrough 
  while we migrate towards VFIO.
  
  I don't think this is acceptable upstream, unfortunately.  KVM device
  assignment is deprecated and we should not add more users.
 That's fine we'll work our way towards dev-tree VFIO reusing what we can
 working with the community.
 
 At this point we're more concerned with numbers and best practices as 
 opposed to mechanism this part will be time consuming. 
 VFIO will be more background for us.
 
  
  What are the latency issues you have?
 
 Our focus now is on IRQ latency and throughput. Right now it appears lowest 
 latency
 is 2x + exit/enter + IRQ injection overhead. We can't tolerate additional 
 IPIs or deferred IRQ injection approaches. We're looking for numbers closer
 to what IBMs ELI managed. Also high res timers which ARM Virt. Ext supports 
 very well. Exitless interrupts which ARM handles very well too. There are
 some future hw ARM interrupt enhancements coming up which may help a lot as 
 well.
 
 There are many other latency/perf. reqs for NFV related to RT,
 essentially Guest must run near native. In the end it may turn out this
 may need to be outside of main tree we'll see.
 
It doesn't sound like this will be the end result.  Everything that you
try to do in your patch set can be accomplished using VFIO and a more
generic infrastructure for virtual IRQ integration with KVM and user
space.

We should avoid creating an environment with important functionality
outside of the main tree, if at all possible.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug#707257: linux-image-3.8-1-686-pae: KVM crashes with entry failed, hardware error 0x80000021

2013-06-24 Thread Stefan Pietsch
On 24.06.2013 14:30, Gleb Natapov wrote:
 On Mon, Jun 24, 2013 at 01:59:34PM +0200, Stefan Pietsch wrote:
 As soon as I remove kvmvapic.bin the virtual machine boots with
 qemu-kvm 1.5.0. I just verified this with Linux kernel 3.10.0-rc5.
 emulate_invalid_guest_state=0 or emulate_invalid_guest_state=1 make
 no difference.

 Please send your patches.
 Here it is, run with it and kvmvapic.bin present. See what is printed in
 dmesg after the failure.
 
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index f4a5b3f..65488a4 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -3385,6 +3385,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
  {
   struct vcpu_vmx *vmx = to_vmx(vcpu);
   u32 ar;
 + unsigned long rip;
  
   if (vmx-rmode.vm86_active  seg != VCPU_SREG_LDTR) {
   *var = vmx-rmode.segs[seg];
 @@ -3408,6 +3409,9 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
   var-db = (ar  14)  1;
   var-g = (ar  15)  1;
   var-unusable = (ar  16)  1;
 + rip = kvm_rip_read(vcpu);
 + if ((rip == 0xc101611c || rip == 0xc101611a)  seg == VCPU_SREG_FS)
 + printk(base=%p limit=%p selector=%x ar=%x\n, var-base, 
 var-limit, var-selector, ar);
  }
  
  static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg)


Booting kernel Linux 3.10-rc5 with your patch applied produces these
messages in dmesg when starting a virtual machine:

emulate_invalid_guest_state=0
[  118.732151] base= limit=  (null) selector=ffff ar=0
[  118.732341] base= limit=  (null) selector=ffff ar=0

emulate_invalid_guest_state=1
[  196.481653] base= limit=  (null) selector=ffff ar=0
[  196.481700] base= limit=  (null) selector=ffff ar=0
[  196.481706] base= limit=  (null) selector=ffff ar=0
[  196.481711] base= limit=  (null) selector=ffff ar=0
[  196.481716] base= limit=  (null) selector=ffff ar=0
[  196.481720] base= limit=  (null) selector=ffff ar=0
[  196.481725] base= limit=  (null) selector=ffff ar=0
[  196.481730] base= limit=  (null) selector=ffff ar=0
[  196.481735] base= limit=  (null) selector=ffff ar=0
[  196.481739] base= limit=  (null) selector=ffff ar=0
[  196.481777] base= limit=  (null) selector=ffff ar=0
[  196.482068] base= limit=  (null) selector=ffff ar=0
[  196.482073] base= limit=  (null) selector=ffff ar=0
[  196.482079] base= limit=  (null) selector=ffff ar=0
[  196.482084] base= limit=  (null) selector=ffff ar=0
[  196.482131] base= limit=  (null) selector=ffff ar=0
[  196.482136] base= limit=  (null) selector=ffff ar=0
[  196.482142] base= limit=  (null) selector=ffff ar=0
[  196.482146] base= limit=  (null) selector=ffff ar=0
[  196.482193] base= limit=  (null) selector=ffff ar=0
[  196.482198] base= limit=  (null) selector=ffff ar=0
[  196.482203] base= limit=  (null) selector=ffff ar=0
[  196.482208] base= limit=  (null) selector=ffff ar=0
[  196.482255] base= limit=  (null) selector=ffff ar=0
[  196.482259] base= limit=  (null) selector=ffff ar=0
[  196.482265] base= limit=  (null) selector=ffff ar=0
[  196.482269] base= limit=  (null) selector=ffff ar=0
[  196.482316] base= limit=  (null) selector=ffff ar=0
[  196.482321] base= limit=  (null) selector=ffff ar=0
[  196.482326] base= limit=  (null) selector=ffff ar=0
[  196.482331] base= limit=  (null) selector=ffff ar=0
[  196.482378] base= limit=  (null) selector=ffff ar=0
[  196.482382] base= limit=  (null) selector=ffff ar=0
[  196.482388] base= limit=  (null) selector=ffff ar=0
[  196.482392] base= limit=  (null) selector=ffff ar=0
[  196.482439] base= limit=  (null) selector=ffff ar=0
[  196.482444] base= limit=  (null) selector=ffff ar=0
[  196.482449] base= limit=  (null) selector=ffff ar=0
[  196.482454] base= limit=  (null) selector=ffff ar=0
[  196.482501] base= limit=  (null) selector=ffff ar=0
[  196.482505] base= limit=  (null) selector=ffff ar=0
[  196.482511] base= limit=  (null) selector=ffff ar=0
[  196.482516] base= limit=  (null) selector=ffff ar=0
[  196.482562] base= limit=  (null) selector=ffff ar=0
[  196.482567] base= limit=  (null) selector=ffff ar=0
[  196.482573] base= limit=  (null) selector=ffff ar=0
[  196.482577] base= limit=  (null) selector=ffff ar=0
[  196.483137] base= limit=  (null) selector=ffff ar=0
[  196.483142] base= limit=  (null) selector=ffff ar=0
[  196.483147] base= limit=  (null) selector=ffff ar=0
[  196.483152] 

Re: [PATCH 2/2] armv7 initial device passthrough support

2013-06-24 Thread Stuart Yoder
On Mon, Jun 24, 2013 at 3:01 PM, Christoffer Dall
christoffer.d...@linaro.org wrote:
 On Mon, Jun 24, 2013 at 10:08:08AM +0200, Mario Smarduch wrote:


 On 6/15/2013 5:47 PM, Paolo Bonzini wrote:
  Il 13/06/2013 11:19, Mario Smarduch ha scritto:
  Updated Device Passthrough Patch.
  - optimized IRQ-CPU-vCPU binding, irq is installed once
  - added dynamic IRQ affinity on schedule in
  - added documentation and few other coding recommendations.
 
  Per earlier discussion VFIO is our target but we like
  something earlier to work with to tackle performance
  latency issue (some ARM related) for device passthrough
  while we migrate towards VFIO.
 
  I don't think this is acceptable upstream, unfortunately.  KVM device
  assignment is deprecated and we should not add more users.
 That's fine we'll work our way towards dev-tree VFIO reusing what we can
 working with the community.

 At this point we're more concerned with numbers and best practices as
 opposed to mechanism this part will be time consuming.
 VFIO will be more background for us.

 
  What are the latency issues you have?

 Our focus now is on IRQ latency and throughput. Right now it appears lowest 
 latency
 is 2x + exit/enter + IRQ injection overhead. We can't tolerate additional
 IPIs or deferred IRQ injection approaches. We're looking for numbers closer
 to what IBMs ELI managed. Also high res timers which ARM Virt. Ext supports
 very well. Exitless interrupts which ARM handles very well too. There are
 some future hw ARM interrupt enhancements coming up which may help a lot as 
 well.

 There are many other latency/perf. reqs for NFV related to RT,
 essentially Guest must run near native. In the end it may turn out this
 may need to be outside of main tree we'll see.

 It doesn't sound like this will be the end result.  Everything that you
 try to do in your patch set can be accomplished using VFIO and a more
 generic infrastructure for virtual IRQ integration with KVM and user
 space.

 We should avoid creating an environment with important functionality
 outside of the main tree, if at all possible.

Also, as we architect that generic infrastructure we need to keep in mind that
there are important use cases for doing I/O in user space that are not
KVM guests-- just normal applications that need direct device
access.

Stuart
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm_intel: Could not allocate 42 bytes percpu data

2013-06-24 Thread Prarit Bhargava


On 06/24/2013 03:01 PM, Chegu Vinod wrote:
 
 Hello,
 
 Lots (~700+) of the following messages are showing up in the dmesg of a 
 3.10-rc1
 based kernel (Host OS is running on a large socket count box with HT-on).
 
 [   82.270682] PERCPU: allocation failed, size=42 align=16, alloc from 
 reserved
 chunk failed
 [   82.272633] kvm_intel: Could not allocate 42 bytes percpu data

On 3.10?  Geez.  I thought we had fixed this.  I'll grab a big machine and see
if I can debug.

Rusty -- any ideas off the top of your head?'
 
 ... also call traces like the following...
 
 [  101.852136]  c901ad5aa090 88084675dd08 81633743 
 88084675ddc8
 [  101.860889]  81145053 81f3fa78 88084809dd40 
 8907d1cfd2e8
 [  101.869466]  8907d1cfd280 88087fffdb08 88084675c010 
 88084675dfd8
 [  101.878190] Call Trace:
 [  101.880953]  [81633743] dump_stack+0x19/0x1e
 [  101.886679]  [81145053] pcpu_alloc+0x9a3/0xa40
 [  101.892754]  [81145103] __alloc_reserved_percpu+0x13/0x20
 [  101.899733]  [810b2d7f] load_module+0x35f/0x1a70
 [  101.905835]  [8163ad6e] ? do_page_fault+0xe/0x10
 [  101.911953]  [810b467b] SyS_init_module+0xfb/0x140
 [  101.918287]  [8163f542] system_call_fastpath+0x16/0x1b
 [  101.924981] kvm_intel: Could not allocate 42 bytes percpu data
 
 
 Wondering if anyone else has seen this with the recent [3.10] based kernels 
 esp.
 on larger boxes?
 
 There was a similar issue that was reported earlier (where modules were being
 loaded per cpu without checking if an instance was already 
 loaded/being-loaded).
 That issue seems to have been addressed in the recent past (e.g.
 https://lkml.org/lkml/2013/1/24/659 along with a couple of follow on 
 cleanups)  
 Is the above yet another variant of the original issue or perhaps some race
 condition that got exposed when there are lot more threads ?

Hmm ... not sure but yeah, that's the likely culprit.

P.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: fix boot of guests with more than 4gb of ram

2013-06-24 Thread Michael Ellerman
On Sun, 2013-06-23 at 21:23 -0400, Sasha Levin wrote:
 Commit kvm tools: virtio: remove hardcoded assumptions
 about guest page size has introduced a bug that prevented
 guests with more than 4gb of ram from booting.
 
 The issue is that 'pfn' is a 32bit integer, so when multiplying
 it by page size to get the actual page will cause an overflow if
 the pfn referred to a memory area above 4gb.

Couldn't we just make pfn 64 bit?

cheers

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6 v5] KVM: PPC: Using struct debug_reg

2013-06-24 Thread Bharat Bhushan
For KVM also use the struct debug_reg defined in asm/processor.h

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/include/asm/kvm_host.h |   13 +
 arch/powerpc/kvm/booke.c|   34 --
 2 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index af326cd..838a577 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -381,17 +381,6 @@ struct kvmppc_slb {
 #define KVMPPC_EPR_USER1 /* exit to userspace to fill EPR */
 #define KVMPPC_EPR_KERNEL  2 /* in-kernel irqchip */
 
-struct kvmppc_booke_debug_reg {
-   u32 dbcr0;
-   u32 dbcr1;
-   u32 dbcr2;
-#ifdef CONFIG_KVM_E500MC
-   u32 dbcr4;
-#endif
-   u64 iac[KVMPPC_BOOKE_MAX_IAC];
-   u64 dac[KVMPPC_BOOKE_MAX_DAC];
-};
-
 #define KVMPPC_IRQ_DEFAULT 0
 #define KVMPPC_IRQ_MPIC1
 #define KVMPPC_IRQ_XICS2
@@ -535,7 +524,7 @@ struct kvm_vcpu_arch {
u32 eptcfg;
u32 epr;
u32 crit_save;
-   struct kvmppc_booke_debug_reg dbg_reg;
+   struct debug_reg dbg_reg;
 #endif
gpa_t paddr_accessed;
gva_t vaddr_accessed;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 62d4ece..3e9fc1d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1424,7 +1424,6 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
int r = 0;
union kvmppc_one_reg val;
int size;
-   long int i;
 
size = one_reg_size(reg-id);
if (size  sizeof(val))
@@ -1432,16 +1431,24 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
 
switch (reg-id) {
case KVM_REG_PPC_IAC1:
+   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac1);
+   break;
case KVM_REG_PPC_IAC2:
+   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac2);
+   break;
+#if CONFIG_PPC_ADV_DEBUG_IACS  2
case KVM_REG_PPC_IAC3:
+   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac3);
+   break;
case KVM_REG_PPC_IAC4:
-   i = reg-id - KVM_REG_PPC_IAC1;
-   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac[i]);
+   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.iac4);
break;
+#endif
case KVM_REG_PPC_DAC1:
+   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.dac1);
+   break;
case KVM_REG_PPC_DAC2:
-   i = reg-id - KVM_REG_PPC_DAC1;
-   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.dac[i]);
+   val = get_reg_val(reg-id, vcpu-arch.dbg_reg.dac2);
break;
case KVM_REG_PPC_EPR: {
u32 epr = get_guest_epr(vcpu);
@@ -1481,7 +1488,6 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
int r = 0;
union kvmppc_one_reg val;
int size;
-   long int i;
 
size = one_reg_size(reg-id);
if (size  sizeof(val))
@@ -1492,16 +1498,24 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
 
switch (reg-id) {
case KVM_REG_PPC_IAC1:
+   vcpu-arch.dbg_reg.iac1 = set_reg_val(reg-id, val);
+   break;
case KVM_REG_PPC_IAC2:
+   vcpu-arch.dbg_reg.iac2 = set_reg_val(reg-id, val);
+   break;
+#if CONFIG_PPC_ADV_DEBUG_IACS  2
case KVM_REG_PPC_IAC3:
+   vcpu-arch.dbg_reg.iac3 = set_reg_val(reg-id, val);
+   break;
case KVM_REG_PPC_IAC4:
-   i = reg-id - KVM_REG_PPC_IAC1;
-   vcpu-arch.dbg_reg.iac[i] = set_reg_val(reg-id, val);
+   vcpu-arch.dbg_reg.iac4 = set_reg_val(reg-id, val);
break;
+#endif
case KVM_REG_PPC_DAC1:
+   vcpu-arch.dbg_reg.dac1 = set_reg_val(reg-id, val);
+   break;
case KVM_REG_PPC_DAC2:
-   i = reg-id - KVM_REG_PPC_DAC1;
-   vcpu-arch.dbg_reg.dac[i] = set_reg_val(reg-id, val);
+   vcpu-arch.dbg_reg.dac2 = set_reg_val(reg-id, val);
break;
case KVM_REG_PPC_EPR: {
u32 new_epr = set_reg_val(reg-id, val);
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6 v5] KVM :PPC: Userspace Debug support

2013-06-24 Thread Bharat Bhushan
From: Bharat Bhushan bharat.bhus...@freescale.com

This patchset adds the userspace debug support for booke/bookehv.
this is tested on powerpc e500v2/e500mc devices.

We are now assuming that debug resource will not be used by kernel for its own 
debugging. It will be used for only kernel user process debugging.
So the kernel debug load interface during context_to is used to load debug 
conext for that selected process.

v4-v5
 - Some comments reworded and other cleanup (like change of function name etc)

v3-v4
 - 4 out of 7 patches of initial patchset were applied.
   This patchset is on and above those 4 patches
 - KVM local struct kvmppc_booke_debug_reg is replaced by
   powerpc global struct debug_reg
 - use switch_booke_debug_regs() for debug register context switch.
 - Save DBSR before kernel pre-emption is enabled.
 - Some more cleanup

v2-v3
 - We are now assuming that debug resource will not be used by
   kernel for its own debugging.
   It will be used for only kernel user process debugging.
   So the kernel debug load interface during context_to is
   used to load debug conext for that selected process.

v1-v2
 - Debug registers are save/restore in vcpu_put/vcpu_get.
   Earlier the debug registers are saved/restored in guest entry/exit

Bharat Bhushan (6):
  powerpc: remove unnecessary line continuations
  powerpc: move debug registers in a structure
  powerpc: export debug register save function for KVM
  KVM: PPC: exit to user space on ehpriv instruction
  KVM: PPC: Using struct debug_reg
  KVM: PPC: Add userspace debug stub support

 arch/powerpc/include/asm/disassemble.h |4 +
 arch/powerpc/include/asm/kvm_host.h|   16 +--
 arch/powerpc/include/asm/processor.h   |   38 +++--
 arch/powerpc/include/asm/reg_booke.h   |8 +-
 arch/powerpc/include/asm/switch_to.h   |4 +
 arch/powerpc/include/uapi/asm/kvm.h|   22 ++-
 arch/powerpc/kernel/asm-offsets.c  |2 +-
 arch/powerpc/kernel/process.c  |   45 +++---
 arch/powerpc/kernel/ptrace.c   |  154 +-
 arch/powerpc/kernel/signal_32.c|6 +-
 arch/powerpc/kernel/traps.c|   35 ++--
 arch/powerpc/kvm/booke.c   |  267 
 arch/powerpc/kvm/booke.h   |5 +
 arch/powerpc/kvm/e500_emulate.c|   27 
 14 files changed, 449 insertions(+), 184 deletions(-)


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6 v5] powerpc: move debug registers in a structure

2013-06-24 Thread Bharat Bhushan
This way we can use same data type struct with KVM and
also help in using other debug related function.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/include/asm/processor.h |   38 +
 arch/powerpc/include/asm/reg_booke.h |8 +-
 arch/powerpc/kernel/asm-offsets.c|2 +-
 arch/powerpc/kernel/process.c|   42 +-
 arch/powerpc/kernel/ptrace.c |  154 +-
 arch/powerpc/kernel/signal_32.c  |6 +-
 arch/powerpc/kernel/traps.c  |   35 
 7 files changed, 146 insertions(+), 139 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index d7e67ca..5b8a7f1 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -147,22 +147,7 @@ typedef struct {
 #define TS_FPR(i) fpr[i][TS_FPROFFSET]
 #define TS_TRANS_FPR(i) transact_fpr[i][TS_FPROFFSET]
 
-struct thread_struct {
-   unsigned long   ksp;/* Kernel stack pointer */
-   unsigned long   ksp_limit;  /* if ksp = ksp_limit stack overflow */
-
-#ifdef CONFIG_PPC64
-   unsigned long   ksp_vsid;
-#endif
-   struct pt_regs  *regs;  /* Pointer to saved register state */
-   mm_segment_tfs; /* for get_fs() validation */
-#ifdef CONFIG_BOOKE
-   /* BookE base exception scratch space; align on cacheline */
-   unsigned long   normsave[8] cacheline_aligned;
-#endif
-#ifdef CONFIG_PPC32
-   void*pgdir; /* root of page-table tree */
-#endif
+struct debug_reg {
 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
/*
 * The following help to manage the use of Debug Control Registers
@@ -199,6 +184,27 @@ struct thread_struct {
unsigned long   dvc2;
 #endif
 #endif
+};
+
+struct thread_struct {
+   unsigned long   ksp;/* Kernel stack pointer */
+   unsigned long   ksp_limit;  /* if ksp = ksp_limit stack overflow */
+
+#ifdef CONFIG_PPC64
+   unsigned long   ksp_vsid;
+#endif
+   struct pt_regs  *regs;  /* Pointer to saved register state */
+   mm_segment_tfs; /* for get_fs() validation */
+#ifdef CONFIG_BOOKE
+   /* BookE base exception scratch space; align on cacheline */
+   unsigned long   normsave[8] cacheline_aligned;
+#endif
+#ifdef CONFIG_PPC32
+   void*pgdir; /* root of page-table tree */
+#endif
+   /* Debug Registers */
+   struct debug_reg debug;
+
/* FP and VSX 0-31 register set */
double  fpr[32][TS_FPRWIDTH];
struct {
diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index b417de3..455dc89 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -381,7 +381,7 @@
 #define DBCR0_IA34T0x4000  /* Instr Addr 3-4 range Toggle */
 #define DBCR0_FT   0x0001  /* Freeze Timers on debug event */
 
-#define dbcr_iac_range(task)   ((task)-thread.dbcr0)
+#define dbcr_iac_range(task)   ((task)-thread.debug.dbcr0)
 #define DBCR_IAC12IDBCR0_IA12  /* Range Inclusive */
 #define DBCR_IAC12X(DBCR0_IA12 | DBCR0_IA12X)  /* Range Exclusive */
 #define DBCR_IAC12MODE (DBCR0_IA12 | DBCR0_IA12X)  /* IAC 1-2 Mode Bits */
@@ -395,7 +395,7 @@
 #define DBCR1_DAC1W0x2000  /* DAC1 Write Debug Event */
 #define DBCR1_DAC2W0x1000  /* DAC2 Write Debug Event */
 
-#define dbcr_dac(task) ((task)-thread.dbcr1)
+#define dbcr_dac(task) ((task)-thread.debug.dbcr1)
 #define DBCR_DAC1R DBCR1_DAC1R
 #define DBCR_DAC1W DBCR1_DAC1W
 #define DBCR_DAC2R DBCR1_DAC2R
@@ -441,7 +441,7 @@
 #define DBCR0_CRET 0x0020  /* Critical Return Debug Event */
 #define DBCR0_FT   0x0001  /* Freeze Timers on debug event */
 
-#define dbcr_dac(task) ((task)-thread.dbcr0)
+#define dbcr_dac(task) ((task)-thread.debug.dbcr0)
 #define DBCR_DAC1R DBCR0_DAC1R
 #define DBCR_DAC1W DBCR0_DAC1W
 #define DBCR_DAC2R DBCR0_DAC2R
@@ -475,7 +475,7 @@
 #define DBCR1_IAC34MX  0x00C0  /* Instr Addr 3-4 range eXclusive */
 #define DBCR1_IAC34AT  0x0001  /* Instr Addr 3-4 range Toggle */
 
-#define dbcr_iac_range(task)   ((task)-thread.dbcr1)
+#define dbcr_iac_range(task)   ((task)-thread.debug.dbcr1)
 #define DBCR_IAC12IDBCR1_IAC12M/* Range Inclusive */
 #define DBCR_IAC12XDBCR1_IAC12MX   /* Range Exclusive */
 #define DBCR_IAC12MODE DBCR1_IAC12MX   /* IAC 1-2 Mode Bits */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index b51a97c..c241c60 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -106,7 +106,7 @@ int main(void)
 #else /* CONFIG_PPC64 */
DEFINE(PGDIR, offsetof(struct thread_struct, pgdir));
 #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE)
-   DEFINE(THREAD_DBCR0, offsetof(struct 

[PATCH 1/6 v5] powerpc: remove unnecessary line continuations

2013-06-24 Thread Bharat Bhushan
Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
v5:
 - no change

 arch/powerpc/kernel/process.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ceb4e7b..639a8de 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -325,7 +325,7 @@ static void set_debug_reg_defaults(struct thread_struct 
*thread)
/*
 * Force User/Supervisor bits to b11 (user-only MSR[PR]=1)
 */
-   thread-dbcr1 = DBCR1_IAC1US | DBCR1_IAC2US |   \
+   thread-dbcr1 = DBCR1_IAC1US | DBCR1_IAC2US |
DBCR1_IAC3US | DBCR1_IAC4US;
/*
 * Force Data Address Compare User/Supervisor bits to be User-only
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support

2013-06-24 Thread Bharat Bhushan
This patch adds the debug stub support on booke/bookehv.
Now QEMU debug stub can use hw breakpoint, watchpoint and
software breakpoint to debug guest.

This is how we save/restore debug register context when switching
between guest, userspace and kernel user-process:

When QEMU is running
 - thread-debug_reg == QEMU debug register context.
 - Kernel will handle switching the debug register on context switch.
 - no vcpu_load() called

QEMU makes ioctls (except RUN)
 - This will call vcpu_load()
 - should not change context.
 - Some ioctls can change vcpu debug register, context saved in 
vcpu-debug_regs

QEMU Makes RUN ioctl
 - Save thread-debug_reg on STACK
 - Store thread-debug_reg == vcpu-debug_reg
 - load thread-debug_reg
 - RUN VCPU ( So thread points to vcpu context )

Context switch happens When VCPU running
 - makes vcpu_load() should not load any context
 - kernel loads the vcpu context as thread-debug_regs points to vcpu context.

On heavyweight_exit
 - Load the context saved on stack in thread-debug_reg

Currently we do not support debug resource emulation to guest,
On debug exception, always exit to user space irrespective of
user space is expecting the debug exception or not. If this is
unexpected exception (breakpoint/watchpoint event not set by
userspace) then let us leave the action on user space. This
is similar to what it was before, only thing is that now we
have proper exit state available to user space.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/include/asm/kvm_host.h |3 +
 arch/powerpc/include/uapi/asm/kvm.h |1 +
 arch/powerpc/kvm/booke.c|  233 ---
 arch/powerpc/kvm/booke.h|5 +
 4 files changed, 224 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 838a577..aeb490d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -524,7 +524,10 @@ struct kvm_vcpu_arch {
u32 eptcfg;
u32 epr;
u32 crit_save;
+   /* guest debug registers*/
struct debug_reg dbg_reg;
+   /* hardware visible debug registers when in guest state */
+   struct debug_reg shadow_dbg_reg;
 #endif
gpa_t paddr_accessed;
gva_t vaddr_accessed;
diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
b/arch/powerpc/include/uapi/asm/kvm.h
index ded0607..f5077c2 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -27,6 +27,7 @@
 #define __KVM_HAVE_PPC_SMT
 #define __KVM_HAVE_IRQCHIP
 #define __KVM_HAVE_IRQ_LINE
+#define __KVM_HAVE_GUEST_DEBUG
 
 struct kvm_regs {
__u64 pc;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 3e9fc1d..8be3502 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu)
 #endif
 }
 
+static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu)
+{
+   /* Synchronize guest's desire to get debug interrupts into shadow MSR */
+#ifndef CONFIG_KVM_BOOKE_HV
+   vcpu-arch.shadow_msr = ~MSR_DE;
+   vcpu-arch.shadow_msr |= vcpu-arch.shared-msr  MSR_DE;
+#endif
+
+   /* Force enable debug interrupts when user space wants to debug */
+   if (vcpu-guest_debug) {
+#ifdef CONFIG_KVM_BOOKE_HV
+   /*
+* Since there is no shadow MSR, sync MSR_DE into the guest
+* visible MSR.
+*/
+   vcpu-arch.shared-msr |= MSR_DE;
+#else
+   vcpu-arch.shadow_msr |= MSR_DE;
+   vcpu-arch.shared-msr = ~MSR_DE;
+#endif
+   }
+}
+
 /*
  * Helper function for full MSR writes.  No need to call this if only
  * EE/CE/ME/DE/RI are changing.
@@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
kvmppc_mmu_msr_notify(vcpu, old_msr);
kvmppc_vcpu_sync_spe(vcpu);
kvmppc_vcpu_sync_fpu(vcpu);
+   kvmppc_vcpu_sync_debug(vcpu);
 }
 
 static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
@@ -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
int ret, s;
+   struct thread_struct thread;
 #ifdef CONFIG_PPC_FPU
unsigned int fpscr;
int fpexc_mode;
@@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 
kvmppc_load_guest_fp(vcpu);
 #endif
+   /* Switch to guest debug context */
+   thread.debug = vcpu-arch.shadow_dbg_reg;
+   switch_booke_debug_regs(thread);
+   thread.debug = current-thread.debug;
+   current-thread.debug = vcpu-arch.shadow_dbg_reg;
 
ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
/* No need for kvm_guest_exit. It's done in handle_exit.
   We also get here with interrupts enabled. */
 
+   /* Switch back to user space debug context */
+   

[PATCH 3/6 v5] powerpc: export debug register save function for KVM

2013-06-24 Thread Bharat Bhushan
KVM need this function when switching from vcpu to user-space
thread. My subsequent patch will use this function.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/include/asm/switch_to.h |4 
 arch/powerpc/kernel/process.c|3 ++-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index 200d763..50b357f 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -30,6 +30,10 @@ extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
 
+#ifdef CONFIG_PPC_ADV_DEBUG_REGS
+extern void switch_booke_debug_regs(struct thread_struct *new_thread);
+#endif
+
 #ifndef CONFIG_SMP
 extern void discard_lazy_cpu_state(void);
 #else
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 01ff496..3375cb7 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -362,12 +362,13 @@ static void prime_debug_regs(struct thread_struct *thread)
  * debug registers, set the debug registers from the values
  * stored in the new thread.
  */
-static void switch_booke_debug_regs(struct thread_struct *new_thread)
+void switch_booke_debug_regs(struct thread_struct *new_thread)
 {
if ((current-thread.debug.dbcr0  DBCR0_IDM)
|| (new_thread-debug.dbcr0  DBCR0_IDM))
prime_debug_regs(new_thread);
 }
+EXPORT_SYMBOL(switch_booke_debug_regs);
 #else  /* !CONFIG_PPC_ADV_DEBUG_REGS */
 #ifndef CONFIG_HAVE_HW_BREAKPOINT
 static void set_debug_reg_defaults(struct thread_struct *thread)
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6 v5] powerpc: export debug register save function for KVM

2013-06-24 Thread Alexander Graf

On 24.06.2013, at 11:08, Bharat Bhushan wrote:

 KVM need this function when switching from vcpu to user-space
 thread. My subsequent patch will use this function.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 arch/powerpc/include/asm/switch_to.h |4 
 arch/powerpc/kernel/process.c|3 ++-
 2 files changed, 6 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/switch_to.h 
 b/arch/powerpc/include/asm/switch_to.h
 index 200d763..50b357f 100644
 --- a/arch/powerpc/include/asm/switch_to.h
 +++ b/arch/powerpc/include/asm/switch_to.h
 @@ -30,6 +30,10 @@ extern void enable_kernel_spe(void);
 extern void giveup_spe(struct task_struct *);
 extern void load_up_spe(struct task_struct *);
 
 +#ifdef CONFIG_PPC_ADV_DEBUG_REGS
 +extern void switch_booke_debug_regs(struct thread_struct *new_thread);
 +#endif
 +
 #ifndef CONFIG_SMP
 extern void discard_lazy_cpu_state(void);
 #else
 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
 index 01ff496..3375cb7 100644
 --- a/arch/powerpc/kernel/process.c
 +++ b/arch/powerpc/kernel/process.c
 @@ -362,12 +362,13 @@ static void prime_debug_regs(struct thread_struct 
 *thread)
  * debug registers, set the debug registers from the values
  * stored in the new thread.
  */
 -static void switch_booke_debug_regs(struct thread_struct *new_thread)
 +void switch_booke_debug_regs(struct thread_struct *new_thread)
 {
   if ((current-thread.debug.dbcr0  DBCR0_IDM)
   || (new_thread-debug.dbcr0  DBCR0_IDM))
   prime_debug_regs(new_thread);
 }
 +EXPORT_SYMBOL(switch_booke_debug_regs);

EXPORT_SYMBOL_GPL?


Alex

 #else /* !CONFIG_PPC_ADV_DEBUG_REGS */
 #ifndef CONFIG_HAVE_HW_BREAKPOINT
 static void set_debug_reg_defaults(struct thread_struct *thread)
 -- 
 1.7.0.4
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/6 v5] powerpc: export debug register save function for KVM

2013-06-24 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, June 24, 2013 3:03 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Wood Scott-B07421;
 tiejun.c...@windriver.com; Bhushan Bharat-R65777
 Subject: Re: [PATCH 3/6 v5] powerpc: export debug register save function for 
 KVM
 
 
 On 24.06.2013, at 11:08, Bharat Bhushan wrote:
 
  KVM need this function when switching from vcpu to user-space thread.
  My subsequent patch will use this function.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/include/asm/switch_to.h |4 
  arch/powerpc/kernel/process.c|3 ++-
  2 files changed, 6 insertions(+), 1 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/switch_to.h
  b/arch/powerpc/include/asm/switch_to.h
  index 200d763..50b357f 100644
  --- a/arch/powerpc/include/asm/switch_to.h
  +++ b/arch/powerpc/include/asm/switch_to.h
  @@ -30,6 +30,10 @@ extern void enable_kernel_spe(void); extern void
  giveup_spe(struct task_struct *); extern void load_up_spe(struct
  task_struct *);
 
  +#ifdef CONFIG_PPC_ADV_DEBUG_REGS
  +extern void switch_booke_debug_regs(struct thread_struct
  +*new_thread); #endif
  +
  #ifndef CONFIG_SMP
  extern void discard_lazy_cpu_state(void); #else diff --git
  a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index
  01ff496..3375cb7 100644
  --- a/arch/powerpc/kernel/process.c
  +++ b/arch/powerpc/kernel/process.c
  @@ -362,12 +362,13 @@ static void prime_debug_regs(struct
  thread_struct *thread)
   * debug registers, set the debug registers from the values
   * stored in the new thread.
   */
  -static void switch_booke_debug_regs(struct thread_struct *new_thread)
  +void switch_booke_debug_regs(struct thread_struct *new_thread)
  {
  if ((current-thread.debug.dbcr0  DBCR0_IDM)
  || (new_thread-debug.dbcr0  DBCR0_IDM))
  prime_debug_regs(new_thread);
  }
  +EXPORT_SYMBOL(switch_booke_debug_regs);
 
 EXPORT_SYMBOL_GPL?

Oops, I missed this comment. Will correct in next version. 

-Bharat

 
 
 Alex
 
  #else   /* !CONFIG_PPC_ADV_DEBUG_REGS */
  #ifndef CONFIG_HAVE_HW_BREAKPOINT
  static void set_debug_reg_defaults(struct thread_struct *thread)
  --
  1.7.0.4
 
 
  --
  To unsubscribe from this list: send the line unsubscribe kvm-ppc in
  the body of a message to majord...@vger.kernel.org More majordomo info
  at  http://vger.kernel.org/majordomo-info.html
 


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support

2013-06-24 Thread Alexander Graf

On 24.06.2013, at 11:08, Bharat Bhushan wrote:

 This patch adds the debug stub support on booke/bookehv.
 Now QEMU debug stub can use hw breakpoint, watchpoint and
 software breakpoint to debug guest.
 
 This is how we save/restore debug register context when switching
 between guest, userspace and kernel user-process:
 
 When QEMU is running
 - thread-debug_reg == QEMU debug register context.
 - Kernel will handle switching the debug register on context switch.
 - no vcpu_load() called
 
 QEMU makes ioctls (except RUN)
 - This will call vcpu_load()
 - should not change context.
 - Some ioctls can change vcpu debug register, context saved in 
 vcpu-debug_regs
 
 QEMU Makes RUN ioctl
 - Save thread-debug_reg on STACK
 - Store thread-debug_reg == vcpu-debug_reg
 - load thread-debug_reg
 - RUN VCPU ( So thread points to vcpu context )
 
 Context switch happens When VCPU running
 - makes vcpu_load() should not load any context
 - kernel loads the vcpu context as thread-debug_regs points to vcpu context.
 
 On heavyweight_exit
 - Load the context saved on stack in thread-debug_reg
 
 Currently we do not support debug resource emulation to guest,
 On debug exception, always exit to user space irrespective of
 user space is expecting the debug exception or not. If this is
 unexpected exception (breakpoint/watchpoint event not set by
 userspace) then let us leave the action on user space. This
 is similar to what it was before, only thing is that now we
 have proper exit state available to user space.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 arch/powerpc/include/asm/kvm_host.h |3 +
 arch/powerpc/include/uapi/asm/kvm.h |1 +
 arch/powerpc/kvm/booke.c|  233 ---
 arch/powerpc/kvm/booke.h|5 +
 4 files changed, 224 insertions(+), 18 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index 838a577..aeb490d 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -524,7 +524,10 @@ struct kvm_vcpu_arch {
   u32 eptcfg;
   u32 epr;
   u32 crit_save;
 + /* guest debug registers*/
   struct debug_reg dbg_reg;
 + /* hardware visible debug registers when in guest state */
 + struct debug_reg shadow_dbg_reg;
 #endif
   gpa_t paddr_accessed;
   gva_t vaddr_accessed;
 diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
 b/arch/powerpc/include/uapi/asm/kvm.h
 index ded0607..f5077c2 100644
 --- a/arch/powerpc/include/uapi/asm/kvm.h
 +++ b/arch/powerpc/include/uapi/asm/kvm.h
 @@ -27,6 +27,7 @@
 #define __KVM_HAVE_PPC_SMT
 #define __KVM_HAVE_IRQCHIP
 #define __KVM_HAVE_IRQ_LINE
 +#define __KVM_HAVE_GUEST_DEBUG
 
 struct kvm_regs {
   __u64 pc;
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index 3e9fc1d..8be3502 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu)
 #endif
 }
 
 +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu)
 +{
 + /* Synchronize guest's desire to get debug interrupts into shadow MSR */
 +#ifndef CONFIG_KVM_BOOKE_HV
 + vcpu-arch.shadow_msr = ~MSR_DE;
 + vcpu-arch.shadow_msr |= vcpu-arch.shared-msr  MSR_DE;
 +#endif
 +
 + /* Force enable debug interrupts when user space wants to debug */
 + if (vcpu-guest_debug) {
 +#ifdef CONFIG_KVM_BOOKE_HV
 + /*
 +  * Since there is no shadow MSR, sync MSR_DE into the guest
 +  * visible MSR.
 +  */
 + vcpu-arch.shared-msr |= MSR_DE;
 +#else
 + vcpu-arch.shadow_msr |= MSR_DE;
 + vcpu-arch.shared-msr = ~MSR_DE;
 +#endif
 + }
 +}
 +
 /*
  * Helper function for full MSR writes.  No need to call this if only
  * EE/CE/ME/DE/RI are changing.
 @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
   kvmppc_mmu_msr_notify(vcpu, old_msr);
   kvmppc_vcpu_sync_spe(vcpu);
   kvmppc_vcpu_sync_fpu(vcpu);
 + kvmppc_vcpu_sync_debug(vcpu);
 }
 
 static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
 @@ -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
   int ret, s;
 + struct thread_struct thread;
 #ifdef CONFIG_PPC_FPU
   unsigned int fpscr;
   int fpexc_mode;
 @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
 kvm_vcpu *vcpu)
 
   kvmppc_load_guest_fp(vcpu);
 #endif
 + /* Switch to guest debug context */
 + thread.debug = vcpu-arch.shadow_dbg_reg;
 + switch_booke_debug_regs(thread);
 + thread.debug = current-thread.debug;
 + current-thread.debug = vcpu-arch.shadow_dbg_reg;
 
   ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
   /* No need for kvm_guest_exit. It's done in handle_exit.
  We also get here with interrupts enabled. */

RE: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support

2013-06-24 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, June 24, 2013 4:13 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Wood Scott-B07421;
 tiejun.c...@windriver.com; Bhushan Bharat-R65777
 Subject: Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support
 
 
 On 24.06.2013, at 11:08, Bharat Bhushan wrote:
 
  This patch adds the debug stub support on booke/bookehv.
  Now QEMU debug stub can use hw breakpoint, watchpoint and software
  breakpoint to debug guest.
 
  This is how we save/restore debug register context when switching
  between guest, userspace and kernel user-process:
 
  When QEMU is running
  - thread-debug_reg == QEMU debug register context.
  - Kernel will handle switching the debug register on context switch.
  - no vcpu_load() called
 
  QEMU makes ioctls (except RUN)
  - This will call vcpu_load()
  - should not change context.
  - Some ioctls can change vcpu debug register, context saved in
  - vcpu-debug_regs
 
  QEMU Makes RUN ioctl
  - Save thread-debug_reg on STACK
  - Store thread-debug_reg == vcpu-debug_reg load thread-debug_reg
  - RUN VCPU ( So thread points to vcpu context )
 
  Context switch happens When VCPU running
  - makes vcpu_load() should not load any context kernel loads the vcpu
  - context as thread-debug_regs points to vcpu context.
 
  On heavyweight_exit
  - Load the context saved on stack in thread-debug_reg
 
  Currently we do not support debug resource emulation to guest, On
  debug exception, always exit to user space irrespective of user space
  is expecting the debug exception or not. If this is unexpected
  exception (breakpoint/watchpoint event not set by
  userspace) then let us leave the action on user space. This is similar
  to what it was before, only thing is that now we have proper exit
  state available to user space.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  arch/powerpc/include/asm/kvm_host.h |3 +
  arch/powerpc/include/uapi/asm/kvm.h |1 +
  arch/powerpc/kvm/booke.c|  233 
  ---
  arch/powerpc/kvm/booke.h|5 +
  4 files changed, 224 insertions(+), 18 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/kvm_host.h
  b/arch/powerpc/include/asm/kvm_host.h
  index 838a577..aeb490d 100644
  --- a/arch/powerpc/include/asm/kvm_host.h
  +++ b/arch/powerpc/include/asm/kvm_host.h
  @@ -524,7 +524,10 @@ struct kvm_vcpu_arch {
  u32 eptcfg;
  u32 epr;
  u32 crit_save;
  +   /* guest debug registers*/
  struct debug_reg dbg_reg;
  +   /* hardware visible debug registers when in guest state */
  +   struct debug_reg shadow_dbg_reg;
  #endif
  gpa_t paddr_accessed;
  gva_t vaddr_accessed;
  diff --git a/arch/powerpc/include/uapi/asm/kvm.h
  b/arch/powerpc/include/uapi/asm/kvm.h
  index ded0607..f5077c2 100644
  --- a/arch/powerpc/include/uapi/asm/kvm.h
  +++ b/arch/powerpc/include/uapi/asm/kvm.h
  @@ -27,6 +27,7 @@
  #define __KVM_HAVE_PPC_SMT
  #define __KVM_HAVE_IRQCHIP
  #define __KVM_HAVE_IRQ_LINE
  +#define __KVM_HAVE_GUEST_DEBUG
 
  struct kvm_regs {
  __u64 pc;
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index
  3e9fc1d..8be3502 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu
  *vcpu) #endif }
 
  +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) {
  +   /* Synchronize guest's desire to get debug interrupts into shadow
  +MSR */ #ifndef CONFIG_KVM_BOOKE_HV
  +   vcpu-arch.shadow_msr = ~MSR_DE;
  +   vcpu-arch.shadow_msr |= vcpu-arch.shared-msr  MSR_DE; #endif
  +
  +   /* Force enable debug interrupts when user space wants to debug */
  +   if (vcpu-guest_debug) {
  +#ifdef CONFIG_KVM_BOOKE_HV
  +   /*
  +* Since there is no shadow MSR, sync MSR_DE into the guest
  +* visible MSR.
  +*/
  +   vcpu-arch.shared-msr |= MSR_DE;
  +#else
  +   vcpu-arch.shadow_msr |= MSR_DE;
  +   vcpu-arch.shared-msr = ~MSR_DE;
  +#endif
  +   }
  +}
  +
  /*
   * Helper function for full MSR writes.  No need to call this if
  only
   * EE/CE/ME/DE/RI are changing.
  @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
  kvmppc_mmu_msr_notify(vcpu, old_msr);
  kvmppc_vcpu_sync_spe(vcpu);
  kvmppc_vcpu_sync_fpu(vcpu);
  +   kvmppc_vcpu_sync_debug(vcpu);
  }
 
  static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, @@
  -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
  int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) {
  int ret, s;
  +   struct thread_struct thread;
  #ifdef CONFIG_PPC_FPU
  unsigned int fpscr;
  int fpexc_mode;
  @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run,
  struct kvm_vcpu *vcpu)
 
  kvmppc_load_guest_fp(vcpu);
  #endif
  +   /* Switch to 

Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support

2013-06-24 Thread Alexander Graf

On 24.06.2013, at 13:22, Bhushan Bharat-R65777 wrote:

 
 
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Monday, June 24, 2013 4:13 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Wood Scott-B07421;
 tiejun.c...@windriver.com; Bhushan Bharat-R65777
 Subject: Re: [PATCH 6/6 v5] KVM: PPC: Add userspace debug stub support
 
 
 On 24.06.2013, at 11:08, Bharat Bhushan wrote:
 
 This patch adds the debug stub support on booke/bookehv.
 Now QEMU debug stub can use hw breakpoint, watchpoint and software
 breakpoint to debug guest.
 
 This is how we save/restore debug register context when switching
 between guest, userspace and kernel user-process:
 
 When QEMU is running
 - thread-debug_reg == QEMU debug register context.
 - Kernel will handle switching the debug register on context switch.
 - no vcpu_load() called
 
 QEMU makes ioctls (except RUN)
 - This will call vcpu_load()
 - should not change context.
 - Some ioctls can change vcpu debug register, context saved in
 - vcpu-debug_regs
 
 QEMU Makes RUN ioctl
 - Save thread-debug_reg on STACK
 - Store thread-debug_reg == vcpu-debug_reg load thread-debug_reg
 - RUN VCPU ( So thread points to vcpu context )
 
 Context switch happens When VCPU running
 - makes vcpu_load() should not load any context kernel loads the vcpu
 - context as thread-debug_regs points to vcpu context.
 
 On heavyweight_exit
 - Load the context saved on stack in thread-debug_reg
 
 Currently we do not support debug resource emulation to guest, On
 debug exception, always exit to user space irrespective of user space
 is expecting the debug exception or not. If this is unexpected
 exception (breakpoint/watchpoint event not set by
 userspace) then let us leave the action on user space. This is similar
 to what it was before, only thing is that now we have proper exit
 state available to user space.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 arch/powerpc/include/asm/kvm_host.h |3 +
 arch/powerpc/include/uapi/asm/kvm.h |1 +
 arch/powerpc/kvm/booke.c|  233 
 ---
 arch/powerpc/kvm/booke.h|5 +
 4 files changed, 224 insertions(+), 18 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_host.h
 b/arch/powerpc/include/asm/kvm_host.h
 index 838a577..aeb490d 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -524,7 +524,10 @@ struct kvm_vcpu_arch {
 u32 eptcfg;
 u32 epr;
 u32 crit_save;
 +   /* guest debug registers*/
 struct debug_reg dbg_reg;
 +   /* hardware visible debug registers when in guest state */
 +   struct debug_reg shadow_dbg_reg;
 #endif
 gpa_t paddr_accessed;
 gva_t vaddr_accessed;
 diff --git a/arch/powerpc/include/uapi/asm/kvm.h
 b/arch/powerpc/include/uapi/asm/kvm.h
 index ded0607..f5077c2 100644
 --- a/arch/powerpc/include/uapi/asm/kvm.h
 +++ b/arch/powerpc/include/uapi/asm/kvm.h
 @@ -27,6 +27,7 @@
 #define __KVM_HAVE_PPC_SMT
 #define __KVM_HAVE_IRQCHIP
 #define __KVM_HAVE_IRQ_LINE
 +#define __KVM_HAVE_GUEST_DEBUG
 
 struct kvm_regs {
 __u64 pc;
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index
 3e9fc1d..8be3502 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -133,6 +133,29 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu
 *vcpu) #endif }
 
 +static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) {
 +   /* Synchronize guest's desire to get debug interrupts into shadow
 +MSR */ #ifndef CONFIG_KVM_BOOKE_HV
 +   vcpu-arch.shadow_msr = ~MSR_DE;
 +   vcpu-arch.shadow_msr |= vcpu-arch.shared-msr  MSR_DE; #endif
 +
 +   /* Force enable debug interrupts when user space wants to debug */
 +   if (vcpu-guest_debug) {
 +#ifdef CONFIG_KVM_BOOKE_HV
 +   /*
 +* Since there is no shadow MSR, sync MSR_DE into the guest
 +* visible MSR.
 +*/
 +   vcpu-arch.shared-msr |= MSR_DE;
 +#else
 +   vcpu-arch.shadow_msr |= MSR_DE;
 +   vcpu-arch.shared-msr = ~MSR_DE;
 +#endif
 +   }
 +}
 +
 /*
 * Helper function for full MSR writes.  No need to call this if
 only
 * EE/CE/ME/DE/RI are changing.
 @@ -150,6 +173,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
 kvmppc_mmu_msr_notify(vcpu, old_msr);
 kvmppc_vcpu_sync_spe(vcpu);
 kvmppc_vcpu_sync_fpu(vcpu);
 +   kvmppc_vcpu_sync_debug(vcpu);
 }
 
 static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, @@
 -655,6 +679,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) {
 int ret, s;
 +   struct thread_struct thread;
 #ifdef CONFIG_PPC_FPU
 unsigned int fpscr;
 int fpexc_mode;
 @@ -698,12 +723,21 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run,
 struct kvm_vcpu *vcpu)
 
 kvmppc_load_guest_fp(vcpu);
 #endif
 +   /* Switch to guest debug context */
 +   thread.debug =