date:20130603

Re: [PATCH RFC V9 0/19] Paravirtualized ticket spinlocks

2013-06-03 Thread Raghavendra K T


On 06/03/2013 07:10 AM, Raghavendra K T wrote:

On 06/02/2013 09:50 PM, Jiannan Ouyang wrote:

On Sun, Jun 2, 2013 at 1:07 AM, Gleb Natapov g...@redhat.com wrote:


High level question here. We have a big hope for Preemptable Ticket
Spinlock patch series by Jiannan Ouyang to solve most, if not all,
ticketing spinlocks in overcommit scenarios problem without need for PV.
So how this patch series compares with his patches on PLE enabled
processors?



No experiment results yet.

An error is reported on a 20 core VM. I'm during an internship
relocation, and will start work on it next week.


Preemptable spinlocks' testing update:
I hit the same softlockup problem while testing on 32 core machine with
32 guest vcpus that Andrew had reported.

After that i started tuning TIMEOUT_UNIT, and when I went till (18),
things seemed to be manageable for undercommit cases.
But I still see degradation for undercommit w.r.t baseline itself on 32
core machine (after tuning).

(37.5% degradation w.r.t base line).
I can give the full report after the all tests complete.

For over-commit cases, I again started hitting softlockups (and
degradation is worse). But as I said in the preemptable thread, the
concept of preemptable locks looks promising (though I am still not a
fan of  embedded TIMEOUT mechanism)

Here is my opinion of TODOs for preemptable locks to make it better ( I
think I need to paste in the preemptable thread also)

1. Current TIMEOUT UNIT seem to be on higher side and also it does not
scale well with large guests and also overcommit. we need to have a
sort of adaptive mechanism and better is sort of different TIMEOUT_UNITS
for different types of lock too. The hashing mechanism that was used in
Rik's spinlock backoff series fits better probably.

2. I do not think TIMEOUT_UNIT itself would work great when we have a
big queue (for large guests / overcommits) for lock.
one way is to add a PV hook that does yield hypercall immediately for
the waiters above some THRESHOLD so that they don't burn the CPU.
( I can do POC to check if  that idea works in improving situation
at some later point of time)



Preemptable-lock results from my run with 2^8 TIMEOUT:

+---+---+---++---+
 ebizzy (records/sec) higher is better
+---+---+---++---+
basestdevpatchedstdev%improvement
+---+---+---++---+
1x  5574.9000   237.49973484.2000   113.4449   -37.50202
2x  2741.5000   561.3090 351.5000   140.5420   -87.17855
3x  2146.2500   216.7718 194.833385.0303   -90.92215
4x  1663.   141.9235 101.57.7853   -93.92664
+---+---+---++---+
+---+---+---++---+
   dbench  (Throughput) higher is better
+---+---+---++---+
 basestdevpatchedstdev%improvement
+---+---+---++---+
1x  14111.5600   754.4525   3930.1602   2547.2369-72.14936
2x  2481.627071.2665  181.181689.5368-92.69908
3x  1510.248331.8634  104.724353.2470-93.06576
4x  1029.487516.9166   72.373838.2432-92.96992
+---+---+---++---+

Note we can not trust on overcommit results because of softlock-ups

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [SeaBIOS] KVM call agenda for 2013-05-28

2013-06-03 Thread Paolo Bonzini

Il 02/06/2013 17:05, Gleb Natapov ha scritto:
 Anthony requested that patches be made that generate the ACPI tables
 in QEMU for the upcoming hotplug work, so that they could be evaluated
 to see if they truly do need to live in QEMU or if the code could live
 in the firmware.  There were no objections.

 I volunteered to implement this.

 Why hotplug should generate ACPI code? It does not do so on real HW.

Hotplug can do a LoadTable and merge it into the existing ones.  But
then you do not need QEMU-time generation of tables to do the same thing
for cold-plug.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: fix sil/dil/bpl/spl in the mod/rm fields

2013-06-03 Thread Paolo Bonzini

Il 02/06/2013 20:12, Gleb Natapov ha scritto:
 On Thu, May 30, 2013 at 04:35:55PM +0200, Paolo Bonzini wrote:
 The x86-64 extended low-byte registers were fetched correctly from reg,
 but not from mod/rm.

 This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
 not enough.

 Did I missed unit test patch? :)

I wanted to ask the GSoC student to do it.  If it doesn't come in a
couple of weeks, I'll send it.

Paolo

 Cc: gnata...@redhat.com
 Cc: kvm@vger.kernel.org
 Cc: sta...@vger.kernel.org # 3.9
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch/x86/kvm/emulate.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index aa68106..028b34f 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
  ctxt-modrm_seg = VCPU_SREG_DS;
  
  if (ctxt-modrm_mod == 3) {
 +int highbyte_regs = ctxt-rex_prefix == 0;
 +
  op-type = OP_REG;
  op-bytes = (ctxt-d  ByteOp) ? 1 : ctxt-op_bytes;
 -op-addr.reg = decode_register(ctxt, ctxt-modrm_rm, ctxt-d  
 ByteOp);
 +op-addr.reg = decode_register(ctxt, ctxt-modrm_rm,
 +   highbyte_regs  (ctxt-d  
 ByteOp));
  if (ctxt-d  Sse) {
  op-type = OP_XMM;
  op-bytes = 16;
 -- 
 1.8.1.4
 
 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kvm-unit-tests] pmu: fixes for Sandy Bridge hosts

2013-06-03 Thread Paolo Bonzini

Il 02/06/2013 17:32, Gleb Natapov ha scritto:
 On Thu, May 30, 2013 at 07:43:07PM +0200, Paolo Bonzini wrote:
 This patch includes two fixes for SB:

 * the 3rd fixed counter (ref cpu cycles) can sometimes report
   less than the number of iterations

 Is it documented? It is strange for architectural counter to behave
 differently on different architectures.

It just counts the CPU cycles.  If the CPU can optimize the loop better,
it will take less CPU cycles to execute it.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kvm-unit-tests v2] pmu: fixes for Sandy Bridge hosts

2013-06-03 Thread Paolo Bonzini

Il 02/06/2013 17:33, Gleb Natapov ha scritto:
 On Thu, May 30, 2013 at 07:47:18PM +0200, Paolo Bonzini wrote:
 This patch includes two fixes for SB:

 * the 3rd fixed counter (ref cpu cycles) can sometimes report
   less than the number of iterations

 * there is an 8th counter which causes out of bounds accesses
   to gp_event or check_counters_many's cnt array

 There is still a bug in KVM, because the pmu all counters-0
 test fails.  (It passes if you use any 6 of the 8 gp counters,
 fails if you use 7 or 8).

 Changelog?

Changelog is simply that the patch applies. :(

v1 was hand-edited and I did not regenerate it after testing the edit:

-@@ -395,6 +396,14 @@ int main(int ac, char **av)
+@@ -395,6 +396,10 @@ int main(int ac, char **av)

Paolo

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  x86/pmu.c | 21 +++--
  1 file changed, 15 insertions(+), 6 deletions(-)

 diff --git a/x86/pmu.c b/x86/pmu.c
 index 2c46f31..dca753a 100644
 --- a/x86/pmu.c
 +++ b/x86/pmu.c
 @@ -88,9 +88,10 @@ struct pmu_event {
  }, fixed_events[] = {
  {fixed 1, MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
  {fixed 2, MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N},
 -{fixed 3, MSR_CORE_PERF_FIXED_CTR0 + 2, 1*N, 30*N}
 +{fixed 3, MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
  };
  
 +static int num_counters;
  static int tests, failures;
  
  char *buf;
 @@ -237,7 +238,7 @@ static void check_gp_counter(struct pmu_event *evt)
  };
  int i;
  
 -for (i = 0; i  eax.split.num_counters; i++, cnt.ctr++) {
 +for (i = 0; i  num_counters; i++, cnt.ctr++) {
  cnt.count = 0;
  measure(cnt, 1);
  report(evt-name, i, verify_event(cnt.count, evt));
 @@ -276,7 +277,7 @@ static void check_counters_many(void)
  pmu_counter_t cnt[10];
  int i, n;
  
 -for (i = 0, n = 0; n  eax.split.num_counters; i++) {
 +for (i = 0, n = 0; n  num_counters; i++) {
  if (ebx.full  (1  i))
  continue;
  
 @@ -316,10 +317,10 @@ static void check_counter_overflow(void)
  /* clear status before test */
  wrmsr(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 
 rdmsr(MSR_CORE_PERF_GLOBAL_STATUS));
  
 -for (i = 0; i  eax.split.num_counters + 1; i++, cnt.ctr++) {
 +for (i = 0; i  num_counters + 1; i++, cnt.ctr++) {
  uint64_t status;
  int idx;
 -if (i == eax.split.num_counters)
 +if (i == num_counters)
  cnt.ctr = fixed_events[0].unit_sel;
  if (i % 2)
  cnt.config |= EVNTSEL_INT;
 @@ -355,7 +356,7 @@ static void check_rdpmc(void)
  uint64_t val = 0x1f3456789ull;
  int i;
  
 -for (i = 0; i  eax.split.num_counters; i++) {
 +for (i = 0; i  num_counters; i++) {
  uint64_t x = (val  0x) |
  ((1ull  (eax.split.bit_width - 32)) - 1)  32;
  wrmsr(MSR_IA32_PERFCTR0 + i, val);
 @@ -395,6 +396,10 @@ int main(int ac, char **av)
  printf(Fixed counters:  %d\n, edx.split.num_counters_fixed);
  printf(Fixed counter width: %d\n, edx.split.bit_width_fixed);
  
 +num_counters = eax.split.num_counters;
 +if (num_counters  ARRAY_SIZE(gp_events))
 +num_counters = ARRAY_SIZE(gp_events);
 +
  apic_write(APIC_LVTPC, PC_VECTOR);
  
  check_gp_counters();
 -- 
 1.8.2.1

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kvm-unit-tests] pmu: fixes for Sandy Bridge hosts

2013-06-03 Thread Gleb Natapov

On Mon, Jun 03, 2013 at 08:33:13AM +0200, Paolo Bonzini wrote:
 Il 02/06/2013 17:32, Gleb Natapov ha scritto:
  On Thu, May 30, 2013 at 07:43:07PM +0200, Paolo Bonzini wrote:
  This patch includes two fixes for SB:
 
  * the 3rd fixed counter (ref cpu cycles) can sometimes report
less than the number of iterations
 
  Is it documented? It is strange for architectural counter to behave
  differently on different architectures.
 
 It just counts the CPU cycles.  If the CPU can optimize the loop better,
 it will take less CPU cycles to execute it.
 
We should try and change the loop so that it will not be so easily optimized.
Making the test succeed if only 10% percent of cycles were spend on a loop
may result in the test missing the case when counter counts something
different.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kvm-unit-tests v2] pmu: fixes for Sandy Bridge hosts

2013-06-03 Thread Gleb Natapov

On Mon, Jun 03, 2013 at 08:35:21AM +0200, Paolo Bonzini wrote:
 Il 02/06/2013 17:33, Gleb Natapov ha scritto:
  On Thu, May 30, 2013 at 07:47:18PM +0200, Paolo Bonzini wrote:
  This patch includes two fixes for SB:
 
  * the 3rd fixed counter (ref cpu cycles) can sometimes report
less than the number of iterations
 
  * there is an 8th counter which causes out of bounds accesses
to gp_event or check_counters_many's cnt array
 
  There is still a bug in KVM, because the pmu all counters-0
  test fails.  (It passes if you use any 6 of the 8 gp counters,
  fails if you use 7 or 8).
 
  Changelog?
 
 Changelog is simply that the patch applies. :(
 
Just say so, I tried to see how they are different and failed.

 v1 was hand-edited and I did not regenerate it after testing the edit:
 
 -@@ -395,6 +396,14 @@ int main(int ac, char **av)
 +@@ -395,6 +396,10 @@ int main(int ac, char **av)
 
Yeah, hard to spot :)

 Paolo
 
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
  ---
   x86/pmu.c | 21 +++--
   1 file changed, 15 insertions(+), 6 deletions(-)
 
  diff --git a/x86/pmu.c b/x86/pmu.c
  index 2c46f31..dca753a 100644
  --- a/x86/pmu.c
  +++ b/x86/pmu.c
  @@ -88,9 +88,10 @@ struct pmu_event {
   }, fixed_events[] = {
 {fixed 1, MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
 {fixed 2, MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N},
  -  {fixed 3, MSR_CORE_PERF_FIXED_CTR0 + 2, 1*N, 30*N}
  +  {fixed 3, MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
   };
   
  +static int num_counters;
   static int tests, failures;
   
   char *buf;
  @@ -237,7 +238,7 @@ static void check_gp_counter(struct pmu_event *evt)
 };
 int i;
   
  -  for (i = 0; i  eax.split.num_counters; i++, cnt.ctr++) {
  +  for (i = 0; i  num_counters; i++, cnt.ctr++) {
 cnt.count = 0;
 measure(cnt, 1);
 report(evt-name, i, verify_event(cnt.count, evt));
  @@ -276,7 +277,7 @@ static void check_counters_many(void)
 pmu_counter_t cnt[10];
 int i, n;
   
  -  for (i = 0, n = 0; n  eax.split.num_counters; i++) {
  +  for (i = 0, n = 0; n  num_counters; i++) {
 if (ebx.full  (1  i))
 continue;
   
  @@ -316,10 +317,10 @@ static void check_counter_overflow(void)
 /* clear status before test */
 wrmsr(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 
  rdmsr(MSR_CORE_PERF_GLOBAL_STATUS));
   
  -  for (i = 0; i  eax.split.num_counters + 1; i++, cnt.ctr++) {
  +  for (i = 0; i  num_counters + 1; i++, cnt.ctr++) {
 uint64_t status;
 int idx;
  -  if (i == eax.split.num_counters)
  +  if (i == num_counters)
 cnt.ctr = fixed_events[0].unit_sel;
 if (i % 2)
 cnt.config |= EVNTSEL_INT;
  @@ -355,7 +356,7 @@ static void check_rdpmc(void)
 uint64_t val = 0x1f3456789ull;
 int i;
   
  -  for (i = 0; i  eax.split.num_counters; i++) {
  +  for (i = 0; i  num_counters; i++) {
 uint64_t x = (val  0x) |
 ((1ull  (eax.split.bit_width - 32)) - 1)  32;
 wrmsr(MSR_IA32_PERFCTR0 + i, val);
  @@ -395,6 +396,10 @@ int main(int ac, char **av)
 printf(Fixed counters:  %d\n, edx.split.num_counters_fixed);
 printf(Fixed counter width: %d\n, edx.split.bit_width_fixed);
   
  +  num_counters = eax.split.num_counters;
  +  if (num_counters  ARRAY_SIZE(gp_events))
  +  num_counters = ARRAY_SIZE(gp_events);
  +
 apic_write(APIC_LVTPC, PC_VECTOR);
   
 check_gp_counters();
  -- 
  1.8.2.1
 
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
  --
  Gleb.
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH uq/master] fix double free the memslot in kvm_set_phys_mem

2013-06-03 Thread Gleb Natapov

On Fri, May 31, 2013 at 04:52:18PM +0800, Xiao Guangrong wrote:
 Luiz Capitulino reported that guest refused to boot and qemu
 complained with:
 kvm_set_phys_mem: error unregistering overlapping slot: Invalid argument
 
 It is caused by commit 235e8982ad that did double free for the memslot
 so that the second one raises the -EINVAL error
 
 Fix it by reset memory size only if it is needed
 
 Reported-by: Luiz Capitulino lcapitul...@redhat.com
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
Thanks, applied.

 ---
  kvm-all.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index 8e7bbf8..405480e 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -206,7 +206,8 @@ static int kvm_set_user_memory_region(KVMState *s, 
 KVMSlot *slot)
  if (s-migration_log) {
  mem.flags |= KVM_MEM_LOG_DIRTY_PAGES;
  }
 -if (mem.flags  KVM_MEM_READONLY) {
 +
 +if (slot-memory_size  mem.flags  KVM_MEM_READONLY) {
  /* Set the slot size to 0 before setting the slot to the desired
   * value. This is needed based on KVM commit 75d61fbc. */
  mem.memory_size = 0;
 -- 
 1.7.7.6

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kvm-unit-tests] pmu: fixes for Sandy Bridge hosts

2013-06-03 Thread Paolo Bonzini

Il 03/06/2013 08:38, Gleb Natapov ha scritto:
 On Mon, Jun 03, 2013 at 08:33:13AM +0200, Paolo Bonzini wrote:
 Il 02/06/2013 17:32, Gleb Natapov ha scritto:
 On Thu, May 30, 2013 at 07:43:07PM +0200, Paolo Bonzini wrote:
 This patch includes two fixes for SB:

 * the 3rd fixed counter (ref cpu cycles) can sometimes report
   less than the number of iterations

 Is it documented? It is strange for architectural counter to behave
 differently on different architectures.

 It just counts the CPU cycles.  If the CPU can optimize the loop better,
 it will take less CPU cycles to execute it.

 We should try and change the loop so that it will not be so easily optimized.
 Making the test succeed if only 10% percent of cycles were spend on a loop
 may result in the test missing the case when counter counts something
 different.

Any hard-to-optimize loop risks becoming wrong on the other side (e.g.
if something stalls the pipeline, a newer chip with longer pipeline will
use more CPU cycles).

Turbo boost could also contribute to lowering the number of cycles; a
boosted processor has ref cpu cycles that are _longer_ than the regular
cycles (thus they count in smaller numbers).  Maybe that's why core
cycles didn't go below N.

The real result was something like 0.8*N (780-83).  I used 0.1*N
because it is used for the ref cpu cycles gp counter, which is not the
same but similar.  Should I change it to 0.5*N or so?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM call agenda for 2013-05-28

2013-06-03 Thread Jordan Justen

On Sun, Jun 2, 2013 at 2:43 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Fri, May 31, 2013 at 01:45:55PM +0200, Laszlo Ersek wrote:
 On 05/31/13 09:09, Jordan Justen wrote:

  Why is updating the ACPI tables in seabios viewed as such a burden?
  Either qemu does it, or seabios... (And, OVMF too, but I don't think
  you guys are concerned with that. :)

 I am :)

  On the flip side, why is moving the ACPI tables to QEMU such an issue?
  It seems like Xen and virtualbox both already do this. Why is running
  iasl not an issue for them?

 I think something was mentioned about iasl having problems on BE
 machines? I could be easily wrong but I *guess* qemu's hosts x targets
 (emulate what on what) set is a proper superset of xen's and
 virtualbox's. Presumably if you want to run an x86 guest on a MIPS host,
 and also want to build qemu on the same MIPS (or SPARC) host, you'd have
 to run iasl there too.

 You guys should take a look at the patch series I posted.

 That's solved there by the means of keeping iasl output in qemu git tree.
 configure checks for a working iasl and enables/disables
 using this pre-processed output accordingly.
 Everyone developing ASL code would still need working iasl
 but that's already the case today.

I'm sorry the I haven't had time to review your series yet. But, from
what you saying about it in this thread, it sounds like a good plan.

-Jordan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kvm-unit-tests] pmu: fixes for Sandy Bridge hosts

2013-06-03 Thread Gleb Natapov

On Mon, Jun 03, 2013 at 09:08:46AM +0200, Paolo Bonzini wrote:
 Il 03/06/2013 08:38, Gleb Natapov ha scritto:
  On Mon, Jun 03, 2013 at 08:33:13AM +0200, Paolo Bonzini wrote:
  Il 02/06/2013 17:32, Gleb Natapov ha scritto:
  On Thu, May 30, 2013 at 07:43:07PM +0200, Paolo Bonzini wrote:
  This patch includes two fixes for SB:
 
  * the 3rd fixed counter (ref cpu cycles) can sometimes report
less than the number of iterations
 
  Is it documented? It is strange for architectural counter to behave
  differently on different architectures.
 
  It just counts the CPU cycles.  If the CPU can optimize the loop better,
  it will take less CPU cycles to execute it.
 
  We should try and change the loop so that it will not be so easily 
  optimized.
  Making the test succeed if only 10% percent of cycles were spend on a loop
  may result in the test missing the case when counter counts something
  different.
 
 Any hard-to-optimize loop risks becoming wrong on the other side (e.g.
 if something stalls the pipeline, a newer chip with longer pipeline will
 use more CPU cycles).
 
 Turbo boost could also contribute to lowering the number of cycles; a
 boosted processor has ref cpu cycles that are _longer_ than the regular
 cycles (thus they count in smaller numbers).  Maybe that's why core
 cycles didn't go below N.
 
core cycles are subject to Turbo boost changes, not ref cycles. Since
instruction are executed at core frequency ref cpu cycles count may be
indeed smaller.

 The real result was something like 0.8*N (780-83).  I used 0.1*N
 because it is used for the ref cpu cycles gp counter, which is not the
 same but similar.  Should I change it to 0.5*N or so?
 
For cpus with constant_tsc they should be the same. OK lets make gp and
fixed use the same boundaries.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kvm-unit-tests] pmu: fixes for Sandy Bridge hosts

2013-06-03 Thread Paolo Bonzini

Il 03/06/2013 09:38, Gleb Natapov ha scritto:
  Turbo boost could also contribute to lowering the number of cycles; a
  boosted processor has ref cpu cycles that are _longer_ than the regular
  cycles (thus they count in smaller numbers).  Maybe that's why core
  cycles didn't go below N.
  
 core cycles are subject to Turbo boost changes, not ref cycles. Since
 instruction are executed at core frequency ref cpu cycles count may be
 indeed smaller.

Yes, that's what I was trying to say. :)

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: s390: Add devname:kvm alias.

2013-06-03 Thread Paolo Bonzini

Il 28/05/2013 12:44, Paolo Bonzini ha scritto:
 Il 27/05/2013 18:42, Cornelia Huck ha scritto:
 Providing a devname:kvm module alias enables automatic loading of
 the kvm module when /dev/kvm is opened.

 Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com
 ---
  arch/s390/kvm/kvm-s390.c | 9 +
  1 file changed, 9 insertions(+)

 diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
 index 93444c4..3b597e5 100644
 --- a/arch/s390/kvm/kvm-s390.c
 +++ b/arch/s390/kvm/kvm-s390.c
 @@ -1138,3 +1138,12 @@ static void __exit kvm_s390_exit(void)
  
  module_init(kvm_s390_init);
  module_exit(kvm_s390_exit);
 +
 +/*
 + * Enable autoloading of the kvm module.
 + * Note that we add the module alias here instead of virt/kvm/kvm_main.c
 + * since x86 takes a different approach.
 + */
 +#include linux/miscdevice.h
 +MODULE_ALIAS_MISCDEV(KVM_MINOR);
 +MODULE_ALIAS(devname:kvm);

 
 Applied, thanks.

After discussion with Gleb, we have decided to postpone this patch to
3.11.  Thanks for your understanding. :)

I'll push pending patches to master and queue soon.

Paolo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: fix sil/dil/bpl/spl in the mod/rm fields

2013-06-03 Thread Gleb Natapov

On Mon, Jun 03, 2013 at 08:27:57AM +0200, Paolo Bonzini wrote:
 Il 02/06/2013 20:12, Gleb Natapov ha scritto:
  On Thu, May 30, 2013 at 04:35:55PM +0200, Paolo Bonzini wrote:
  The x86-64 extended low-byte registers were fetched correctly from reg,
  but not from mod/rm.
 
  This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
  not enough.
 
  Did I missed unit test patch? :)
 
 I wanted to ask the GSoC student to do it.  If it doesn't come in a
 couple of weeks, I'll send it.
 
Which instruction you saw the bug happening with? It this 3.10 regression?

 Paolo
 
  Cc: gnata...@redhat.com
Please use my other email :)

  Cc: kvm@vger.kernel.org
  Cc: sta...@vger.kernel.org # 3.9
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
  ---
   arch/x86/kvm/emulate.c | 5 -
   1 file changed, 4 insertions(+), 1 deletion(-)
 
  diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
  index aa68106..028b34f 100644
  --- a/arch/x86/kvm/emulate.c
  +++ b/arch/x86/kvm/emulate.c
  @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt 
  *ctxt,
 ctxt-modrm_seg = VCPU_SREG_DS;
   
 if (ctxt-modrm_mod == 3) {
  +  int highbyte_regs = ctxt-rex_prefix == 0;
  +
 op-type = OP_REG;
 op-bytes = (ctxt-d  ByteOp) ? 1 : ctxt-op_bytes;
  -  op-addr.reg = decode_register(ctxt, ctxt-modrm_rm, ctxt-d  
  ByteOp);
  +  op-addr.reg = decode_register(ctxt, ctxt-modrm_rm,
  + highbyte_regs  (ctxt-d  
  ByteOp));
 if (ctxt-d  Sse) {
 op-type = OP_XMM;
 op-bytes = 16;
  -- 
  1.8.1.4
  
  --
  Gleb.
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: fix sil/dil/bpl/spl in the mod/rm fields

2013-06-03 Thread Paolo Bonzini

Il 03/06/2013 10:04, Gleb Natapov ha scritto:
 On Mon, Jun 03, 2013 at 08:27:57AM +0200, Paolo Bonzini wrote:
 Il 02/06/2013 20:12, Gleb Natapov ha scritto:
 On Thu, May 30, 2013 at 04:35:55PM +0200, Paolo Bonzini wrote:
 The x86-64 extended low-byte registers were fetched correctly from reg,
 but not from mod/rm.

 This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
 not enough.

 Did I missed unit test patch? :)

 I wanted to ask the GSoC student to do it.  If it doesn't come in a
 couple of weeks, I'll send it.

 Which instruction you saw the bug happening with? It this 3.10 regression?

cmp $0x1f, %bpl

Like the NOP, it is a regression introduced in the switch of
emulate_invalid_guest_state from 0 to 1.

Paolo

 
 Paolo

 Cc: gnata...@redhat.com
 Please use my other email :)
 
 Cc: kvm@vger.kernel.org
 Cc: sta...@vger.kernel.org # 3.9
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch/x86/kvm/emulate.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index aa68106..028b34f 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt 
 *ctxt,
ctxt-modrm_seg = VCPU_SREG_DS;
  
if (ctxt-modrm_mod == 3) {
 +  int highbyte_regs = ctxt-rex_prefix == 0;
 +
op-type = OP_REG;
op-bytes = (ctxt-d  ByteOp) ? 1 : ctxt-op_bytes;
 -  op-addr.reg = decode_register(ctxt, ctxt-modrm_rm, ctxt-d  
 ByteOp);
 +  op-addr.reg = decode_register(ctxt, ctxt-modrm_rm,
 + highbyte_regs  (ctxt-d  
 ByteOp));
if (ctxt-d  Sse) {
op-type = OP_XMM;
op-bytes = 16;
 -- 
 1.8.1.4

 --
 Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 
 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: Emulate multibyte NOP

2013-06-03 Thread Gleb Natapov

On Thu, May 30, 2013 at 01:22:39PM +0200, Paolo Bonzini wrote:
 This is encountered when booting RHEL5.9 64-bit.  There is another bug
 after this one that is not a simple emulation failure, but this one lets
 the boot proceed a bit.
 
 Cc: sta...@vger.kernel.org # 3.9
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Applied to master, thanks.

 ---
  arch/x86/kvm/emulate.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index 48e6abd..aa68106 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -3987,7 +3987,8 @@ static const struct opcode twobyte_table[256] = {
   DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
   N, D(ImplicitOps | ModRM), N, N,
   /* 0x10 - 0x1F */
 - N, N, N, N, N, N, N, N, D(ImplicitOps | ModRM), N, N, N, N, N, N, N,
 + N, N, N, N, N, N, N, N,
 + D(ImplicitOps | ModRM), N, N, N, N, N, N, D(ImplicitOps | ModRM),
   /* 0x20 - 0x2F */
   DIP(ModRM | DstMem | Priv | Op3264, cr_read, check_cr_read),
   DIP(ModRM | DstMem | Priv | Op3264, dr_read, check_dr_read),
 @@ -4822,6 +4823,7 @@ twobyte_insn:
   case 0x08:  /* invd */
   case 0x0d:  /* GrpP (prefetch) */
   case 0x18:  /* Grp16 (prefetch/nop) */
 + case 0x1f:  /* nop */
   break;
   case 0x20: /* mov cr, reg */
   ctxt-dst.val = ops-get_cr(ctxt, ctxt-modrm_reg);
 -- 
 1.8.1.4

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: fix sil/dil/bpl/spl in the mod/rm fields

2013-06-03 Thread Gleb Natapov

On Thu, May 30, 2013 at 04:35:55PM +0200, Paolo Bonzini wrote:
 The x86-64 extended low-byte registers were fetched correctly from reg,
 but not from mod/rm.
 
 This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
 not enough.
 
 Cc: gnata...@redhat.com
 Cc: kvm@vger.kernel.org
 Cc: sta...@vger.kernel.org # 3.9
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Applied to master, thanks.

 ---
  arch/x86/kvm/emulate.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index aa68106..028b34f 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt,
   ctxt-modrm_seg = VCPU_SREG_DS;
  
   if (ctxt-modrm_mod == 3) {
 + int highbyte_regs = ctxt-rex_prefix == 0;
 +
   op-type = OP_REG;
   op-bytes = (ctxt-d  ByteOp) ? 1 : ctxt-op_bytes;
 - op-addr.reg = decode_register(ctxt, ctxt-modrm_rm, ctxt-d  
 ByteOp);
 + op-addr.reg = decode_register(ctxt, ctxt-modrm_rm,
 +highbyte_regs  (ctxt-d  
 ByteOp));
   if (ctxt-d  Sse) {
   op-type = OP_XMM;
   op-bytes = 16;
 -- 
 1.8.1.4

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kvm-unit-tests v2] pmu: fixes for Sandy Bridge hosts

2013-06-03 Thread Gleb Natapov

On Thu, May 30, 2013 at 07:47:18PM +0200, Paolo Bonzini wrote:
 This patch includes two fixes for SB:
 
 * the 3rd fixed counter (ref cpu cycles) can sometimes report
   less than the number of iterations
 
 * there is an 8th counter which causes out of bounds accesses
   to gp_event or check_counters_many's cnt array
 
 There is still a bug in KVM, because the pmu all counters-0
 test fails.  (It passes if you use any 6 of the 8 gp counters,
 fails if you use 7 or 8).
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Applied, thanks.

 ---
  x86/pmu.c | 21 +++--
  1 file changed, 15 insertions(+), 6 deletions(-)
 
 diff --git a/x86/pmu.c b/x86/pmu.c
 index 2c46f31..dca753a 100644
 --- a/x86/pmu.c
 +++ b/x86/pmu.c
 @@ -88,9 +88,10 @@ struct pmu_event {
  }, fixed_events[] = {
   {fixed 1, MSR_CORE_PERF_FIXED_CTR0, 10*N, 10.2*N},
   {fixed 2, MSR_CORE_PERF_FIXED_CTR0 + 1, 1*N, 30*N},
 - {fixed 3, MSR_CORE_PERF_FIXED_CTR0 + 2, 1*N, 30*N}
 + {fixed 3, MSR_CORE_PERF_FIXED_CTR0 + 2, 0.1*N, 30*N}
  };
  
 +static int num_counters;
  static int tests, failures;
  
  char *buf;
 @@ -237,7 +238,7 @@ static void check_gp_counter(struct pmu_event *evt)
   };
   int i;
  
 - for (i = 0; i  eax.split.num_counters; i++, cnt.ctr++) {
 + for (i = 0; i  num_counters; i++, cnt.ctr++) {
   cnt.count = 0;
   measure(cnt, 1);
   report(evt-name, i, verify_event(cnt.count, evt));
 @@ -276,7 +277,7 @@ static void check_counters_many(void)
   pmu_counter_t cnt[10];
   int i, n;
  
 - for (i = 0, n = 0; n  eax.split.num_counters; i++) {
 + for (i = 0, n = 0; n  num_counters; i++) {
   if (ebx.full  (1  i))
   continue;
  
 @@ -316,10 +317,10 @@ static void check_counter_overflow(void)
   /* clear status before test */
   wrmsr(MSR_CORE_PERF_GLOBAL_OVF_CTRL, 
 rdmsr(MSR_CORE_PERF_GLOBAL_STATUS));
  
 - for (i = 0; i  eax.split.num_counters + 1; i++, cnt.ctr++) {
 + for (i = 0; i  num_counters + 1; i++, cnt.ctr++) {
   uint64_t status;
   int idx;
 - if (i == eax.split.num_counters)
 + if (i == num_counters)
   cnt.ctr = fixed_events[0].unit_sel;
   if (i % 2)
   cnt.config |= EVNTSEL_INT;
 @@ -355,7 +356,7 @@ static void check_rdpmc(void)
   uint64_t val = 0x1f3456789ull;
   int i;
  
 - for (i = 0; i  eax.split.num_counters; i++) {
 + for (i = 0; i  num_counters; i++) {
   uint64_t x = (val  0x) |
   ((1ull  (eax.split.bit_width - 32)) - 1)  32;
   wrmsr(MSR_IA32_PERFCTR0 + i, val);
 @@ -395,6 +396,10 @@ int main(int ac, char **av)
   printf(Fixed counters:  %d\n, edx.split.num_counters_fixed);
   printf(Fixed counter width: %d\n, edx.split.bit_width_fixed);
  
 + num_counters = eax.split.num_counters;
 + if (num_counters  ARRAY_SIZE(gp_events))
 + num_counters = ARRAY_SIZE(gp_events);
 +
   apic_write(APIC_LVTPC, PC_VECTOR);
  
   check_gp_counters();
 -- 
 1.8.2.1
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: add detail error message when fail to add ioeventfd

2013-06-03 Thread Gleb Natapov

On Wed, May 22, 2013 at 12:57:35PM +0800, Amos Kong wrote:
 I try to hotplug 28 * 8 multiple-function devices to guest with
 old host kernel, ioeventfds in host kernel will be exhausted, then
 qemu fails to allocate ioeventfds for blk/nic devices.
 
 It's better to add detail error here.
 
Applied, thanks.

 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  kvm-all.c |4 
  1 files changed, 4 insertions(+), 0 deletions(-)
 
 diff --git a/kvm-all.c b/kvm-all.c
 index 8222729..3d5f7b7 100644
 --- a/kvm-all.c
 +++ b/kvm-all.c
 @@ -813,6 +813,8 @@ static void kvm_mem_ioeventfd_add(MemoryListener 
 *listener,
  r = kvm_set_ioeventfd_mmio(fd, section-offset_within_address_space,
 data, true, section-size, match_data);
  if (r  0) {
 +fprintf(stderr, %s: error adding ioeventfd: %s\n,
 +__func__, strerror(-r));
  abort();
  }
  }
 @@ -843,6 +845,8 @@ static void kvm_io_ioeventfd_add(MemoryListener *listener,
  r = kvm_set_ioeventfd_pio(fd, section-offset_within_address_space,
data, true, section-size, match_data);
  if (r  0) {
 +fprintf(stderr, %s: error adding ioeventfd: %s\n,
 +__func__, strerror(-r));
  abort();
  }
  }
 -- 
 1.7.1

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Redirections from virtual interfaces.

2013-06-03 Thread Stefan Hajnoczi

On Fri, May 31, 2013 at 11:10:24AM -0300, Targino SIlveira wrote:
 I have an server with only one NIC, this NIC has a Public IP, this
 server is locate in a data center, I can't have more than one, but I
 can have many IP's, so I would like to know if I can redirect
 packages from virtual interface for my VM's?
 
 Examples:
 
 eth0:1 xxx.xx.xxx.xxx redirec all trafic to 192.168.122.200
 eth0:2 xxx.xx.xxx.xxy redirec all trafic to 192.168.122.150
 eth0:3 xxx.xx.xxx.xxz redirec all trafic to 192.168.122.180
 
 I'm using /etc/libvirt/hooks/qemu to write iptables rules.

Yes, look at NAT.  A lot of material probably covers NAT behind one
public IP, in this case you actually need to map public addresses onto
private addresses 1:1.

A web search for linux nat should turn up howtos.  Or check on
libvirt.org if there is libvirt configuration that automatically sets
this up for you.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-03 Thread Michael S. Tsirkin

On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote:
 Michael S. Tsirkin m...@redhat.com writes:
  On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote:
  Rusty Russell ru...@rustcorp.com.au writes:
  
   Anthony Liguori aligu...@us.ibm.com writes:
   Forcing a guest driver change is a really big
   deal and I see no reason to do that unless there's a compelling reason
   to.
  
   So we're stuck with the 1.0 config layout for a very long time.
  
   We definitely must not force a guest change.  The explicit aim of the
   standard is that legacy and 1.0 be backward compatible.  One
   deliverable is a document detailing how this is done (effectively a
   summary of changes between what we have and 1.0).
  
  If 2.0 is fully backwards compatible, great.  It seems like such a
  difference that that would be impossible but I need to investigate
  further.
  
  Regards,
  
  Anthony Liguori
 
  If you look at my patches you'll see how it works.
  Basically old guests use BAR0 new ones don't, so
  it's easy: BAR0 access means legacy guest.
  Only started testing but things seem to work
  fine with old guests so far.
 
  I think we need a spec, not just driver code.
 
  Rusty what's the plan? Want me to write it?
 
 We need both, of course, but the spec work will happen in the OASIS WG.
 A draft is good, but let's not commit anything to upstream QEMU until we
 get the spec finalized.  And that is proposed to be late this year.

Well that would be quite sad really.

This means we can't make virtio a spec compliant pci express device,
and we can't add any more feature bits, so no
flexible buffer optimizations for virtio net.

There are probably more projects that will be blocked.

So how about we keep extending legacy layout for a bit longer:
- add a way to access device with MMIO
- use feature bit 31 to signal 64 bit features
  (and shift device config accordingly)

No endian-ness rework, no per queue enable etc.


Then when we start discussions we will have a working
express and working 64 bit feature support,
and it will be that much easier to make it pretty it.


 Since I'm going to have to reformat the spec and adapt it into OASIS
 style anyway, perhaps you should prepare a description as a standalone
 text document.  Easier to email and work with...
 
 Now, the idea is that if you want to support 0.9 and 1.0 (or whatever we
 call them; I used the term legacy for existing implementations in the
 OASIS WG proposal), you add capabilities and don't point them into (the
 start of?) BAR0.  Old drivers use BAR0 as now.
 
 One trick to note: while drivers shouldn't use both old and new style on
 the same device, you need to allow it for kexec, particularly reset via
 BAR0.
 
 Cheers,
 Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: fix sil/dil/bpl/spl in the mod/rm fields

2013-06-03 Thread Gleb Natapov

On Thu, May 30, 2013 at 05:34:21PM +0200, Paolo Bonzini wrote:
 Il 30/05/2013 16:35, Paolo Bonzini ha scritto:
  The x86-64 extended low-byte registers were fetched correctly from reg,
  but not from mod/rm.
  
  This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
  not enough.
 
 Well, it is enough but it takes 2 minutes to reach the point where
 hardware virtualization is used.  It is doing a lot of stuff in
 emulation mode because FS and GS have leftovers from the A20 test:
 
 FS =   9300 DPL=0 DS16 [-WA]
 GS = 0000  9300 DPL=0 DS16 [-WA]
 
 0x000113be:  in $0x92,%al
 0x000113c0:  or $0x2,%al
 0x000113c2:  out%al,$0x92
 0x000113c4:  xor%ax,%ax
 0x000113c6:  mov%ax,%fs
 0x000113c8:  dec%ax
 0x000113c9:  mov%ax,%gs
 0x000113cb:  inc%ax
 0x000113cc:  mov%ax,%fs:0x200
 0x000113d0:  cmp%gs:0x210,%ax
 0x000113d5:  je 0x113cb
 
This is 16 bit code that sets them up. So 32bit transition code does not
reload them?

 The DPL  RPL test fails.  Any ideas?  Should we introduce a new
 intermediate value for emulate_invalid_guest_state (0=none, 1=some, 2=full)?
 
 Paolo
 
  Cc: gnata...@redhat.com
  Cc: kvm@vger.kernel.org
  Cc: sta...@vger.kernel.org # 3.9
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
  ---
   arch/x86/kvm/emulate.c | 5 -
   1 file changed, 4 insertions(+), 1 deletion(-)
  
  diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
  index aa68106..028b34f 100644
  --- a/arch/x86/kvm/emulate.c
  +++ b/arch/x86/kvm/emulate.c
  @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt 
  *ctxt,
  ctxt-modrm_seg = VCPU_SREG_DS;
   
  if (ctxt-modrm_mod == 3) {
  +   int highbyte_regs = ctxt-rex_prefix == 0;
  +
  op-type = OP_REG;
  op-bytes = (ctxt-d  ByteOp) ? 1 : ctxt-op_bytes;
  -   op-addr.reg = decode_register(ctxt, ctxt-modrm_rm, ctxt-d  
  ByteOp);
  +   op-addr.reg = decode_register(ctxt, ctxt-modrm_rm,
  +  highbyte_regs  (ctxt-d  
  ByteOp));
  if (ctxt-d  Sse) {
  op-type = OP_XMM;
  op-bytes = 16;
  

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 58921] [nested virt] L2 Windows guest can't boot up ('-cpu host' to start L1)

2013-06-03 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=58921


Nadav Har'El n...@math.technion.ac.il changed:

   What|Removed |Added

 CC||n...@math.technion.ac.il




--- Comment #1 from Nadav Har'El n...@math.technion.ac.il  2013-06-03 
10:48:19 ---
This is a dup of bug 53641

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: fix sil/dil/bpl/spl in the mod/rm fields

2013-06-03 Thread Paolo Bonzini

Il 03/06/2013 12:25, Gleb Natapov ha scritto:
 On Thu, May 30, 2013 at 05:34:21PM +0200, Paolo Bonzini wrote:
 Il 30/05/2013 16:35, Paolo Bonzini ha scritto:
 The x86-64 extended low-byte registers were fetched correctly from reg,
 but not from mod/rm.

 This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
 not enough.

 Well, it is enough but it takes 2 minutes to reach the point where
 hardware virtualization is used.  It is doing a lot of stuff in
 emulation mode because FS and GS have leftovers from the A20 test:

 FS =   9300 DPL=0 DS16 [-WA]
 GS = 0000  9300 DPL=0 DS16 [-WA]

 0x000113be:  in $0x92,%al
 0x000113c0:  or $0x2,%al
 0x000113c2:  out%al,$0x92
 0x000113c4:  xor%ax,%ax
 0x000113c6:  mov%ax,%fs
 0x000113c8:  dec%ax
 0x000113c9:  mov%ax,%gs
 0x000113cb:  inc%ax
 0x000113cc:  mov%ax,%fs:0x200
 0x000113d0:  cmp%gs:0x210,%ax
 0x000113d5:  je 0x113cb

 This is 16 bit code that sets them up. So 32bit transition code does not
 reload them?

Yes.  It does this:

movw$1, %ax # protected mode (PE) bit
lmsw%ax # This is it!
jmp flush_instr

flush_instr:
xorw%bx, %bx# Flag to indicate a boot
xorl%esi, %esi  # Pointer to real-mode code
movw%cs, %si
subw$DELTA_INITSEG, %si
shll$4, %esi# Convert to 32-bit pointer
.byte 0x66, 0xea# prefix + jmpi-opcode
code32: .long   0x1000  # will be set to 0x10
# for big kernels
.word   __KERNEL_CS

which jumps to boot/compressed/head.S:

startup_32:
cld
cli
movl$(__KERNEL_DS), %eax
movl%eax, %ds
movl%eax, %es
movl%eax, %ss

and totally ignores fs/gs.  Much later there is this (in kernel/head.S):

/*
 * We don't really need to load %fs or %gs, but load them anyway
 * to kill any stale realmode selectors.  This allows execution
 * under VT hardware.
 */
movl %eax,%fs
movl %eax,%gs
 
but the whole decompression is run under emulation.

Paolo

 The DPL  RPL test fails.  Any ideas?  Should we introduce a new
 intermediate value for emulate_invalid_guest_state (0=none, 1=some, 2=full)?

 Paolo

 Cc: gnata...@redhat.com
 Cc: kvm@vger.kernel.org
 Cc: sta...@vger.kernel.org # 3.9
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch/x86/kvm/emulate.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
 index aa68106..028b34f 100644
 --- a/arch/x86/kvm/emulate.c
 +++ b/arch/x86/kvm/emulate.c
 @@ -1239,9 +1239,12 @@ static int decode_modrm(struct x86_emulate_ctxt 
 *ctxt,
 ctxt-modrm_seg = VCPU_SREG_DS;
  
 if (ctxt-modrm_mod == 3) {
 +   int highbyte_regs = ctxt-rex_prefix == 0;
 +
 op-type = OP_REG;
 op-bytes = (ctxt-d  ByteOp) ? 1 : ctxt-op_bytes;
 -   op-addr.reg = decode_register(ctxt, ctxt-modrm_rm, ctxt-d  
 ByteOp);
 +   op-addr.reg = decode_register(ctxt, ctxt-modrm_rm,
 +  highbyte_regs  (ctxt-d  
 ByteOp));
 if (ctxt-d  Sse) {
 op-type = OP_XMM;
 op-bytes = 16;

 
 --
   Gleb.
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 5/7] target-arm: Initialize cpreg list from KVM when using KVM

2013-06-03 Thread Peter Maydell

When using KVM, use the kernel's initial state to set up the
cpreg list, and sync to and from the kernel when doing
migration.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 target-arm/Makefile.objs |1 +
 target-arm/kvm-stub.c|   23 +++
 target-arm/kvm.c |  164 +-
 target-arm/kvm_arm.h |   33 ++
 target-arm/machine.c |   30 +++--
 5 files changed, 245 insertions(+), 6 deletions(-)
 create mode 100644 target-arm/kvm-stub.c

diff --git a/target-arm/Makefile.objs b/target-arm/Makefile.objs
index d89b57c..4a6e52e 100644
--- a/target-arm/Makefile.objs
+++ b/target-arm/Makefile.objs
@@ -1,5 +1,6 @@
 obj-y += arm-semi.o
 obj-$(CONFIG_SOFTMMU) += machine.o
 obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_NO_KVM) += kvm-stub.o
 obj-y += translate.o op_helper.o helper.o cpu.o
 obj-y += neon_helper.o iwmmxt_helper.o
diff --git a/target-arm/kvm-stub.c b/target-arm/kvm-stub.c
new file mode 100644
index 000..cd1849f
--- /dev/null
+++ b/target-arm/kvm-stub.c
@@ -0,0 +1,23 @@
+/*
+ * QEMU KVM ARM specific function stubs
+ *
+ * Copyright Linaro Limited 2013
+ *
+ * Author: Peter Maydell peter.mayd...@linaro.org
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#include qemu-common.h
+#include kvm_arm.h
+
+bool write_kvmstate_to_list(ARMCPU *cpu)
+{
+abort();
+}
+
+bool write_list_to_kvmstate(ARMCPU *cpu)
+{
+abort();
+}
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 4aea7c3..746ae02 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -50,12 +50,35 @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
 return cpu-cpu_index;
 }
 
+static bool reg_syncs_via_tuple_list(uint64_t regidx)
+{
+/* Return true if the regidx is a register we should synchronize
+ * via the cpreg_tuples array (ie is not a core reg we sync by
+ * hand in kvm_arch_get/put_registers())
+ */
+switch (regidx  KVM_REG_ARM_COPROC_MASK) {
+case KVM_REG_ARM_CORE:
+case KVM_REG_ARM_VFP:
+return false;
+default:
+return true;
+}
+}
+
+static int compare_u64(const void *a, const void *b)
+{
+return *(uint64_t *)a - *(uint64_t *)b;
+}
+
 int kvm_arch_init_vcpu(CPUState *cs)
 {
 struct kvm_vcpu_init init;
-int ret;
+int i, ret, arraylen;
 uint64_t v;
 struct kvm_one_reg r;
+struct kvm_reg_list rl;
+struct kvm_reg_list *rlp;
+ARMCPU *cpu = ARM_CPU(cs);
 
 init.target = KVM_ARM_TARGET_CORTEX_A15;
 memset(init.features, 0, sizeof(init.features));
@@ -74,6 +97,73 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (ret == -ENOENT) {
 return -EINVAL;
 }
+
+/* Populate the cpreg list based on the kernel's idea
+ * of what registers exist (and throw away the TCG-created list).
+ */
+rl.n = 0;
+ret = kvm_vcpu_ioctl(cs, KVM_GET_REG_LIST, rl);
+if (ret != -E2BIG) {
+return ret;
+}
+rlp = g_malloc(sizeof(struct kvm_reg_list) + rl.n * sizeof(uint64_t));
+rlp-n = rl.n;
+ret = kvm_vcpu_ioctl(cs, KVM_GET_REG_LIST, rlp);
+if (ret) {
+goto out;
+}
+/* Sort the list we get back from the kernel, since cpreg_tuples
+ * must be in strictly ascending order.
+ */
+qsort(rlp-reg, rlp-n, sizeof(rlp-reg[0]), compare_u64);
+
+for (i = 0, arraylen = 0; i  rlp-n; i++) {
+if (!reg_syncs_via_tuple_list(rlp-reg[i])) {
+continue;
+}
+switch (rlp-reg[i]  KVM_REG_SIZE_MASK) {
+case KVM_REG_SIZE_U32:
+case KVM_REG_SIZE_U64:
+break;
+default:
+fprintf(stderr, Can't handle size of register in kernel list\n);
+ret = -EINVAL;
+goto out;
+}
+
+arraylen++;
+}
+
+cpu-cpreg_indexes = g_renew(uint64_t, cpu-cpreg_indexes, arraylen);
+cpu-cpreg_values = g_renew(uint64_t, cpu-cpreg_values, arraylen);
+cpu-cpreg_vmstate_indexes = g_renew(uint64_t, cpu-cpreg_vmstate_indexes,
+ arraylen);
+cpu-cpreg_vmstate_values = g_renew(uint64_t, cpu-cpreg_vmstate_values,
+arraylen);
+cpu-cpreg_array_len = arraylen;
+cpu-cpreg_vmstate_array_len = arraylen;
+
+for (i = 0, arraylen = 0; i  rlp-n; i++) {
+uint64_t regidx = rlp-reg[i];
+if (!reg_syncs_via_tuple_list(regidx)) {
+continue;
+}
+cpu-cpreg_indexes[arraylen] = regidx;
+arraylen++;
+}
+assert(cpu-cpreg_array_len == arraylen);
+
+if (!write_kvmstate_to_list(cpu)) {
+/* Shouldn't happen unless kernel is inconsistent about
+ * what registers exist.
+ */
+fprintf(stderr, Initial read of kernel register state failed\n);
+ret = -EINVAL;
+goto out;
+}
+
+out:
+g_free(rlp);
 return ret;
 }
 
@@ -163,6 +253,78 @@ void

[PATCH v2 6/7] target-arm: Reinitialize all KVM VCPU registers on reset

2013-06-03 Thread Peter Maydell

Since the ARM KVM API doesn't include a reset this VCPU
ioctl, we have to capture the initial values of every
register it knows about so that we can reset the VCPU
by feeding those values back again.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 target-arm/cpu-qom.h |6 +-
 target-arm/kvm.c |   16 
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index 2242eee..25239b8 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -72,7 +72,11 @@ typedef struct ARMCPU {
 uint64_t *cpreg_indexes;
 /* Values of the registers (cpreg_indexes[i]'s value is cpreg_values[i]) */
 uint64_t *cpreg_values;
-/* Length of the indexes, values arrays */
+/* When using KVM, keeps a copy of the initial state of the VCPU,
+ * so that on reset we can feed the reset values back into the kernel.
+ */
+uint64_t *cpreg_reset_values;
+/* Length of the indexes, values, reset_values arrays */
 int32_t cpreg_array_len;
 /* These are used only for migration: incoming data arrives in
  * these fields and is sanity checked in post_load before copying
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 746ae02..f4a835d 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -162,6 +162,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
 goto out;
 }
 
+/* Save a copy of the initial register values so that we can
+ * feed it back to the kernel on VCPU reset.
+ */
+cpu-cpreg_reset_values = g_memdup(cpu-cpreg_values,
+   cpu-cpreg_array_len *
+   sizeof(cpu-cpreg_values[0]));
+
 out:
 g_free(rlp);
 return ret;
@@ -603,6 +610,15 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 
 void kvm_arch_reset_vcpu(CPUState *cs)
 {
+/* Feed the kernel back its initial register state */
+ARMCPU *cpu = ARM_CPU(cs);
+
+memmove(cpu-cpreg_values, cpu-cpreg_reset_values,
+cpu-cpreg_array_len * sizeof(cpu-cpreg_values[0]));
+
+if (!write_list_to_kvmstate(cpu)) {
+abort();
+}
 }
 
 bool kvm_arch_stop_on_emulation_error(CPUState *cs)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/7] target-arm: Add raw_readfn and raw_writefn to ARMCPRegInfo

2013-06-03 Thread Peter Maydell

For reading and writing register values from the kernel for KVM,
we need to provide accessor functions which are guaranteed to succeed
and don't impose access checks, mask out unwritable bits, etc.
Define new fields raw_readfn and raw_writefn for this purpose;
these only need to be provided if there is a readfn or writefn
already and it is not suitable.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 target-arm/cpu.h|   18 +-
 target-arm/helper.c |   13 +
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 737c00c..1d8eba5 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -434,19 +434,22 @@ void armv7m_nvic_complete_irq(void *opaque, int irq);
  * a register definition to override a previous definition for the
  * same (cp, is64, crn, crm, opc1, opc2) tuple: either the new or the
  * old must have the OVERRIDE bit set.
+ * NO_MIGRATE indicates that this register should be ignored for migration;
+ * (eg because any state is accessed via some other coprocessor register).
  */
 #define ARM_CP_SPECIAL 1
 #define ARM_CP_CONST 2
 #define ARM_CP_64BIT 4
 #define ARM_CP_SUPPRESS_TB_END 8
 #define ARM_CP_OVERRIDE 16
+#define ARM_CP_NO_MIGRATE 32
 #define ARM_CP_NOP (ARM_CP_SPECIAL | (1  8))
 #define ARM_CP_WFI (ARM_CP_SPECIAL | (2  8))
 #define ARM_LAST_SPECIAL ARM_CP_WFI
 /* Used only as a terminator for ARMCPRegInfo lists */
 #define ARM_CP_SENTINEL 0x
 /* Mask of only the flag bits in a type field */
-#define ARM_CP_FLAG_MASK 0x1f
+#define ARM_CP_FLAG_MASK 0x3f
 
 /* Return true if cptype is a valid type field. This is used to try to
  * catch errors where the sentinel has been accidentally left off the end
@@ -562,6 +565,19 @@ struct ARMCPRegInfo {
  * by fieldoffset.
  */
 CPWriteFn *writefn;
+/* Function for doing a raw read; used when we need to copy
+ * coprocessor state to the kernel for KVM or out for
+ * migration. This only needs to be provided if there is also a
+ * readfn and it makes an access permission check.
+ */
+CPReadFn *raw_readfn;
+/* Function for doing a raw write; used when we need to copy KVM
+ * kernel coprocessor state into userspace, or for inbound
+ * migration. This only needs to be provided if there is also a
+ * writefn and it makes an access permission check or masks out
+ * unwritable bits or has write-one-to-clear or similar behaviour.
+ */
+CPWriteFn *raw_writefn;
 /* Function for resetting the register. If NULL, then reset will be done
  * by writing resetvalue to the field specified in fieldoffset. If
  * fieldoffset is 0 then no reset will be done.
diff --git a/target-arm/helper.c b/target-arm/helper.c
index fd055e8..2585d59 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -1392,6 +1392,19 @@ void define_one_arm_cp_reg_with_opaque(ARMCPU *cpu,
 r2-crm = crm;
 r2-opc1 = opc1;
 r2-opc2 = opc2;
+/* By convention, for wildcarded registers only the first
+ * entry is used for migration; the others are marked as
+ * NO_MIGRATE so we don't try to transfer the register
+ * multiple times. Special registers (ie NOP/WFI) are
+ * never migratable.
+ */
+if ((r-type  ARM_CP_SPECIAL) ||
+((r-crm == CP_ANY)  crm != 0) ||
+((r-opc1 == CP_ANY)  opc1 != 0) ||
+((r-opc2 == CP_ANY)  opc2 != 0)) {
+r2-type |= ARM_CP_NO_MIGRATE;
+}
+
 /* Overriding of an existing definition must be explicitly
  * requested.
  */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/7] target-arm: cpregs list for migration, kvm reset

2013-06-03 Thread Peter Maydell

This patch series overhauls how we handle ARM coprocessor registers,
so that we use a consistent approach for migration, reset and
QEMU-KVM synchronisation, driven by the kernel's list of supported
registers.

The basic principle here is that we trust the kernel's list of what
registers it knows about, and that QEMU doesn't have to have specific
knowledge of a coprocessor register to support running and migrating
a KVM session on a kernel that does support that register.

We maintain a list of cp registers, which is initialized either from
the current cpreg hashtable (for TCG), or by querying the kernel (for
KVM).  For migration we simply send the lists of register indexes and
values; migration fails if there's a register the destination kernel
is unaware of, or if the value can't be set as required, but isn't
gated on whether source or destination QEMU know about the register.

We also use the register list to properly reset the vcpu by simply
feeding it back the initial set of register values; this fixes a bug
where we weren't resetting everything we should have (though Linux
guests don't care about most reset values).

Note that vm save/load with KVM requires that you run with -machine
kernel_irqchip=off, because the kernel doesn't currently support
save/load of either the VGIC or virtual timer state.  It may also be
necessary to nobble the device tree blob to remove the armv7-timer
node so the guest doesn't try to use the vtimers.  Migration between
TCG and KVM is not supported at the moment (it would require us to
add a lot of registers to TCG, which I may do at some point, but this
is a bit of an obscure usecase IMHO).

Changes v1-v2:
 * added raw write accessors for regs which do a tlb_flush()
   in their write function (CONTEXTIDR and others)
 * added kvm-stub.h accidentally omitted in v1

(Remembered to cc kvm list this time around...)

Peter Maydell (7):
  target-arm: Allow special cpregs to have flags set
  target-arm: Add raw_readfn and raw_writefn to ARMCPRegInfo
  target-arm: mark up cpregs for no-migrate or raw access
  target-arm: Convert TCG to using (index,value) list for cp migration
  target-arm: Initialize cpreg list from KVM when using KVM
  target-arm: Reinitialize all KVM VCPU registers on reset
  target-arm: Use tuple list to sync cp regs with KVM

 target-arm/Makefile.objs |1 +
 target-arm/cpu-qom.h |   24 
 target-arm/cpu.c |2 +
 target-arm/cpu.h |   89 -
 target-arm/helper.c  |  327 +++---
 target-arm/kvm-stub.c|   23 
 target-arm/kvm.c |  292 +++--
 target-arm/kvm_arm.h |   33 +
 target-arm/machine.c |  134 ---
 9 files changed, 759 insertions(+), 166 deletions(-)
 create mode 100644 target-arm/kvm-stub.c

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/7] target-arm: Allow special cpregs to have flags set

2013-06-03 Thread Peter Maydell

Relax the is this a valid ARMCPRegInfo type value? check to permit
special cpregs to have flags other than ARM_CP_SPECIAL set. At
the moment none of the other flags are relevant for special regs,
but the migration related flag we're about to introduce can apply
here too.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 target-arm/cpu.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 5438444..737c00c 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -456,7 +456,7 @@ static inline bool cptype_valid(int cptype)
 {
 return ((cptype  ~ARM_CP_FLAG_MASK) == 0)
 || ((cptype  ARM_CP_SPECIAL) 
-(cptype = ARM_LAST_SPECIAL));
+((cptype  ~ARM_CP_FLAG_MASK) = ARM_LAST_SPECIAL));
 }
 
 /* Access rights:
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 4/7] target-arm: Convert TCG to using (index,value) list for cp migration

2013-06-03 Thread Peter Maydell

Convert the TCG ARM target to using an (index,value) list for migrating
coprocessors. The primary benefit of the (index,value) list is for
passing state between KVM and QEMU, but it works for TCG-to-TCG
migration as well and is a useful self-contained first step.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 target-arm/cpu-qom.h |   20 ++
 target-arm/cpu.c |2 +
 target-arm/cpu.h |   69 
 target-arm/helper.c  |  174 ++
 target-arm/kvm.c |9 +++
 target-arm/machine.c |  114 +++--
 6 files changed, 341 insertions(+), 47 deletions(-)

diff --git a/target-arm/cpu-qom.h b/target-arm/cpu-qom.h
index 12fcefe..2242eee 100644
--- a/target-arm/cpu-qom.h
+++ b/target-arm/cpu-qom.h
@@ -62,6 +62,25 @@ typedef struct ARMCPU {
 
 /* Coprocessor information */
 GHashTable *cp_regs;
+/* For marshalling (mostly coprocessor) register state between the
+ * kernel and QEMU (for KVM) and between two QEMUs (for migration),
+ * we use these arrays.
+ */
+/* List of register indexes managed via these arrays; (full KVM style
+ * 64 bit indexes, not CPRegInfo 32 bit indexes)
+ */
+uint64_t *cpreg_indexes;
+/* Values of the registers (cpreg_indexes[i]'s value is cpreg_values[i]) */
+uint64_t *cpreg_values;
+/* Length of the indexes, values arrays */
+int32_t cpreg_array_len;
+/* These are used only for migration: incoming data arrives in
+ * these fields and is sanity checked in post_load before copying
+ * to the working data structures above.
+ */
+uint64_t *cpreg_vmstate_indexes;
+uint64_t *cpreg_vmstate_values;
+int32_t cpreg_vmstate_array_len;
 
 /* The instance init functions for implementation-specific subclasses
  * set these fields to specify the implementation-dependent values of
@@ -116,6 +135,7 @@ extern const struct VMStateDescription vmstate_arm_cpu;
 #endif
 
 void register_cp_regs_for_features(ARMCPU *cpu);
+void init_cpreg_list(ARMCPU *cpu);
 
 void arm_cpu_do_interrupt(CPUState *cpu);
 void arm_v7m_cpu_do_interrupt(CPUState *cpu);
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 496a59f..241f032 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -204,6 +204,8 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 register_cp_regs_for_features(cpu);
 arm_cpu_register_gdb_regs_for_features(cpu);
 
+init_cpreg_list(cpu);
+
 cpu_reset(CPU(cpu));
 qemu_init_vcpu(env);
 
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 1d8eba5..abcc0b4 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -424,6 +424,43 @@ void armv7m_nvic_complete_irq(void *opaque, int irq);
 (((cp)  16) | ((is64)  15) | ((crn)  11) |\
  ((crm)  7) | ((opc1)  3) | (opc2))
 
+/* Note that these must line up with the KVM/ARM register
+ * ID field definitions (kvm.c will check this, but we
+ * can't just use the KVM defines here as the kvm headers
+ * are unavailable to non-KVM-specific files)
+ */
+#define CP_REG_SIZE_SHIFT 52
+#define CP_REG_SIZE_MASK   0x00f0ULL
+#define CP_REG_SIZE_U320x0020ULL
+#define CP_REG_SIZE_U640x0030ULL
+#define CP_REG_ARM 0x4000ULL
+
+/* Convert a full 64 bit KVM register ID to the truncated 32 bit
+ * version used as a key for the coprocessor register hashtable
+ */
+static inline uint32_t kvm_to_cpreg_id(uint64_t kvmid)
+{
+uint32_t cpregid = kvmid;
+if ((kvmid  CP_REG_SIZE_MASK) == CP_REG_SIZE_U64) {
+cpregid |= (1  15);
+}
+return cpregid;
+}
+
+/* Convert a truncated 32 bit hashtable key into the full
+ * 64 bit KVM register ID.
+ */
+static inline uint64_t cpreg_to_kvm_id(uint32_t cpregid)
+{
+uint64_t kvmid = cpregid  ~(1  15);
+if (cpregid  (1  15)) {
+kvmid |= CP_REG_SIZE_U64 | CP_REG_ARM;
+} else {
+kvmid |= CP_REG_SIZE_U32 | CP_REG_ARM;
+}
+return kvmid;
+}
+
 /* ARMCPRegInfo type field bits. If the SPECIAL bit is set this is a
  * special-behaviour cp reg and bits [15..8] indicate what behaviour
  * it has. Otherwise it is a simple cp reg, where CONST indicates that
@@ -621,6 +658,38 @@ static inline bool cp_access_ok(CPUARMState *env,
 return (ri-access  ((arm_current_pl(env) * 2) + isread))  1;
 }
 
+/**
+ * write_list_to_cpustate
+ * @cpu: ARMCPU
+ *
+ * For each register listed in the ARMCPU cpreg_indexes list, write
+ * its value from the cpreg_values list into the ARMCPUState structure.
+ * This updates TCG's working data structures from KVM data or
+ * from incoming migration state.
+ *
+ * Returns: true if all register values were updated correctly,
+ * false if some register was unknown or could not be written.
+ * Note that we do not stop early on failure -- we will attempt
+ * writing all registers in the list.
+ */
+bool write_list_to_cpustate(ARMCPU *cpu);
+
+/**
+ *

[PATCH v2 3/7] target-arm: mark up cpregs for no-migrate or raw access

2013-06-03 Thread Peter Maydell

Mark up coprocessor register definitions to add raw access
functions or mark the register as non-migratable where necessary.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 target-arm/helper.c |  140 ++-
 1 file changed, 94 insertions(+), 46 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 2585d59..baf7576 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -64,6 +64,20 @@ static int vfp_gdb_set_reg(CPUARMState *env, uint8_t *buf, 
int reg)
 return 0;
 }
 
+static int raw_read(CPUARMState *env, const ARMCPRegInfo *ri,
+uint64_t *value)
+{
+*value = CPREG_FIELD32(env, ri);
+return 0;
+}
+
+static int raw_write(CPUARMState *env, const ARMCPRegInfo *ri,
+ uint64_t value)
+{
+CPREG_FIELD32(env, ri) = value;
+return 0;
+}
+
 static int dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
 {
 env-cp15.c3 = value;
@@ -139,13 +153,13 @@ static const ARMCPRegInfo cp_reginfo[] = {
 { .name = DACR, .cp = 15,
   .crn = 3, .crm = CP_ANY, .opc1 = CP_ANY, .opc2 = CP_ANY,
   .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.c3),
-  .resetvalue = 0, .writefn = dacr_write },
+  .resetvalue = 0, .writefn = dacr_write, .raw_writefn = raw_write, },
 { .name = FCSEIDR, .cp = 15, .crn = 13, .crm = 0, .opc1 = 0, .opc2 = 0,
   .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.c13_fcse),
-  .resetvalue = 0, .writefn = fcse_write },
+  .resetvalue = 0, .writefn = fcse_write, .raw_writefn = raw_write, },
 { .name = CONTEXTIDR, .cp = 15, .crn = 13, .crm = 0, .opc1 = 0, .opc2 = 
1,
   .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.c13_fcse),
-  .resetvalue = 0, .writefn = contextidr_write },
+  .resetvalue = 0, .writefn = contextidr_write, .raw_writefn = raw_write, 
},
 /* ??? This covers not just the impdef TLB lockdown registers but also
  * some v7VMSA registers relating to TEX remap, so it is overly broad.
  */
@@ -155,13 +169,17 @@ static const ARMCPRegInfo cp_reginfo[] = {
  * the unified TLB ops but also the dside/iside/inner-shareable variants.
  */
 { .name = TLBIALL, .cp = 15, .crn = 8, .crm = CP_ANY,
-  .opc1 = CP_ANY, .opc2 = 0, .access = PL1_W, .writefn = tlbiall_write, },
+  .opc1 = CP_ANY, .opc2 = 0, .access = PL1_W, .writefn = tlbiall_write,
+  .type = ARM_CP_NO_MIGRATE },
 { .name = TLBIMVA, .cp = 15, .crn = 8, .crm = CP_ANY,
-  .opc1 = CP_ANY, .opc2 = 1, .access = PL1_W, .writefn = tlbimva_write, },
+  .opc1 = CP_ANY, .opc2 = 1, .access = PL1_W, .writefn = tlbimva_write,
+  .type = ARM_CP_NO_MIGRATE },
 { .name = TLBIASID, .cp = 15, .crn = 8, .crm = CP_ANY,
-  .opc1 = CP_ANY, .opc2 = 2, .access = PL1_W, .writefn = tlbiasid_write, },
+  .opc1 = CP_ANY, .opc2 = 2, .access = PL1_W, .writefn = tlbiasid_write,
+  .type = ARM_CP_NO_MIGRATE },
 { .name = TLBIMVAA, .cp = 15, .crn = 8, .crm = CP_ANY,
-  .opc1 = CP_ANY, .opc2 = 3, .access = PL1_W, .writefn = tlbimvaa_write, },
+  .opc1 = CP_ANY, .opc2 = 3, .access = PL1_W, .writefn = tlbimvaa_write,
+  .type = ARM_CP_NO_MIGRATE },
 /* Cache maintenance ops; some of this space may be overridden later. */
 { .name = CACHEMAINT, .cp = 15, .crn = 7, .crm = CP_ANY,
   .opc1 = 0, .opc2 = CP_ANY, .access = PL1_W,
@@ -196,7 +214,8 @@ static const ARMCPRegInfo not_v7_cp_reginfo[] = {
   .resetvalue = 0 },
 /* v6 doesn't have the cache ID registers but Linux reads them anyway */
 { .name = DUMMY, .cp = 15, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = CP_ANY,
-  .access = PL1_R, .type = ARM_CP_CONST, .resetvalue = 0 },
+  .access = PL1_R, .type = ARM_CP_CONST | ARM_CP_NO_MIGRATE,
+  .resetvalue = 0 },
 REGINFO_SENTINEL
 };
 
@@ -235,6 +254,7 @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
 REGINFO_SENTINEL
 };
 
+
 static int pmreg_read(CPUARMState *env, const ARMCPRegInfo *ri,
   uint64_t *value)
 {
@@ -366,13 +386,16 @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
 { .name = PMCNTENSET, .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 
1,
   .access = PL0_RW, .resetvalue = 0,
   .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcnten),
-  .readfn = pmreg_read, .writefn = pmcntenset_write },
+  .readfn = pmreg_read, .writefn = pmcntenset_write,
+  .raw_readfn = raw_read, .raw_writefn = raw_write },
 { .name = PMCNTENCLR, .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 
2,
   .access = PL0_RW, .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcnten),
-  .readfn = pmreg_read, .writefn = pmcntenclr_write },
+  .readfn = pmreg_read, .writefn = pmcntenclr_write,
+  .type = ARM_CP_NO_MIGRATE },
 { .name = PMOVSR, .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 3,
   .access = PL0_RW, .fieldoffset = offsetof(CPUARMState, cp15.c9_pmovsr),
-

[PATCH v2 7/7] target-arm: Use tuple list to sync cp regs with KVM

2013-06-03 Thread Peter Maydell

Use the tuple list of cp registers for syncing KVM state to QEMU,
rather than only syncing a very minimal set by hand.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
 target-arm/kvm.c |  103 +-
 1 file changed, 33 insertions(+), 70 deletions(-)

diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index f4a835d..5c91ab7 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -344,17 +344,6 @@ typedef struct Reg {
 offsetof(CPUARMState, QEMUFIELD) \
 }
 
-#define CP15REG(CRN, CRM, OPC1, OPC2, QEMUFIELD) \
-{\
-KVM_REG_ARM | KVM_REG_SIZE_U32 | \
-(15  KVM_REG_ARM_COPROC_SHIFT) |   \
-((CRN)  KVM_REG_ARM_32_CRN_SHIFT) |\
-((CRM)  KVM_REG_ARM_CRM_SHIFT) |   \
-((OPC1)  KVM_REG_ARM_OPC1_SHIFT) | \
-((OPC2)  KVM_REG_ARM_32_OPC2_SHIFT),   \
-offsetof(CPUARMState, QEMUFIELD) \
-}
-
 #define VFPSYSREG(R)   \
 {  \
 KVM_REG_ARM | KVM_REG_SIZE_U32 | KVM_REG_ARM_VFP | \
@@ -403,12 +392,6 @@ static const Reg regs[] = {
 COREREG(fiq_regs[7], banked_spsr[5]),
 /* R15 */
 COREREG(usr_regs.uregs[15], regs[15]),
-/* A non-comprehensive set of cp15 registers.
- * TODO: drive this from the cp_regs hashtable instead.
- */
-CP15REG(1, 0, 0, 0, cp15.c1_sys), /* SCTLR */
-CP15REG(2, 0, 0, 2, cp15.c2_control), /* TTBCR */
-CP15REG(3, 0, 0, 0, cp15.c3), /* DACR */
 /* VFP system registers */
 VFPSYSREG(FPSID),
 VFPSYSREG(MVFR1),
@@ -426,7 +409,6 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 int mode, bn;
 int ret, i;
 uint32_t cpsr, fpscr;
-uint64_t ttbr;
 
 /* Make sure the banked regs are properly set */
 mode = env-uncached_cpsr  CPSR_M;
@@ -460,26 +442,6 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
-/* TTBR0: cp15 crm=2 opc1=0 */
-ttbr = ((uint64_t)env-cp15.c2_base0_hi  32) | env-cp15.c2_base0;
-r.id = KVM_REG_ARM | KVM_REG_SIZE_U64 | (15  KVM_REG_ARM_COPROC_SHIFT) |
-(2  KVM_REG_ARM_CRM_SHIFT) | (0  KVM_REG_ARM_OPC1_SHIFT);
-r.addr = (uintptr_t)(ttbr);
-ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, r);
-if (ret) {
-return ret;
-}
-
-/* TTBR1: cp15 crm=2 opc1=1 */
-ttbr = ((uint64_t)env-cp15.c2_base1_hi  32) | env-cp15.c2_base1;
-r.id = KVM_REG_ARM | KVM_REG_SIZE_U64 | (15  KVM_REG_ARM_COPROC_SHIFT) |
-(2  KVM_REG_ARM_CRM_SHIFT) | (1  KVM_REG_ARM_OPC1_SHIFT);
-r.addr = (uintptr_t)(ttbr);
-ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, r);
-if (ret) {
-return ret;
-}
-
 /* VFP registers */
 r.id = KVM_REG_ARM | KVM_REG_SIZE_U64 | KVM_REG_ARM_VFP;
 for (i = 0; i  32; i++) {
@@ -496,6 +458,31 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 fpscr = vfp_get_fpscr(env);
 r.addr = (uintptr_t)fpscr;
 ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, r);
+if (ret) {
+return ret;
+}
+
+/* Note that we do not call write_cpustate_to_list()
+ * here, so we are only writing the tuple list back to
+ * KVM. This is safe because nothing can change the
+ * CPUARMState cp15 fields (in particular gdb accesses cannot)
+ * and so there are no changes to sync. In fact syncing would
+ * be wrong at this point: for a constant register where TCG and
+ * KVM disagree about its value, the preceding write_list_to_cpustate()
+ * would not have had any effect on the CPUARMState value (since the
+ * register is read-only), and a write_cpustate_to_list() here would
+ * then try to write the TCG value back into KVM -- this would either
+ * fail or incorrectly change the value the guest sees.
+ *
+ * If we ever want to allow the user to modify cp15 registers via
+ * the gdb stub, we would need to be more clever here (for instance
+ * tracking the set of registers kvm_arch_get_registers() successfully
+ * managed to update the CPUARMState with, and only allowing those
+ * to be written back up into the kernel).
+ */
+if (!write_list_to_kvmstate(cpu)) {
+return EINVAL;
+}
 
 return ret;
 }
@@ -508,7 +495,6 @@ int kvm_arch_get_registers(CPUState *cs)
 int mode, bn;
 int ret, i;
 uint32_t cpsr, fpscr;
-uint64_t ttbr;
 
 for (i = 0; i  ARRAY_SIZE(regs); i++) {
 r.id = regs[i].id;
@@ -529,28 +515,6 @@ int kvm_arch_get_registers(CPUState *cs)
 }
 cpsr_write(env, cpsr, 0x);
 
-/* TTBR0: cp15 crm=2 opc1=0 */
-r.id = KVM_REG_ARM | KVM_REG_SIZE_U64 | (15  KVM_REG_ARM_COPROC_SHIFT) |
-(2  KVM_REG_ARM_CRM_SHIFT) | (0  KVM_REG_ARM_OPC1_SHIFT);
-r.addr = (uintptr_t)(ttbr);
-ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, r);
-if (ret) {
-return ret;
-

Re: [PATCH RFC V9 2/19] x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks

2013-06-03 Thread Konrad Rzeszutek Wilk

On Sun, Jun 02, 2013 at 12:52:09AM +0530, Raghavendra K T wrote:
 x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks
 
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 
 The code size expands somewhat, and its better to just call
 a function rather than inline it.
 
 Thanks Jeremy for original version of ARCH_NOINLINE_SPIN_UNLOCK config patch,
 which is simplified.
 
 Suggested-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 ---
  arch/x86/Kconfig |1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
 index 685692c..80fcc4b 100644
 --- a/arch/x86/Kconfig
 +++ b/arch/x86/Kconfig
 @@ -621,6 +621,7 @@ config PARAVIRT_DEBUG
  config PARAVIRT_SPINLOCKS
   bool Paravirtualization layer for spinlocks
   depends on PARAVIRT  SMP
 + select UNINLINE_SPIN_UNLOCK
   ---help---
 Paravirtualized spinlocks allow a pvops backend to replace the
 spinlock implementation with something virtualization-friendly
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V9 3/19] x86/ticketlock: Collapse a layer of functions

2013-06-03 Thread Konrad Rzeszutek Wilk

On Sun, Jun 02, 2013 at 12:52:29AM +0530, Raghavendra K T wrote:
 x86/ticketlock: Collapse a layer of functions
 
 From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 
 Now that the paravirtualization layer doesn't exist at the spinlock
 level any more, we can collapse the __ticket_ functions into the arch_
 functions.
 
 Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 Tested-by: Attilio Rao attilio@citrix.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 ---
  arch/x86/include/asm/spinlock.h |   35 +--
  1 file changed, 5 insertions(+), 30 deletions(-)
 
 diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
 index 4d54244..7442410 100644
 --- a/arch/x86/include/asm/spinlock.h
 +++ b/arch/x86/include/asm/spinlock.h
 @@ -76,7 +76,7 @@ static __always_inline void __ticket_unlock_kick(struct 
 arch_spinlock *lock,
   * in the high part, because a wide xadd increment of the low part would 
 carry
   * up and contaminate the high part.
   */
 -static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock)
 +static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
  {
   register struct __raw_tickets inc = { .tail = 1 };
  
 @@ -96,7 +96,7 @@ static __always_inline void __ticket_spin_lock(struct 
 arch_spinlock *lock)
  out: barrier();  /* make sure nothing creeps before the lock is taken */
  }
  
 -static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock)
 +static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
  {
   arch_spinlock_t old, new;
  
 @@ -110,7 +110,7 @@ static __always_inline int 
 __ticket_spin_trylock(arch_spinlock_t *lock)
   return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
 old.head_tail;
  }
  
 -static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock)
 +static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
  {
   __ticket_t next = lock-tickets.head + 1;
  
 @@ -118,46 +118,21 @@ static __always_inline void 
 __ticket_spin_unlock(arch_spinlock_t *lock)
   __ticket_unlock_kick(lock, next);
  }
  
 -static inline int __ticket_spin_is_locked(arch_spinlock_t *lock)
 +static inline int arch_spin_is_locked(arch_spinlock_t *lock)
  {
   struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
  
   return tmp.tail != tmp.head;
  }
  
 -static inline int __ticket_spin_is_contended(arch_spinlock_t *lock)
 +static inline int arch_spin_is_contended(arch_spinlock_t *lock)
  {
   struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
  
   return (__ticket_t)(tmp.tail - tmp.head)  1;
  }
 -
 -static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 -{
 - return __ticket_spin_is_locked(lock);
 -}
 -
 -static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 -{
 - return __ticket_spin_is_contended(lock);
 -}
  #define arch_spin_is_contended   arch_spin_is_contended
  
 -static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
 -{
 - __ticket_spin_lock(lock);
 -}
 -
 -static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
 -{
 - return __ticket_spin_trylock(lock);
 -}
 -
 -static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 -{
 - __ticket_spin_unlock(lock);
 -}
 -
  static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
 unsigned long flags)
  {
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: fix sil/dil/bpl/spl in the mod/rm fields

2013-06-03 Thread Avi Kivity

On Thu, May 30, 2013 at 7:34 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 30/05/2013 17:34, Paolo Bonzini ha scritto:
 Il 30/05/2013 16:35, Paolo Bonzini ha scritto:
 The x86-64 extended low-byte registers were fetched correctly from reg,
 but not from mod/rm.

 This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
 not enough.

 Well, it is enough but it takes 2 minutes to reach the point where
 hardware virtualization is used.  It is doing a lot of stuff in
 emulation mode because FS and GS have leftovers from the A20 test:

 FS =   9300 DPL=0 DS16 [-WA]
 GS = 0000  9300 DPL=0 DS16 [-WA]

 0x000113be:  in $0x92,%al
 0x000113c0:  or $0x2,%al
 0x000113c2:  out%al,$0x92
 0x000113c4:  xor%ax,%ax
 0x000113c6:  mov%ax,%fs
 0x000113c8:  dec%ax
 0x000113c9:  mov%ax,%gs
 0x000113cb:  inc%ax
 0x000113cc:  mov%ax,%fs:0x200
 0x000113d0:  cmp%gs:0x210,%ax
 0x000113d5:  je 0x113cb

 The DPL  RPL test fails.  Any ideas?  Should we introduce a new
 intermediate value for emulate_invalid_guest_state (0=none, 1=some, 2=full)?

 One idea could be to replace invalid descriptors with NULL ones.  Then
 you can intercept this in the #GP handler and trigger emulation for that
 instruction only.

Won't work, vmx won't let you enter in such a configuration.

Maybe you can detect the exact code sequence (%eip, some instructions,
register state) and clear %fs and %gs.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V9 8/19] x86/pvticketlock: When paravirtualizing ticket locks, increment by 2

2013-06-03 Thread Konrad Rzeszutek Wilk

On Sun, Jun 02, 2013 at 12:54:02AM +0530, Raghavendra K T wrote:
 x86/pvticketlock: When paravirtualizing ticket locks, increment by 2
 
 From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 
 Increment ticket head/tails by 2 rather than 1 to leave the LSB free
 to store a is in slowpath state bit.  This halves the number
 of possible CPUs for a given ticket size, but this shouldn't matter
 in practice - kernels built for 32k+ CPU systems are probably
 specially built for the hardware rather than a generic distro
 kernel.
 
 Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 Tested-by: Attilio Rao attilio@citrix.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 ---
  arch/x86/include/asm/spinlock.h   |   10 +-
  arch/x86/include/asm/spinlock_types.h |   10 +-
  2 files changed, 14 insertions(+), 6 deletions(-)
 
 diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
 index 7442410..04a5cd5 100644
 --- a/arch/x86/include/asm/spinlock.h
 +++ b/arch/x86/include/asm/spinlock.h
 @@ -78,7 +78,7 @@ static __always_inline void __ticket_unlock_kick(struct 
 arch_spinlock *lock,
   */
  static __always_inline void arch_spin_lock(struct arch_spinlock *lock)
  {
 - register struct __raw_tickets inc = { .tail = 1 };
 + register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
  
   inc = xadd(lock-tickets, inc);
  
 @@ -104,7 +104,7 @@ static __always_inline int 
 arch_spin_trylock(arch_spinlock_t *lock)
   if (old.tickets.head != old.tickets.tail)
   return 0;
  
 - new.head_tail = old.head_tail + (1  TICKET_SHIFT);
 + new.head_tail = old.head_tail + (TICKET_LOCK_INC  TICKET_SHIFT);
  
   /* cmpxchg is a full barrier, so nothing can move before it */
   return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == 
 old.head_tail;
 @@ -112,9 +112,9 @@ static __always_inline int 
 arch_spin_trylock(arch_spinlock_t *lock)
  
  static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
  {
 - __ticket_t next = lock-tickets.head + 1;
 + __ticket_t next = lock-tickets.head + TICKET_LOCK_INC;
  
 - __add(lock-tickets.head, 1, UNLOCK_LOCK_PREFIX);
 + __add(lock-tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
   __ticket_unlock_kick(lock, next);
  }
  
 @@ -129,7 +129,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t 
 *lock)
  {
   struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets);
  
 - return (__ticket_t)(tmp.tail - tmp.head)  1;
 + return (__ticket_t)(tmp.tail - tmp.head)  TICKET_LOCK_INC;
  }
  #define arch_spin_is_contended   arch_spin_is_contended
  
 diff --git a/arch/x86/include/asm/spinlock_types.h 
 b/arch/x86/include/asm/spinlock_types.h
 index 83fd3c7..e96fcbd 100644
 --- a/arch/x86/include/asm/spinlock_types.h
 +++ b/arch/x86/include/asm/spinlock_types.h
 @@ -3,7 +3,13 @@
  
  #include linux/types.h
  
 -#if (CONFIG_NR_CPUS  256)
 +#ifdef CONFIG_PARAVIRT_SPINLOCKS
 +#define __TICKET_LOCK_INC2
 +#else
 +#define __TICKET_LOCK_INC1
 +#endif
 +
 +#if (CONFIG_NR_CPUS  (256 / __TICKET_LOCK_INC))
  typedef u8  __ticket_t;
  typedef u16 __ticketpair_t;
  #else
 @@ -11,6 +17,8 @@ typedef u16 __ticket_t;
  typedef u32 __ticketpair_t;
  #endif
  
 +#define TICKET_LOCK_INC  ((__ticket_t)__TICKET_LOCK_INC)
 +
  #define TICKET_SHIFT (sizeof(__ticket_t) * 8)
  
  typedef struct arch_spinlock {
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V9 9/19] Split out rate limiting from jump_label.h

2013-06-03 Thread Konrad Rzeszutek Wilk

On Sun, Jun 02, 2013 at 12:54:22AM +0530, Raghavendra K T wrote:
 Split jumplabel ratelimit

I would change the title a bit, perhaps prefix it with: jump_label: 
 
 From: Andrew Jones drjo...@redhat.com
 
 Commit b202952075f62603bea9bfb6ebc6b0420db11949 introduced rate limiting

Also please add right after the git id this:

(perf, core: Rate limit perf_sched_events jump_label patching)

 for jump label disabling. The changes were made in the jump label code
 in order to be more widely available and to keep things tidier. This is
 all fine, except now jump_label.h includes linux/workqueue.h, which
 makes it impossible to include jump_label.h from anything that
 workqueue.h needs. For example, it's now impossible to include
 jump_label.h from asm/spinlock.h, which is done in proposed
 pv-ticketlock patches. This patch splits out the rate limiting related
 changes from jump_label.h into a new file, jump_label_ratelimit.h, to
 resolve the issue.
 
 Signed-off-by: Andrew Jones drjo...@redhat.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com

Otherwise looks fine to me:

Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 ---
  include/linux/jump_label.h   |   26 +-
  include/linux/jump_label_ratelimit.h |   34 
 ++
  include/linux/perf_event.h   |1 +
  kernel/jump_label.c  |1 +
  4 files changed, 37 insertions(+), 25 deletions(-)
  create mode 100644 include/linux/jump_label_ratelimit.h
 
 diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
 index 0976fc4..53cdf89 100644
 --- a/include/linux/jump_label.h
 +++ b/include/linux/jump_label.h
 @@ -48,7 +48,6 @@
  
  #include linux/types.h
  #include linux/compiler.h
 -#include linux/workqueue.h
  
  #if defined(CC_HAVE_ASM_GOTO)  defined(CONFIG_JUMP_LABEL)
  
 @@ -61,12 +60,6 @@ struct static_key {
  #endif
  };
  
 -struct static_key_deferred {
 - struct static_key key;
 - unsigned long timeout;
 - struct delayed_work work;
 -};
 -
  # include asm/jump_label.h
  # define HAVE_JUMP_LABEL
  #endif   /* CC_HAVE_ASM_GOTO  CONFIG_JUMP_LABEL */
 @@ -119,10 +112,7 @@ extern void arch_jump_label_transform_static(struct 
 jump_entry *entry,
  extern int jump_label_text_reserved(void *start, void *end);
  extern void static_key_slow_inc(struct static_key *key);
  extern void static_key_slow_dec(struct static_key *key);
 -extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
  extern void jump_label_apply_nops(struct module *mod);
 -extern void
 -jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
  
  #define STATIC_KEY_INIT_TRUE ((struct static_key) \
   { .enabled = ATOMIC_INIT(1), .entries = (void *)1 })
 @@ -141,10 +131,6 @@ static __always_inline void jump_label_init(void)
  {
  }
  
 -struct static_key_deferred {
 - struct static_key  key;
 -};
 -
  static __always_inline bool static_key_false(struct static_key *key)
  {
   if (unlikely(atomic_read(key-enabled))  0)
 @@ -169,11 +155,6 @@ static inline void static_key_slow_dec(struct static_key 
 *key)
   atomic_dec(key-enabled);
  }
  
 -static inline void static_key_slow_dec_deferred(struct static_key_deferred 
 *key)
 -{
 - static_key_slow_dec(key-key);
 -}
 -
  static inline int jump_label_text_reserved(void *start, void *end)
  {
   return 0;
 @@ -187,12 +168,6 @@ static inline int jump_label_apply_nops(struct module 
 *mod)
   return 0;
  }
  
 -static inline void
 -jump_label_rate_limit(struct static_key_deferred *key,
 - unsigned long rl)
 -{
 -}
 -
  #define STATIC_KEY_INIT_TRUE ((struct static_key) \
   { .enabled = ATOMIC_INIT(1) })
  #define STATIC_KEY_INIT_FALSE ((struct static_key) \
 @@ -203,6 +178,7 @@ jump_label_rate_limit(struct static_key_deferred *key,
  #define STATIC_KEY_INIT STATIC_KEY_INIT_FALSE
  #define jump_label_enabled static_key_enabled
  
 +static inline int atomic_read(const atomic_t *v);
  static inline bool static_key_enabled(struct static_key *key)
  {
   return (atomic_read(key-enabled)  0);
 diff --git a/include/linux/jump_label_ratelimit.h 
 b/include/linux/jump_label_ratelimit.h
 new file mode 100644
 index 000..1137883
 --- /dev/null
 +++ b/include/linux/jump_label_ratelimit.h
 @@ -0,0 +1,34 @@
 +#ifndef _LINUX_JUMP_LABEL_RATELIMIT_H
 +#define _LINUX_JUMP_LABEL_RATELIMIT_H
 +
 +#include linux/jump_label.h
 +#include linux/workqueue.h
 +
 +#if defined(CC_HAVE_ASM_GOTO)  defined(CONFIG_JUMP_LABEL)
 +struct static_key_deferred {
 + struct static_key key;
 + unsigned long timeout;
 + struct delayed_work work;
 +};
 +#endif
 +
 +#ifdef HAVE_JUMP_LABEL
 +extern void static_key_slow_dec_deferred(struct static_key_deferred *key);
 +extern void
 +jump_label_rate_limit(struct static_key_deferred *key, unsigned long rl);
 +
 +#else/* !HAVE_JUMP_LABEL */
 +struct static_key_deferred {
 + struct static_key  key;

Re: [PATCH RFC V9 12/19] xen: Enable PV ticketlocks on HVM Xen

2013-06-03 Thread Konrad Rzeszutek Wilk

On Sun, Jun 02, 2013 at 12:55:03AM +0530, Raghavendra K T wrote:
 xen: Enable PV ticketlocks on HVM Xen

There is more to it. You should also revert 
70dd4998cb85f0ecd6ac892cc7232abefa432efb

 
 From: Stefano Stabellini stefano.stabell...@eu.citrix.com
 
 Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  arch/x86/xen/smp.c |1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
 index dcdc91c..8d2abf7 100644
 --- a/arch/x86/xen/smp.c
 +++ b/arch/x86/xen/smp.c
 @@ -682,4 +682,5 @@ void __init xen_hvm_smp_init(void)
   smp_ops.cpu_die = xen_hvm_cpu_die;
   smp_ops.send_call_func_ipi = xen_smp_send_call_function_ipi;
   smp_ops.send_call_func_single_ipi = 
 xen_smp_send_call_function_single_ipi;
 + xen_init_spinlocks();
  }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V9 16/19] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-06-03 Thread Konrad Rzeszutek Wilk

On Sun, Jun 02, 2013 at 12:55:57AM +0530, Raghavendra K T wrote:
 kvm : Paravirtual ticketlocks support for linux guests running on KVM 
 hypervisor
 
 From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 
 During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
 required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
  support for pv-ticketlocks is registered via pv_lock_ops.
 
 Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
 
 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 Signed-off-by: Suzuki Poulose suz...@in.ibm.com
 [Raghu: check_zero race fix, enum for kvm_contention_stat
 jumplabel related changes ]
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  arch/x86/include/asm/kvm_para.h |   14 ++
  arch/x86/kernel/kvm.c   |  256 
 +++
  2 files changed, 268 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
 index 695399f..427afcb 100644
 --- a/arch/x86/include/asm/kvm_para.h
 +++ b/arch/x86/include/asm/kvm_para.h
 @@ -118,10 +118,20 @@ void kvm_async_pf_task_wait(u32 token);
  void kvm_async_pf_task_wake(u32 token);
  u32 kvm_read_and_reset_pf_reason(void);
  extern void kvm_disable_steal_time(void);
 -#else
 -#define kvm_guest_init() do { } while (0)
 +
 +#ifdef CONFIG_PARAVIRT_SPINLOCKS
 +void __init kvm_spinlock_init(void);
 +#else /* !CONFIG_PARAVIRT_SPINLOCKS */
 +static inline void kvm_spinlock_init(void)
 +{
 +}
 +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
 +
 +#else /* CONFIG_KVM_GUEST */
 +#define kvm_guest_init() do {} while (0)
  #define kvm_async_pf_task_wait(T) do {} while(0)
  #define kvm_async_pf_task_wake(T) do {} while(0)
 +
  static inline u32 kvm_read_and_reset_pf_reason(void)
  {
   return 0;
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index cd6d9a5..2715b92 100644
 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -34,6 +34,7 @@
  #include linux/sched.h
  #include linux/slab.h
  #include linux/kprobes.h
 +#include linux/debugfs.h
  #include asm/timer.h
  #include asm/cpu.h
  #include asm/traps.h
 @@ -419,6 +420,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
   WARN_ON(kvm_register_clock(primary cpu clock));
   kvm_guest_cpu_init();
   native_smp_prepare_boot_cpu();
 + kvm_spinlock_init();
  }
  
  static void __cpuinit kvm_guest_cpu_online(void *dummy)
 @@ -523,3 +525,257 @@ static __init int activate_jump_labels(void)
   return 0;
  }
  arch_initcall(activate_jump_labels);
 +
 +/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
 +void kvm_kick_cpu(int cpu)
 +{
 + int apicid;
 +
 + apicid = per_cpu(x86_cpu_to_apicid, cpu);
 + kvm_hypercall1(KVM_HC_KICK_CPU, apicid);
 +}
 +
 +#ifdef CONFIG_PARAVIRT_SPINLOCKS
 +
 +enum kvm_contention_stat {
 + TAKEN_SLOW,
 + TAKEN_SLOW_PICKUP,
 + RELEASED_SLOW,
 + RELEASED_SLOW_KICKED,
 + NR_CONTENTION_STATS
 +};
 +
 +#ifdef CONFIG_KVM_DEBUG_FS
 +#define HISTO_BUCKETS30
 +
 +static struct kvm_spinlock_stats
 +{
 + u32 contention_stats[NR_CONTENTION_STATS];
 + u32 histo_spin_blocked[HISTO_BUCKETS+1];
 + u64 time_blocked;
 +} spinlock_stats;
 +
 +static u8 zero_stats;
 +
 +static inline void check_zero(void)
 +{
 + u8 ret;
 + u8 old;
 +
 + old = ACCESS_ONCE(zero_stats);
 + if (unlikely(old)) {
 + ret = cmpxchg(zero_stats, old, 0);
 + /* This ensures only one fellow resets the stat */
 + if (ret == old)
 + memset(spinlock_stats, 0, sizeof(spinlock_stats));
 + }
 +}
 +
 +static inline void add_stats(enum kvm_contention_stat var, u32 val)
 +{
 + check_zero();
 + spinlock_stats.contention_stats[var] += val;
 +}
 +
 +
 +static inline u64 spin_time_start(void)
 +{
 + return sched_clock();
 +}
 +
 +static void __spin_time_accum(u64 delta, u32 *array)
 +{
 + unsigned index;
 +
 + index = ilog2(delta);
 + check_zero();
 +
 + if (index  HISTO_BUCKETS)
 + array[index]++;
 + else
 + array[HISTO_BUCKETS]++;
 +}
 +
 +static inline void spin_time_accum_blocked(u64 start)
 +{
 + u32 delta;
 +
 + delta = sched_clock() - start;
 + __spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
 + spinlock_stats.time_blocked += delta;
 +}
 +
 +static struct dentry *d_spin_debug;
 +static struct dentry *d_kvm_debug;
 +
 +struct dentry *kvm_init_debugfs(void)
 +{
 + d_kvm_debug = debugfs_create_dir(kvm, NULL);
 + if (!d_kvm_debug)
 + printk(KERN_WARNING Could not create 'kvm' debugfs 
 directory\n);
 +
 + return d_kvm_debug;
 +}
 +
 +static int __init kvm_spinlock_debugfs(void)
 +{
 + struct dentry *d_kvm;
 +
 + d_kvm = kvm_init_debugfs();
 + if (d_kvm == NULL)
 + return -ENOMEM;
 +
 + d_spin_debug = debugfs_create_dir(spinlocks, d_kvm);
 +
 +

Re: [PATCH RFC V9 5/19] xen/pvticketlock: Xen implementation for PV ticket locks

2013-06-03 Thread Konrad Rzeszutek Wilk

On Sat, Jun 01, 2013 at 12:23:14PM -0700, Raghavendra K T wrote:
 xen/pvticketlock: Xen implementation for PV ticket locks
 
 From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 
 Replace the old Xen implementation of PV spinlocks with and implementation
 of xen_lock_spinning and xen_unlock_kick.
 
 xen_lock_spinning simply registers the cpu in its entry in lock_waiting,
 adds itself to the waiting_cpus set, and blocks on an event channel
 until the channel becomes pending.
 
 xen_unlock_kick searches the cpus in waiting_cpus looking for the one
 which next wants this lock with the next ticket, if any.  If found,
 it kicks it by making its event channel pending, which wakes it up.
 
 We need to make sure interrupts are disabled while we're relying on the
 contents of the per-cpu lock_waiting values, otherwise an interrupt
 handler could come in, try to take some other lock, block, and overwrite
 our values.
 
 Raghu: use function + enum instead of macro, cmpxchg for zero status reset
 
 Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  arch/x86/xen/spinlock.c |  347 
 +++
  1 file changed, 78 insertions(+), 269 deletions(-)
 
 diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
 index d6481a9..860e190 100644
 --- a/arch/x86/xen/spinlock.c
 +++ b/arch/x86/xen/spinlock.c
 @@ -16,45 +16,44 @@
  #include xen-ops.h
  #include debugfs.h
  
 -#ifdef CONFIG_XEN_DEBUG_FS
 -static struct xen_spinlock_stats
 -{
 - u64 taken;
 - u32 taken_slow;
 - u32 taken_slow_nested;
 - u32 taken_slow_pickup;
 - u32 taken_slow_spurious;
 - u32 taken_slow_irqenable;
 +enum xen_contention_stat {
 + TAKEN_SLOW,
 + TAKEN_SLOW_PICKUP,
 + TAKEN_SLOW_SPURIOUS,
 + RELEASED_SLOW,
 + RELEASED_SLOW_KICKED,
 + NR_CONTENTION_STATS
 +};
  
 - u64 released;
 - u32 released_slow;
 - u32 released_slow_kicked;
  
 +#ifdef CONFIG_XEN_DEBUG_FS
  #define HISTO_BUCKETS30
 - u32 histo_spin_total[HISTO_BUCKETS+1];
 - u32 histo_spin_spinning[HISTO_BUCKETS+1];
 +static struct xen_spinlock_stats
 +{
 + u32 contention_stats[NR_CONTENTION_STATS];
   u32 histo_spin_blocked[HISTO_BUCKETS+1];
 -
 - u64 time_total;
 - u64 time_spinning;
   u64 time_blocked;
  } spinlock_stats;
  
  static u8 zero_stats;
  
 -static unsigned lock_timeout = 1  10;
 -#define TIMEOUT lock_timeout
 -
  static inline void check_zero(void)
  {
 - if (unlikely(zero_stats)) {
 - memset(spinlock_stats, 0, sizeof(spinlock_stats));
 - zero_stats = 0;
 + u8 ret;
 + u8 old = ACCESS_ONCE(zero_stats);
 + if (unlikely(old)) {
 + ret = cmpxchg(zero_stats, old, 0);
 + /* This ensures only one fellow resets the stat */
 + if (ret == old)
 + memset(spinlock_stats, 0, sizeof(spinlock_stats));
   }
  }
  
 -#define ADD_STATS(elem, val) \
 - do { check_zero(); spinlock_stats.elem += (val); } while(0)
 +static inline void add_stats(enum xen_contention_stat var, u32 val)
 +{
 + check_zero();
 + spinlock_stats.contention_stats[var] += val;
 +}
  
  static inline u64 spin_time_start(void)
  {
 @@ -73,22 +72,6 @@ static void __spin_time_accum(u64 delta, u32 *array)
   array[HISTO_BUCKETS]++;
  }
  
 -static inline void spin_time_accum_spinning(u64 start)
 -{
 - u32 delta = xen_clocksource_read() - start;
 -
 - __spin_time_accum(delta, spinlock_stats.histo_spin_spinning);
 - spinlock_stats.time_spinning += delta;
 -}
 -
 -static inline void spin_time_accum_total(u64 start)
 -{
 - u32 delta = xen_clocksource_read() - start;
 -
 - __spin_time_accum(delta, spinlock_stats.histo_spin_total);
 - spinlock_stats.time_total += delta;
 -}
 -
  static inline void spin_time_accum_blocked(u64 start)
  {
   u32 delta = xen_clocksource_read() - start;
 @@ -98,19 +81,15 @@ static inline void spin_time_accum_blocked(u64 start)
  }
  #else  /* !CONFIG_XEN_DEBUG_FS */
  #define TIMEOUT  (1  10)
 -#define ADD_STATS(elem, val) do { (void)(val); } while(0)
 +static inline void add_stats(enum xen_contention_stat var, u32 val)
 +{
 +}
  
  static inline u64 spin_time_start(void)
  {
   return 0;
  }
  
 -static inline void spin_time_accum_total(u64 start)
 -{
 -}
 -static inline void spin_time_accum_spinning(u64 start)
 -{
 -}
  static inline void spin_time_accum_blocked(u64 start)
  {
  }
 @@ -133,229 +112,82 @@ typedef u16 xen_spinners_t;
   asm(LOCK_PREFIX  decw %0 : +m ((xl)-spinners) : : memory);
  #endif
  
 -struct xen_spinlock {
 - unsigned char lock; /* 0 - free; 1 - locked */
 - xen_spinners_t spinners;/* count of waiting cpus */
 +struct xen_lock_waiting {
 + struct arch_spinlock *lock;
 +

Re: [PATCH RFC V9 19/19] kvm hypervisor: Add directed yield in vcpu block path

2013-06-03 Thread Konrad Rzeszutek Wilk

On Sun, Jun 02, 2013 at 12:56:45AM +0530, Raghavendra K T wrote:
 kvm hypervisor: Add directed yield in vcpu block path
 
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 
 We use the improved PLE handler logic in vcpu block patch for
 scheduling rather than plain schedule, so that we can make
 intelligent decisions

You are missing '.' there, and

 
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  arch/ia64/include/asm/kvm_host.h|5 +
  arch/powerpc/include/asm/kvm_host.h |5 +
  arch/s390/include/asm/kvm_host.h|5 +
  arch/x86/include/asm/kvm_host.h |2 +-
  arch/x86/kvm/x86.c  |8 
  include/linux/kvm_host.h|2 +-
  virt/kvm/kvm_main.c |6 --
  7 files changed, 29 insertions(+), 4 deletions(-)
 
 diff --git a/arch/ia64/include/asm/kvm_host.h 
 b/arch/ia64/include/asm/kvm_host.h
 index 989dd3f..999ab15 100644
 --- a/arch/ia64/include/asm/kvm_host.h
 +++ b/arch/ia64/include/asm/kvm_host.h
 @@ -595,6 +595,11 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu);
  int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
  void kvm_sal_emul(struct kvm_vcpu *vcpu);
  
 +static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
 +{
 + schedule();
 +}
 +
  #define __KVM_HAVE_ARCH_VM_ALLOC 1
  struct kvm *kvm_arch_alloc_vm(void);
  void kvm_arch_free_vm(struct kvm *kvm);
 diff --git a/arch/powerpc/include/asm/kvm_host.h 
 b/arch/powerpc/include/asm/kvm_host.h
 index af326cd..1aeecc0 100644
 --- a/arch/powerpc/include/asm/kvm_host.h
 +++ b/arch/powerpc/include/asm/kvm_host.h
 @@ -628,4 +628,9 @@ struct kvm_vcpu_arch {
  #define __KVM_HAVE_ARCH_WQP
  #define __KVM_HAVE_CREATE_DEVICE
  
 +static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
 +{
 + schedule();
 +}
 +
  #endif /* __POWERPC_KVM_HOST_H__ */
 diff --git a/arch/s390/include/asm/kvm_host.h 
 b/arch/s390/include/asm/kvm_host.h
 index 16bd5d1..db09a56 100644
 --- a/arch/s390/include/asm/kvm_host.h
 +++ b/arch/s390/include/asm/kvm_host.h
 @@ -266,4 +266,9 @@ struct kvm_arch{
  };
  
  extern int sie64a(struct kvm_s390_sie_block *, u64 *);
 +static inline void kvm_do_schedule(struct kvm_vcpu *vcpu)
 +{
 + schedule();
 +}
 +
  #endif
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index 95702de..72ff791 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -1042,5 +1042,5 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct 
 msr_data *msr_info);
  int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
  void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
  void kvm_deliver_pmi(struct kvm_vcpu *vcpu);
 -
 +void kvm_do_schedule(struct kvm_vcpu *vcpu);
  #endif /* _ASM_X86_KVM_HOST_H */
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index b963c86..d26c4be 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -7281,6 +7281,14 @@ bool kvm_arch_can_inject_async_page_present(struct 
 kvm_vcpu *vcpu)
   kvm_x86_ops-interrupt_allowed(vcpu);
  }
  
 +void kvm_do_schedule(struct kvm_vcpu *vcpu)
 +{
 + /* We try to yield to a kikced vcpu else do a schedule */

s/kikced/kicked/

 + if (kvm_vcpu_on_spin(vcpu) = 0)
 + schedule();
 +}
 +EXPORT_SYMBOL_GPL(kvm_do_schedule);
 +
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index f0eea07..39efc18 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -565,7 +565,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct 
 kvm_memory_slot *memslot,
  void kvm_vcpu_block(struct kvm_vcpu *vcpu);
  void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
  bool kvm_vcpu_yield_to(struct kvm_vcpu *target);
 -void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
 +bool kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
  void kvm_resched(struct kvm_vcpu *vcpu);
  void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
  void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 302681c..8387247 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -1685,7 +1685,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
   if (signal_pending(current))
   break;
  
 - schedule();
 + kvm_do_schedule(vcpu);
   }
  
   finish_wait(vcpu-wq, wait);
 @@ -1786,7 +1786,7 @@ bool kvm_vcpu_eligible_for_directed_yield(struct 
 kvm_vcpu *vcpu)
  }
  #endif
  
 -void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 +bool kvm_vcpu_on_spin(struct kvm_vcpu *me)
  {
   struct kvm *kvm = me-kvm;
   struct kvm_vcpu *vcpu;
 @@ -1835,6 +1835,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
  
   /* Ensure vcpu is not eligible during next spinloop */
   kvm_vcpu_set_dy_eligible(me, false);
 +
 + return yielded;
  }

Re: [PATCH RFC V9 18/19] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock

2013-06-03 Thread Konrad Rzeszutek Wilk

On Sun, Jun 02, 2013 at 12:56:24AM +0530, Raghavendra K T wrote:
 Documentation/kvm : Add documentation on Hypercalls and features used for PV 
 spinlock
 
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 
 KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in paravirtual spinlock
 enabled guest.
 
 KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be 
 enabled
 in guest.
 
 Thanks Vatsa for rewriting KVM_HC_KICK_CPU
 
 Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  Documentation/virtual/kvm/cpuid.txt  |4 
  Documentation/virtual/kvm/hypercalls.txt |   13 +
  2 files changed, 17 insertions(+)
 
 diff --git a/Documentation/virtual/kvm/cpuid.txt 
 b/Documentation/virtual/kvm/cpuid.txt
 index 83afe65..654f43c 100644
 --- a/Documentation/virtual/kvm/cpuid.txt
 +++ b/Documentation/virtual/kvm/cpuid.txt
 @@ -43,6 +43,10 @@ KVM_FEATURE_CLOCKSOURCE2   || 3 || kvmclock 
 available at msrs
  KVM_FEATURE_ASYNC_PF   || 4 || async pf can be enabled by
 ||   || writing to msr 0x4b564d02
  
 --
 +KVM_FEATURE_PV_UNHALT  || 6 || guest checks this feature bit
 +   ||   || before enabling 
 paravirtualized
 +   ||   || spinlock support.
 +--
  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no 
 guest-side
 ||   || per-cpu warps are expected in
 ||   || kvmclock.
 diff --git a/Documentation/virtual/kvm/hypercalls.txt 
 b/Documentation/virtual/kvm/hypercalls.txt
 index ea113b5..2a4da11 100644
 --- a/Documentation/virtual/kvm/hypercalls.txt
 +++ b/Documentation/virtual/kvm/hypercalls.txt
 @@ -64,3 +64,16 @@ Purpose: To enable communication between the hypervisor 
 and guest there is a
  shared page that contains parts of supervisor visible register state.
  The guest can map this shared page to access its supervisor register through
  memory using this hypercall.
 +
 +5. KVM_HC_KICK_CPU
 +
 +Architecture: x86
 +Status: active
 +Purpose: Hypercall used to wakeup a vcpu from HLT state
 +Usage example : A vcpu of a paravirtualized guest that is busywaiting in 
 guest
 +kernel mode for an event to occur (ex: a spinlock to become available) can
 +execute HLT instruction once it has busy-waited for more than a threshold
 +time-interval. Execution of HLT instruction would cause the hypervisor to put
 +the vcpu to sleep until occurence of an appropriate event. Another vcpu of 
 the
 +same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall,
 +specifying APIC ID of the vcpu to be wokenup.

woken up.
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: fix sil/dil/bpl/spl in the mod/rm fields

2013-06-03 Thread Gleb Natapov

On Mon, Jun 03, 2013 at 06:42:11PM +0300, Avi Kivity wrote:
 On Thu, May 30, 2013 at 7:34 PM, Paolo Bonzini pbonz...@redhat.com wrote:
  Il 30/05/2013 17:34, Paolo Bonzini ha scritto:
  Il 30/05/2013 16:35, Paolo Bonzini ha scritto:
  The x86-64 extended low-byte registers were fetched correctly from reg,
  but not from mod/rm.
 
  This fixes another bug in the boot of RHEL5.9 64-bit, but it is still
  not enough.
 
  Well, it is enough but it takes 2 minutes to reach the point where
  hardware virtualization is used.  It is doing a lot of stuff in
  emulation mode because FS and GS have leftovers from the A20 test:
 
  FS =   9300 DPL=0 DS16 [-WA]
  GS = 0000  9300 DPL=0 DS16 [-WA]
 
  0x000113be:  in $0x92,%al
  0x000113c0:  or $0x2,%al
  0x000113c2:  out%al,$0x92
  0x000113c4:  xor%ax,%ax
  0x000113c6:  mov%ax,%fs
  0x000113c8:  dec%ax
  0x000113c9:  mov%ax,%gs
  0x000113cb:  inc%ax
  0x000113cc:  mov%ax,%fs:0x200
  0x000113d0:  cmp%gs:0x210,%ax
  0x000113d5:  je 0x113cb
 
  The DPL  RPL test fails.  Any ideas?  Should we introduce a new
  intermediate value for emulate_invalid_guest_state (0=none, 1=some, 
  2=full)?
 
  One idea could be to replace invalid descriptors with NULL ones.  Then
  you can intercept this in the #GP handler and trigger emulation for that
  instruction only.
 
 Won't work, vmx won't let you enter in such a configuration.
 
Why? It is possible to have NULL descriptor in 32bit mode with vmx. But
we do not usually intercept #GP while executing 32bit mode, so we will
have to track if there is artificial NULL selector and enables #GP
interception and then emulate on every #GP.

 Maybe you can detect the exact code sequence (%eip, some instructions,
 register state) and clear %fs and %gs.
My be we can set dpl to rpl unconditionally on a switch from 16 to 32
bit. The only problem I can see with it is that if a guest enters user
mode without explicitly reload the segment it will be accessible by a
user mode code, but I am not sure it is well defined what dpl of a 16
bit segment is after transition to 32 bit mode anyway, so it would be
crazy to do so.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

VirtIO and BSOD On Windows Server 2003

2013-06-03 Thread Aaron Clausen

I've been merrily running a Windows Server 2003 with Exchange vm under
an older version of KVM (0.12.5) under Debian 6 (Squeeze), for over a
year.

I recently built a new kvm server with Debian Wheezy which comes with
KVM 1.1.2 and when I moved this guest over, I immediately started
getting BSODs (0x007). I disabled virtio block driver and then
attempted to upgrade to the latest with no luck.

Right now I have it running under IDE, and because the new server is
pretty spunky, I don't see any performance issues, but I have another
Server 2003 vm that is a file server to move over and I'm concerned
that I start getting a few IDE emulated guests and things will start
to get ugly.

Oddly enough, I have a Server 2012 (x64) running on virtio that moved
over without a hitch, so this is clearly a Server 2003 issue.

The most obvious solution at the moment is to downgrade to Debian
Squeeze, and that's the course I may take for the time being, but
that's not much of a long term solution.

I've done some research and this does seem to be an issue with Windows
XP/Server 2003  guests but clearly the issue here is not just virtio
drivers, but interaction between the drivers and a newer version of
kvm.

Does anybody have any thoughts or workarounds?

--
Aaron Clausen
mightymartia...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: fix sil/dil/bpl/spl in the mod/rm fields

2013-06-03 Thread Paolo Bonzini

Il 03/06/2013 18:40, Gleb Natapov ha scritto:
  Won't work, vmx won't let you enter in such a configuration.
 
 Why? It is possible to have NULL descriptor in 32bit mode with vmx. But
 we do not usually intercept #GP while executing 32bit mode, so we will
 have to track if there is artificial NULL selector and enables #GP
 interception and then emulate on every #GP.

Yes, that's what I had in mind.  Of course for invalid CS you do have to
emulate.

  Maybe you can detect the exact code sequence (%eip, some instructions,
  register state) and clear %fs and %gs.
 My be we can set dpl to rpl unconditionally on a switch from 16 to 32
 bit. The only problem I can see with it is that if a guest enters user
 mode without explicitly reload the segment it will be accessible by a
 user mode code, but I am not sure it is well defined what dpl of a 16
 bit segment is after transition to 32 bit mode anyway, so it would be
 crazy to do so.

That too, or just set it to 3.  But perhaps the #GP interception
wouldn't be too hard.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: fix sil/dil/bpl/spl in the mod/rm fields

2013-06-03 Thread Gleb Natapov

On Mon, Jun 03, 2013 at 08:30:18PM +0300, Avi Kivity wrote:
 On Jun 3, 2013 7:41 PM, Gleb Natapov g...@redhat.com wrote:
 
  On Mon, Jun 03, 2013 at 06:42:11PM +0300, Avi Kivity wrote:
   On Thu, May 30, 2013 at 7:34 PM, Paolo Bonzini pbonz...@redhat.com
 wrote:
Il 30/05/2013 17:34, Paolo Bonzini ha scritto:
Il 30/05/2013 16:35, Paolo Bonzini ha scritto:
The x86-64 extended low-byte registers were fetched correctly from
 reg,
but not from mod/rm.
   
This fixes another bug in the boot of RHEL5.9 64-bit, but it is
 still
not enough.
   
Well, it is enough but it takes 2 minutes to reach the point where
hardware virtualization is used.  It is doing a lot of stuff in
emulation mode because FS and GS have leftovers from the A20 test:
   
FS =   9300 DPL=0 DS16 [-WA]
GS = 0000  9300 DPL=0 DS16 [-WA]
   
0x000113be:  in $0x92,%al
0x000113c0:  or $0x2,%al
0x000113c2:  out%al,$0x92
0x000113c4:  xor%ax,%ax
0x000113c6:  mov%ax,%fs
0x000113c8:  dec%ax
0x000113c9:  mov%ax,%gs
0x000113cb:  inc%ax
0x000113cc:  mov%ax,%fs:0x200
0x000113d0:  cmp%gs:0x210,%ax
0x000113d5:  je 0x113cb
   
The DPL  RPL test fails.  Any ideas?  Should we introduce a new
intermediate value for emulate_invalid_guest_state (0=none, 1=some,
 2=full)?
   
One idea could be to replace invalid descriptors with NULL ones.  Then
you can intercept this in the #GP handler and trigger emulation for
 that
instruction only.
  
   Won't work, vmx won't let you enter in such a configuration.
  
  Why? It is possible to have NULL descriptor in 32bit mode with vmx. But
  we do not usually intercept #GP while executing 32bit mode, so we will
  have to track if there is artificial NULL selector and enables #GP
  interception and then emulate on every #GP.
 
 Sorry, was thinking of virtual-8086 mode. It should work.
 
 
   Maybe you can detect the exact code sequence (%eip, some instructions,
   register state) and clear %fs and %gs.
  My be we can set dpl to rpl unconditionally on a switch from 16 to 32
  bit. The only problem I can see with it is that if a guest enters user
  mode without explicitly reload the segment it will be accessible by a
  user mode code, but I am not sure it is well defined what dpl of a 16
  bit segment is after transition to 32 bit mode anyway, so it would be
  crazy to do so.
 
 The problem is you cannot detect a segment reload if you do that.Trapping
 #GP preserves correctness in all cases (at the cost of some complexity).
 
I do not see why I would want to detect reload. Setting segment to NULL
has a disadvantage that if guest will read selector it will get wrong
value, but may be we can leave selector alone and mark segment unusable.
I always wondered what VMX has unusable attribute for, may be this is
it.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 2/6] KVM: PPC: Book3E: Refactor SPE_FP exit handling

2013-06-03 Thread Mihai Caraman

SPE_FP interrupts are shared with ALTIVEC. Refactor SPE_FP exit handling
to detect KVM support for the featured unit at run-time, in order to
accommodate ALTIVEC later.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
 arch/powerpc/kvm/booke.c |   80 ++
 1 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 1020119..d082bbc 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -822,6 +822,15 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
}
 }
 
+static inline bool kvmppc_supports_spe(void)
+{
+#ifdef CONFIG_SPE
+   if (cpu_has_feature(CPU_FTR_SPE))
+   return true;
+#endif
+   return false;
+}
+
 /**
  * kvmppc_handle_exit
  *
@@ -931,42 +940,71 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
r = RESUME_GUEST;
break;
 
-#ifdef CONFIG_SPE
case BOOKE_INTERRUPT_SPE_UNAVAIL: {
-   if (vcpu-arch.shared-msr  MSR_SPE)
-   kvmppc_vcpu_enable_spe(vcpu);
-   else
-   kvmppc_booke_queue_irqprio(vcpu,
-  BOOKE_IRQPRIO_SPE_UNAVAIL);
+   /*
+* The interrupt is shared, KVM support for the featured unit
+* is detected at run-time.
+*/
+   bool handled = false;
+
+   if (kvmppc_supports_spe()) {
+#ifdef CONFIG_SPE
+   if (cpu_has_feature(CPU_FTR_SPE))
+   if (vcpu-arch.shared-msr  MSR_SPE) {
+   kvmppc_vcpu_enable_spe(vcpu);
+   handled = true;
+   }
+#endif
+   if (!handled)
+   kvmppc_booke_queue_irqprio(vcpu,
+   BOOKE_IRQPRIO_SPE_UNAVAIL);
+   } else {
+   /* 
+* Guest wants SPE, but host kernel doesn't support it.
+* Send an unimplemented operation program check to
+* the guest.
+*/
+   kvmppc_core_queue_program(vcpu, ESR_PUO | ESR_SPV);
+   }
+
r = RESUME_GUEST;
break;
}
 
case BOOKE_INTERRUPT_SPE_FP_DATA:
-   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_DATA);
-   r = RESUME_GUEST;
+   /*
+* The interrupt is shared, KVM support for the featured unit
+* is detected at run-time.
+*/
+   if (kvmppc_supports_spe()) {
+   kvmppc_booke_queue_irqprio(vcpu,
+   BOOKE_IRQPRIO_SPE_FP_DATA);
+   r = RESUME_GUEST;
+   } else {
+   /*
+* These really should never happen without CONFIG_SPE,
+* as we should never enable the real MSR[SPE] in the
+* guest.
+*/
+   printk(KERN_CRIT %s: unexpected SPE interrupt %u at \
+   %08lx\n, __func__, exit_nr, vcpu-arch.pc);
+   run-hw.hardware_exit_reason = exit_nr;
+   r = RESUME_HOST;
+   }
+
break;
 
case BOOKE_INTERRUPT_SPE_FP_ROUND:
+#ifdef CONFIG_SPE
kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_ROUND);
r = RESUME_GUEST;
break;
 #else
-   case BOOKE_INTERRUPT_SPE_UNAVAIL:
/*
-* Guest wants SPE, but host kernel doesn't support it.  Send
-* an unimplemented operation program check to the guest.
+* These really should never happen without CONFIG_SPE,
+* as we should never enable the real MSR[SPE] in the
+* guest.
 */
-   kvmppc_core_queue_program(vcpu, ESR_PUO | ESR_SPV);
-   r = RESUME_GUEST;
-   break;
-
-   /*
-* These really should never happen without CONFIG_SPE,
-* as we should never enable the real MSR[SPE] in the guest.
-*/
-   case BOOKE_INTERRUPT_SPE_FP_DATA:
-   case BOOKE_INTERRUPT_SPE_FP_ROUND:
printk(KERN_CRIT %s: unexpected SPE interrupt %u at %08lx\n,
   __func__, exit_nr, vcpu-arch.pc);
run-hw.hardware_exit_reason = exit_nr;
-- 
1.7.4.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 6/6] KVM: PPC: Book3E: Enhance FPU laziness

2013-06-03 Thread Mihai Caraman

Adopt AltiVec approach to increase laziness by calling kvmppc_load_guest_fp()
just before returning to guest instaed of each sched in.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
 arch/powerpc/kvm/booke.c  |1 +
 arch/powerpc/kvm/e500mc.c |2 --
 2 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 019496d..5382238 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1258,6 +1258,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
} else {
kvmppc_lazy_ee_enable();
kvmppc_load_guest_altivec(vcpu);
+   kvmppc_load_guest_fp(vcpu);
}
}
 
diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
index 9d7f38e..29cf97a 100644
--- a/arch/powerpc/kvm/e500mc.c
+++ b/arch/powerpc/kvm/e500mc.c
@@ -138,8 +138,6 @@ void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
if (vcpu-arch.oldpir != mfspr(SPRN_PIR))
kvmppc_e500_tlbil_all(vcpu_e500);
-
-   kvmppc_load_guest_fp(vcpu);
 }
 
 void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
-- 
1.7.4.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 1/6] KVM: PPC: Book3E: Fix AltiVec interrupt numbers and build breakage

2013-06-03 Thread Mihai Caraman

Interrupt numbers defined for Book3E follows IVORs definition. Align
BOOKE_INTERRUPT_ALTIVEC_UNAVAIL and BOOKE_INTERRUPT_ALTIVEC_ASSIST to this
rule which also fixes the build breakage.
IVORs 32 and 33 are shared so reflect this in the interrupts naming.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
 arch/powerpc/include/asm/kvm_asm.h |   16 ++--
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h 
b/arch/powerpc/include/asm/kvm_asm.h
index b9dd382..851bac7 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -54,8 +54,16 @@
 #define BOOKE_INTERRUPT_DEBUG 15
 
 /* E500 */
-#define BOOKE_INTERRUPT_SPE_UNAVAIL 32
-#define BOOKE_INTERRUPT_SPE_FP_DATA 33
+#define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
+#define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
+/*
+ * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same defines
+ */
+#define BOOKE_INTERRUPT_SPE_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
+#define BOOKE_INTERRUPT_SPE_FP_DATA BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
+#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
+#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
+   BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
 #define BOOKE_INTERRUPT_SPE_FP_ROUND 34
 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
 #define BOOKE_INTERRUPT_DOORBELL 36
@@ -67,10 +75,6 @@
 #define BOOKE_INTERRUPT_HV_SYSCALL 40
 #define BOOKE_INTERRUPT_HV_PRIV 41
 
-/* altivec */
-#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL 42
-#define BOOKE_INTERRUPT_ALTIVEC_ASSIST 43
-
 /* book3s */
 
 #define BOOK3S_INTERRUPT_SYSTEM_RESET  0x100
-- 
1.7.4.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 4/6] KVM: PPC: Book3E: Add AltiVec support

2013-06-03 Thread Mihai Caraman

KVM Book3E FPU support gracefully reuse host infrastructure so we do the
same for AltiVec. To keep AltiVec lazy call kvmppc_load_guest_altivec()
just when returning to guest instead of each sched in.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
 arch/powerpc/kvm/booke.c  |   74 +++-
 arch/powerpc/kvm/e500mc.c |8 +
 2 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index c08b04b..01eb635 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -134,6 +134,23 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu)
 }
 
 /*
+ * Simulate AltiVec unavailable fault to load guest state
+ * from thread to AltiVec unit.
+ * It requires to be called with preemption disabled.
+ */
+static inline void kvmppc_load_guest_altivec(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_ALTIVEC
+   if (cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   if (!(current-thread.regs-msr  MSR_VEC)) {
+   load_up_altivec(NULL);
+   current-thread.regs-msr |= MSR_VEC;
+   }
+   }
+#endif
+}
+
+/*
  * Helper function for full MSR writes.  No need to call this if only
  * EE/CE/ME/DE/RI are changing.
  */
@@ -661,6 +678,12 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
u64 fpr[32];
 #endif
 
+#ifdef CONFIG_ALTIVEC
+   vector128 vr[32];
+   vector128 vscr;
+   int used_vr = 0;
+#endif
+
if (!vcpu-arch.sane) {
kvm_run-exit_reason = KVM_EXIT_INTERNAL_ERROR;
return -EINVAL;
@@ -699,6 +722,22 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
kvmppc_load_guest_fp(vcpu);
 #endif
 
+#ifdef CONFIG_ALTIVEC
+   if (cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   /* Save userspace VEC state in stack */
+   enable_kernel_altivec();
+   memcpy(vr, current-thread.vr, sizeof(current-thread.vr));
+   vscr = current-thread.vscr;
+   used_vr = current-thread.used_vr;
+
+   /* Restore guest VEC state to thread */
+   memcpy(current-thread.vr, vcpu-arch.vr, 
sizeof(vcpu-arch.vr));
+   current-thread.vscr = vcpu-arch.vscr;
+
+   kvmppc_load_guest_altivec(vcpu);
+   }
+#endif
+
ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
/* No need for kvm_guest_exit. It's done in handle_exit.
@@ -719,6 +758,23 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
current-thread.fpexc_mode = fpexc_mode;
 #endif
 
+#ifdef CONFIG_ALTIVEC
+   if (cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   /* Save AltiVec state to thread */
+   if (current-thread.regs-msr  MSR_VEC)
+   giveup_altivec(current);
+
+   /* Save guest state */
+   memcpy(vcpu-arch.vr, current-thread.vr, 
sizeof(vcpu-arch.vr));
+   vcpu-arch.vscr = current-thread.vscr;
+
+   /* Restore userspace state */
+   memcpy(current-thread.vr, vr, sizeof(current-thread.vr));
+   current-thread.vscr = vscr;
+   current-thread.used_vr = used_vr;
+   }
+#endif
+
 out:
vcpu-mode = OUTSIDE_GUEST_MODE;
return ret;
@@ -822,6 +878,19 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
}
 }
 
+/*
+ * Always returns true is AltiVec unit is present, see
+ * kvmppc_core_check_processor_compat().
+ */
+static inline bool kvmppc_supports_altivec(void)
+{
+#ifdef CONFIG_ALTIVEC
+   if (cpu_has_feature(CPU_FTR_ALTIVEC))
+   return true;
+#endif
+   return false;
+}
+
 static inline bool kvmppc_supports_spe(void)
 {
 #ifdef CONFIG_SPE
@@ -947,7 +1016,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 */
bool handled = false;
 
-   if (kvmppc_supports_spe()) {
+   if (kvmppc_supports_altivec() || kvmppc_supports_spe()) {
 #ifdef CONFIG_SPE
if (cpu_has_feature(CPU_FTR_SPE))
if (vcpu-arch.shared-msr  MSR_SPE) {
@@ -976,7 +1045,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 * The interrupt is shared, KVM support for the featured unit
 * is detected at run-time.
 */
-   if (kvmppc_supports_spe()) {
+   if (kvmppc_supports_altivec() || kvmppc_supports_spe()) {
kvmppc_booke_queue_irqprio(vcpu,
BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST);
r = RESUME_GUEST;
@@ -1188,6 +1257,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
r = (s  2) | RESUME_HOST | (r  RESUME_FLAG_NV);
} else {
kvmppc_lazy_ee_enable();
+

[RFC PATCH 0/6] KVM: PPC: Book3E: AltiVec support

2013-06-03 Thread Mihai Caraman

Mihai Caraman (6):
  KVM: PPC: Book3E: Fix AltiVec interrupt numbers and build breakage
  KVM: PPC: Book3E: Refactor SPE_FP exit handling
  KVM: PPC: Book3E: Rename IRQPRIO names to accommodate ALTIVEC
  KVM: PPC: Book3E: Add AltiVec support
  KVM: PPC: Book3E: Add ONE_REG AltiVec support
  KVM: PPC: Book3E: Enhance FPU laziness

 arch/powerpc/include/asm/kvm_asm.h|   16 ++-
 arch/powerpc/kvm/booke.c  |  189 
 arch/powerpc/kvm/booke.h  |4 +-
 arch/powerpc/kvm/bookehv_interrupts.S |8 +-
 arch/powerpc/kvm/e500.c   |   10 +-
 arch/powerpc/kvm/e500_emulate.c   |8 +-
 arch/powerpc/kvm/e500mc.c |   10 ++-
 7 files changed, 199 insertions(+), 46 deletions(-)

-- 
1.7.4.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 3/6] KVM: PPC: Book3E: Rename IRQPRIO names to accommodate ALTIVEC

2013-06-03 Thread Mihai Caraman

Rename BOOKE_IRQPRIO_SPE_UNAVAIL and BOOKE_IRQPRIO_SPE_FP_DATA names
to accommodate ALTIVEC. Replace BOOKE_INTERRUPT_SPE_UNAVAIL and
BOOKE_INTERRUPT_SPE_FP_DATA with the common version.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
 arch/powerpc/kvm/booke.c  |   12 ++--
 arch/powerpc/kvm/booke.h  |4 ++--
 arch/powerpc/kvm/bookehv_interrupts.S |8 
 arch/powerpc/kvm/e500.c   |   10 ++
 arch/powerpc/kvm/e500_emulate.c   |8 
 5 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index d082bbc..c08b04b 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -362,8 +362,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
case BOOKE_IRQPRIO_ITLB_MISS:
case BOOKE_IRQPRIO_SYSCALL:
case BOOKE_IRQPRIO_FP_UNAVAIL:
-   case BOOKE_IRQPRIO_SPE_UNAVAIL:
-   case BOOKE_IRQPRIO_SPE_FP_DATA:
+   case BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL:
+   case BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST:
case BOOKE_IRQPRIO_SPE_FP_ROUND:
case BOOKE_IRQPRIO_AP_UNAVAIL:
allowed = 1;
@@ -940,7 +940,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
r = RESUME_GUEST;
break;
 
-   case BOOKE_INTERRUPT_SPE_UNAVAIL: {
+   case BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL: {
/*
 * The interrupt is shared, KVM support for the featured unit
 * is detected at run-time.
@@ -957,7 +957,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
 #endif
if (!handled)
kvmppc_booke_queue_irqprio(vcpu,
-   BOOKE_IRQPRIO_SPE_UNAVAIL);
+   BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL);
} else {
/* 
 * Guest wants SPE, but host kernel doesn't support it.
@@ -971,14 +971,14 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
break;
}
 
-   case BOOKE_INTERRUPT_SPE_FP_DATA:
+   case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST:
/*
 * The interrupt is shared, KVM support for the featured unit
 * is detected at run-time.
 */
if (kvmppc_supports_spe()) {
kvmppc_booke_queue_irqprio(vcpu,
-   BOOKE_IRQPRIO_SPE_FP_DATA);
+   BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST);
r = RESUME_GUEST;
} else {
/*
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index 5fd1ba6..9e92006 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -32,8 +32,8 @@
 #define BOOKE_IRQPRIO_ALIGNMENT 2
 #define BOOKE_IRQPRIO_PROGRAM 3
 #define BOOKE_IRQPRIO_FP_UNAVAIL 4
-#define BOOKE_IRQPRIO_SPE_UNAVAIL 5
-#define BOOKE_IRQPRIO_SPE_FP_DATA 6
+#define BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL 5
+#define BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST 6
 #define BOOKE_IRQPRIO_SPE_FP_ROUND 7
 #define BOOKE_IRQPRIO_SYSCALL 8
 #define BOOKE_IRQPRIO_AP_UNAVAIL 9
diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index e8ed7d6..8d35dc0 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -295,9 +295,9 @@ kvm_handler BOOKE_INTERRUPT_DTLB_MISS, EX_PARAMS_TLB, \
SPRN_SRR0, SPRN_SRR1, (NEED_EMU | NEED_DEAR | NEED_ESR)
 kvm_handler BOOKE_INTERRUPT_ITLB_MISS, EX_PARAMS_TLB, \
SPRN_SRR0, SPRN_SRR1, 0
-kvm_handler BOOKE_INTERRUPT_SPE_UNAVAIL, EX_PARAMS(GEN), \
+kvm_handler BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL, EX_PARAMS(GEN), \
SPRN_SRR0, SPRN_SRR1, 0
-kvm_handler BOOKE_INTERRUPT_SPE_FP_DATA, EX_PARAMS(GEN), \
+kvm_handler BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST, EX_PARAMS(GEN), \
SPRN_SRR0, SPRN_SRR1, 0
 kvm_handler BOOKE_INTERRUPT_SPE_FP_ROUND, EX_PARAMS(GEN), \
SPRN_SRR0, SPRN_SRR1, 0
@@ -398,8 +398,8 @@ kvm_lvl_handler BOOKE_INTERRUPT_WATCHDOG, \
 kvm_handler BOOKE_INTERRUPT_DTLB_MISS, \
SPRN_SRR0, SPRN_SRR1, (NEED_EMU | NEED_DEAR | NEED_ESR)
 kvm_handler BOOKE_INTERRUPT_ITLB_MISS, SPRN_SRR0, SPRN_SRR1, 0
-kvm_handler BOOKE_INTERRUPT_SPE_UNAVAIL, SPRN_SRR0, SPRN_SRR1, 0
-kvm_handler BOOKE_INTERRUPT_SPE_FP_DATA, SPRN_SRR0, SPRN_SRR1, 0
+kvm_handler BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL, SPRN_SRR0, SPRN_SRR1, 0
+kvm_handler BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST, SPRN_SRR0, SPRN_SRR1, 0
 kvm_handler BOOKE_INTERRUPT_SPE_FP_ROUND, SPRN_SRR0, SPRN_SRR1, 0
 kvm_handler BOOKE_INTERRUPT_PERFORMANCE_MONITOR, SPRN_SRR0, SPRN_SRR1, 0
 kvm_handler BOOKE_INTERRUPT_DOORBELL, SPRN_SRR0, SPRN_SRR1, 0
diff --git a/arch/powerpc/kvm/e500.c

[RFC PATCH 5/6] KVM: PPC: Book3E: Add ONE_REG AltiVec support

2013-06-03 Thread Mihai Caraman

Add ONE_REG support for AltiVec on Book3E.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
 arch/powerpc/kvm/booke.c |   32 
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 01eb635..019496d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1570,6 +1570,22 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
case KVM_REG_PPC_DEBUG_INST:
val = get_reg_val(reg-id, KVMPPC_INST_EHPRIV);
break;
+#ifdef CONFIG_ALTIVEC
+   case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31:
+   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   r = -ENXIO;
+   break;
+   }
+   val.vval = vcpu-arch.vr[reg-id - KVM_REG_PPC_VR0];
+   break;
+   case KVM_REG_PPC_VSCR:
+   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   r = -ENXIO;
+   break;
+   }
+   val = get_reg_val(reg-id, vcpu-arch.vscr.u[3]);
+   break;
+#endif /* CONFIG_ALTIVEC */
default:
r = kvmppc_get_one_reg(vcpu, reg-id, val);
break;
@@ -1643,6 +1659,22 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
kvmppc_set_tcr(vcpu, tcr);
break;
}
+#ifdef CONFIG_ALTIVEC
+   case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31:
+   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   r = -ENXIO;
+   break;
+   }
+   vcpu-arch.vr[reg-id - KVM_REG_PPC_VR0] = val.vval;
+   break;
+   case KVM_REG_PPC_VSCR:
+   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   r = -ENXIO;
+   break;
+   }
+   vcpu-arch.vscr.u[3] = set_reg_val(reg-id, val);
+   break;
+#endif /* CONFIG_ALTIVEC */
default:
r = kvmppc_set_one_reg(vcpu, reg-id, val);
break;
-- 
1.7.4.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 0/6] KVM: PPC: Book3E: AltiVec support

2013-06-03 Thread Mihai Caraman

Mihai Caraman (6):
  KVM: PPC: Book3E: Fix AltiVec interrupt numbers and build breakage
  KVM: PPC: Book3E: Refactor SPE_FP exit handling
  KVM: PPC: Book3E: Rename IRQPRIO names to accommodate ALTIVEC
  KVM: PPC: Book3E: Add AltiVec support
  KVM: PPC: Book3E: Add ONE_REG AltiVec support
  KVM: PPC: Book3E: Enhance FPU laziness

 arch/powerpc/include/asm/kvm_asm.h|   16 ++-
 arch/powerpc/kvm/booke.c  |  189 
 arch/powerpc/kvm/booke.h  |4 +-
 arch/powerpc/kvm/bookehv_interrupts.S |8 +-
 arch/powerpc/kvm/e500.c   |   10 +-
 arch/powerpc/kvm/e500_emulate.c   |8 +-
 arch/powerpc/kvm/e500mc.c |   10 ++-
 7 files changed, 199 insertions(+), 46 deletions(-)

-- 
1.7.4.1


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 5/6] KVM: PPC: Book3E: Add ONE_REG AltiVec support

2013-06-03 Thread Mihai Caraman

Add ONE_REG support for AltiVec on Book3E.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
 arch/powerpc/kvm/booke.c |   32 
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 01eb635..019496d 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1570,6 +1570,22 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
case KVM_REG_PPC_DEBUG_INST:
val = get_reg_val(reg-id, KVMPPC_INST_EHPRIV);
break;
+#ifdef CONFIG_ALTIVEC
+   case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31:
+   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   r = -ENXIO;
+   break;
+   }
+   val.vval = vcpu-arch.vr[reg-id - KVM_REG_PPC_VR0];
+   break;
+   case KVM_REG_PPC_VSCR:
+   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   r = -ENXIO;
+   break;
+   }
+   val = get_reg_val(reg-id, vcpu-arch.vscr.u[3]);
+   break;
+#endif /* CONFIG_ALTIVEC */
default:
r = kvmppc_get_one_reg(vcpu, reg-id, val);
break;
@@ -1643,6 +1659,22 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
kvmppc_set_tcr(vcpu, tcr);
break;
}
+#ifdef CONFIG_ALTIVEC
+   case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31:
+   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   r = -ENXIO;
+   break;
+   }
+   vcpu-arch.vr[reg-id - KVM_REG_PPC_VR0] = val.vval;
+   break;
+   case KVM_REG_PPC_VSCR:
+   if (!cpu_has_feature(CPU_FTR_ALTIVEC)) {
+   r = -ENXIO;
+   break;
+   }
+   vcpu-arch.vscr.u[3] = set_reg_val(reg-id, val);
+   break;
+#endif /* CONFIG_ALTIVEC */
default:
r = kvmppc_set_one_reg(vcpu, reg-id, val);
break;
-- 
1.7.4.1


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 3/6] KVM: PPC: Book3E: Rename IRQPRIO names to accommodate ALTIVEC

2013-06-03 Thread Mihai Caraman

Rename BOOKE_IRQPRIO_SPE_UNAVAIL and BOOKE_IRQPRIO_SPE_FP_DATA names
to accommodate ALTIVEC. Replace BOOKE_INTERRUPT_SPE_UNAVAIL and
BOOKE_INTERRUPT_SPE_FP_DATA with the common version.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
 arch/powerpc/kvm/booke.c  |   12 ++--
 arch/powerpc/kvm/booke.h  |4 ++--
 arch/powerpc/kvm/bookehv_interrupts.S |8 
 arch/powerpc/kvm/e500.c   |   10 ++
 arch/powerpc/kvm/e500_emulate.c   |8 
 5 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index d082bbc..c08b04b 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -362,8 +362,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
case BOOKE_IRQPRIO_ITLB_MISS:
case BOOKE_IRQPRIO_SYSCALL:
case BOOKE_IRQPRIO_FP_UNAVAIL:
-   case BOOKE_IRQPRIO_SPE_UNAVAIL:
-   case BOOKE_IRQPRIO_SPE_FP_DATA:
+   case BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL:
+   case BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST:
case BOOKE_IRQPRIO_SPE_FP_ROUND:
case BOOKE_IRQPRIO_AP_UNAVAIL:
allowed = 1;
@@ -940,7 +940,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
r = RESUME_GUEST;
break;
 
-   case BOOKE_INTERRUPT_SPE_UNAVAIL: {
+   case BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL: {
/*
 * The interrupt is shared, KVM support for the featured unit
 * is detected at run-time.
@@ -957,7 +957,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
 #endif
if (!handled)
kvmppc_booke_queue_irqprio(vcpu,
-   BOOKE_IRQPRIO_SPE_UNAVAIL);
+   BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL);
} else {
/* 
 * Guest wants SPE, but host kernel doesn't support it.
@@ -971,14 +971,14 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
break;
}
 
-   case BOOKE_INTERRUPT_SPE_FP_DATA:
+   case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST:
/*
 * The interrupt is shared, KVM support for the featured unit
 * is detected at run-time.
 */
if (kvmppc_supports_spe()) {
kvmppc_booke_queue_irqprio(vcpu,
-   BOOKE_IRQPRIO_SPE_FP_DATA);
+   BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST);
r = RESUME_GUEST;
} else {
/*
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index 5fd1ba6..9e92006 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -32,8 +32,8 @@
 #define BOOKE_IRQPRIO_ALIGNMENT 2
 #define BOOKE_IRQPRIO_PROGRAM 3
 #define BOOKE_IRQPRIO_FP_UNAVAIL 4
-#define BOOKE_IRQPRIO_SPE_UNAVAIL 5
-#define BOOKE_IRQPRIO_SPE_FP_DATA 6
+#define BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL 5
+#define BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST 6
 #define BOOKE_IRQPRIO_SPE_FP_ROUND 7
 #define BOOKE_IRQPRIO_SYSCALL 8
 #define BOOKE_IRQPRIO_AP_UNAVAIL 9
diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index e8ed7d6..8d35dc0 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -295,9 +295,9 @@ kvm_handler BOOKE_INTERRUPT_DTLB_MISS, EX_PARAMS_TLB, \
SPRN_SRR0, SPRN_SRR1, (NEED_EMU | NEED_DEAR | NEED_ESR)
 kvm_handler BOOKE_INTERRUPT_ITLB_MISS, EX_PARAMS_TLB, \
SPRN_SRR0, SPRN_SRR1, 0
-kvm_handler BOOKE_INTERRUPT_SPE_UNAVAIL, EX_PARAMS(GEN), \
+kvm_handler BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL, EX_PARAMS(GEN), \
SPRN_SRR0, SPRN_SRR1, 0
-kvm_handler BOOKE_INTERRUPT_SPE_FP_DATA, EX_PARAMS(GEN), \
+kvm_handler BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST, EX_PARAMS(GEN), \
SPRN_SRR0, SPRN_SRR1, 0
 kvm_handler BOOKE_INTERRUPT_SPE_FP_ROUND, EX_PARAMS(GEN), \
SPRN_SRR0, SPRN_SRR1, 0
@@ -398,8 +398,8 @@ kvm_lvl_handler BOOKE_INTERRUPT_WATCHDOG, \
 kvm_handler BOOKE_INTERRUPT_DTLB_MISS, \
SPRN_SRR0, SPRN_SRR1, (NEED_EMU | NEED_DEAR | NEED_ESR)
 kvm_handler BOOKE_INTERRUPT_ITLB_MISS, SPRN_SRR0, SPRN_SRR1, 0
-kvm_handler BOOKE_INTERRUPT_SPE_UNAVAIL, SPRN_SRR0, SPRN_SRR1, 0
-kvm_handler BOOKE_INTERRUPT_SPE_FP_DATA, SPRN_SRR0, SPRN_SRR1, 0
+kvm_handler BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL, SPRN_SRR0, SPRN_SRR1, 0
+kvm_handler BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST, SPRN_SRR0, SPRN_SRR1, 0
 kvm_handler BOOKE_INTERRUPT_SPE_FP_ROUND, SPRN_SRR0, SPRN_SRR1, 0
 kvm_handler BOOKE_INTERRUPT_PERFORMANCE_MONITOR, SPRN_SRR0, SPRN_SRR1, 0
 kvm_handler BOOKE_INTERRUPT_DOORBELL, SPRN_SRR0, SPRN_SRR1, 0
diff --git a/arch/powerpc/kvm/e500.c

[RFC PATCH 2/6] KVM: PPC: Book3E: Refactor SPE_FP exit handling

2013-06-03 Thread Mihai Caraman

SPE_FP interrupts are shared with ALTIVEC. Refactor SPE_FP exit handling
to detect KVM support for the featured unit at run-time, in order to
accommodate ALTIVEC later.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
 arch/powerpc/kvm/booke.c |   80 ++
 1 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 1020119..d082bbc 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -822,6 +822,15 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
}
 }
 
+static inline bool kvmppc_supports_spe(void)
+{
+#ifdef CONFIG_SPE
+   if (cpu_has_feature(CPU_FTR_SPE))
+   return true;
+#endif
+   return false;
+}
+
 /**
  * kvmppc_handle_exit
  *
@@ -931,42 +940,71 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
r = RESUME_GUEST;
break;
 
-#ifdef CONFIG_SPE
case BOOKE_INTERRUPT_SPE_UNAVAIL: {
-   if (vcpu-arch.shared-msr  MSR_SPE)
-   kvmppc_vcpu_enable_spe(vcpu);
-   else
-   kvmppc_booke_queue_irqprio(vcpu,
-  BOOKE_IRQPRIO_SPE_UNAVAIL);
+   /*
+* The interrupt is shared, KVM support for the featured unit
+* is detected at run-time.
+*/
+   bool handled = false;
+
+   if (kvmppc_supports_spe()) {
+#ifdef CONFIG_SPE
+   if (cpu_has_feature(CPU_FTR_SPE))
+   if (vcpu-arch.shared-msr  MSR_SPE) {
+   kvmppc_vcpu_enable_spe(vcpu);
+   handled = true;
+   }
+#endif
+   if (!handled)
+   kvmppc_booke_queue_irqprio(vcpu,
+   BOOKE_IRQPRIO_SPE_UNAVAIL);
+   } else {
+   /* 
+* Guest wants SPE, but host kernel doesn't support it.
+* Send an unimplemented operation program check to
+* the guest.
+*/
+   kvmppc_core_queue_program(vcpu, ESR_PUO | ESR_SPV);
+   }
+
r = RESUME_GUEST;
break;
}
 
case BOOKE_INTERRUPT_SPE_FP_DATA:
-   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_DATA);
-   r = RESUME_GUEST;
+   /*
+* The interrupt is shared, KVM support for the featured unit
+* is detected at run-time.
+*/
+   if (kvmppc_supports_spe()) {
+   kvmppc_booke_queue_irqprio(vcpu,
+   BOOKE_IRQPRIO_SPE_FP_DATA);
+   r = RESUME_GUEST;
+   } else {
+   /*
+* These really should never happen without CONFIG_SPE,
+* as we should never enable the real MSR[SPE] in the
+* guest.
+*/
+   printk(KERN_CRIT %s: unexpected SPE interrupt %u at \
+   %08lx\n, __func__, exit_nr, vcpu-arch.pc);
+   run-hw.hardware_exit_reason = exit_nr;
+   r = RESUME_HOST;
+   }
+
break;
 
case BOOKE_INTERRUPT_SPE_FP_ROUND:
+#ifdef CONFIG_SPE
kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_ROUND);
r = RESUME_GUEST;
break;
 #else
-   case BOOKE_INTERRUPT_SPE_UNAVAIL:
/*
-* Guest wants SPE, but host kernel doesn't support it.  Send
-* an unimplemented operation program check to the guest.
+* These really should never happen without CONFIG_SPE,
+* as we should never enable the real MSR[SPE] in the
+* guest.
 */
-   kvmppc_core_queue_program(vcpu, ESR_PUO | ESR_SPV);
-   r = RESUME_GUEST;
-   break;
-
-   /*
-* These really should never happen without CONFIG_SPE,
-* as we should never enable the real MSR[SPE] in the guest.
-*/
-   case BOOKE_INTERRUPT_SPE_FP_DATA:
-   case BOOKE_INTERRUPT_SPE_FP_ROUND:
printk(KERN_CRIT %s: unexpected SPE interrupt %u at %08lx\n,
   __func__, exit_nr, vcpu-arch.pc);
run-hw.hardware_exit_reason = exit_nr;
-- 
1.7.4.1


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 1/6] KVM: PPC: Book3E: Fix AltiVec interrupt numbers and build breakage

2013-06-03 Thread Mihai Caraman

Interrupt numbers defined for Book3E follows IVORs definition. Align
BOOKE_INTERRUPT_ALTIVEC_UNAVAIL and BOOKE_INTERRUPT_ALTIVEC_ASSIST to this
rule which also fixes the build breakage.
IVORs 32 and 33 are shared so reflect this in the interrupts naming.

Signed-off-by: Mihai Caraman mihai.cara...@freescale.com
---
 arch/powerpc/include/asm/kvm_asm.h |   16 ++--
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h 
b/arch/powerpc/include/asm/kvm_asm.h
index b9dd382..851bac7 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -54,8 +54,16 @@
 #define BOOKE_INTERRUPT_DEBUG 15
 
 /* E500 */
-#define BOOKE_INTERRUPT_SPE_UNAVAIL 32
-#define BOOKE_INTERRUPT_SPE_FP_DATA 33
+#define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
+#define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
+/*
+ * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same defines
+ */
+#define BOOKE_INTERRUPT_SPE_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
+#define BOOKE_INTERRUPT_SPE_FP_DATA BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
+#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
+#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
+   BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
 #define BOOKE_INTERRUPT_SPE_FP_ROUND 34
 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
 #define BOOKE_INTERRUPT_DOORBELL 36
@@ -67,10 +75,6 @@
 #define BOOKE_INTERRUPT_HV_SYSCALL 40
 #define BOOKE_INTERRUPT_HV_PRIV 41
 
-/* altivec */
-#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL 42
-#define BOOKE_INTERRUPT_ALTIVEC_ASSIST 43
-
 /* book3s */
 
 #define BOOK3S_INTERRUPT_SYSTEM_RESET  0x100
-- 
1.7.4.1


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

58 matches

Mail list logo