Re: A question about "CONFIG_KVM_DEVICE_ASSIGNMENT" configuration

2015-09-10 Thread Chen, Tiejun

On 9/10/2015 11:08 PM, Alex Williamson wrote:

On Thu, 2015-09-10 at 18:22 +0800, Nan Xiao wrote:

Hi all,

When building kernel, it prompts "CONFIG_KVM_DEVICE_ASSIGNMENT" is "deprecated".
But it is still used in kernel code. E.g.:
"kvm_vm_ioctl_check_extension" function:
{
...
#ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
case KVM_CAP_ASSIGN_DEV_IRQ:
case KVM_CAP_PCI_2_3:
#endif
r = 1;
break;
   ...
#ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
case KVM_CAP_IOMMU:
r = iommu_present(_bus_type);
break;
#endif
   ...
}

If not configure this option, the following code will execute failed:

ret = ioctl(dev, KVM_CHECK_EXTENSION, KVM_CAP_IOMMU);

So does it mean to use KVM assigned device feature, the
"CONFIG_KVM_DEVICE_ASSIGNMENT"
is not "deprecated"?


Legacy KVM device assignment is deprecated, it is fully replaced by


Why do we remove this legacy assignment completely? Its always leading 
to this confusion :)


Thanks
Tiejun


VFIO-based device assignment.  The intention is to deprecate legacy KVM
device assignment now, so all users can transition away from it and at
some point remove it from the kernel.  When using QEMU and libvirt, the
default is already to use VFIO instead of legacy KVM device assignment.
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [[PATCH 2/2] kvm: enable preemption to register/unregister preempt notifier

2015-07-05 Thread Chen, Tiejun

On 2015/7/3 19:23, Paolo Bonzini wrote:

On 03/07/2015 10:56, Tiejun Chen wrote:

After commit 1cde2930e154 (sched/preempt: Add static_key() to
preempt_notifiers) is introduced, preempt_notifier_{register, unregister}
always hold a mutex, jump_label_mutex. So in current case this shouldn't
work further under the circumstance of disabled preemption, and its also
safe since we're just handling a per-vcpu stuff with holding vcpu-mutex.
Otherwise, some warning messages are posted like this,

BUG: scheduling while atomic: qemu-system-x86/17177/0x0002
2 locks held by qemu-system-x86/17177:
  #0:  (vcpu-mutex){+.+.+.}, at: [a035fb48] vcpu_load+0x28/0xf0 
[kvm]
  #1:  (jump_label_mutex){+.+.+.}, at: [81244b54] 
static_key_slow_inc+0xc4/0x140
Modules linked in: x86_pkg_temp_thermal kvm_intel kvm
Preemption disabled at:[a035fd3e] kvm_vcpu_ioctl+0x7e/0xeb0 [kvm]


Thanks for your work Tiejun.  However, the original patch is crap.  I've
asked to revert it.



Yeah, its better to revert that commit since finally this also trigger a 
bug 100671: vmwrite error in vmx_vcpu_run.


Thanks
Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/12] KVM: mark kvm-buses as empty once they were destroyed

2015-03-26 Thread Chen, Tiejun

On 2015/3/27 9:31, Marcelo Tosatti wrote:

On Wed, Mar 25, 2015 at 05:09:13PM +, Marc Zyngier wrote:

On 23/03/15 15:58, Andre Przywara wrote:

In kvm_destroy_vm() we call kvm_io_bus_destroy() pretty early,
especially before calling kvm_arch_destroy_vm(). To avoid
unregistering devices from the already destroyed bus, let's mark
the bus with NULL to let other users know it has been destroyed
already.
This avoids a crash on a VM shutdown with the VGIC using the
kvm_io_bus later (the unregistering is in there to be able to roll
back a faulting init).

Signed-off-by: Andre Przywara andre.przyw...@arm.com


That seems sensible, but I don't see why nobody else hits that. What are
we doing differently?

Otherwise,

Reviewed-by: Marc Zyngier marc.zyng...@arm.com

Paolo, Marcelo, can we have your Ack on this?

Thanks,

M.


---
  virt/kvm/kvm_main.c |4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8c7ab0b..6f164eb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -604,8 +604,10 @@ static void kvm_destroy_vm(struct kvm *kvm)
list_del(kvm-vm_list);
spin_unlock(kvm_lock);
kvm_free_irq_routing(kvm);
-   for (i = 0; i  KVM_NR_BUSES; i++)
+   for (i = 0; i  KVM_NR_BUSES; i++) {
kvm_io_bus_destroy(kvm-buses[i]);
+   kvm-buses[i] = NULL;


Could we fold this line into a common like,

@@ -596,7 +597,6 @@ static void kvm_destroy_devices(struct kvm *kvm)

 static void kvm_destroy_vm(struct kvm *kvm)
 {
-   int i;
struct mm_struct *mm = kvm-mm;

kvm_arch_sync_events(kvm);
@@ -604,8 +604,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
list_del(kvm-vm_list);
spin_unlock(kvm_lock);
kvm_free_irq_routing(kvm);
-   for (i = 0; i  KVM_NR_BUSES; i++)
-   kvm_io_bus_destroy(kvm-buses[i]);
+   kvm_destroy_all_io_bus(kvm);
kvm_coalesced_mmio_free(kvm);
 #if defined(CONFIG_MMU_NOTIFIER)  defined(KVM_ARCH_WANT_MMU_NOTIFIER)
mmu_notifier_unregister(kvm-mmu_notifier, kvm-mm);
@@ -2943,6 +2942,16 @@ static void kvm_io_bus_destroy(struct kvm_io_bus 
*bus)

kfree(bus);
 }

+static void kvm_destroy_all_io_bus(struct kvm *kvm)
+{
+   int i;
+
+   for (i = 0; i  KVM_NR_BUSES; i++) {
+   kvm_io_bus_destroy(kvm-buses[i]);
+   kvm-buses[i] = NULL;
+   }
+}
+
 static inline int kvm_io_bus_cmp(const struct kvm_io_range *r1,
  const struct kvm_io_range *r2)
 {

Thanks
Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: fix to update memslots properly

2015-03-10 Thread Chen, Tiejun

On 2015/3/10 4:54, Marcelo Tosatti wrote:

On Sat, Dec 27, 2014 at 09:41:45PM +0100, Paolo Bonzini wrote:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f528343..6e52f3f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots,
WARN_ON(mslots[i].id != id);
if (!new-npages) {
new-base_gfn = 0;
+   new-flags = 0;
if (mslots[i].npages)
slots-used_slots--;
} else {


This should not be necessary.  The part of the mslots array that has
base_gfn == npages == 0 is entirely unused, and such a slot can never
be returned by search_memslots because this:

 if (gfn = memslots[slot].base_gfn 
 gfn  memslots[slot].base_gfn + memslots[slot].npages)

can never be true.


@@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots,
i++;
}
while (i  0 
-  new-base_gfn  mslots[i - 1].base_gfn) {
+  ((new-base_gfn  mslots[i - 1].base_gfn) ||
+   (!new-base_gfn 
+!mslots[i - 1].base_gfn  !mslots[i - 1].npages))) {
mslots[i] = mslots[i - 1];
slots-id_to_index[mslots[i].id] = i;
i--;



You should have explained _why_ this fixes the bug, and what invariant
is not being respected, something like this:

 kvm: fix sorting of memslots with base_gfn == 0

 Before commit 0e60b0799fed (kvm: change memslot sorting rule from size
 to GFN, 2014-12-01), the memslots' sorting key was npages, meaning
 that a valid memslot couldn't have its sorting key equal to zero.
 On the other hand, a valid memslot can have base_gfn == 0, and invalid
 memslots are identified by base_gfn == npages == 0.

 Because of this, commit 0e60b0799fed broke the invariant that invalid
 memslots are at the end of the mslots array.  When a memslot with
 base_gfn == 0 was created, any invalid memslot before it were left
 in place.

This suggests another fix.  We can change the insertion to use a =
comparison, as in your first patch.  Alone it is not correct, but we
only need to take some care and avoid breaking the case of deleting a
memslot.

It's enough to wrap the second loop (that you patched) with
if (new-npages).  In the new-npages == 0 case the first loop has
already set i to the right value, and moving i back would be wrong:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f5283438ee05..050974c051b5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -687,11 +687,23 @@ static void update_memslots(struct kvm_memslots *slots,
slots-id_to_index[mslots[i].id] = i;
i++;
}
-   while (i  0 
-  new-base_gfn  mslots[i - 1].base_gfn) {
-   mslots[i] = mslots[i - 1];
-   slots-id_to_index[mslots[i].id] = i;
-   i--;
+
+   /*
+* The = is needed when creating a slot with base_gfn == 0,
+* so that it moves before all those with base_gfn == npages == 0.
+*
+* On the other hand, if new-npages is zero, the above loop has
+* already left i pointing to the beginning of the empty part of
+* mslots, and the = would move the hole backwards in this
+* case---which is wrong.  So skip the loop when deleting a slot.
+*/
+   if (new-npages) {
+   while (i  0 
+  new-base_gfn = mslots[i - 1].base_gfn) {
+   mslots[i] = mslots[i - 1];
+   slots-id_to_index[mslots[i].id] = i;
+   i--;
+   }
}

mslots[i] = *new;

Paolo


Paolo,

Can you include a proper changelog for this patch?



But this is already applied long time ago...

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ÞATCH v2] kvmtool, mips: Support more than 256 MB guest memory

2015-01-13 Thread Chen, Tiejun

On 2015/1/13 18:19, Andreas Herrmann wrote:

Two guest memory regions need to be defined and two mem= parameters
need to be passed to guest kernel to support more than 256 MB.

Cc: Chen, Tiejun tiejun.c...@intel.com


Looks fine to me.

Thanks
Tiejun


Signed-off-by: Andreas Herrmann andreas.herrm...@caviumnetworks.com
---
  tools/kvm/mips/include/kvm/kvm-arch.h |   10 +
  tools/kvm/mips/kvm.c  |   36 +++--
  2 files changed, 40 insertions(+), 6 deletions(-)

This version uses correct macros during calculation of memory
parameters.


Andreas


diff --git a/tools/kvm/mips/include/kvm/kvm-arch.h 
b/tools/kvm/mips/include/kvm/kvm-arch.h
index 7eadbf4..97bbf34 100644
--- a/tools/kvm/mips/include/kvm/kvm-arch.h
+++ b/tools/kvm/mips/include/kvm/kvm-arch.h
@@ -1,10 +1,20 @@
  #ifndef KVM__KVM_ARCH_H
  #define KVM__KVM_ARCH_H

+
+/*
+ * Guest memory map is:
+ *   0x-0x0fff : System RAM
+ *   0x1000-0x1fff : I/O (defined by KVM_MMIO_START and KVM_MMIO_SIZE)
+ *   0x2000-...: System RAM
+ * See also kvm__init_ram().
+ */
+
  #define KVM_MMIO_START0x1000
  #define KVM_PCI_CFG_AREA  KVM_MMIO_START
  #define KVM_PCI_MMIO_AREA (KVM_MMIO_START + 0x100)
  #define KVM_VIRTIO_MMIO_AREA  (KVM_MMIO_START + 0x200)
+#define KVM_MMIO_SIZE  0x1000

  /*
   * Just for reference. This and the above corresponds to what's used
diff --git a/tools/kvm/mips/kvm.c b/tools/kvm/mips/kvm.c
index fc0428b..1925f38 100644
--- a/tools/kvm/mips/kvm.c
+++ b/tools/kvm/mips/kvm.c
@@ -22,11 +22,28 @@ void kvm__init_ram(struct kvm *kvm)
u64 phys_start, phys_size;
void*host_mem;

-   phys_start = 0;
-   phys_size  = kvm-ram_size;
-   host_mem   = kvm-ram_start;
+   if (kvm-ram_size = KVM_MMIO_START) {
+   /* one region for all memory */
+   phys_start = 0;
+   phys_size  = kvm-ram_size;
+   host_mem   = kvm-ram_start;

-   kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+   kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+   } else {
+   /* one region for memory that fits below MMIO range */
+   phys_start = 0;
+   phys_size  = KVM_MMIO_START;
+   host_mem   = kvm-ram_start;
+
+   kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+
+   /* one region for rest of memory */
+   phys_start = KVM_MMIO_START + KVM_MMIO_SIZE;
+   phys_size  = kvm-ram_size - KVM_MMIO_START;
+   host_mem   = kvm-ram_start + KVM_MMIO_START;
+
+   kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+   }
  }

  void kvm__arch_delete_ram(struct kvm *kvm)
@@ -108,8 +125,15 @@ static void kvm__mips_install_cmdline(struct kvm *kvm)
u64 argv_offset = argv_start;
u64 argc = 0;

-   sprintf(p + cmdline_offset, mem=0x%llx@0 ,
-(unsigned long long)kvm-ram_size);
+
+   if ((u64) kvm-ram_size = KVM_MMIO_START)
+   sprintf(p + cmdline_offset, mem=0x%llx@0 ,
+   (unsigned long long)kvm-ram_size);
+   else
+   sprintf(p + cmdline_offset, mem=0x%llx@0 mem=0x%llx@0x%llx ,
+   (unsigned long long)KVM_MMIO_START,
+   (unsigned long long)kvm-ram_size - KVM_MMIO_START,
+   (unsigned long long)(KVM_MMIO_START + KVM_MMIO_SIZE));

strcat(p + cmdline_offset, kvm-cfg.real_cmdline); /* maximum size is 
2K */



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ÞATCH] kvmtool, mips: Support more than 256 MB guest memory

2015-01-07 Thread Chen, Tiejun

On 2015/1/6 21:15, Andreas Herrmann wrote:

Two guest memory regions need to be defined and two mem= parameters
need to be passed to guest kernel to support more than 256 MB.

Signed-off-by: Andreas Herrmann andreas.herrm...@caviumnetworks.com
---
  tools/kvm/mips/include/kvm/kvm-arch.h |   10 +
  tools/kvm/mips/kvm.c  |   36 +++--
  2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/tools/kvm/mips/include/kvm/kvm-arch.h 
b/tools/kvm/mips/include/kvm/kvm-arch.h
index 7eadbf4..97bbf34 100644
--- a/tools/kvm/mips/include/kvm/kvm-arch.h
+++ b/tools/kvm/mips/include/kvm/kvm-arch.h
@@ -1,10 +1,20 @@
  #ifndef KVM__KVM_ARCH_H
  #define KVM__KVM_ARCH_H

+
+/*
+ * Guest memory map is:
+ *   0x-0x0fff : System RAM
+ *   0x1000-0x1fff : I/O (defined by KVM_MMIO_START and KVM_MMIO_SIZE)
+ *   0x2000-...: System RAM
+ * See also kvm__init_ram().
+ */
+
  #define KVM_MMIO_START0x1000
  #define KVM_PCI_CFG_AREA  KVM_MMIO_START
  #define KVM_PCI_MMIO_AREA (KVM_MMIO_START + 0x100)
  #define KVM_VIRTIO_MMIO_AREA  (KVM_MMIO_START + 0x200)
+#define KVM_MMIO_SIZE  0x1000

  /*
   * Just for reference. This and the above corresponds to what's used
diff --git a/tools/kvm/mips/kvm.c b/tools/kvm/mips/kvm.c
index fc0428b..10b441b 100644
--- a/tools/kvm/mips/kvm.c
+++ b/tools/kvm/mips/kvm.c
@@ -22,11 +22,28 @@ void kvm__init_ram(struct kvm *kvm)
u64 phys_start, phys_size;
void*host_mem;

-   phys_start = 0;
-   phys_size  = kvm-ram_size;
-   host_mem   = kvm-ram_start;
+   if (kvm-ram_size = KVM_MMIO_START) {
+   /* one region for all memory */
+   phys_start = 0;
+   phys_size  = kvm-ram_size;
+   host_mem   = kvm-ram_start;

-   kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+   kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+   } else {
+   /* one region for memory that fits below MMIO range */
+   phys_start = 0;
+   phys_size  = KVM_MMIO_SIZE;


Here phys_size = KVM_MMIO_START is better to make more sense.


+   host_mem   = kvm-ram_start;
+
+   kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+
+   /* one region for rest of memory */
+   phys_start = KVM_MMIO_START + KVM_MMIO_SIZE;
+   phys_size  = kvm-ram_size - 0x1000;


and, phys_size = kvm-ram_size - KVM_MMIO_START.

Tiejun


+   host_mem   = kvm-ram_start + KVM_MMIO_SIZE;
+
+   kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+   }
  }

  void kvm__arch_delete_ram(struct kvm *kvm)
@@ -108,8 +125,15 @@ static void kvm__mips_install_cmdline(struct kvm *kvm)
u64 argv_offset = argv_start;
u64 argc = 0;

-   sprintf(p + cmdline_offset, mem=0x%llx@0 ,
-(unsigned long long)kvm-ram_size);
+
+   if ((u64) kvm-ram_size = KVM_MMIO_SIZE)
+   sprintf(p + cmdline_offset, mem=0x%llx@0 ,
+   (unsigned long long)kvm-ram_size);
+   else
+   sprintf(p + cmdline_offset, mem=0x%llx@0 mem=0x%llx@0x%llx ,
+   (unsigned long long)KVM_MMIO_START,
+   (unsigned long long)kvm-ram_size - KVM_MMIO_START,
+   (unsigned long long)(KVM_MMIO_START + KVM_MMIO_SIZE));

strcat(p + cmdline_offset, kvm-cfg.real_cmdline); /* maximum size is 
2K */



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: fix to update memslots properly

2014-12-28 Thread Chen, Tiejun

On 2014/12/28 4:41, Paolo Bonzini wrote:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f528343..6e52f3f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots,
WARN_ON(mslots[i].id != id);
if (!new-npages) {
new-base_gfn = 0;
+   new-flags = 0;
if (mslots[i].npages)
slots-used_slots--;
} else {


This should not be necessary.  The part of the mslots array that has
base_gfn == npages == 0 is entirely unused, and such a slot can never
be returned by search_memslots because this:

 if (gfn = memslots[slot].base_gfn 
 gfn  memslots[slot].base_gfn + memslots[slot].npages)

can never be true.


Yeah, but its really a little ugly to see some slots, 
base_gfn:npages:falgs = 0:0:(!0), to resort again when debug something 
inside of update_memslots().





@@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots,
i++;
}
while (i  0 
-  new-base_gfn  mslots[i - 1].base_gfn) {
+  ((new-base_gfn  mslots[i - 1].base_gfn) ||
+   (!new-base_gfn 
+!mslots[i - 1].base_gfn  !mslots[i - 1].npages))) {
mslots[i] = mslots[i - 1];
slots-id_to_index[mslots[i].id] = i;
i--;



You should have explained _why_ this fixes the bug, and what invariant


Yeah.


is not being respected, something like this:

 kvm: fix sorting of memslots with base_gfn == 0

 Before commit 0e60b0799fed (kvm: change memslot sorting rule from size
 to GFN, 2014-12-01), the memslots' sorting key was npages, meaning
 that a valid memslot couldn't have its sorting key equal to zero.
 On the other hand, a valid memslot can have base_gfn == 0, and invalid
 memslots are identified by base_gfn == npages == 0.

 Because of this, commit 0e60b0799fed broke the invariant that invalid
 memslots are at the end of the mslots array.  When a memslot with
 base_gfn == 0 was created, any invalid memslot before it were left
 in place.

This suggests another fix.  We can change the insertion to use a =
comparison, as in your first patch.  Alone it is not correct, but we
only need to take some care and avoid breaking the case of deleting a
memslot.

It's enough to wrap the second loop (that you patched) with
if (new-npages).  In the new-npages == 0 case the first loop has
already set i to the right value, and moving i back would be wrong:

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f5283438ee05..050974c051b5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -687,11 +687,23 @@ static void update_memslots(struct kvm_memslots *slots,
slots-id_to_index[mslots[i].id] = i;
i++;
}
-   while (i  0 
-  new-base_gfn  mslots[i - 1].base_gfn) {
-   mslots[i] = mslots[i - 1];
-   slots-id_to_index[mslots[i].id] = i;
-   i--;
+
+   /*
+* The = is needed when creating a slot with base_gfn == 0,
+* so that it moves before all those with base_gfn == npages == 0.
+*
+* On the other hand, if new-npages is zero, the above loop has
+* already left i pointing to the beginning of the empty part of
+* mslots, and the = would move the hole backwards in this
+* case---which is wrong.  So skip the loop when deleting a slot.
+*/
+   if (new-npages) {
+   while (i  0 
+  new-base_gfn = mslots[i - 1].base_gfn) {
+   mslots[i] = mslots[i - 1];
+   slots-id_to_index[mslots[i].id] = i;
+   i--;
+   }
}

mslots[i] = *new;



This looks better.

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] kvm_irqchip_assign_irqfd: just set irqfd in case of kvm_irqfds_enabled()

2014-12-28 Thread Chen, Tiejun


On 2014/12/27 22:52, Paolo Bonzini wrote:



On 26/12/2014 18:59, Peter Maydell wrote:

Mm, but once you're into such microoptimisations as this you really
need to have a good justification for the effort, in the form of
profiling measurements that indicate that this is a hot path.
In this case that seems pretty unlikely, because I'd expect all
the systems where we care about performance will support irqfds,
so we won't be taking the early-exit code path anyhow. (And
not supporting irqfds is leaving much more performance on the
table than we could possibly be talking about in this function.)


Also, it's even possible for a compiler to figure this out.  All in all,
I don't see any advantage to this patch...



Indeed, its just a cleanup to make codes readable and comprehensible 
since oftentimes we don't initially write such a subsequent code just 
because we have this possibility to figure them out by the compiler, or 
others. And this is why I'm CCing Qemu trivial.


Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bisected] KVM in 3.19-rc1 is completely broken

2014-12-25 Thread Chen, Tiejun



On 2014/12/25 8:55, Chen, Tiejun wrote:



On 2014/12/25 1:11, Andy Lutomirski wrote:

On Wed, Dec 24, 2014 at 12:23 AM, Chen, Tiejun tiejun.c...@intel.com
wrote:

On 2014/12/24 5:29, Andy Lutomirski wrote:


On Tue, Dec 23, 2014 at 1:13 PM, Paolo Bonzini pbonz...@redhat.com
wrote:




I can reproduce it using the same steps on a Sandy Bridge laptop,
with
whatever QEMU is packaged in Fedora 21.  I attached the config.

I also submitted a virtme update for Fedora Rawhide and 21 (20 is
still building) in case it helps.  The build is here:

http://koji.fedoraproject.org/koji/buildinfo?buildID=600732



The other reporter bisected it to
0e60b0799fedc495a5c57dbd669de3c10d72edd2.  Can you try its parent?



That's what I bisected it to.  The parent works.



Also, does it break with 3.18 host and 3.19-rc1 guest, or with
3.19-rc1 host and 3.18 guest?  (Sorry I should do this myself
but I'm a bit swamped due to vacation until Jan 6th).



The breakage is with 3.17.7-something L0 and the same test kernel as
L1 and L2.  I think it breaks the same way with 3.19-rc1 as host and
guest without any nesting, but that's awkward to test right now.




Andy,

Could you try this?

Signed-off-by: Tiejun Chen tiejun.c...@intel.com


I applied it by hand, and it survives extremely light testing.

Tested-by: Andy Lutomirski l...@amacapital.net



Looks good so thanks for your validation.



I refine that I posted that fix in another thread since looks that will 
broken !next case. And I myself already run those test instructions you 
showed previously, now looks good.


Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes

2014-12-25 Thread Chen, Tiejun

On 2014/12/25 8:52, Nadav Amit wrote:

Although pop sreg updates RSP according to the operand size, only 2 bytes are
read.  The current behavior may result in incorrect #GP or #PF exceptions.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
  arch/x86/kvm/emulate.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e5a84be..702da5e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
unsigned long selector;
int rc;



Looks we just should do similar thing to em_push_sreg(),

unsigned long selector;
int rc;

+   if (ctxt-op_bytes == 4) {
+   rsp_increment(ctxt, -2);
+   ctxt-op_bytes = 2;
+   }
rc = emulate_pop(ctxt, selector, ctxt-op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;

Right?

Thanks
Tiejun


-   rc = emulate_pop(ctxt, selector, ctxt-op_bytes);
+   rc = emulate_pop(ctxt, selector, 2);
if (rc != X86EMUL_CONTINUE)
return rc;

if (ctxt-modrm_reg == VCPU_SREG_SS)
ctxt-interruptibility = KVM_X86_SHADOW_INT_MOV_SS;
+   if (ctxt-op_bytes  2)
+   rsp_increment(ctxt, ctxt-op_bytes - 2);

rc = load_segment_descriptor(ctxt, (u16)selector, seg);
return rc;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: regression bisected; KVM: entry failed, hardware error 0x80000021

2014-12-25 Thread Chen, Tiejun

On 2014/12/25 18:52, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/24 19:02, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/23 15:26, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/23 9:50, Chen, Tiejun wrote:

On 2014/12/22 17:23, Jamie Heilman wrote:

KVM internal error. Suberror: 1
emulation failure
EAX=000de494 EBX= ECX= EDX=0cfd
ESI=0059 EDI= EBP= ESP=6fb4
EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010   00c09300 DPL=0 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0010   00c09300 DPL=0 DS   [-WA]
FS =0010   00c09300 DPL=0 DS   [-WA]
GS =0010   00c09300 DPL=0 DS   [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS32-busy
GDT= 000f6be8 0037
IDT= 000f6c26 
CR0=6011 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b
5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1
b0 04 e6 21 b0 02

FWIW, I get the same thing with 34a1cd60d17 reverted.  Maybe there are
two bugs, maybe there's more to this first one.  I can repro this


So if my understanding is correct, this is probably another bug. And
especially, I already saw the same log in another thread, Cleaning up
the KVM clock. Maybe you can continue to `git bisect` to locate that
bad commit.



Looks just now Andy found that commit,
0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting rule

from size to GFN, maybe you can try to revert this to try yours again.

That doesn't revert cleanly for me, and I don't have much time to
fiddle with it until the 24th---so checked out the commit before it
(d4ae84a0), applied your patch, built, and yes, everything works fine
at that point.  I'll probably have time for another full bisection
later, assuming things aren't ironed out already by then.


3.18.0-rc3-00120-gd4ae84a0 + vmx reorder msr writes patch = OK
3.18.0-rc3-00121-g0e60b07 + vmx reorder msr writes patch = emulation failure

So that certainly points to 0e60b0799fedc495a5c57dbd669de3c10d72edd2
as well.


Could you try this to fix your last error?


Running qemu-system-x86_64 -machine pc,accel=kvm -nodefaults works,
my real (headless) kvm guests work, but this new patch makes running
qemu-system-x86_64 -machine pc,accel=kvm fail again, this time with


Are you sure? From my test based on 3.19-rc1 that it owns top commit,

aa39477b5692611b91ac9455ae588738852b3f60

just plus my previous patch, kvm: x86: vmx: reorder some msr writing

I already can execute such a command successfully,

qemu-system-x86_64 -machine pc,accel=kvm -m 2048 -smp 2 -hda ubuntu.img

And your log below seems not to relate mem_slot issue we're discussing, I
guess you need to update qemu as well.


Yes, I'm sure.


But I also found my new patch just work out Andy's next case, its really
bringing a new issue in !next case. So I tried to refine that patch again as
follows,


This latest patch (again, after fixing all the whitespace so it actually


Next time I guess I need to post that as a attached file :)


applies), does the trick.  Both
qemu-system-x86_64 -machine pc,accel=kvm and
qemu-system-x86_64 -machine pc,accel=kvm -nodefaults work for me
now without any of the aforementioned warnings from the host.


Sounds great and thanks for your test again.

Tiejun





Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
  virt/kvm/kvm_main.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f528343..910bc48 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots,
 WARN_ON(mslots[i].id != id);
 if (!new-npages) {
 new-base_gfn = 0;
+   new-flags = 0;
 if (mslots[i].npages)
 slots-used_slots--;
 } else {
@@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots,
 i++;
 }
 while (i  0 
-  new-base_gfn  mslots[i - 1].base_gfn) {
+  ((new-base_gfn  mslots[i - 1].base_gfn) ||
+(!new-base_gfn 
+ !mslots[i - 1].base_gfn  !mslots[i - 1].npages))) {
 mslots[i] = mslots[i - 1];
 slots-id_to_index[mslots[i].id] = i;
 i--;



Tiejun


errors in the host to the tune of:

[ cut here ]
WARNING: CPU: 1 PID: 3901 at arch/x86/kvm/x86.c:6575 
kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]()
Modules linked in: nfsv4 cpufreq_userspace cpufreq_stats cpufreq_powersave 
cpufreq_ondemand

Re: [PATCH 2/8] KVM: x86: pop sreg accesses only 2 bytes

2014-12-25 Thread Chen, Tiejun

On 2014/12/25 17:55, Nadav Amit wrote:

Tiejun tiejun.c...@intel.com wrote:


On 2014/12/25 8:52, Nadav Amit wrote:

Although pop sreg updates RSP according to the operand size, only 2 bytes are
read.  The current behavior may result in incorrect #GP or #PF exceptions.

Signed-off-by: Nadav Amit na...@cs.technion.ac.il
---
  arch/x86/kvm/emulate.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e5a84be..702da5e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1830,12 +1830,14 @@ static int em_pop_sreg(struct x86_emulate_ctxt *ctxt)
unsigned long selector;
int rc;


Looks we just should do similar thing to em_push_sreg(),

unsigned long selector;
int rc;

+   if (ctxt-op_bytes == 4) {
+   rsp_increment(ctxt, -2);
+   ctxt-op_bytes = 2;
+   }
rc = emulate_pop(ctxt, selector, ctxt-op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;

Right?

I don’t think so. It seems the behaviour of push and pop is a bit different.
For push: “If the source operand is a segment register (16 bits) and the
operand size is 64-bits, a zero-extended value is pushed on the stack; if
the operand size is 32-bits ... all recent Core and Atom processors perform
a 16-bit move, leaving the upper portion of the stack location unmodified.”

Therefore, for push in the case of op_bytes==8 we push zero-extended value.

For pop the behaviour is not well-documented, but experimentally it appears
only the first two bytes are accessed. I cannot see why it would be


Maybe we can comment something here, like /* Just force 2 byte 
destination to already work well in most cases. */.



different when opsize is 8, since it is not like the push case, where the
segment register value was zero extended.


Thanks for your explanation.



If you feel strongly about it, I’ll create a unit test.


Based on your description I think I can stand with you at this point.

Tiejun



Nadav



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bisected] KVM in 3.19-rc1 is completely broken

2014-12-24 Thread Chen, Tiejun

On 2014/12/24 5:29, Andy Lutomirski wrote:

On Tue, Dec 23, 2014 at 1:13 PM, Paolo Bonzini pbonz...@redhat.com wrote:



I can reproduce it using the same steps on a Sandy Bridge laptop, with
whatever QEMU is packaged in Fedora 21.  I attached the config.

I also submitted a virtme update for Fedora Rawhide and 21 (20 is
still building) in case it helps.  The build is here:

http://koji.fedoraproject.org/koji/buildinfo?buildID=600732


The other reporter bisected it to
0e60b0799fedc495a5c57dbd669de3c10d72edd2.  Can you try its parent?


That's what I bisected it to.  The parent works.



Also, does it break with 3.18 host and 3.19-rc1 guest, or with
3.19-rc1 host and 3.18 guest?  (Sorry I should do this myself
but I'm a bit swamped due to vacation until Jan 6th).



The breakage is with 3.17.7-something L0 and the same test kernel as
L1 and L2.  I think it breaks the same way with 3.19-rc1 as host and
guest without any nesting, but that's awkward to test right now.




Andy,

Could you try this?

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
 virt/kvm/kvm_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f528343..a2d928c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots,
WARN_ON(mslots[i].id != id);
if (!new-npages) {
new-base_gfn = 0;
+   new-flags = 0;
if (mslots[i].npages)
slots-used_slots--;
} else {
@@ -688,7 +689,7 @@ static void update_memslots(struct kvm_memslots *slots,
i++;
}
while (i  0 
-  new-base_gfn  mslots[i - 1].base_gfn) {
+  new-base_gfn = mslots[i - 1].base_gfn) {
mslots[i] = mslots[i - 1];
slots-id_to_index[mslots[i].id] = i;
i--;
--
1.9.1

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: regression bisected; KVM: entry failed, hardware error 0x80000021

2014-12-24 Thread Chen, Tiejun

On 2014/12/23 15:26, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/23 9:50, Chen, Tiejun wrote:

On 2014/12/22 17:23, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/21 20:46, Jamie Heilman wrote:

With v3.19-rc1 when I run qemu-system-x86_64 -machine pc,accel=kvm I
get:

KVM: entry failed, hardware error 0x8021


Looks some MSR writing issues such a failed entry.


If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an
invalid
state for Intel VT. For example, the guest maybe running in big real
mode
which is not supported on less recent Intel processors.

EAX= EBX= ECX= EDX=0663
ESI= EDI= EBP= ESP=
EIP=e05b EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   9300
CS =f000 000f  9b00
SS =   9300
DS =   9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00
GDT=  
IDT=  
CR0=6010 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=


And I don't see any obvious wrong as well. Any valuable info from dmesg?


With the simple qemu command above, on 3.18.1 I see:

kern.info: kvm: zapping shadow pages for mmio generation wraparound

when I fire up a full guest that's actually useful I get:

kern.info: kvm: zapping shadow pages for mmio generation wraparound
kern.err: kvm [4073]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x

On 3.18.0-rc3-00042-g34a1cd6 nothing appears in the dmesg, just the
message I mention above to stderr.  Same thing with a stock
3.19.0-rc1.  Once I apply your patch the simple test command produces
the same zapping shadow pages messages as 3.18.1, and a test guest of
a Debian Jessie image (w/stock distro kernel) produces the same thing
with disabled perfctr wrmsr message.  However, it doesn't look like


Sorry I'm not sure if I understood current status. Looks 3.19-rc1  my
patch just fix that error above,

KVM: entry failed, hardware error 0x8021
...

Right?


I'm entirely out of the woods, because one of my other guest VMs with a
custom kernel that works great under 3.18.1 now fails to run.  Nothing
in dmesg, but here's the stderr:


But even you revert 34a1cd60d17 or just apply my patch, something else
introduced between 3.18.1 and 3.19-rc1 led this error below, right?



KVM internal error. Suberror: 1
emulation failure
EAX=000de494 EBX= ECX= EDX=0cfd
ESI=0059 EDI= EBP= ESP=6fb4
EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010   00c09300 DPL=0 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0010   00c09300 DPL=0 DS   [-WA]
FS =0010   00c09300 DPL=0 DS   [-WA]
GS =0010   00c09300 DPL=0 DS   [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS32-busy
GDT= 000f6be8 0037
IDT= 000f6c26 
CR0=6011 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b
5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1
b0 04 e6 21 b0 02

FWIW, I get the same thing with 34a1cd60d17 reverted.  Maybe there are
two bugs, maybe there's more to this first one.  I can repro this


So if my understanding is correct, this is probably another bug. And
especially, I already saw the same log in another thread, Cleaning up
the KVM clock. Maybe you can continue to `git bisect` to locate that
bad commit.



Looks just now Andy found that commit,
0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting rule
from size to GFN, maybe you can try to revert this to try yours again.


That doesn't revert cleanly for me, and I don't have much time to
fiddle with it until the 24th---so checked out the commit before it
(d4ae84a0), applied your patch, built, and yes, everything works fine
at that point.  I'll probably have time for another full bisection
later, assuming things aren't ironed out already by then.


Could you try this to fix your last error?

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
 virt/kvm/kvm_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f528343..a2d928c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots,
WARN_ON

Re: [bisected] KVM in 3.19-rc1 is completely broken

2014-12-24 Thread Chen, Tiejun



On 2014/12/25 1:11, Andy Lutomirski wrote:

On Wed, Dec 24, 2014 at 12:23 AM, Chen, Tiejun tiejun.c...@intel.com wrote:

On 2014/12/24 5:29, Andy Lutomirski wrote:


On Tue, Dec 23, 2014 at 1:13 PM, Paolo Bonzini pbonz...@redhat.com
wrote:




I can reproduce it using the same steps on a Sandy Bridge laptop, with
whatever QEMU is packaged in Fedora 21.  I attached the config.

I also submitted a virtme update for Fedora Rawhide and 21 (20 is
still building) in case it helps.  The build is here:

http://koji.fedoraproject.org/koji/buildinfo?buildID=600732



The other reporter bisected it to
0e60b0799fedc495a5c57dbd669de3c10d72edd2.  Can you try its parent?



That's what I bisected it to.  The parent works.



Also, does it break with 3.18 host and 3.19-rc1 guest, or with
3.19-rc1 host and 3.18 guest?  (Sorry I should do this myself
but I'm a bit swamped due to vacation until Jan 6th).



The breakage is with 3.17.7-something L0 and the same test kernel as
L1 and L2.  I think it breaks the same way with 3.19-rc1 as host and
guest without any nesting, but that's awkward to test right now.




Andy,

Could you try this?

Signed-off-by: Tiejun Chen tiejun.c...@intel.com


I applied it by hand, and it survives extremely light testing.

Tested-by: Andy Lutomirski l...@amacapital.net



Looks good so thanks for your validation.

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: regression bisected; KVM: entry failed, hardware error 0x80000021

2014-12-24 Thread Chen, Tiejun

On 2014/12/24 19:02, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/23 15:26, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/23 9:50, Chen, Tiejun wrote:

On 2014/12/22 17:23, Jamie Heilman wrote:

KVM internal error. Suberror: 1
emulation failure
EAX=000de494 EBX= ECX= EDX=0cfd
ESI=0059 EDI= EBP= ESP=6fb4
EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010   00c09300 DPL=0 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0010   00c09300 DPL=0 DS   [-WA]
FS =0010   00c09300 DPL=0 DS   [-WA]
GS =0010   00c09300 DPL=0 DS   [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS32-busy
GDT= 000f6be8 0037
IDT= 000f6c26 
CR0=6011 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b
5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1
b0 04 e6 21 b0 02

FWIW, I get the same thing with 34a1cd60d17 reverted.  Maybe there are
two bugs, maybe there's more to this first one.  I can repro this


So if my understanding is correct, this is probably another bug. And
especially, I already saw the same log in another thread, Cleaning up
the KVM clock. Maybe you can continue to `git bisect` to locate that
bad commit.



Looks just now Andy found that commit,
0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting rule

from size to GFN, maybe you can try to revert this to try yours again.

That doesn't revert cleanly for me, and I don't have much time to
fiddle with it until the 24th---so checked out the commit before it
(d4ae84a0), applied your patch, built, and yes, everything works fine
at that point.  I'll probably have time for another full bisection
later, assuming things aren't ironed out already by then.


3.18.0-rc3-00120-gd4ae84a0 + vmx reorder msr writes patch = OK
3.18.0-rc3-00121-g0e60b07 + vmx reorder msr writes patch = emulation failure

So that certainly points to 0e60b0799fedc495a5c57dbd669de3c10d72edd2
as well.


Could you try this to fix your last error?


Running qemu-system-x86_64 -machine pc,accel=kvm -nodefaults works,
my real (headless) kvm guests work, but this new patch makes running
qemu-system-x86_64 -machine pc,accel=kvm fail again, this time with


Are you sure? From my test based on 3.19-rc1 that it owns top commit,

aa39477b5692611b91ac9455ae588738852b3f60

just plus my previous patch, kvm: x86: vmx: reorder some msr writing

I already can execute such a command successfully,

qemu-system-x86_64 -machine pc,accel=kvm -m 2048 -smp 2 -hda ubuntu.img

And your log below seems not to relate mem_slot issue we're discussing, 
I guess you need to update qemu as well.


But I also found my new patch just work out Andy's next case, its really 
bringing a new issue in !next case. So I tried to refine that patch 
again as follows,


Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
 virt/kvm/kvm_main.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f528343..910bc48 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -672,6 +672,7 @@ static void update_memslots(struct kvm_memslots *slots,
WARN_ON(mslots[i].id != id);
if (!new-npages) {
new-base_gfn = 0;
+   new-flags = 0;
if (mslots[i].npages)
slots-used_slots--;
} else {
@@ -688,7 +689,9 @@ static void update_memslots(struct kvm_memslots *slots,
i++;
}
while (i  0 
-  new-base_gfn  mslots[i - 1].base_gfn) {
+  ((new-base_gfn  mslots[i - 1].base_gfn) ||
+(!new-base_gfn 
+ !mslots[i - 1].base_gfn  !mslots[i - 1].npages))) {
mslots[i] = mslots[i - 1];
slots-id_to_index[mslots[i].id] = i;
i--;



Tiejun


errors in the host to the tune of:

[ cut here ]
WARNING: CPU: 1 PID: 3901 at arch/x86/kvm/x86.c:6575 
kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]()
Modules linked in: nfsv4 cpufreq_userspace cpufreq_stats cpufreq_powersave 
cpufreq_ondemand cpufreq_conservative autofs4 fan nfsd auth_rpcgss nfs lockd 
grace fscache sunrpc bridge stp llc vhost_net tun vhost macvtap macvlan fuse 
cbc dm_crypt usb_storage snd_hda_codec_analog snd_hda_codec_generic kvm_intel 
kvm tg3 ptp pps_core sr_mod snd_hda_intel snd_hda_controller snd_hda_codec 
snd_hwdep snd_pcm snd_timer snd sg dcdbas cdrom psmouse soundcore floppy evdev 
xfs dm_mod raid1 md_mod
CPU: 1 PID: 3901 Comm: qemu-system-x86 Not tainted 
3.19.0-rc1-00011-g53262d1-dirty #1
Hardware

Re: regression bisected; KVM: entry failed, hardware error 0x80000021

2014-12-24 Thread Chen, Tiejun

On 2014/12/24 19:11, Paolo Bonzini wrote:



On 24/12/2014 12:02, Jamie Heilman wrote:

Running qemu-system-x86_64 -machine pc,accel=kvm -nodefaults works,
my real (headless) kvm guests work, but this new patch makes running
qemu-system-x86_64 -machine pc,accel=kvm fail again, this time with
errors in the host to the tune of:

[ cut here ]
WARNING: CPU: 1 PID: 3901 at arch/x86/kvm/x86.c:6575 
kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]()
Modules linked in: nfsv4 cpufreq_userspace cpufreq_stats cpufreq_powersave 
cpufreq_ondemand cpufreq_conservative autofs4 fan nfsd auth_rpcgss nfs lockd 
grace fscache sunrpc bridge stp llc vhost_net tun vhost macvtap macvlan fuse 
cbc dm_crypt usb_storage snd_hda_codec_analog snd_hda_codec_generic kvm_intel 
kvm tg3 ptp pps_core sr_mod snd_hda_intel snd_hda_controller snd_hda_codec 
snd_hwdep snd_pcm snd_timer snd sg dcdbas cdrom psmouse soundcore floppy evdev 
xfs dm_mod raid1 md_mod
CPU: 1 PID: 3901 Comm: qemu-system-x86 Not tainted 
3.19.0-rc1-00011-g53262d1-dirty #1
Hardware name: Dell Inc. Precision WorkStation T3400  /0TP412, BIOS A14 
04/30/2012
   7e052328 8800c25ffcf8 813defbe
    8800c25ffd38 8103b517
  8800c25ffd28 a019bdec 8800caf1d000 8800c2774800
Call Trace:
  [813defbe] dump_stack+0x4c/0x6e
  [8103b517] warn_slowpath_common+0x97/0xb1
  [a019bdec] ? kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]
  [8103b60b] warn_slowpath_null+0x15/0x17
  [a019bdec] kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]
  [a02308b9] ? vmcs_load+0x20/0x62 [kvm_intel]
  [a0231e03] ? vmx_vcpu_load+0x140/0x16a [kvm_intel]
  [a0196ba3] ? kvm_arch_vcpu_load+0x15c/0x161 [kvm]
  [a018d8b1] kvm_vcpu_ioctl+0x189/0x4bd [kvm]
  [8104647a] ? do_sigtimedwait+0x12f/0x189
  [810ea316] do_vfs_ioctl+0x370/0x436
  [810f24f2] ? __fget+0x67/0x72
  [810ea41b] SyS_ioctl+0x3f/0x5e
  [813e34d2] system_call_fastpath+0x12/0x17
---[ end trace 46abac932fb3b4a1 ]---
[ cut here ]
WARNING: CPU: 1 PID: 3901 at arch/x86/kvm/x86.c:6575 
kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]()
Modules linked in: nfsv4 cpufreq_userspace cpufreq_stats cpufreq_powersave 
cpufreq_ondemand cpufreq_conservative autofs4 fan nfsd auth_rpcgss nfs lockd 
grace fscache sunrpc bridge stp llc vhost_net tun vhost macvtap macvlan fuse 
cbc dm_crypt usb_storage snd_hda_codec_analog snd_hda_codec_generic kvm_intel 
kvm tg3 ptp pps_core sr_mod snd_hda_intel snd_hda_controller snd_hda_codec 
snd_hwdep snd_pcm snd_timer snd sg dcdbas cdrom psmouse soundcore floppy evdev 
xfs dm_mod raid1 md_mod
CPU: 1 PID: 3901 Comm: qemu-system-x86 Tainted: GW  
3.19.0-rc1-00011-g53262d1-dirty #1
Hardware name: Dell Inc. Precision WorkStation T3400  /0TP412, BIOS A14 
04/30/2012
   7e052328 8800c25ffcf8 813defbe
    8800c25ffd38 8103b517
  8800c25ffd28 a019bdec 8800caf1d000 8800c2774800
Call Trace:
  [813defbe] dump_stack+0x4c/0x6e
  [8103b517] warn_slowpath_common+0x97/0xb1
  [a019bdec] ? kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]
  [8103b60b] warn_slowpath_null+0x15/0x17
  [a019bdec] kvm_arch_vcpu_ioctl_run+0xd63/0xe5b [kvm]
  [a02308b9] ? vmcs_load+0x20/0x62 [kvm_intel]
  [a0231e03] ? vmx_vcpu_load+0x140/0x16a [kvm_intel]
  [a0196ba3] ? kvm_arch_vcpu_load+0x15c/0x161 [kvm]
  [a018d8b1] kvm_vcpu_ioctl+0x189/0x4bd [kvm]
  [8104647a] ? do_sigtimedwait+0x12f/0x189
  [810ea316] do_vfs_ioctl+0x370/0x436
  [810f24f2] ? __fget+0x67/0x72
  [810ea41b] SyS_ioctl+0x3f/0x5e
  [813e34d2] system_call_fastpath+0x12/0x17
---[ end trace 46abac932fb3b4a2 ]---

over and over and over ad nauseum, or until I kill the qemu command,
it also eats a core's worth of cpu.


Such a message above seems to be out of our mem_slot issue, I'm not 100% 
sure but actually I can run this case,


qemu-system-x86_64 -machine pc,accel=kvm -m 2048 -smp 2 -hda ubuntu.img

Just one patch, kvm: x86: vmx: reorder some msr writing, is needed 
here. So I guess you guy can try to pull your 3.19-rc1 + that patch, and 
also pull qemu.




Yeah, I'm fairly sure that the second hunk of Tiejun's patch is not
correct, but he's on the right track.  I hope to post a fix today, else


Yeah, looks that will broken !next case then I regenerate that again 
post into another email. Now at lease I myself can run Andy's next case 
and a normal case, qemu-system-x86_64 -machine pc,accel=kvm, at the 
same time. But if I'm missing something please correct me directly :)


Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [bisected] KVM in 3.19-rc1 is completely broken

2014-12-23 Thread Chen, Tiejun

On 2014/12/23 9:38, Andy Lutomirski wrote:

I can't get it to work at all.  It fails with:



How can we reproduce this? I just try the following command,

qemu-system-x86_64 -machine pc,accel=kvm -m 2048 -smp 2 -hda ubuntu.img

but at least I can boot that successfully and already live for a while. 
Here I installed ubuntu14.04 as a guest, and


Linux commit: aa39477b5692611b91ac9455ae588738852b3f60
Qemu commit: 7e58e2ac7778cca3234c33387e49577bb7732714

So any specific Kconfigs need to be enabled?

Tiejun


KVM internal error. Suberror: 1
emulation failure
EAX=000dee58 EBX= ECX= EDX=0cfd
ESI=0059 EDI= EBP= ESP=6fc4
EIP=000f17f4 EFL=00010012 [A--] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010   00c09300 DPL=0 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0010   00c09300 DPL=0 DS   [-WA]
FS =0010   00c09300 DPL=0 DS   [-WA]
GS =0010   00c09300 DPL=0 DS   [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS32-busy
GDT= 000f6c58 0037
IDT= 000f6c96 
CR0=6011 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=e8 75 fc ff ff 89 f2 a8 10 89 d8 75 0a b9 74 17 ff ff ff d1 5b
5e c3 5b 5e e9 76 ff ff ff 57 56 53 8b 35 38 65 0f 00 85 f6 0f 88 be
00 00 00 0f b7 f6

The problem started with:

commit 0e60b0799fedc495a5c57dbd669de3c10d72edd2
Author: Igor Mammedov imamm...@redhat.com
Date:   Mon Dec 1 17:29:26 2014 +

 kvm: change memslot sorting rule from size to GFN

 it will allow to use binary search for GFN - memslot
 lookups, reducing lookup cost with large slots amount.

 Signed-off-by: Igor Mammedov imamm...@redhat.com
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com


--Andy
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: x86: remove vmx_vm_has_apicv() outside of hwapic_isr_update()

2014-12-22 Thread Chen, Tiejun

On 2014/12/19 19:12, Paolo Bonzini wrote:



On 19/12/2014 03:32, Chen, Tiejun wrote:


Are you saying something below?

if (enable_apicv)
 ...
else {
 kvm_x86_ops-hwapic_irr_update = NULL;


Yes.


But this means we have to revise hadware_setup() to get 'kvm' inside,


This would not even be possible, since hardware_setup() is only called once.


Yeah.



However, for the only caller of hwapic_isr_update (but presumably all of
them, as is the case for hwapic_irr_update), you already know that


There are two other callers to hwapic_isr_update(), apic_set_isr() and 
apic_clear_isr(), but I think they still work here.



irqchip_in_kernel(kvm) is true.  You are in kvm_apic_post_state_restore,
which takes a kvm_lapic_state, and no lapic exists if
!irqchip_in_kernel(kvm).

(Yes, irqchip_in_kernel) is a bit weird and tests pic_irqchip(kvm)
instead, but it's the same.  It tests pic_irqchip(kvm) only because the
LAPIC is per-cpu and irqchip_in_kernel takes a struct kvm).

So it's possible to NULL out hwapic_isr_update in hardware_setup.  It
simply shouldn't happen that you call hwapic_isr_update without the
in-kernel irqchip.  The kernel knows nothing about ISR/IRR without the
in-kernel irqchip.


Thanks for your kind elaboration which always benefits me.

What about this revision as follows?

 kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv

In most cases calling hwapic_isr_update(), we always check if
kvm_apic_vid_enabled() == 1, but actually,
kvm_apic_vid_enabled()
- kvm_x86_ops-vm_has_apicv()
- vmx_vm_has_apicv() or '0' in svm case
- return enable_apicv  irqchip_in_kernel(kvm)

So its a little cost to recall vmx_vm_has_apicv() inside
hwapic_isr_update(), here just NULL out hwapic_isr_update() in
case of !enable_apicv inside hardware_setup() then make all
related stuffs follow this. Note we don't check this under that
condition of irqchip_in_kernel() since we should make sure
definitely any caller don't work  without in-kernel irqchip.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
 arch/x86/kvm/lapic.c | 7 ---
 arch/x86/kvm/vmx.c   | 4 +---
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4f0c0b9..eb4cd5e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -402,7 +402,7 @@ static inline void apic_set_isr(int vec, struct 
kvm_lapic *apic)

 * because the processor can modify ISR under the hood.  Instead
 * just set SVI.
 */
-   if (unlikely(kvm_apic_vid_enabled(vcpu-kvm)))
+   if (unlikely(kvm_x86_ops-hwapic_isr_update))
kvm_x86_ops-hwapic_isr_update(vcpu-kvm, vec);
else {
++apic-isr_count;
@@ -450,7 +450,7 @@ static inline void apic_clear_isr(int vec, struct 
kvm_lapic *apic)

 * on the other hand isr_count and highest_isr_cache are unused
 * and must be left alone.
 */
-   if (unlikely(kvm_apic_vid_enabled(vcpu-kvm)))
+   if (unlikely(kvm_x86_ops-hwapic_isr_update))
kvm_x86_ops-hwapic_isr_update(vcpu-kvm,

apic_find_highest_isr(apic));
else {
@@ -1742,7 +1742,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu 
*vcpu,

if (kvm_x86_ops-hwapic_irr_update)
kvm_x86_ops-hwapic_irr_update(vcpu,
apic_find_highest_irr(apic));
-   kvm_x86_ops-hwapic_isr_update(vcpu-kvm, 
apic_find_highest_isr(apic));

+   if (kvm_x86_ops-hwapic_isr_update)
+   kvm_x86_ops-hwapic_isr_update(vcpu-kvm, 
apic_find_highest_isr(apic));

kvm_make_request(KVM_REQ_EVENT, vcpu);
kvm_rtc_eoi_tracking_restore_one(vcpu);
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 96c84a8..e378dff 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5895,6 +5895,7 @@ static __init int hardware_setup(void)
kvm_x86_ops-update_cr8_intercept = NULL;
else {
kvm_x86_ops-hwapic_irr_update = NULL;
+   kvm_x86_ops-hwapic_isr_update = NULL;
kvm_x86_ops-deliver_posted_interrupt = NULL;
kvm_x86_ops-sync_pir_to_irr = vmx_sync_pir_to_irr_dummy;
}
@@ -7471,9 +7472,6 @@ static void vmx_hwapic_isr_update(struct kvm *kvm, 
int isr)

u16 status;
u8 old;

-   if (!vmx_vm_has_apicv(kvm))
-   return;
-
if (isr == -1)
isr = 0;

--
1.9.1

I can send out as a patch if we have on any objections.

Thanks
Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: regression bisected; KVM: entry failed, hardware error 0x80000021

2014-12-22 Thread Chen, Tiejun

On 2014/12/22 17:23, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/21 20:46, Jamie Heilman wrote:

With v3.19-rc1 when I run qemu-system-x86_64 -machine pc,accel=kvm I
get:

KVM: entry failed, hardware error 0x8021


Looks some MSR writing issues such a failed entry.


If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

EAX= EBX= ECX= EDX=0663
ESI= EDI= EBP= ESP=
EIP=e05b EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   9300
CS =f000 000f  9b00
SS =   9300
DS =   9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00
GDT=  
IDT=  
CR0=6010 CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=


And I don't see any obvious wrong as well. Any valuable info from dmesg?


With the simple qemu command above, on 3.18.1 I see:

kern.info: kvm: zapping shadow pages for mmio generation wraparound

when I fire up a full guest that's actually useful I get:

kern.info: kvm: zapping shadow pages for mmio generation wraparound
kern.err: kvm [4073]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x

On 3.18.0-rc3-00042-g34a1cd6 nothing appears in the dmesg, just the
message I mention above to stderr.  Same thing with a stock
3.19.0-rc1.  Once I apply your patch the simple test command produces
the same zapping shadow pages messages as 3.18.1, and a test guest of
a Debian Jessie image (w/stock distro kernel) produces the same thing
with disabled perfctr wrmsr message.  However, it doesn't look like


Sorry I'm not sure if I understood current status. Looks 3.19-rc1  my 
patch just fix that error above,


KVM: entry failed, hardware error 0x8021
...

Right?


I'm entirely out of the woods, because one of my other guest VMs with a
custom kernel that works great under 3.18.1 now fails to run.  Nothing
in dmesg, but here's the stderr:


But even you revert 34a1cd60d17 or just apply my patch, something else 
introduced between 3.18.1 and 3.19-rc1 led this error below, right?




KVM internal error. Suberror: 1
emulation failure
EAX=000de494 EBX= ECX= EDX=0cfd
ESI=0059 EDI= EBP= ESP=6fb4
EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010   00c09300 DPL=0 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0010   00c09300 DPL=0 DS   [-WA]
FS =0010   00c09300 DPL=0 DS   [-WA]
GS =0010   00c09300 DPL=0 DS   [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS32-busy
GDT= 000f6be8 0037
IDT= 000f6c26 
CR0=6011 CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b 5e c3 5b 
5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1 b0 04 e6 21 b0 02

FWIW, I get the same thing with 34a1cd60d17 reverted.  Maybe there are
two bugs, maybe there's more to this first one.  I can repro this


So if my understanding is correct, this is probably another bug. And 
especially, I already saw the same log in another thread, Cleaning up 
the KVM clock. Maybe you can continue to `git bisect` to locate that 
bad commit.


Thanks
Tiejun


error with the command: qemu-system-x86_64 -machine pc,accel=kvm -nodefaults


This is with QEMU emulator version 2.1.2 (Debian 1:2.1+dfsg-11),
Copyright (c) 2003-2008 Fabrice Bellard

The host system is:

cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Duo CPU E8400  @ 3.00GHz
stepping: 10
microcode   : 0xa0b
...
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor 
ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm 
tpr_shadow vnmi flexpriority

I bisected this back to:

commit 34a1cd60d17f62c1f077c1478a6c2ca8c3d17af4
Author: Tiejun Chen tiejun.c...@intel.com
Date:   Tue Oct 28 10:14:48 2014 +0800

 kvm: x86: vmx: move some vmx setting from vmx_init() to hardware_setup()

 Instead of vmx_init(), actually it would make reasonable sense

Re: regression bisected; KVM: entry failed, hardware error 0x80000021

2014-12-22 Thread Chen, Tiejun

On 2014/12/23 9:50, Chen, Tiejun wrote:

On 2014/12/22 17:23, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/21 20:46, Jamie Heilman wrote:

With v3.19-rc1 when I run qemu-system-x86_64 -machine pc,accel=kvm I
get:

KVM: entry failed, hardware error 0x8021


Looks some MSR writing issues such a failed entry.


If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an
invalid
state for Intel VT. For example, the guest maybe running in big real
mode
which is not supported on less recent Intel processors.

EAX= EBX= ECX= EDX=0663
ESI= EDI= EBP= ESP=
EIP=e05b EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   9300
CS =f000 000f  9b00
SS =   9300
DS =   9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00
GDT=  
IDT=  
CR0=6010 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=


And I don't see any obvious wrong as well. Any valuable info from dmesg?


With the simple qemu command above, on 3.18.1 I see:

kern.info: kvm: zapping shadow pages for mmio generation wraparound

when I fire up a full guest that's actually useful I get:

kern.info: kvm: zapping shadow pages for mmio generation wraparound
kern.err: kvm [4073]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x

On 3.18.0-rc3-00042-g34a1cd6 nothing appears in the dmesg, just the
message I mention above to stderr.  Same thing with a stock
3.19.0-rc1.  Once I apply your patch the simple test command produces
the same zapping shadow pages messages as 3.18.1, and a test guest of
a Debian Jessie image (w/stock distro kernel) produces the same thing
with disabled perfctr wrmsr message.  However, it doesn't look like


Sorry I'm not sure if I understood current status. Looks 3.19-rc1  my
patch just fix that error above,

KVM: entry failed, hardware error 0x8021
...

Right?


I'm entirely out of the woods, because one of my other guest VMs with a
custom kernel that works great under 3.18.1 now fails to run.  Nothing
in dmesg, but here's the stderr:


But even you revert 34a1cd60d17 or just apply my patch, something else
introduced between 3.18.1 and 3.19-rc1 led this error below, right?



KVM internal error. Suberror: 1
emulation failure
EAX=000de494 EBX= ECX= EDX=0cfd
ESI=0059 EDI= EBP= ESP=6fb4
EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010   00c09300 DPL=0 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0010   00c09300 DPL=0 DS   [-WA]
FS =0010   00c09300 DPL=0 DS   [-WA]
GS =0010   00c09300 DPL=0 DS   [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS32-busy
GDT= 000f6be8 0037
IDT= 000f6c26 
CR0=6011 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b
5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1
b0 04 e6 21 b0 02

FWIW, I get the same thing with 34a1cd60d17 reverted.  Maybe there are
two bugs, maybe there's more to this first one.  I can repro this


So if my understanding is correct, this is probably another bug. And
especially, I already saw the same log in another thread, Cleaning up
the KVM clock. Maybe you can continue to `git bisect` to locate that
bad commit.



Looks just now Andy found that commit, 
0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting 
rule from size to GFN, maybe you can try to revert this to try yours again.


Thanks
Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: regression bisected; KVM: entry failed, hardware error 0x80000021

2014-12-22 Thread Chen, Tiejun

On 2014/12/23 15:26, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/23 9:50, Chen, Tiejun wrote:

On 2014/12/22 17:23, Jamie Heilman wrote:

Chen, Tiejun wrote:

On 2014/12/21 20:46, Jamie Heilman wrote:

With v3.19-rc1 when I run qemu-system-x86_64 -machine pc,accel=kvm I
get:

KVM: entry failed, hardware error 0x8021


Looks some MSR writing issues such a failed entry.


If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an
invalid
state for Intel VT. For example, the guest maybe running in big real
mode
which is not supported on less recent Intel processors.

EAX= EBX= ECX= EDX=0663
ESI= EDI= EBP= ESP=
EIP=e05b EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   9300
CS =f000 000f  9b00
SS =   9300
DS =   9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00
GDT=  
IDT=  
CR0=6010 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=


And I don't see any obvious wrong as well. Any valuable info from dmesg?


With the simple qemu command above, on 3.18.1 I see:

kern.info: kvm: zapping shadow pages for mmio generation wraparound

when I fire up a full guest that's actually useful I get:

kern.info: kvm: zapping shadow pages for mmio generation wraparound
kern.err: kvm [4073]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0x

On 3.18.0-rc3-00042-g34a1cd6 nothing appears in the dmesg, just the
message I mention above to stderr.  Same thing with a stock
3.19.0-rc1.  Once I apply your patch the simple test command produces
the same zapping shadow pages messages as 3.18.1, and a test guest of
a Debian Jessie image (w/stock distro kernel) produces the same thing
with disabled perfctr wrmsr message.  However, it doesn't look like


Sorry I'm not sure if I understood current status. Looks 3.19-rc1  my
patch just fix that error above,

KVM: entry failed, hardware error 0x8021
...

Right?


I'm entirely out of the woods, because one of my other guest VMs with a
custom kernel that works great under 3.18.1 now fails to run.  Nothing
in dmesg, but here's the stderr:


But even you revert 34a1cd60d17 or just apply my patch, something else
introduced between 3.18.1 and 3.19-rc1 led this error below, right?



KVM internal error. Suberror: 1
emulation failure
EAX=000de494 EBX= ECX= EDX=0cfd
ESI=0059 EDI= EBP= ESP=6fb4
EIP=000f15c1 EFL=00010016 [AP-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010   00c09300 DPL=0 DS   [-WA]
CS =0008   00c09b00 DPL=0 CS32 [-RA]
SS =0010   00c09300 DPL=0 DS   [-WA]
DS =0010   00c09300 DPL=0 DS   [-WA]
FS =0010   00c09300 DPL=0 DS   [-WA]
GS =0010   00c09300 DPL=0 DS   [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS32-busy
GDT= 000f6be8 0037
IDT= 000f6c26 
CR0=6011 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=e8 ae fc ff ff 89 f2 a8 10 89 d8 75 0a b9 41 15 ff ff ff d1 5b
5e c3 5b 5e e9 76 ff ff ff b0 11 e6 20 e6 a0 b0 08 e6 21 b0 70 e6 a1
b0 04 e6 21 b0 02

FWIW, I get the same thing with 34a1cd60d17 reverted.  Maybe there are
two bugs, maybe there's more to this first one.  I can repro this


So if my understanding is correct, this is probably another bug. And
especially, I already saw the same log in another thread, Cleaning up
the KVM clock. Maybe you can continue to `git bisect` to locate that
bad commit.



Looks just now Andy found that commit,
0e60b0799fedc495a5c57dbd669de3c10d72edd2 kvm: change memslot sorting rule
from size to GFN, maybe you can try to revert this to try yours again.


That doesn't revert cleanly for me, and I don't have much time to


Yeah, I guess all associated commits should be reverted gradually.


fiddle with it until the 24th---so checked out the commit before it
(d4ae84a0), applied your patch, built, and yes, everything works fine


Thanks for your test.

I think I can submit this patch to fix one of yours problems and I'd 
like to add you as Reported-by  Tested-by.


Then we can step into another issue. And I'm trying to fetch 3.19-rc1 
(because I'm always working on kvm/next.) to take a look at that but 
maybe Paolo is already going on that.


Tiejun


at that point.  I'll probably have time for another full bisection
later, assuming things aren't ironed out already

Re: regression bisected; KVM: entry failed, hardware error 0x80000021

2014-12-21 Thread Chen, Tiejun

On 2014/12/21 20:46, Jamie Heilman wrote:

With v3.19-rc1 when I run qemu-system-x86_64 -machine pc,accel=kvm I
get:

KVM: entry failed, hardware error 0x8021


Looks some MSR writing issues such a failed entry.



If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

EAX= EBX= ECX= EDX=0663
ESI= EDI= EBP= ESP=
EIP=e05b EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   9300
CS =f000 000f  9b00
SS =   9300
DS =   9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00
GDT=  
IDT=  
CR0=6010 CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=


And I don't see any obvious wrong as well. Any valuable info from dmesg?


Code=85 00 87 00 89 00 8b 00 00 00 86 00 88 00 8a 00 8c 00 00 90 2e 66 83 3e 
30 6c 00 0f 85 e7 f2 31 c0 8e d0 66 bc 00 70 00 00 66 ba 31 2e 0f 00 e9 45 f1

This is with QEMU emulator version 2.1.2 (Debian 1:2.1+dfsg-11),
Copyright (c) 2003-2008 Fabrice Bellard

The host system is:

cpu family  : 6
model   : 23
model name  : Intel(R) Core(TM)2 Duo CPU E8400  @ 3.00GHz
stepping: 10
microcode   : 0xa0b
...
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor 
ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm 
tpr_shadow vnmi flexpriority

I bisected this back to:

commit 34a1cd60d17f62c1f077c1478a6c2ca8c3d17af4
Author: Tiejun Chen tiejun.c...@intel.com
Date:   Tue Oct 28 10:14:48 2014 +0800

 kvm: x86: vmx: move some vmx setting from vmx_init() to hardware_setup()

 Instead of vmx_init(), actually it would make reasonable sense to do
 anything specific to vmx hardware setting in vmx_x86_ops-hardware_setup().



This commit just reorders something but some MSR writing depend on 
previous status.


Could you try this?

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index feb852b..96c84a8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5840,49 +5840,6 @@ static __init int hardware_setup(void)
memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE);
memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE);

-   vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
-   vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
-   vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
-   vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false);
-   vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
-   vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
-   vmx_disable_intercept_for_msr(MSR_IA32_BNDCFGS, true);
-
-   memcpy(vmx_msr_bitmap_legacy_x2apic,
-   vmx_msr_bitmap_legacy, PAGE_SIZE);
-   memcpy(vmx_msr_bitmap_longmode_x2apic,
-   vmx_msr_bitmap_longmode, PAGE_SIZE);
-
-   if (enable_apicv) {
-   for (msr = 0x800; msr = 0x8ff; msr++)
-   vmx_disable_intercept_msr_read_x2apic(msr);
-
-   /* According SDM, in x2apic mode, the whole id reg is used.
-* But in KVM, it only use the highest eight bits. Need to
-* intercept it */
-   vmx_enable_intercept_msr_read_x2apic(0x802);
-   /* TMCCT */
-   vmx_enable_intercept_msr_read_x2apic(0x839);
-   /* TPR */
-   vmx_disable_intercept_msr_write_x2apic(0x808);
-   /* EOI */
-   vmx_disable_intercept_msr_write_x2apic(0x80b);
-   /* SELF-IPI */
-   vmx_disable_intercept_msr_write_x2apic(0x83f);
-   }
-
-   if (enable_ept) {
-   kvm_mmu_set_mask_ptes(0ull,
-   (enable_ept_ad_bits) ? VMX_EPT_ACCESS_BIT : 0ull,
-   (enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
-   0ull, VMX_EPT_EXECUTABLE_MASK);
-   ept_set_mmio_spte_mask();
-   kvm_enable_tdp();
-   } else
-   kvm_disable_tdp();
-
-   update_ple_window_actual_max();
-
if (setup_vmcs_config(vmcs_config)  0) {
r = -EIO;
goto out7;
@@ -5945,6 +5902,49 @@ static __init int hardware_setup(void)
if (nested)
nested_vmx_setup_ctls_msrs();

+   

Re: [PATCH] kvm: x86: remove vmx_vm_has_apicv() outside of hwapic_isr_update()

2014-12-18 Thread Chen, Tiejun

On 2014/12/1 19:43, Paolo Bonzini wrote:



On 01/12/2014 10:28, Tiejun Chen wrote:

In most cases calling hwapic_isr_update(), actually we always
check if kvm_apic_vid_enabled() == 1, and also actually,
kvm_apic_vid_enabled()
 - kvm_x86_ops-vm_has_apicv()
 - vmx_vm_has_apicv() or '0' in svm case

So its unnecessary to recall this inside hwapic_isr_update(), here
just remove vmx_vm_has_apicv() out and follow others.


If you want to do this, please NULL out the function pointer instead, as
KVM already does for hwapic_irr_update.


Are you saying something below?

if (enable_apicv)
...
else {
kvm_x86_ops-hwapic_irr_update = NULL;

But there's a little bit difference to NULL out hwapic_isr_update(),

static int vmx_vm_has_apicv(struct kvm *kvm)
{
return enable_apicv  irqchip_in_kernel(kvm);
}

Yes, I can do something like this,

static __init int hadware_setup(void)
{
...
if (enable_apicv) {
...
if (!irqchip_in_kernel(kvm))
kvm_x86_ops-hwapic_isr_update = NULL;   
} else {
...
kvm_x86_ops-hwapic_isr_update = NULL;

But this means we have to revise hadware_setup() to get 'kvm' inside, 
then rebase other callers to hwapic_isr_update(), is it really good?


Here what I will intend to do is trying to reduce some cost (reduplicate 
check) with a little code, so its may not be worth changing much more.


Tiejun



Paolo


Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
  arch/x86/kvm/lapic.c | 3 ++-
  arch/x86/kvm/vmx.c   | 3 ---
  2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e0e5642..2ddc426 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1739,7 +1739,8 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
if (kvm_x86_ops-hwapic_irr_update)
kvm_x86_ops-hwapic_irr_update(vcpu,
apic_find_highest_irr(apic));
-   kvm_x86_ops-hwapic_isr_update(vcpu-kvm, apic_find_highest_isr(apic));
+   if (kvm_apic_vid_enabled(vcpu-kvm))
+   kvm_x86_ops-hwapic_isr_update(vcpu-kvm, 
apic_find_highest_isr(apic));
kvm_make_request(KVM_REQ_EVENT, vcpu);
kvm_rtc_eoi_tracking_restore_one(vcpu);
  }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6a951d8..f0c16a9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7406,9 +7406,6 @@ static void vmx_hwapic_isr_update(struct kvm *kvm, int 
isr)
u16 status;
u8 old;

-   if (!vmx_vm_has_apicv(kvm))
-   return;
-
if (isr == -1)
isr = 0;



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: iommu: Add cond_resched to legacy device assignment code

2014-12-16 Thread Chen, Tiejun

On 2014/12/16 23:47, Joerg Roedel wrote:

From: Joerg Roedel jroe...@suse.de

When assigning devices to large memory guests (=128GB guest
memory in the failure case) the functions to create the
IOMMU page-tables for the whole guest might run for a very
long time. On non-preemptible kernels this might cause
Soft-Lockup warnings. Fix these by adding a cond_resched()
to the mapping and unmapping loops.

Signed-off-by: Joerg Roedel jroe...@suse.de
---
  virt/kvm/iommu.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index c1e6ae9..ac427e8 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c


This file is already gone after one latest commit c274e03af705, kvm: 
x86: move assigned-dev.c and iommu.c to arch/x86/ is introduced, so you 
need to pull your tree firstly :)


Tiejun



@@ -137,7 +137,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct 
kvm_memory_slot *slot)

gfn += page_size  PAGE_SHIFT;

-
+   cond_resched();
}

return 0;
@@ -311,6 +311,8 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
kvm_unpin_pages(kvm, pfn, unmap_pages);

gfn += unmap_pages;
+
+   cond_resched();
}
  }



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: coalesced_mmio: remove one redundant check inside of coalesced_mmio_in_range()

2014-12-11 Thread Chen, Tiejun

On 2014/12/11 19:29, Paolo Bonzini wrote:



On 11/12/2014 04:02, Tiejun Chen wrote:

We already check 'len' above to make sure it already isn't
negative here, so indeed, (addr + len  addr) should never be happened.


... except if there is an overflow.


Sorry, I'm confused. 'addr' is u64 and now 'len' would always be '=0', 
what's your a so-called overflow here? And we also have such a check 
below, (addr + len  dev-zone.addr + dev-zone.size), so can this 
guarantee an overflow?


Thanks
Tiejun



Paolo


Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
  virt/kvm/coalesced_mmio.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 00d8642..60f59cd 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -30,8 +30,6 @@ static int coalesced_mmio_in_range(struct 
kvm_coalesced_mmio_dev *dev,
 */
if (len  0)
return 0;
-   if (addr + len  addr)
-   return 0;
if (addr  dev-zone.addr)
return 0;
if (addr + len  dev-zone.addr + dev-zone.size)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: coalesced_mmio: remove one redundant check inside of coalesced_mmio_in_range()

2014-12-11 Thread Chen, Tiejun

On 2014/12/12 9:02, Chen, Tiejun wrote:

On 2014/12/11 19:29, Paolo Bonzini wrote:



On 11/12/2014 04:02, Tiejun Chen wrote:

We already check 'len' above to make sure it already isn't
negative here, so indeed, (addr + len  addr) should never be happened.


... except if there is an overflow.




I think now I can understand what you mean.

Thanks
Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [question] what is virtio-1 device?

2014-11-26 Thread Chen, Tiejun

On 2014/11/27 9:26, Zhang Haoyu wrote:

Hi,
what is virtio-1 device?



Are you mean subsystem device id is '1'? That should be a network card.

Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [question] what is virtio-1 device?

2014-11-26 Thread Chen, Tiejun

On 2014/11/27 9:46, Zhang Haoyu wrote:

Hi,
what is virtio-1 device?



Are you mean subsystem device id is '1'? That should be a network card.


In the mail qemu: towards virtio-1 host support, virtio: allow virtio-1 queue 
layout,
dataplane: allow virtio-1 devices, etc., virtio-1 devices was mentioned,
I don't know which devices it incorporates?



That should mean they're carrying forward virtio based on latest Virtual 
I/O Device (VIRTIO) Version 1.0.


Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: ia64: remove

2014-11-19 Thread Chen, Tiejun

On 2014/11/20 5:05, Paolo Bonzini wrote:

KVM for ia64 has been marked as broken not just once, but twice even,
and the last patch from the maintainer is now roughly 5 years old.
Time for it to rest in piece.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---


I think we also need to sync this in Documentation:

Documentation: virtual: kvm: remove ia64

kvm/ia64 would be gone so just clean Documentation.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
 Documentation/ia64/kvm.txt  | 83 
-

 Documentation/ia64/paravirt_ops.txt |  8 ++--
 Documentation/virtual/kvm/api.txt   | 45 ++--
 3 files changed, 26 insertions(+), 110 deletions(-)
 delete mode 100644 Documentation/ia64/kvm.txt

diff --git a/Documentation/ia64/kvm.txt b/Documentation/ia64/kvm.txt
deleted file mode 100644
index ffb5c80..000
--- a/Documentation/ia64/kvm.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-Currently, kvm module is in EXPERIMENTAL stage on IA64. This means that
-interfaces are not stable enough to use. So, please don't run critical
-applications in virtual machine.
-We will try our best to improve it in future versions!
-
-   Guide: How to boot up guests on kvm/ia64
-
-This guide is to describe how to enable kvm support for IA-64 systems.
-
-1. Get the kvm source from git.kernel.org.
-   Userspace source:
-   git clone 
git://git.kernel.org/pub/scm/virt/kvm/kvm-userspace.git

-   Kernel Source:
-   git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/xiantao/kvm-ia64.git

-
-2. Compile the source code.
-   2.1 Compile userspace code:
-   (1)cd ./kvm-userspace
-   (2)./configure
-   (3)cd kernel
-   (4)make sync LINUX= $kernel_dir (kernel_dir is the 
directory of kernel source.)

-   (5)cd ..
-   (6)make qemu
-   (7)cd qemu; make install
-
-   2.2 Compile kernel source code:
-   (1) cd ./$kernel_dir
-   (2) Make menuconfig
-   (3) Enter into virtualization option, and choose kvm.
-   (4) make
-   (5) Once (4) done, make modules_install
-   (6) Make initrd, and use new kernel to reboot up host 
machine.

-   (7) Once (6) done, cd $kernel_dir/arch/ia64/kvm
-   (8) insmod kvm.ko; insmod kvm-intel.ko
-
-Note: For step 2, please make sure that host page size == 
TARGET_PAGE_SIZE of qemu, otherwise, may fail.

-
-3. Get Guest Firmware named as Flash.fd, and put it under right place:
-   (1) If you have the guest firmware (binary) released by Intel 
Corp for Xen, use it directly.

-
-   (2) If you have no firmware at hand, Please download its source from
-   hg clone http://xenbits.xensource.com/ext/efi-vfirmware.hg
-   you can get the firmware's binary in the directory of 
efi-vfirmware.hg/binaries.

-
-   (3) Rename the firmware you owned to Flash.fd, and copy it to 
/usr/local/share/qemu

-
-4. Boot up Linux or Windows guests:
-   4.1 Create or install a image for guest boot. If you have xen 
experience, it should be easy.

-
-   4.2 Boot up guests use the following command.
-   /usr/local/bin/qemu-system-ia64 -smp xx -m 512 -hda 
$your_image
-   (xx is the number of virtual processors for the guest, 
now the maximum value is 4)

-
-5. Known possible issue on some platforms with old Firmware.
-
-In the event of strange host crash issues, try to solve it through 
either of the following ways:

-
-(1): Upgrade your Firmware to the latest one.
-
-(2): Applying the below patch to kernel source.
-diff --git a/arch/ia64/kernel/pal.S b/arch/ia64/kernel/pal.S
-index 0b53344..f02b0f7 100644
 a/arch/ia64/kernel/pal.S
-+++ b/arch/ia64/kernel/pal.S
-@@ -84,7 +84,8 @@ GLOBAL_ENTRY(ia64_pal_call_static)
-   mov ar.pfs = loc1
-   mov rp = loc0
-   ;;
--  srlz.d  // serialize restoration of psr.l
-+  srlz.i  // serialize restoration of psr.l
-+  ;;
-   br.ret.sptk.many b0
- END(ia64_pal_call_static)
-
-6. Bug report:
-   If you found any issues when use kvm/ia64, Please post the bug 
info to kvm-ia64-devel mailing list.

-   https://lists.sourceforge.net/lists/listinfo/kvm-ia64-devel/
-
-Thanks for your interest! Let's work together, and make kvm/ia64 
stronger and stronger!

-
-
-   Xiantao 
Zhang xiantao.zh...@intel.com
- 
2008.3.10
diff --git a/Documentation/ia64/paravirt_ops.txt 
b/Documentation/ia64/paravirt_ops.txt

index 39ded02..3a1f36b 100644
--- a/Documentation/ia64/paravirt_ops.txt
+++ b/Documentation/ia64/paravirt_ops.txt
@@ -12,7 +12,7 @@ paravirt_ops (pv_ops in short) is a way for 
virtualization support of

 Linux kernel on x86. Several ways for virtualization support were
 proposed, paravirt_ops is the winner.

Re: [v2][PATCH] kvm: x86: mmio: fix setting the present bit of mmio spte

2014-11-18 Thread Chen, Tiejun

On 2014/11/17 19:40, Paolo Bonzini wrote:



On 17/11/2014 12:31, Tiejun Chen wrote:

In non-ept 64-bit of PAE case maxphyaddr may be 52bit as well,


There is no such thing as 64-bit PAE.



Definitely.


On 32-bit PAE hosts, PTEs have bit 62 reserved, as in your patch:



Do you mean just one reserved bit is fine enough in this case?

Thanks
Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 1/2] kvm: x86: mmu: return zero if s e in rsvd_bits()

2014-11-17 Thread Chen, Tiejun

On 2014/11/17 17:22, Paolo Bonzini wrote:



On 17/11/2014 02:34, Chen, Tiejun wrote:

On 2014/11/14 18:06, Paolo Bonzini wrote:



On 14/11/2014 10:31, Tiejun Chen wrote:

In some real scenarios 'start' may not be less than 'end' like
maxphyaddr = 52.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
   arch/x86/kvm/mmu.h | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index bde8ee7..0e98b5e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -58,6 +58,8 @@

   static inline u64 rsvd_bits(int s, int e)
   {
+if (unlikely(s  e))
+return 0;
   return ((1ULL  (e - s + 1)) - 1)  s;
   }




s == e + 1 is supported:

 (1ULL  (e - (e + 1) + 1)) - 1)  s ==


(1ULL  (e - (e + 1) + 1)) - 1)  s
 = (1ULL  (e - e - 1) + 1)) - 1)  s
 = (1ULL  (-1) + 1)) - 1)  s


no,


You're right since I'm seeing () wrongly.

Sorry to bother you.

Thanks
Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 1/2] kvm: x86: mmu: return zero if s e in rsvd_bits()

2014-11-16 Thread Chen, Tiejun

On 2014/11/14 18:06, Paolo Bonzini wrote:



On 14/11/2014 10:31, Tiejun Chen wrote:

In some real scenarios 'start' may not be less than 'end' like
maxphyaddr = 52.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
  arch/x86/kvm/mmu.h | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index bde8ee7..0e98b5e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -58,6 +58,8 @@

  static inline u64 rsvd_bits(int s, int e)
  {
+   if (unlikely(s  e))
+   return 0;
return ((1ULL  (e - s + 1)) - 1)  s;
  }




s == e + 1 is supported:

(1ULL  (e - (e + 1) + 1)) - 1)  s ==


(1ULL  (e - (e + 1) + 1)) - 1)  s
= (1ULL  (e - e - 1) + 1)) - 1)  s
= (1ULL  (-1) + 1)) - 1)  s
= (1ULL  (0) - 1)  s
= (1ULL  (- 1)  s

Am I missing something?

Thanks
Tiejun


(1ULL  0)  s ==
0

Is there any case where s is even bigger?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 2/2] kvm: x86: mmio: fix setting the present bit of mmio spte

2014-11-16 Thread Chen, Tiejun

On 2014/11/14 18:11, Paolo Bonzini wrote:



On 14/11/2014 10:31, Tiejun Chen wrote:

In PAE case maxphyaddr may be 52bit as well, we also need to
disable mmio page fault. Here we can check MMIO_SPTE_GEN_HIGH_SHIFT
directly to determine if we should set the present bit, and
bring a little cleanup.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
  arch/x86/include/asm/kvm_host.h |  1 +
  arch/x86/kvm/mmu.c  | 23 +++
  arch/x86/kvm/x86.c  | 30 --
  3 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dc932d3..667f2b6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -809,6 +809,7 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 struct kvm_memory_slot *slot,
 gfn_t gfn_offset, unsigned long mask);
  void kvm_mmu_zap_all(struct kvm *kvm);
+void kvm_set_mmio_spte_mask(void);
  void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ac1c4de..8e4be36 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -295,6 +295,29 @@ static bool check_mmio_spte(struct kvm *kvm, u64 spte)
return likely(kvm_gen == spte_gen);
  }

+/*
+ * Set the reserved bits and the present bit of an paging-structure
+ * entry to generate page fault with PFER.RSV = 1.
+ */
+void kvm_set_mmio_spte_mask(void)
+{
+   u64 mask;
+   int maxphyaddr = boot_cpu_data.x86_phys_bits;
+
+   /* Mask the reserved physical address bits. */
+   mask = rsvd_bits(maxphyaddr, MMIO_SPTE_GEN_HIGH_SHIFT - 1);
+
+   /* Magic bits are always reserved for 32bit host. */
+   mask |= 0x3ull  62;


This should be enough to trigger the page fault on PAE systems.

The problem is specific to non-EPT 64-bit hosts, where the PTEs have no
reserved bits beyond 51:MAXPHYADDR.  On EPT we use WX- permissions to
trigger EPT misconfig, on 32-bit systems we have bit 62.


Thanks for your explanation.




+   /* Set the present bit to enable mmio page fault. */
+   if (maxphyaddr  MMIO_SPTE_GEN_HIGH_SHIFT)
+   mask = PT_PRESENT_MASK;


Shouldn't this be |= anyway, instead of =?



Yeah, just miss this. Thanks a lot, I will fix this in next revision.

Thanks
Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: reset RVI upon system reset

2014-11-05 Thread Chen, Tiejun

On 2014/11/5 15:39, Wang, Wei W wrote:

On 05/11/2014 2:14, Tiejun Chen wrote:

A bug was reported as follows: when running Windows 7 32-bit guests on
qemu-kvm, sometimes the guests run into blue screen during reboot. The
problem was that a guest's RVI was not cleared when it rebooted. This

patch has fixed the problem.


Signed-off-by: Wei Wang wei.w.w...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
Tested-by: Rongrong Liu rongrongx@intel.com, Da Chun
ng...@qq.com
---
   arch/x86/kvm/lapic.c |3 +++
   arch/x86/kvm/vmx.c   |   12 ++--
   2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index
66dd173..6942742 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1712,6 +1712,9 @@ void kvm_apic_post_state_restore(struct

kvm_vcpu *vcpu,

apic-isr_count = kvm_apic_vid_enabled(vcpu-kvm) ?
1 : count_vectors(apic-regs + APIC_ISR);
apic-highest_isr_cache = -1;
+   if (kvm_x86_ops-hwapic_irr_update)
+   kvm_x86_ops-hwapic_irr_update(vcpu,
+   apic_find_highest_irr(apic));


Could we pass 0 directly? Because looks we just need to clear RVI.

kvm_x86_ops-hwapic_irr_update(vcpu, 0);

I think this already makes sense based on your description.

Thanks
Tiejun


No. This is a restore function, and we cannot assume that the callers always 
need to reset to the initial state.


Okay. Maybe I'm confused by the following change.



Wei



kvm_x86_ops-hwapic_isr_update(vcpu-kvm,

apic_find_highest_isr(apic));

kvm_make_request(KVM_REQ_EVENT, vcpu);
kvm_rtc_eoi_tracking_restore_one(vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
fe4d2f4..d632548 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7292,19 +7292,19 @@ static void vmx_set_rvi(int vector)
   static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
   {
if (max_irr == -1)
+   max_irr = 0;
+
+   if (!is_guest_mode(vcpu)) {
+   vmx_set_rvi(max_irr);
return;
+   }

/*
 * If a vmexit is needed, vmx_check_nested_events handles it.
 */
-   if (is_guest_mode(vcpu)  nested_exit_on_intr(vcpu))
+   if ((is_guest_mode(vcpu)  nested_exit_on_intr(vcpu)) || max_irr

==

+0)


Its really not readable to modify max_irr as 0 and check that here, and 
especially when you read the original comment.


So what about this?

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0cd99d8..bc4558b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7280,6 +7280,9 @@ static void vmx_set_rvi(int vector)
u16 status;
u8 old;

+   if (vector == -1)
+   vector = 0;
+
status = vmcs_read16(GUEST_INTR_STATUS);
old = (u8)status  0xff;
if ((u8)vector != old) {
@@ -7291,9 +7294,6 @@ static void vmx_set_rvi(int vector)

 static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
 {
-   if (max_irr == -1)
-   return;
-
/*
 * If a vmexit is needed, vmx_check_nested_events handles it.
 */
@@ -7305,6 +7305,9 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu 
*vcpu, int max_irr)

return;
}

+   if (max_irr == -1)
+   return;
+
/*
 * Fall back to pre-APICv interrupt injection since L2
 * is run without virtual interrupt delivery.


Thanks
Tiejun


return;

-   if (!is_guest_mode(vcpu)) {
-   vmx_set_rvi(max_irr);
-   return;
-   }
-
/*
 * Fall back to pre-APICv interrupt injection since L2
 * is run without virtual interrupt delivery.





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: reset RVI upon system reset

2014-11-05 Thread Chen, Tiejun

On 2014/11/5 16:50, Wang, Wei W wrote:

On 05/11/2014 4:07, Tiejun Chen wrote:

A bug was reported as follows: when running Windows 7 32-bit guests
on qemu-kvm, sometimes the guests run into blue screen during
reboot. The problem was that a guest's RVI was not cleared when it
rebooted. This

patch has fixed the problem.


Signed-off-by: Wei Wang wei.w.w...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
Tested-by: Rongrong Liu rongrongx@intel.com, Da Chun
ng...@qq.com
---
arch/x86/kvm/lapic.c |3 +++
arch/x86/kvm/vmx.c   |   12 ++--
2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index
66dd173..6942742 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1712,6 +1712,9 @@ void kvm_apic_post_state_restore(struct

kvm_vcpu *vcpu,

apic-isr_count = kvm_apic_vid_enabled(vcpu-kvm) ?
1 : count_vectors(apic-regs + APIC_ISR);
apic-highest_isr_cache = -1;
+   if (kvm_x86_ops-hwapic_irr_update)
+   kvm_x86_ops-hwapic_irr_update(vcpu,
+   apic_find_highest_irr(apic));


Could we pass 0 directly? Because looks we just need to clear RVI.

kvm_x86_ops-hwapic_irr_update(vcpu, 0);

I think this already makes sense based on your description.

Thanks
Tiejun


No. This is a restore function, and we cannot assume that the callers

always need to reset to the initial state.

Okay. Maybe I'm confused by the following change.



Wei



kvm_x86_ops-hwapic_isr_update(vcpu-kvm,

apic_find_highest_isr(apic));

kvm_make_request(KVM_REQ_EVENT, vcpu);
kvm_rtc_eoi_tracking_restore_one(vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
fe4d2f4..d632548 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7292,19 +7292,19 @@ static void vmx_set_rvi(int vector)
static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
{
if (max_irr == -1)
+   max_irr = 0;
+
+   if (!is_guest_mode(vcpu)) {
+   vmx_set_rvi(max_irr);
return;
+   }

/*
 * If a vmexit is needed, vmx_check_nested_events handles it.
 */
-   if (is_guest_mode(vcpu)  nested_exit_on_intr(vcpu))
+   if ((is_guest_mode(vcpu)  nested_exit_on_intr(vcpu)) || max_irr

==

+0)


Its really not readable to modify max_irr as 0 and check that here, and
especially when you read the original comment.

So what about this?



I think both are ok.
If we zero max_irr in vmx_set_rvi(), we still need this check:
if ((is_guest_mode(vcpu)  nested_exit_on_intr(vcpu)) || max_irr == -1)


No, I don't think we need to add this.



Let's see if Paolo has any comments, then send out a second version.

Wei


diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
0cd99d8..bc4558b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7280,6 +7280,9 @@ static void vmx_set_rvi(int vector)
  u16 status;
  u8 old;

+   if (vector == -1)
+   vector = 0;
+
  status = vmcs_read16(GUEST_INTR_STATUS);
  old = (u8)status  0xff;
  if ((u8)vector != old) {
@@ -7291,9 +7294,6 @@ static void vmx_set_rvi(int vector)

   static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
   {
-   if (max_irr == -1)
-   return;
-
  /*
   * If a vmexit is needed, vmx_check_nested_events handles it.
   */
@@ -7305,6 +7305,9 @@ static void vmx_hwapic_irr_update(struct
kvm_vcpu *vcpu, int max_irr)
  return;
  }

+   if (max_irr == -1)
+   return;
+


Did you see this?

Tiejun


  /*
   * Fall back to pre-APICv interrupt injection since L2
   * is run without virtual interrupt delivery.


Thanks
Tiejun


return;

-   if (!is_guest_mode(vcpu)) {
-   vmx_set_rvi(max_irr);
-   return;
-   }
-
/*
 * Fall back to pre-APICv interrupt injection since L2
 * is run without virtual interrupt delivery.








--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: x86: lapic: remove one redundant judging condition

2014-11-05 Thread Chen, Tiejun

On 2014/11/5 18:22, Paolo Bonzini wrote:

On 05/11/2014 10:03, Tiejun Chen wrote:

Finally we always return highest_irr so its unnecessary to return -1
after check if highest_irr == -1.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
  arch/x86/kvm/lapic.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5f574b4..e6a7eb6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1638,8 +1638,7 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)

apic_update_ppr(apic);
highest_irr = apic_find_highest_irr(apic);
-   if ((highest_irr == -1) ||
-   ((highest_irr  0xF0) = kvm_apic_get_reg(apic, APIC_PROCPRI)))
+   if ((highest_irr  0xF0) = kvm_apic_get_reg(apic, APIC_PROCPRI))
return -1;
return highest_irr;
  }


I think the code is clearer without this change.

The two returns mean:

- return -1: no interrupt to inject

- return highest_irr: inject this interrupt

With IRR equal to all zeroes (highest_irr = -1), your patch would make
the if always false (current PPR is low, can inject the interrupt),
but computing highest_irr  0xF0 would make no sense if highest_irr ==
-1.


Yeah, you're right so here is just a little confusion to read.

Actually what this code is doing looks like,

@@ -1638,7 +1638,7 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)

apic_update_ppr(apic);
highest_irr = apic_find_highest_irr(apic);
-   if ((highest_irr == -1) ||
+   if ((highest_irr != -1) 
((highest_irr  0xF0) = kvm_apic_get_reg(apic, APIC_PROCPRI)))
return -1;
return highest_irr;

But it's really no big deal so we can keep the original alive.

Thanks
Tiejun



To put it another way, imagine the code looked like this:

static inline int int_prio(int vector)
{
WARN_ON(vector == -1);
return vector  0xF0;
}
...

apic_update_ppr(apic);
highest_irr = apic_find_highest_irr(apic);
if (highest_irr == -1 ||
int_prio(highest_irr) = kvm_apic_get_reg(apic, APIC_PROCPRI))
return -1;
return highest_irr;

Then removing the check on highest_irr == -1 would trigger a warning.

Paolo


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: x86: vmx: avoid returning bool to distinguish success from error

2014-11-04 Thread Chen, Tiejun

On 2014/11/5 1:33, Paolo Bonzini wrote:

Return a negative error code instead, and WARN() when we should be covering
the entire 2-bit space of vmcs_field_type's return value.  For increased
robustness, add a BUILD_BUG_ON checking the range of vmcs_field_to_offset.

Suggested-by: Tiejun Chen tiejun.c...@intel.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
  arch/x86/kvm/vmx.c | 51 +++
  1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 12f6f9a8dd8d..0ee148f1687f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -720,12 +720,15 @@ static const unsigned short vmcs_field_to_offset_table[] 
= {
FIELD(HOST_RSP, host_rsp),
FIELD(HOST_RIP, host_rip),
  };
-static const int max_vmcs_field = ARRAY_SIZE(vmcs_field_to_offset_table);

  static inline short vmcs_field_to_offset(unsigned long field)
  {
-   if (field = max_vmcs_field || vmcs_field_to_offset_table[field] == 0)
-   return -1;
+   BUILD_BUG_ON(ARRAY_SIZE(vmcs_field_to_offset_table)  SHRT_MAX);
+
+   if (field = ARRAY_SIZE(vmcs_field_to_offset_table) ||
+   vmcs_field_to_offset_table[field] == 0)
+   return -ENOENT;
+
return vmcs_field_to_offset_table[field];
  }

@@ -6492,58 +6495,60 @@ static inline int vmcs_field_readonly(unsigned long 
field)
   * some of the bits we return here (e.g., on 32-bit guests, only 32 bits of
   * 64-bit fields are to be returned).
   */
-static inline bool vmcs12_read_any(struct kvm_vcpu *vcpu,
-   unsigned long field, u64 *ret)
+static inline int vmcs12_read_any(struct kvm_vcpu *vcpu,
+ unsigned long field, u64 *ret)
  {
short offset = vmcs_field_to_offset(field);
char *p;

if (offset  0)
-   return 0;
+   return offset;

p = ((char *)(get_vmcs12(vcpu))) + offset;

switch (vmcs_field_type(field)) {
case VMCS_FIELD_TYPE_NATURAL_WIDTH:
*ret = *((natural_width *)p);
-   return 1;
+   return 0;
case VMCS_FIELD_TYPE_U16:
*ret = *((u16 *)p);
-   return 1;
+   return 0;
case VMCS_FIELD_TYPE_U32:
*ret = *((u32 *)p);
-   return 1;
+   return 0;
case VMCS_FIELD_TYPE_U64:
*ret = *((u64 *)p);
-   return 1;
+   return 0;
default:
-   return 0; /* can never happen. */
+   WARN_ON(1);
+   return -ENOENT;
}
  }


-static inline bool vmcs12_write_any(struct kvm_vcpu *vcpu,
-   unsigned long field, u64 field_value){
+static inline int vmcs12_write_any(struct kvm_vcpu *vcpu,
+  unsigned long field, u64 field_value){
short offset = vmcs_field_to_offset(field);
char *p = ((char *) get_vmcs12(vcpu)) + offset;
if (offset  0)
-   return false;
+   return offset;

switch (vmcs_field_type(field)) {
case VMCS_FIELD_TYPE_U16:
*(u16 *)p = field_value;
-   return true;
+   return 0;
case VMCS_FIELD_TYPE_U32:
*(u32 *)p = field_value;
-   return true;
+   return 0;
case VMCS_FIELD_TYPE_U64:
*(u64 *)p = field_value;
-   return true;
+   return 0;
case VMCS_FIELD_TYPE_NATURAL_WIDTH:
*(natural_width *)p = field_value;
-   return true;
+   return 0;
default:
-   return false; /* can never happen. */
+   WARN_ON(1);
+   return -ENOENT;
}

  }
@@ -6576,6 +6581,9 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
case VMCS_FIELD_TYPE_NATURAL_WIDTH:
field_value = vmcs_readl(field);
break;
+   default:
+   WARN_ON(1);
+   continue;


'continue' versus 'break'?

Thanks
Tiejun


}
vmcs12_write_any(vmx-vcpu, field, field_value);
}
@@ -6621,6 +6629,9 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
case VMCS_FIELD_TYPE_NATURAL_WIDTH:
vmcs_writel(field, (long)field_value);
break;
+   default:
+   WARN_ON(1);
+   break;
}
}
}
@@ -6659,7 +6670,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
/* Decode instruction info and find the field to read */
field = kvm_register_readl(vcpu, (((vmx_instruction_info)  28)  
0xf));
/* Read the field, 

Re: [PATCH] kvm: x86: vmx: avoid returning bool to distinguish success from error

2014-11-04 Thread Chen, Tiejun

On 2014/11/5 9:43, Chen, Tiejun wrote:

On 2014/11/5 1:33, Paolo Bonzini wrote:

Return a negative error code instead, and WARN() when we should be
covering
the entire 2-bit space of vmcs_field_type's return value.  For increased
robustness, add a BUILD_BUG_ON checking the range of
vmcs_field_to_offset.

Suggested-by: Tiejun Chen tiejun.c...@intel.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
  arch/x86/kvm/vmx.c | 51


[snip]


@@ -6576,6 +6581,9 @@ static void copy_shadow_to_vmcs12(struct
vcpu_vmx *vmx)
  case VMCS_FIELD_TYPE_NATURAL_WIDTH:
  field_value = vmcs_readl(field);
  break;
+default:
+WARN_ON(1);
+continue;


'continue' versus 'break'?

Thanks
Tiejun


  }
  vmcs12_write_any(vmx-vcpu, field, field_value);
  }
@@ -6621,6 +6629,9 @@ static void copy_vmcs12_to_shadow(struct
vcpu_vmx *vmx)
  case VMCS_FIELD_TYPE_NATURAL_WIDTH:
  vmcs_writel(field, (long)field_value);
  break;
+default:
+WARN_ON(1);
+break;
  }
  }
  }
@@ -6659,7 +6670,7 @@ static int handle_vmread(struct kvm_vcpu *vcpu)
  /* Decode instruction info and find the field to read */
  field = kvm_register_readl(vcpu, (((vmx_instruction_info)  28)
 0xf));
  /* Read the field, zero-extended to a u64 field_value */
-if (!vmcs12_read_any(vcpu, field, field_value)) {
+if (vmcs12_read_any(vcpu, field, field_value)  0) {
  nested_vmx_failValid(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT);
  skip_emulated_instruction(vcpu);
  return 1;



Looks we're missing another place,

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6601,7 +6601,7 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
return 1;
}

-   if (!vmcs12_write_any(vcpu, field, field_value)) {
+   if (vmcs12_write_any(vcpu, field, field_value)  0) {
nested_vmx_failValid(vcpu, 
VMXERR_UNSUPPORTED_VMCS_COMPONENT);

skip_emulated_instruction(vcpu);
return 1;
Thanks
Tiejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: x86: vmx: return 'bool' type from vmcs12_read_any()

2014-11-04 Thread Chen, Tiejun

On 2014/11/5 1:34, Paolo Bonzini wrote:

On 31/10/2014 07:45, Tiejun Chen wrote:

Return value should be as bool type as this function declaration,
static inline bool vmcs12_read_any().


Actually, bool return values are in general a bad idea if you mean
success/fail, especially if you can use POSIX error codes such as in
this case ENOENT.



Yeah.


I've sent a patch that changes the return value to int.


Cool! I just have two minimal comments inline.

Thanks
Tiejun



Paolo


Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
  arch/x86/kvm/vmx.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9fa1f46..9d44d6e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6499,25 +6499,25 @@ static inline bool vmcs12_read_any(struct kvm_vcpu 
*vcpu,
char *p;

if (offset  0)
-   return 0;
+   return false;

p = ((char *)(get_vmcs12(vcpu))) + offset;

switch (vmcs_field_type(field)) {
case VMCS_FIELD_TYPE_NATURAL_WIDTH:
*ret = *((natural_width *)p);
-   return 1;
+   return true;
case VMCS_FIELD_TYPE_U16:
*ret = *((u16 *)p);
-   return 1;
+   return true;
case VMCS_FIELD_TYPE_U32:
*ret = *((u32 *)p);
-   return 1;
+   return true;
case VMCS_FIELD_TYPE_U64:
*ret = *((u64 *)p);
-   return 1;
+   return true;
default:
-   return 0; /* can never happen. */
+   return false; /* can never happen. */
}
  }





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: reset RVI upon system reset

2014-11-04 Thread Chen, Tiejun

On 2014/11/5 10:53, Wei Wang wrote:

A bug was reported as follows: when running Windows 7 32-bit guests on qemu-kvm,
sometimes the guests run into blue screen during reboot. The problem was that a
guest's RVI was not cleared when it rebooted. This patch has fixed the problem.

Signed-off-by: Wei Wang wei.w.w...@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
Tested-by: Rongrong Liu rongrongx@intel.com, Da Chun ng...@qq.com
---
  arch/x86/kvm/lapic.c |3 +++
  arch/x86/kvm/vmx.c   |   12 ++--
  2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 66dd173..6942742 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1712,6 +1712,9 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
apic-isr_count = kvm_apic_vid_enabled(vcpu-kvm) ?
1 : count_vectors(apic-regs + APIC_ISR);
apic-highest_isr_cache = -1;
+   if (kvm_x86_ops-hwapic_irr_update)
+   kvm_x86_ops-hwapic_irr_update(vcpu,
+   apic_find_highest_irr(apic));


Could we pass 0 directly? Because looks we just need to clear RVI.

kvm_x86_ops-hwapic_irr_update(vcpu, 0);

I think this already makes sense based on your description.

Thanks
Tiejun


kvm_x86_ops-hwapic_isr_update(vcpu-kvm, apic_find_highest_isr(apic));
kvm_make_request(KVM_REQ_EVENT, vcpu);
kvm_rtc_eoi_tracking_restore_one(vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fe4d2f4..d632548 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7292,19 +7292,19 @@ static void vmx_set_rvi(int vector)
  static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
  {
if (max_irr == -1)
+   max_irr = 0;
+
+   if (!is_guest_mode(vcpu)) {
+   vmx_set_rvi(max_irr);
return;
+   }

/*
 * If a vmexit is needed, vmx_check_nested_events handles it.
 */
-   if (is_guest_mode(vcpu)  nested_exit_on_intr(vcpu))
+   if ((is_guest_mode(vcpu)  nested_exit_on_intr(vcpu)) || max_irr == 0)
return;

-   if (!is_guest_mode(vcpu)) {
-   vmx_set_rvi(max_irr);
-   return;
-   }
-
/*
 * Fall back to pre-APICv interrupt injection since L2
 * is run without virtual interrupt delivery.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: x86: fix access memslots w/o hold srcu read lock

2014-10-31 Thread Chen, Tiejun

On 2014/10/31 14:26, Wanpeng Li wrote:

The srcu read lock must be held while accessing memslots (e.g.
when using gfn_to_* functions), however, commit c24ae0dcd3e8
(kvm: x86: Unpin and remove kvm_arch-apic_access_page) call
gfn_to_page() in kvm_vcpu_reload_apic_access_page() w/o hold it in
vmx_vcpu_reset() path which leads to suspicious rcu_dereference_check()
usage warning. This patch fix it by holding srcu read lock when call
gfn_to_page() in vmx_vcpu_reset() path.


[ INFO: suspicious RCU usage. ]
3.18.0-rc2-test2+ #70 Not tainted
---
include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/2371:
  #0:  (vcpu-mutex){+.+...}, at: [a037d800] vcpu_load+0x20/0xd0 
[kvm]

stack backtrace:
CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70
Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
  0001 880209983ca8 816f514f 
  8802099b8990 880209983cd8 810bd687 000fee00
  880208a2c000 880208a1 88020ef50040 880209983d08
Call Trace:
  [816f514f] dump_stack+0x4e/0x71
  [810bd687] lockdep_rcu_suspicious+0xe7/0x120
  [a037d055] gfn_to_memslot+0xd5/0xe0 [kvm]
  [a03807d3] __gfn_to_pfn+0x33/0x60 [kvm]
  [a0380885] gfn_to_page+0x25/0x90 [kvm]
  [a038aeec] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm]
  [a08f0a9c] vmx_vcpu_reset+0x20c/0x460 [kvm_intel]
  [a039ab8e] kvm_vcpu_reset+0x15e/0x1b0 [kvm]
  [a039ac0c] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
  [a037f7e0] kvm_vm_ioctl+0x1d0/0x780 [kvm]
  [810bc664] ? __lock_is_held+0x54/0x80
  [812231f0] do_vfs_ioctl+0x300/0x520
  [8122ee45] ? __fget+0x5/0x250
  [8122f0fa] ? __fget_light+0x2a/0xe0
  [81223491] SyS_ioctl+0x81/0xa0
  [816fed6d] system_call_fastpath+0x16/0x1b

Reported-by: Takashi Iwai ti...@suse.de
Reported-by: Alexei Starovoitov alexei.starovoi...@gmail.com
Suggested-by: Tiejun Chen tiejun.c...@intel.com


Reviewed-by: Tiejun Chen tiejun.c...@intel.com


Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
---
  arch/x86/kvm/vmx.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a0f78db..bd9be01 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4512,6 +4512,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
  {
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct msr_data apic_base_msr;
+   int idx;

vmx-rmode.vm86_active = 0;

@@ -4579,7 +4580,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, 0);
}

+   idx = srcu_read_lock(vcpu-kvm-srcu);
kvm_vcpu_reload_apic_access_page(vcpu);
+   srcu_read_unlock(vcpu-kvm-srcu, idx);

if (vmx_vm_has_apicv(vcpu-kvm))
memset(vmx-pi_desc, 0, sizeof(struct pi_desc));


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix access memslots w/o hold srcu read lock

2014-10-30 Thread Chen, Tiejun

On 2014/10/31 12:33, Wanpeng Li wrote:

The srcu read lock must be held while accessing memslots (e.g.
when using gfn_to_* functions), however, commit c24ae0dcd3e8
(kvm: x86: Unpin and remove kvm_arch-apic_access_page) call
gfn_to_page() in kvm_vcpu_reload_apic_access_page() w/o hold it
which leads to suspicious rcu_dereference_check() usage warning.
This patch fix it by holding srcu read lock when call gfn_to_page()
in kvm_vcpu_reload_apic_access_page() function.


[ INFO: suspicious RCU usage. ]
3.18.0-rc2-test2+ #70 Not tainted
---
include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/2371:
  #0:  (vcpu-mutex){+.+...}, at: [a037d800] vcpu_load+0x20/0xd0 
[kvm]

stack backtrace:
CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70
Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
  0001 880209983ca8 816f514f 
  8802099b8990 880209983cd8 810bd687 000fee00
  880208a2c000 880208a1 88020ef50040 880209983d08
Call Trace:
  [816f514f] dump_stack+0x4e/0x71
  [810bd687] lockdep_rcu_suspicious+0xe7/0x120
  [a037d055] gfn_to_memslot+0xd5/0xe0 [kvm]
  [a03807d3] __gfn_to_pfn+0x33/0x60 [kvm]
  [a0380885] gfn_to_page+0x25/0x90 [kvm]
  [a038aeec] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm]
  [a08f0a9c] vmx_vcpu_reset+0x20c/0x460 [kvm_intel]
  [a039ab8e] kvm_vcpu_reset+0x15e/0x1b0 [kvm]
  [a039ac0c] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
  [a037f7e0] kvm_vm_ioctl+0x1d0/0x780 [kvm]
  [810bc664] ? __lock_is_held+0x54/0x80
  [812231f0] do_vfs_ioctl+0x300/0x520
  [8122ee45] ? __fget+0x5/0x250
  [8122f0fa] ? __fget_light+0x2a/0xe0
  [81223491] SyS_ioctl+0x81/0xa0
  [816fed6d] system_call_fastpath+0x16/0x1b

Reported-by: Takashi Iwai ti...@suse.de
Reported-by: Alexei Starovoitov alexei.starovoi...@gmail.com
Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
---
  arch/x86/kvm/x86.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0033df3..2d97329 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6059,6 +6059,7 @@ static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu)
  void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
  {
struct page *page = NULL;
+   int idx;

if (!irqchip_in_kernel(vcpu-kvm))
return;
@@ -6066,7 +6067,9 @@ void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu 
*vcpu)
if (!kvm_x86_ops-set_apic_access_page_addr)
return;

+   idx = srcu_read_lock(vcpu-kvm-srcu);


There's another scenario that we already hold srcu before call 
kvm_vcpu_reload_apic_access_page(),


__vcpu_run()
|
+ vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
+ r = vcpu_enter_guest(vcpu);
|
+ kvm_vcpu_reload_apic_access_page(vcpu);

So according to backtrace I think we should fix as follows:

kvm: x86: vmx: hold kvm-srcu while reload apic access page

kvm_vcpu_reload_apic_access_page() needs to access memslots via
gfn_to_page(), so its necessary to hold kvm-srcu.

Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
 arch/x86/kvm/vmx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b25a588..9fa1f46 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4442,6 +4442,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct msr_data apic_base_msr;
+   int idx;

vmx-rmode.vm86_active = 0;

@@ -4509,7 +4510,9 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, 0);
}

+   idx = srcu_read_lock(vcpu-kvm-srcu);
kvm_vcpu_reload_apic_access_page(vcpu);
+   srcu_read_unlock(vcpu-kvm-srcu, idx);

if (vmx_vm_has_apicv(vcpu-kvm))
memset(vmx-pi_desc, 0, sizeof(struct pi_desc));
--
1.9.1

Thanks
Tiejun

page = gfn_to_page(vcpu-kvm, APIC_DEFAULT_PHYS_BASE  PAGE_SHIFT);
+   srcu_read_unlock(vcpu-kvm-srcu, idx);
kvm_x86_ops-set_apic_access_page_addr(vcpu, page_to_phys(page));

/*


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [x86, kvm] WARNING: at arch/x86/kernel/pvclock.c:182 pvclock_init_vsyscall()

2014-09-30 Thread Chen, Tiejun

On 2014/9/30 15:59, Fengguang Wu wrote:

Greetings,

0day kernel testing robot got the below dmesg and the first bad commit is

commit 3dc4f7cfb7441e5e0fed3a02fc81cdaabd28300a
Author: Marcelo Tosatti mtosa...@redhat.com
AuthorDate: Tue Nov 27 23:28:56 2012 -0200
Commit: Marcelo Tosatti mtosa...@redhat.com
CommitDate: Tue Nov 27 23:29:10 2012 -0200

 x86: kvm guest: pvclock vsyscall support

 Hook into generic pvclock vsyscall code, with the aim to
 allow userspace to have visibility into pvclock data.

 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

+--++++
|  | 71056ae22d | 
3dc4f7cfb7 | d778df51c0 |
+--++++
| boot_successes   | 141| 0 
 | 0  |
| boot_failures| 0  | 
47 | 11 |
| WARNING:at_arch/x86/kernel/pvclock.c:pvclock_init_vsyscall() | 0  | 
47 | 11 |
| backtrace:pvclock_init_vsyscall  | 0  | 
47 | 11 |
| backtrace:warn_slowpath_null | 0  | 
47 | 11 |
| backtrace:kvm_setup_vsyscall_timeinfo| 0  | 
47 | 11 |
| backtrace:kvm_guest_init | 0  | 
47 | 11 |
| Kernel_panic-not_syncing:Attempted_to_kill_init_exitcode=| 0  | 0 
 | 7  |
| BUG:kernel_boot_hang | 0  | 0 
 | 4  |
+--++++

[0.00] mapped IOAPIC to ff5f9000 (fec0)
[0.00] nr_irqs_gsi: 40
[0.00] [ cut here ]
[0.00] WARNING: at arch/x86/kernel/pvclock.c:182 
pvclock_init_vsyscall+0x41/0x8e()
[0.00] Hardware name: Standard PC (i440FX + PIIX, 1996)
[0.00] Modules linked in:
[0.00] Pid: 0, comm: swapper Not tainted 3.7.0-rc3-00112-g3dc4f7c #1


Looks you're working with old kernel, so its worth trying the latest again.

Tiejun


[0.00] Call Trace:
[0.00]  [8104f750] warn_slowpath_common+0x70/0xa0
[0.00]  [8104f83a] warn_slowpath_null+0x1a/0x20
[0.00]  [8219a6af] pvclock_init_vsyscall+0x41/0x8e
[0.00]  [8219a623] kvm_setup_vsyscall_timeinfo+0x48/0x78
[0.00]  [8219a397] kvm_guest_init+0x98/0xe1
[0.00]  [82190eb8] setup_arch+0xa9b/0xb10
[0.00]  [818eeae8] ? printk+0x4f/0x57
[0.00]  [8218e856] start_kernel+0x93/0x388
[0.00]  [8218e120] ? early_idt_handlers+0x120/0x120
[0.00]  [8218e2b4] x86_64_start_reservations+0xb0/0xb3
[0.00]  [8218e3b9] x86_64_start_kernel+0x102/0x10f
[0.00] ---[ end trace c9f7d63dd24af7e7 ]---
[0.00] KVM setup async PF for cpu 0

git bisect start v3.8 v3.7 --
git bisect  bad 8d91a42e54eebc43f4d8f6064751ccba73528275  # 21:30  0-  
5  Merge tag 'omap-late-cleanups' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect  bad 770b6cb4d21fb3e3df2a7a51e186a3c14db1ec30  # 21:35  0-  
1  ARM: OMAP: Fix drivers to depend on omap for internal devices
git bisect good 9977d9b379cb77e0f67bd6f4563618106e58e11d  # 21:40 30+  
0  Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
git bisect  bad e777d192ffb9f2929d547a2f8a5f65b7db7a9552  # 21:44  4- 
26  Merge tag 'scsi-misc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect good 8b0cab14951fbf8126795ab301835a8f8126a988  # 22:00 30+  
0  Merge tag 'regulator-3.8' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
git bisect good c7708fac5a878d6e0f2de0aa19f9749cff4f707f  # 22:11 30+  
0  Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
git bisect  bad e05a1c6397a73d09389e033b6b2c25c954d2177c  # 22:17  0-  
1  Merge tag 'ktest-v3.8' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
git bisect good 896ea17d3da5f44b2625c9cda9874d7dfe447393  # 22:36 30+  
0  Merge tag 'stable/for-linus-3.8-rc0-tag' of 
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
git bisect  bad 66cdd0ceaf65a18996f561b770eedde1d123b019  # 22:41  3-  
3  Merge tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect good 8455d79e2163997e479931b8d5b7e60a92cd2b86  # 23:04 30+  
0  KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run
git bisect  bad 

Re: [PATCH v2] KVM: x86: sync old tmr with ioapic to update

2014-08-27 Thread Chen, Tiejun

On 2014/8/27 22:05, Wei Wang wrote:

kvm_ioapic_scan_entry() needs to update tmr. The previous lapic tmr value
(old_tmr) needs to sync with ioapic to get an accurate updated tmr
value before the updating work.

Tested-by: Rongrong Liu rongrongx@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
Signed-off-by: Wei Wang wei.w.w...@intel.com
---
  arch/x86/kvm/lapic.c |   19 +--
  arch/x86/kvm/x86.c   |2 +-
  2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 08e8a89..8c1162d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -518,10 +518,25 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
  void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
  {
struct kvm_lapic *apic = vcpu-arch.apic;
+   u32 irr;
+   u32 isr;
+   u32 old_tmr, new_tmr;
int i;

-   for (i = 0; i  8; i++)
-   apic_set_reg(apic, APIC_TMR + 0x10 * i, tmr[i]);
+   /*
+* The updated tmr value comes from level-triggerd interrupts that


s/level-triggerd/level-triggered


+* have already been delieverd to lapic and new coming ones which


s/delieverd/delivered


+* are pending in ioapic. According to the x86 spec, tmr is valid
+* when irr or isr is set.
+*/
+   for (i = 0; i  8; i++) {
+   irr = kvm_apic_get_reg(apic, APIC_IRR + 0x10 * i);
+   isr = kvm_apic_get_reg(apic, APIC_ISR + 0x10 * i);
+   old_tmr = kvm_apic_get_reg(apic, APIC_TMR + 0x10 * i);
+   new_tmr = (~(irr | isr)  tmr[i])
+   | ((irr | isr)  old_tmr);
+   apic_set_reg(apic, APIC_TMR + 0x10 * i, new_tmr);
+   }
  }

  static void apic_update_ppr(struct kvm_lapic *apic)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5f5edb6..d401684 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5991,8 +5991,8 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
memset(tmr, 0, 32);

kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr);
-   kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap);
kvm_apic_update_tmr(vcpu, tmr);
+   kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap);
  }

  /*


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: keep EOI exit bitmap accurate before loading it.

2014-08-26 Thread Chen, Tiejun

On 2014/8/27 0:27, Wei Wang wrote:

Guest may mask the IOAPIC entry before issue EOI. In such case,
EOI will not be intercepted by hypervisor, since the corrensponding


Looks this is always missed :)

s/corrensponding/corresponding

Tiejun


bit in eoi exit bitmap is not set after the masking of IOAPIC entry.

The solution here is to OR EOI_exit_bitmap with tmr.

Tested-by: Rongrong Liu rongrongx@intel.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
Signed-off-by: Wei Wang wei.w.w...@intel.com
---
  arch/x86/kvm/lapic.c |9 +
  arch/x86/kvm/lapic.h |2 ++
  arch/x86/kvm/x86.c   |1 +
  virt/kvm/ioapic.c|6 +++---
  4 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 93c2e93..759d24e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -533,6 +533,15 @@ void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
}
  }

+void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
+   u32 *tmr)
+{
+   u32 i;
+
+   for (i = 0; i  8; i++)
+   *((u32 *)eoi_exit_bitmap + i) |= tmr[i];
+}
+
  static void apic_update_ppr(struct kvm_lapic *apic)
  {
u32 tpr, isrv, ppr, old_ppr;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6a11845..eda7be7 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -55,6 +55,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu);

  void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
  void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
+void kvm_apic_update_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
+   u32 *tmr);
  int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
  int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
  int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d401684..4042bc0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5992,6 +5992,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)

kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr);
kvm_apic_update_tmr(vcpu, tmr);
+   kvm_apic_update_eoi_exitmap(vcpu, eoi_exit_bitmap, tmr);
kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap);
  }

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index e8ce34c..ea5f697 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -254,9 +254,9 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap,
spin_lock(ioapic-lock);
for (index = 0; index  IOAPIC_NUM_PINS; index++) {
e = ioapic-redirtbl[index];
-   if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG ||
-   kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, 
index) ||
-   index == RTC_GSI) {
+   if (!e-fields.mask  e-fields.trig_mode == IOAPIC_LEVEL_TRIG
+   || kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC,
+   index) || index == RTC_GSI) {
if (kvm_apic_match_dest(vcpu, NULL, 0,
e-fields.dest_id, e-fields.dest_mode)) {
__set_bit(e-fields.vector,


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: x86: check ISR and TMR to construct eoi exit bitmap

2014-08-15 Thread Chen, Tiejun

On 2014/8/14 3:16, Wei Wang wrote:

From: Yang Zhang yang.z.zh...@intel.com

Guest may mask the IOAPIC entry before issue EOI. In such case,
EOI will not be intercepted by hypervisor due to the corrensponding


s/corrensponding/corresponding


bit in eoi exit bitmap is not setting.

The solution is to check ISR + TMR to construct the EOI exit bitmap.

This patch is a better fixing for the issue that commit 0f6c0a740b
tries to solve.

Tested-by: Alex Williamson alex.william...@redhat.com
Signed-off-by: Yang Zhang yang.z.zh...@intel.com
Signed-off-by: Wei Wang wei.w.w...@intel.com
---
  arch/x86/kvm/lapic.c |   17 +
  arch/x86/kvm/lapic.h |2 ++
  arch/x86/kvm/x86.c   |9 +
  virt/kvm/ioapic.c|7 ---
  4 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 08e8a89..0ed4bcb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -515,6 +515,23 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
__clear_bit(KVM_APIC_PV_EOI_PENDING, vcpu-arch.apic_attention);
  }

+void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
+   u32 *tmr)
+{
+   u32 i, reg_off, intr_in_service;
+   struct kvm_lapic *apic = vcpu-arch.apic;
+
+   for (i = 0; i  8; i++) {
+   reg_off = 0x10 * i;
+   intr_in_service = apic_read_reg(apic, APIC_ISR + reg_off) 
+   kvm_apic_get_reg(apic, APIC_TMR + reg_off);
+   if (intr_in_service) {
+   *((u32 *)eoi_exit_bitmap + i) |= intr_in_service;
+   tmr[i] |= intr_in_service;
+   }
+   }
+}
+
  void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
  {
struct kvm_lapic *apic = vcpu-arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6a11845..4ee3d70 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -53,6 +53,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
  u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
  void kvm_apic_set_version(struct kvm_vcpu *vcpu);

+void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
+   u32 *tmr);
  void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
  void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
  int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 204422d..755b556 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6005,6 +6005,15 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
memset(tmr, 0, 32);

kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr);
+   /*
+* Guest may mask the IOAPIC entry before issue EOI. In such case,
+* EOI will not be intercepted by hypervisor due to the corrensponding


s/corrensponding/corresponding


+* bit in eoi exit bitmap is not setting.
+*
+* The solution is to check ISR + TMR to construct the EOI exit bitmap.
+*/
+   kvm_apic_zap_eoi_exitmap(vcpu, eoi_exit_bitmap, tmr);
+
kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap);
kvm_apic_update_tmr(vcpu, tmr);
  }
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index e8ce34c..2458a1d 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap,
spin_lock(ioapic-lock);
for (index = 0; index  IOAPIC_NUM_PINS; index++) {
e = ioapic-redirtbl[index];
-   if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG ||
-   kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, 
index) ||
-   index == RTC_GSI) {
+   if (!e-fields.mask 
+   (e-fields.trig_mode == IOAPIC_LEVEL_TRIG ||
+kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC,
+index) || index == RTC_GSI)) {
if (kvm_apic_match_dest(vcpu, NULL, 0,
e-fields.dest_id, e-fields.dest_mode)) {
__set_bit(e-fields.vector,


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: check ISR and TMR to construct eoi exit bitmap

2014-08-08 Thread Chen, Tiejun

On 2014/8/8 14:02, Yang Zhang wrote:

From: Yang Zhang yang.z.zh...@intel.com

Guest may mask the IOAPIC entry before issue EOI. In such case,
EOI will not be intercepted by hypervisor due to the corrensponding
bit in eoi exit bitmap is not setting.

The solution is to check ISR + TMR to construct the EOI exit bitmap.

This patch is a better fixing for the issue that commit 0f6c0a740b
tries to solve.

Signed-off-by: Yang Zhang yang.z.zh...@intel.com
---
  arch/x86/kvm/lapic.c |   22 ++
  arch/x86/kvm/lapic.h |2 ++
  arch/x86/kvm/x86.c   |9 +
  virt/kvm/ioapic.c|7 ---
  4 files changed, 37 insertions(+), 3 deletions(-)

hi, alex

This patch is not tested since i don't have the environment to do it. Can you
help to test it? thanks.

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 08e8a89..d2f9a6e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -71,6 +71,11 @@
  #define VEC_POS(v) ((v)  (32 - 1))
  #define REG_POS(v) (((v)  5)  4)

+static inline u32 apic_read_reg(struct kvm_lapic *apic, int reg_off)
+{
+   return *((u32 *) (apic-regs + reg_off));
+}
+


I think we already define the same in the arch/x86/kvm/lapic.h file,

static inline u32 kvm_apic_get_reg(struct kvm_lapic *apic, int reg_off)
{
return *((u32 *) (apic-regs + reg_off));
}

Thanks
Tiejun


  static inline void apic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val)
  {
*((u32 *) (apic-regs + reg_off)) = val;
@@ -515,6 +520,23 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu)
__clear_bit(KVM_APIC_PV_EOI_PENDING, vcpu-arch.apic_attention);
  }

+void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
+   u32 *tmr)
+{
+   u32 i, reg_off, intr_in_service[8];
+   struct kvm_lapic *apic = vcpu-arch.apic;
+
+   for (i = 0; i  8; i++) {
+   reg_off = 0x10 * i;
+   intr_in_service[i] = apic_read_reg(apic, APIC_ISR + reg_off) 
+   apic_read_reg(apic, APIC_TMR + reg_off);
+   if (intr_in_service[i]) {
+   *((u32 *)eoi_exit_bitmap + i) |= intr_in_service[i];
+   tmr[i] |= intr_in_service[i];
+   }
+   }
+}
+
  void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr)
  {
struct kvm_lapic *apic = vcpu-arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6a11845..4ee3d70 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -53,6 +53,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
  u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
  void kvm_apic_set_version(struct kvm_vcpu *vcpu);

+void kvm_apic_zap_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
+   u32 *tmr);
  void kvm_apic_update_tmr(struct kvm_vcpu *vcpu, u32 *tmr);
  void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir);
  int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 204422d..755b556 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6005,6 +6005,15 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
memset(tmr, 0, 32);

kvm_ioapic_scan_entry(vcpu, eoi_exit_bitmap, tmr);
+   /*
+* Guest may mask the IOAPIC entry before issue EOI. In such case,
+* EOI will not be intercepted by hypervisor due to the corrensponding
+* bit in eoi exit bitmap is not setting.
+*
+* The solution is to check ISR + TMR to construct the EOI exit bitmap.
+*/
+   kvm_apic_zap_eoi_exitmap(vcpu, eoi_exit_bitmap, tmr);
+
kvm_x86_ops-load_eoi_exitmap(vcpu, eoi_exit_bitmap);
kvm_apic_update_tmr(vcpu, tmr);
  }
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index e8ce34c..2458a1d 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -254,9 +254,10 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap,
spin_lock(ioapic-lock);
for (index = 0; index  IOAPIC_NUM_PINS; index++) {
e = ioapic-redirtbl[index];
-   if (e-fields.trig_mode == IOAPIC_LEVEL_TRIG ||
-   kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC, 
index) ||
-   index == RTC_GSI) {
+   if (!e-fields.mask 
+   (e-fields.trig_mode == IOAPIC_LEVEL_TRIG ||
+kvm_irq_has_notifier(ioapic-kvm, KVM_IRQCHIP_IOAPIC,
+index) || index == RTC_GSI)) {
if (kvm_apic_match_dest(vcpu, NULL, 0,
e-fields.dest_id, e-fields.dest_mode)) {
__set_bit(e-fields.vector,


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-09 Thread Chen, Tiejun
 -Original Message-
 From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] 
 Sent: Thursday, May 09, 2013 8:38 PM
 To: Chen, Tiejun
 Cc: Bhushan Bharat-R65777; Caraman Mihai Claudiu-B02008; Wood 
 Scott-B07421; linuxppc-...@lists.ozlabs.org; ag...@suse.de; 
 kvm-...@vger.kernel.org; kvm@vger.kernel.org
 Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: 
 soft-disable interrupts
 
 On Thu, 2013-05-09 at 17:44 +0800, tiejun.chen wrote:
  
  Actually in the case GS=1 even if EE=0, EXT/DEC/DBELL still 
 occur as I 
  recall.
 
 Only if directed to the hypervisor.

Yes, this is our current booehv design with EPCR[EXTGS] = 0.

 
   Case 1)
 - Local_irq_disable()  will set soft_enabled = 0
 - Now Externel interrupt happens, there we set PACA_IRQ_EE in
  irq_happened, Also clears EE in SRR1 and rfi. So interrupts 
 are hard 
  disabled. No more other interrupt gated by MSR.EE can happen. Looks 
  like the idea here is to not let a device keep on inserting 
 interrupt 
  till the interrupt condition on device is cleared, right?
 
 The external interrupt line is level sensitive normally, so 
 we have to mask MSR:EE in that case or the interrupt would 
 keep re-occurring (note that FSL has this concept of 
 auto-acked interrupts via the on die MPIC for which you can 
 potentially use PACA_IRQ_EE_EDGE instead and avoid having to 
 mask MSR:EE).
 
  I don't understand the interrupt condition on device is cleared
  here.
 
 Well, the external interrupt line will remain asserted until 
 the device (via the PIC) stops interrupting the processor :-) 

Yes, the device keeps on interrupting the interrupt until the device clear its 
interrupted condition.

 Either because we mask at the PIC level (or raise the 
 priority) or because the condition goes away.
  
  I think regardless if you clear the device interrupt status, the 
  system still  receive a pending interrupt once EE or GS = 1.
 
 Why ? Depends on your PIC actually but if it's a sane one, it 
 won't remember a level interrupt that is gone. An edge 
 interrupt is a different matter and is latched.

But the interrupt controller like MPIC should leave this irq as pending if we 
don't ACK/EOI this irq at all if we just clear the device, right?

 
 Some MPIC implementations tend to generate a spurrious IRQ in 
 the case of level IRQs going away. IE. they still remember an 
 event occurred and interrupt the processor, but on IACK 
 return the spurious vector. However that isn't guaranteed to 

Yes, this is just what I mean. No matter if this vector is still valid, the 
external interrupt exception always occur once EE = 1 again.

 be the case and it is perfectly ok (and a good
 idea) for the PIC to just stop asserting the external 
 interrupt line if the original device interrupt goes away 
 (and it's configured as level sensitive).

I don't remember MPIC can work with this way. So I'd like to look the manual 
again :)

Tiejun--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html