Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Laszlo Ersek
On 11/20/14 00:32, Mario Smarduch wrote:
 Hi Laszlo,
 
 couple observations.
 
  I'm wondering if access from qemu and guest won't
 result in mixed memory attributes and if that's acceptable
 to the CPU.

Normally this would be a problem I think (Jon raised the topic of live
migration). However, for flash programming specifically, I think the
guest's access pattern ensures that we'll see things OK.

When the guest issues the first write access, the memslot is deleted,
and everything is forwarded to qemu, both reads and writes. In response
qemu modifies the array that *otherwise* backs the flash. These
modifications by qemu end up in the dcache mostly. When the guest is
done programming, it writes a special command (read array mode) at
which point the memslot is recreated (as read-only) and flushed / set up
for flushing during demand paging.

So from the emulated flash POV, the memslot either doesn't exist at all
(and then qemu serves all accesses just fine), or it exists r/o, at
which point qemu (host userspace) will have stopped writing to it, and
will have set it up for flushing before and during guest read accesses.

 Also is if you update memory from qemu you may break
 dirty page logging/migration.

Very probably. Jon said the same thing.

 Unless there is some other way
 you keep track. Of course it may not be applicable in your
 case (i.e. flash unused after boot).

The flash *is* used after boot, because the UEFI runtime variable
services *are* exercised by the guest kernel. However those use the same
access pattern (it's the same set of UEFI services just called by a
different client).

*Uncoordinated* access from guest and host in parallel will be a big
problem; but we're not that far yet, and we need to get the flash
problem sorted, so that we can at least boot and work on the basic
stuff. The flash programming dance happens to provide coordination; the
flash mode changes (which are equivalent to the teardown and the
recreation of the memslot) can be considered barriers.

I hope this is acceptable for the time being...

Thanks
Laszlo

 
 - Mario
 
 On 11/17/2014 07:49 AM, Laszlo Ersek wrote:
 On 11/17/14 16:29, Paolo Bonzini wrote:


 On 17/11/2014 15:58, Ard Biesheuvel wrote:
 Readonly memslots are often used to implement emulation of ROMs and
 NOR flashes, in which case the guest may legally map these regions as
 uncached.
 To deal with the incoherency associated with uncached guest mappings,
 treat all readonly memslots as incoherent, and ensure that pages that
 belong to regions tagged as such are flushed to DRAM before being passed
 to the guest.

 On x86, the processor combines the cacheability values from the two
 levels of page tables.  Is there no way to do the same on ARM?

 Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict
 (Device non-Gathering, non-Reordering, no Early Write Acknowledgement --
 for EFI_MEMORY_UC), which basically overrides the Stage2 (very lax
 host) memory attributes.

 When qemu writes, as part of emulating the flash programming commands,
 to the RAMBlock that *otherwise* backs the flash range (as a r/o
 memslot), those writes (from host userspace) tend to end up in dcache.

 But, when the guest flips back the flash to romd mode, and tries to read
 back the values from the flash as plain ROM, the dcache is completely
 bypassed due to the strict stage1 mapping, and the guest goes directly
 to DRAM.

 Where qemu's earlier writes are not yet / necessarily visible.

 Please see my original patch (which was incomplete) in the attachment,
 it has a very verbose commit message.

 Anyway, I'll let others explain; they can word it better than I can :)

 FWIW,

 Series
 Reviewed-by: Laszlo Ersek ler...@redhat.com

 I ported this series to a 3.17.0+ based kernel, and tested it. It works
 fine. The ROM-like view of the NOR flash now reflects the previously
 programmed contents.

 Series
 Tested-by: Laszlo Ersek ler...@redhat.com

 Thanks!
 Laszlo



 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v1 0/2] Define some VFIO interfaces for VT-d Posted-Interrupts

2014-11-20 Thread Feng Wu
VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

This patch set does the following things:

- Define a new VFIO group KVM_DEV_VFIO_INTERRUPT and it's attributes
KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ. Qemu can use this interface to
configure VT-d PI when guest updates the interrupt configuration
(MSI/MSI-X configuration).

- Define a new VFIO API: vfio_msi_get_irq(), which can be used by KVM
to get the host irq of the assigned devices. Then KVM can update the
associated IRTE for VT-d PI.

Feng Wu (2):
  vfio: Add new interrupt group for VFIO
  vfio: Add VFIO API vfio_msi_get_irq

 Documentation/virtual/kvm/devices/vfio.txt |8 
 drivers/vfio/pci/vfio_pci.c|   10 ++
 include/linux/vfio.h   |2 ++
 include/uapi/linux/kvm.h   |   14 ++
 4 files changed, 34 insertions(+), 0 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v1 2/2] vfio: Add VFIO API vfio_msi_get_irq

2014-11-20 Thread Feng Wu
This API returns the host irq for the MSI/MSI-X interrrupts.

Signed-off-by: Feng Wu feng...@intel.com
---
 drivers/vfio/pci/vfio_pci.c |   10 ++
 include/linux/vfio.h|2 ++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 9558da3..4fb9828 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1009,6 +1009,16 @@ put_devs:
kfree(devs.devices);
 }
 
+unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, bool 
msix)
+{
+   struct vfio_pci_device *vdev =
+   (struct vfio_pci_device *)vfio_device_data(device);
+   struct pci_dev *pdev = vdev-pdev;
+
+   return msix ? vdev-msix[vector].vector : pdev-irq + vector;
+}
+EXPORT_SYMBOL_GPL(vfio_msi_get_irq);
+
 static void __exit vfio_pci_cleanup(void)
 {
pci_unregister_driver(vfio_pci_driver);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index d320411..007ca55 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -92,6 +92,8 @@ extern void vfio_unregister_iommu_driver(
 /*
  * External user API
  */
+extern unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector,
+ bool msix);
 extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
 extern void vfio_group_put_external_user(struct vfio_group *group);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO

2014-11-20 Thread Feng Wu
Add new group KVM_DEV_VFIO_INTERRUPT and command
KVM_DEV_VFIO_DEVIE_POSTING_IRQ related to it.

This is used for VT-d Posted-Interrupts setup.

Signed-off-by: Feng Wu feng...@intel.com
---
 Documentation/virtual/kvm/devices/vfio.txt |8 
 include/uapi/linux/kvm.h   |   14 ++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
b/Documentation/virtual/kvm/devices/vfio.txt
index ef51740..bd99176 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -13,6 +13,7 @@ VFIO-group is held by KVM.
 
 Groups:
   KVM_DEV_VFIO_GROUP
+  KVM_DEV_VFIO_INTERRUPT
 
 KVM_DEV_VFIO_GROUP attributes:
   KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
@@ -20,3 +21,10 @@ KVM_DEV_VFIO_GROUP attributes:
 
 For each, kvm_device_attr.addr points to an int32_t file descriptor
 for the VFIO group.
+
+KVM_DEV_VFIO_INTERRUPT attributes:
+  KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ: Set up the interrupt configuration for
+VT-d Posted-Interrrupts
+
+For each, kvm_device_attr.addr points to struct kvm_posted_intr, which
+include the needed information for VT-d Posted-Interrupts setup.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6076882..5544fcc 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -943,9 +943,23 @@ struct kvm_device_attr {
__u64   addr;   /* userspace address of attr data */
 };
 
+struct virq_info {
+   __u32   index;  /* index of the msi/msix entry */
+   int virq;   /* virq of the interrupt */
+};
+
+struct kvm_posted_intr {
+   __u32   fd; /* file descriptor of the VFIO device */
+   __u32   count;
+   boolmsix;
+   struct virq_info virq_info[0];
+};
+
 #define  KVM_DEV_VFIO_GROUP1
 #define   KVM_DEV_VFIO_GROUP_ADD   1
 #define   KVM_DEV_VFIO_GROUP_DEL   2
+#define  KVM_DEV_VFIO_INTERRUPT2
+#define   KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ   1
 
 enum kvm_device_type {
KVM_DEV_TYPE_FSL_MPIC_20= 1,
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions

2014-11-20 Thread Aneesh Kumar K.V
Suresh E. Warrier warr...@linux.vnet.ibm.com writes:

 This patch adds trace points in the guest entry and exit code and also
 for exceptions handled by the host in kernel mode - hypercalls and page
 faults. The new events are added to /sys/kernel/debug/tracing/events
 under a new subsystem called kvm_hv.



   /* Set this explicitly in case thread 0 doesn't have a vcpu */
 @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
  
   vc-vcore_state = VCORE_RUNNING;
   preempt_disable();
 +
 + trace_kvmppc_run_core(vc, 0);
 +
   spin_unlock(vc-lock);

Do we really want to call tracepoint with spin lock held ? Is that a good
thing to do ?. 

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] vfio: put off the allocation of minor in vfio_create_group

2014-11-20 Thread Zhen Lei
The next code fragment list_for_each_entry is not depend on minor. With this
patch, the free of minor in list_for_each_entry can be reduced, and there is
no functional change.

Signed-off-by: Zhen Lei thunder.leiz...@huawei.com
---
 drivers/vfio/vfio.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index f018d8d..737eb468 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -225,22 +225,21 @@ static struct vfio_group *vfio_create_group(struct 
iommu_group *iommu_group)

mutex_lock(vfio.group_lock);

-   minor = vfio_alloc_group_minor(group);
-   if (minor  0) {
-   vfio_group_unlock_and_free(group);
-   return ERR_PTR(minor);
-   }
-
/* Did we race creating this group? */
list_for_each_entry(tmp, vfio.group_list, vfio_next) {
if (tmp-iommu_group == iommu_group) {
vfio_group_get(tmp);
-   vfio_free_group_minor(minor);
vfio_group_unlock_and_free(group);
return tmp;
}
}

+   minor = vfio_alloc_group_minor(group);
+   if (minor  0) {
+   vfio_group_unlock_and_free(group);
+   return ERR_PTR(minor);
+   }
+
dev = device_create(vfio.class, NULL,
MKDEV(MAJOR(vfio.group_devt), minor),
group, %d, iommu_group_id(iommu_group));
--
1.8.0


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds

2014-11-20 Thread Christian Borntraeger
Am 10.11.2014 um 22:07 schrieb Linus Torvalds:
[...]
 So before blacklisting any compilers, let's first see if
 
  (a) we can actually make it a real rule that we only use ACCESS_ONCE on 
 scalars
  (b) we can somehow enforce this with a compiler warning/error for mis-uses
 
 For example, the attached patch works for some cases, but shows how we
 use ACCESS_ONCE() on pointers to pte_t's etc, so it doesn't come even
 close to compiling the whole kernel. But I wonder how painful that
 would be to change.. The places where it complains are actually
 somewhat debatable to begin with, like:
 
  - handle_pte_fault(.. pte_t *pte ..):
 
 entry = ACCESS_ONCE(*pte);
 
 and the thing is, pte is actually possibly an 8-byte entity on
 x86-32, and that ACCESS_ONCE() fundamentally will be two 32-byte
 reads.
 
 So there is a very valid argument for saying well, you shouldn't do
 that, then, and that we might be better off cleaning up our
 ACCESS_ONCE() uses, than to just blindly blacklist compilers.
 
 NOTE! I'm not at all advocating the attached patch. I'm sending it out
 white-space damaged on purpose, it's more of a hey, something like
 this might be the direction we want to go in, with the spinlock.h
 part of the patch also acting as an example of the kind of changes the
 ACCESS_ONCE() only works on scalars rule would require.

So I tried to see if I can come up with some results on how often this problem 
happens...

[...]


 diff --git a/include/linux/compiler.h b/include/linux/compiler.h
 index d5ad7b1118fc..63e82f1dfc1a 100644
 --- a/include/linux/compiler.h
 +++ b/include/linux/compiler.h
 @@ -378,7 +378,11 @@ void ftrace_likely_update(struct
 ftrace_branch_data *f, int val, int expect);
   * use is to mediate communication between process-level code and irq/NMI
   * handlers, all running on the same CPU.
   */
 -#define ACCESS_ONCE(x) (*(volatile typeof(x) *)(x))
 +#define get_scalar_volatile_pointer(x) ({ \
 +   typeof(x) *__p = (x); \
 +   volatile typeof(x) *__vp = __p; \
 +   (void)(long)*__p; __vp; })
 +#define ACCESS_ONCE(x) (*get_scalar_volatile_pointer(x))

..and just took this patch. On s390 is pretty much clean with allyesconfig
In fact with the siif lock changed only the pte/pmd cases you mentioned trigger 
a compile error:

mm/memory.c: In function 'handle_pte_fault':
mm/memory.c:3203:2: error: aggregate value used where an integer was expected
  entry = ACCESS_ONCE(*pte);

mm/rmap.c: In function 'mm_find_pmd':
mm/rmap.c:584:2: error: aggregate value used where an integer was expected
  pmde = ACCESS_ONCE(*pmd);


Here a barrier() might be a good solution as well, I guess.
On x86 allyesconfig its almost the same.
- we need your spinlock changes (well, something different to make it compile)
- we need to fix pmd and pte
- we have gup_get_pte in arch/x86/mm/gup.c getting a ptep

So It looks like we could make a change to ACCESS_ONCE. Would something like

CONFIG_ARCH_SCALAR_ACCESS_ONCE be a good start?

This would boil down to
Patch1: Provide stricter ACCESS_ONCE if CONFIG_ARCH_SCALAR_ACCESS_ONCE is set + 
docu update + comments
Patch2: Change mm/* to barriers
Patch3: Change x86 locks
Patch4: Change x86 gup
Patch4: Enable CONFIG_ARCH_SCALAR_ACCESS_ONCE for s390x and x86

Makes sense?

Christian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions

2014-11-20 Thread Alexander Graf


On 19.11.14 22:54, Suresh E. Warrier wrote:
 
 
 On 11/14/2014 04:56 AM, Alexander Graf wrote:



 Am 14.11.2014 um 00:29 schrieb Suresh E. Warrier 
 warr...@linux.vnet.ibm.com:

 This patch adds trace points in the guest entry and exit code and also
 for exceptions handled by the host in kernel mode - hypercalls and page
 faults. The new events are added to /sys/kernel/debug/tracing/events
 under a new subsystem called kvm_hv.

 Acked-by: Paul Mackerras pau...@samba.org
 Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
 ---
 arch/powerpc/kvm/book3s_64_mmu_hv.c |  12 +-
 arch/powerpc/kvm/book3s_hv.c|  19 ++
 arch/powerpc/kvm/trace_hv.h | 497 
 
 3 files changed, 525 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/kvm/trace_hv.h

 diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
 b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 index 70feb7b..20cbad1 100644
 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
 +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
 @@ -38,6 +38,7 @@
 #include asm/cputable.h

 #include book3s_hv_cma.h
 +#include trace_hv.h

 /* POWER7 has 10-bit LPIDs, PPC970 has 6-bit LPIDs */
 #define MAX_LPID_97063
 @@ -627,6 +628,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
gfn = gpa  PAGE_SHIFT;
memslot = gfn_to_memslot(kvm, gfn);

 +trace_kvm_page_fault_enter(vcpu, hpte, memslot, ea, dsisr);
 +
/* No memslot means it's an emulated MMIO region */
if (!memslot || (memslot-flags  KVM_MEMSLOT_INVALID))
return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea,
 @@ -639,6 +642,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
mmu_seq = kvm-mmu_notifier_seq;
smp_rmb();

 +ret = -EFAULT;
is_io = 0;
pfn = 0;
page = NULL;
 @@ -662,7 +666,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
}
up_read(current-mm-mmap_sem);
if (!pfn)
 -return -EFAULT;
 +goto out_put;
} else {
page = pages[0];
if (PageHuge(page)) {
 @@ -690,14 +694,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
pfn = page_to_pfn(page);
}

 -ret = -EFAULT;
if (psize  pte_size)
goto out_put;

/* Check WIMG vs. the actual page we're accessing */
if (!hpte_cache_flags_ok(r, is_io)) {
if (is_io)
 -return -EFAULT;
 +goto out_put;
 +
/*
 * Allow guest to map emulated device memory as
 * uncacheable, but actually make it cacheable.
 @@ -753,6 +757,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
SetPageDirty(page);

  out_put:
 +trace_kvm_page_fault_exit(vcpu, hpte, ret);
 +
if (page) {
/*
 * We drop pages[0] here, not page because page might
 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
 index 69d4085..5143d17 100644
 --- a/arch/powerpc/kvm/book3s_hv.c
 +++ b/arch/powerpc/kvm/book3s_hv.c
 @@ -57,6 +57,9 @@

 #include book3s.h

 +#define CREATE_TRACE_POINTS
 +#include trace_hv.h
 +
 /* #define EXIT_DEBUG */
 /* #define EXIT_DEBUG_SIMPLE */
 /* #define EXIT_DEBUG_INT */
 @@ -1679,6 +1682,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) {
kvmppc_start_thread(vcpu);
kvmppc_create_dtl_entry(vcpu, vc);
 +trace_kvm_guest_enter(vcpu);
}

/* Set this explicitly in case thread 0 doesn't have a vcpu */
 @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)

vc-vcore_state = VCORE_RUNNING;
preempt_disable();
 +
 +trace_kvmppc_run_core(vc, 0);
 +
spin_unlock(vc-lock);

kvm_guest_enter();
 @@ -1732,6 +1739,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
kvmppc_core_pending_dec(vcpu))
kvmppc_core_dequeue_dec(vcpu);

 +trace_kvm_guest_exit(vcpu);
 +
ret = RESUME_GUEST;
if (vcpu-arch.trap)
ret = kvmppc_handle_exit_hv(vcpu-arch.kvm_run, vcpu,
 @@ -1757,6 +1766,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
wake_up(vcpu-arch.cpu_run);
}
}
 +
 +trace_kvmppc_run_core(vc, 1);
 }

 /*
 @@ -1783,11 +1794,13 @@ static void kvmppc_vcore_blocked(struct 
 kvmppc_vcore *vc)

prepare_to_wait(vc-wq, wait, TASK_INTERRUPTIBLE);
vc-vcore_state = VCORE_SLEEPING;
 +trace_kvmppc_vcore_blocked(vc, 0);
spin_unlock(vc-lock);
schedule();
finish_wait(vc-wq, wait);
spin_lock(vc-lock);
vc-vcore_state = VCORE_INACTIVE;
 +trace_kvmppc_vcore_blocked(vc, 1);
 }

 static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 @@ -1796,6 +1809,8 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
 struct kvm_vcpu *vcpu)
struct kvmppc_vcore *vc;
struct kvm_vcpu *v, *vn;

 +

Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions

2014-11-20 Thread Alexander Graf


On 20.11.14 11:40, Aneesh Kumar K.V wrote:
 Suresh E. Warrier warr...@linux.vnet.ibm.com writes:
 
 This patch adds trace points in the guest entry and exit code and also
 for exceptions handled by the host in kernel mode - hypercalls and page
 faults. The new events are added to /sys/kernel/debug/tracing/events
 under a new subsystem called kvm_hv.
 
 
 
  /* Set this explicitly in case thread 0 doesn't have a vcpu */
 @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
  
  vc-vcore_state = VCORE_RUNNING;
  preempt_disable();
 +
 +trace_kvmppc_run_core(vc, 0);
 +
  spin_unlock(vc-lock);
 
 Do we really want to call tracepoint with spin lock held ? Is that a good
 thing to do ?. 

I thought it was safe to call tracepoints inside of spin lock regions?
Steve?


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: ia64: remove

2014-11-20 Thread Radim Krčmář
2014-11-19 22:05+0100, Paolo Bonzini:
 KVM for ia64 has been marked as broken not just once, but twice even,
 and the last patch from the maintainer is now roughly 5 years old.
 Time for it to rest in piece.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---

Nice, if only every diffstat was like that!

I propose another removal.
(The reasoning below wasn't confirmed with ia64 compiler.
 I'd remove the ioctls even if they worked.)

---8---
KVM: remove buggy ia64 specific ioctls

IA64 is no longer present so new applications shouldn't use them.

The main problem is that they most likely didn't work even before,
because we have misused ioctl

  #define KVM_SET_GUEST_DEBUG   _IOW(KVMIO,  0x9b, struct kvm_guest_debug)
  #define KVM_IA64_VCPU_SET_STACK   _IOW(KVMIO,  0x9b, void *)

as

  struct kvm_guest_debug {
__u32 control;
__u32 pad;
struct kvm_guest_debug_arch arch;
  };

and

  struct kvm_guest_debug_arch {
  };

mean that

  sizeof(struct kvm_guest_debug) == sizeof(void *) == 8

thus

  KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK

and KVM_SET_GUEST_DEBUG is handled before KVM_IA64_VCPU_SET_STACK.

Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 include/uapi/linux/kvm.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6d59e5b..a37fd12 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1099,9 +1099,6 @@ struct kvm_s390_ucas_mapping {
 #define KVM_X86_SETUP_MCE _IOW(KVMIO,  0x9c, __u64)
 #define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO,  0x9d, __u64)
 #define KVM_X86_SET_MCE   _IOW(KVMIO,  0x9e, struct kvm_x86_mce)
-/* IA64 stack access */
-#define KVM_IA64_VCPU_GET_STACK   _IOR(KVMIO,  0x9a, void *)
-#define KVM_IA64_VCPU_SET_STACK   _IOW(KVMIO,  0x9b, void *)
 /* Available with KVM_CAP_VCPU_EVENTS */
 #define KVM_GET_VCPU_EVENTS   _IOR(KVMIO,  0x9f, struct kvm_vcpu_events)
 #define KVM_SET_VCPU_EVENTS   _IOW(KVMIO,  0xa0, struct kvm_vcpu_events)
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/

2014-11-20 Thread Paolo Bonzini
ia64 does not need them anymore.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 arch/x86/include/asm/kvm_host.h   | 16 
 arch/x86/kvm/Makefile |  5 ++---
 {virt = arch/x86}/kvm/ioapic.c   |  0
 {virt = arch/x86}/kvm/ioapic.h   |  1 -
 {virt = arch/x86}/kvm/irq_comm.c |  4 ++--
 arch/x86/kvm/x86.c|  1 +
 include/linux/kvm_host.h  | 22 --
 virt/kvm/eventfd.c|  7 ---
 virt/kvm/kvm_main.c   |  3 ---
 9 files changed, 29 insertions(+), 30 deletions(-)
 rename {virt = arch/x86}/kvm/ioapic.c (100%)
 rename {virt = arch/x86}/kvm/ioapic.h (98%)
 rename {virt = arch/x86}/kvm/irq_comm.c (98%)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 769db36a3001..76ff3e2d8fd2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -603,6 +603,9 @@ struct kvm_arch {
 
struct kvm_xen_hvm_config xen_hvm_config;
 
+   /* reads protected by irq_srcu, writes by irq_lock */
+   struct hlist_head mask_notifier_list;
+
/* fields used by HYPER-V emulation */
u64 hv_guest_os_id;
u64 hv_hypercall;
@@ -819,6 +822,19 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
  const void *val, int bytes);
 u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn);
 
+struct kvm_irq_mask_notifier {
+   void (*func)(struct kvm_irq_mask_notifier *kimn, bool masked);
+   int irq;
+   struct hlist_node link;
+};
+
+void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq,
+   struct kvm_irq_mask_notifier *kimn);
+void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq,
+ struct kvm_irq_mask_notifier *kimn);
+void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin,
+bool mask);
+
 extern bool tdp_enabled;
 
 u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 25d22b2d6509..ee1cd92b03be 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,14 +7,13 @@ CFLAGS_vmx.o := -I.
 
 KVM := ../../../virt/kvm
 
-kvm-y  += $(KVM)/kvm_main.o $(KVM)/ioapic.o \
-   $(KVM)/coalesced_mmio.o $(KVM)/irq_comm.o \
+kvm-y  += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
 kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)+= $(KVM)/assigned-dev.o $(KVM)/iommu.o
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 
 kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
-  i8254.o cpuid.o pmu.o
+  i8254.o ioapic.o irq_comm.o cpuid.o pmu.o
 kvm-intel-y+= vmx.o
 kvm-amd-y  += svm.o
 
diff --git a/virt/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
similarity index 100%
rename from virt/kvm/ioapic.c
rename to arch/x86/kvm/ioapic.c
diff --git a/virt/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
similarity index 98%
rename from virt/kvm/ioapic.h
rename to arch/x86/kvm/ioapic.h
index dc3baa3a538f..deac8d509f2a 100644
--- a/virt/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -96,7 +96,6 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
struct kvm_lapic_irq *irq, unsigned long *dest_map);
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
 int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
-void kvm_vcpu_request_scan_ioapic(struct kvm *kvm);
 void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap,
u32 *tmr);
 
diff --git a/virt/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
similarity index 98%
rename from virt/kvm/irq_comm.c
rename to arch/x86/kvm/irq_comm.c
index 1345bde064f5..e9c135b639aa 100644
--- a/virt/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -234,7 +234,7 @@ void kvm_register_irq_mask_notifier(struct kvm *kvm, int 
irq,
 {
mutex_lock(kvm-irq_lock);
kimn-irq = irq;
-   hlist_add_head_rcu(kimn-link, kvm-mask_notifier_list);
+   hlist_add_head_rcu(kimn-link, kvm-arch.mask_notifier_list);
mutex_unlock(kvm-irq_lock);
 }
 
@@ -256,7 +256,7 @@ void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned 
irqchip, unsigned pin,
idx = srcu_read_lock(kvm-irq_srcu);
gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin);
if (gsi != -1)
-   hlist_for_each_entry_rcu(kimn, kvm-mask_notifier_list, link)
+   hlist_for_each_entry_rcu(kimn, kvm-arch.mask_notifier_list, 
link)
if (kimn-irq == gsi)
kimn-func(kimn, mask);
srcu_read_unlock(kvm-irq_srcu, idx);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a8f53a6960fd..5337039427c8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ 

Re: [PATCH] KVM: ia64: remove

2014-11-20 Thread Paolo Bonzini


On 20/11/2014 14:38, Radim Krčmář wrote:
 I propose another removal.
 (The reasoning below wasn't confirmed with ia64 compiler.
 I'd remove the ioctls even if they worked.)

Checked and applied, thanks.

Paolo

 ---8---
 KVM: remove buggy ia64 specific ioctls
 
 IA64 is no longer present so new applications shouldn't use them.
 
 The main problem is that they most likely didn't work even before,
 because we have misused ioctl
 
   #define KVM_SET_GUEST_DEBUG   _IOW(KVMIO,  0x9b, struct kvm_guest_debug)
   #define KVM_IA64_VCPU_SET_STACK   _IOW(KVMIO,  0x9b, void *)
 
 as
 
   struct kvm_guest_debug {
   __u32 control;
   __u32 pad;
   struct kvm_guest_debug_arch arch;
   };
 
 and
 
   struct kvm_guest_debug_arch {
   };
 
 mean that
 
   sizeof(struct kvm_guest_debug) == sizeof(void *) == 8
 
 thus
 
   KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK
 
 and KVM_SET_GUEST_DEBUG is handled before KVM_IA64_VCPU_SET_STACK.
 
 Signed-off-by: Radim Krčmář rkrc...@redhat.com
 ---
  include/uapi/linux/kvm.h | 3 ---
  1 file changed, 3 deletions(-)
 
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 6d59e5b..a37fd12 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -1099,9 +1099,6 @@ struct kvm_s390_ucas_mapping {
  #define KVM_X86_SETUP_MCE _IOW(KVMIO,  0x9c, __u64)
  #define KVM_X86_GET_MCE_CAP_SUPPORTED _IOR(KVMIO,  0x9d, __u64)
  #define KVM_X86_SET_MCE   _IOW(KVMIO,  0x9e, struct kvm_x86_mce)
 -/* IA64 stack access */
 -#define KVM_IA64_VCPU_GET_STACK   _IOR(KVMIO,  0x9a, void *)
 -#define KVM_IA64_VCPU_SET_STACK   _IOW(KVMIO,  0x9b, void *)
  /* Available with KVM_CAP_VCPU_EVENTS */
  #define KVM_GET_VCPU_EVENTS   _IOR(KVMIO,  0x9f, struct kvm_vcpu_events)
  #define KVM_SET_VCPU_EVENTS   _IOW(KVMIO,  0xa0, struct kvm_vcpu_events)
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions

2014-11-20 Thread Steven Rostedt
On Thu, 20 Nov 2014 13:10:12 +0100
Alexander Graf ag...@suse.de wrote:

 
 
 On 20.11.14 11:40, Aneesh Kumar K.V wrote:
  Suresh E. Warrier warr...@linux.vnet.ibm.com writes:
  
  This patch adds trace points in the guest entry and exit code and also
  for exceptions handled by the host in kernel mode - hypercalls and page
  faults. The new events are added to /sys/kernel/debug/tracing/events
  under a new subsystem called kvm_hv.
  
  
  
 /* Set this explicitly in case thread 0 doesn't have a vcpu */
  @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
   
 vc-vcore_state = VCORE_RUNNING;
 preempt_disable();
  +
  +  trace_kvmppc_run_core(vc, 0);
  +
 spin_unlock(vc-lock);
  
  Do we really want to call tracepoint with spin lock held ? Is that a good
  thing to do ?. 
 
 I thought it was safe to call tracepoints inside of spin lock regions?
 Steve?
 

There's tracepoints in the guts of the scheduler where rq lock is held.
Don't worry about it. The tracing system is lockless.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 01/19] vfio/platform: initial skeleton of VFIO support for platform devices

2014-11-20 Thread Antonios Motakis
On Wed, Nov 12, 2014 at 5:49 PM, Alex Williamson
alex.william...@redhat.com wrote:

 On Wed, 2014-11-12 at 11:05 +0100, Eric Auger wrote:
  Hi Antonios,
 
  On 10/27/2014 07:07 PM, Antonios Motakis wrote:
   This patch forms the common skeleton code for platform devices support
   with VFIO. This will include the core functionality of VFIO_PLATFORM,
   however binding to the device and discovering the device resources will
   be done with the help of a separate file where any Linux platform bus
   specific code will reside.
  
   This will allow us to implement support for also discovering AMBA devices
   and their resources, but still reuse a large part of the VFIO_PLATFORM
   implementation.
  
   Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
   ---
drivers/vfio/platform/vfio_platform_common.c  | 126 
   ++
drivers/vfio/platform/vfio_platform_private.h |  36 
2 files changed, 162 insertions(+)
create mode 100644 drivers/vfio/platform/vfio_platform_common.c
create mode 100644 drivers/vfio/platform/vfio_platform_private.h
  
   diff --git a/drivers/vfio/platform/vfio_platform_common.c 
   b/drivers/vfio/platform/vfio_platform_common.c
   new file mode 100644
   index 000..e0fdbc8
   --- /dev/null
   +++ b/drivers/vfio/platform/vfio_platform_common.c
   @@ -0,0 +1,126 @@
   +/*
   + * Copyright (C) 2013 - Virtual Open Systems
   + * Author: Antonios Motakis a.mota...@virtualopensystems.com
   + *
   + * This program is free software; you can redistribute it and/or modify
   + * it under the terms of the GNU General Public License, version 2, as
   + * published by the Free Software Foundation.
   + *
   + * This program is distributed in the hope that it will be useful,
   + * but WITHOUT ANY WARRANTY; without even the implied warranty of
   + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   + * GNU General Public License for more details.
   + */
   +
   +#include linux/device.h
   +#include linux/interrupt.h
   +#include linux/iommu.h
   +#include linux/module.h
   +#include linux/mutex.h
   +#include linux/notifier.h
   +#include linux/pm_runtime.h
   +#include linux/slab.h
   +#include linux/types.h
   +#include linux/uaccess.h
   +#include linux/vfio.h
   +#include linux/io.h
  not sure at that state all the above includes are needed.
   +
   +#include vfio_platform_private.h
   +
   +static void vfio_platform_release(void *device_data)
   +{
   +   module_put(THIS_MODULE);
   +}
   +
   +static int vfio_platform_open(void *device_data)
   +{
   +   if (!try_module_get(THIS_MODULE))
   +   return -ENODEV;
   +
   +   return 0;
   +}
   +
   +static long vfio_platform_ioctl(void *device_data,
   +  unsigned int cmd, unsigned long arg)
  a minor style comment/question that applies on all the series. Shouldn't
  subsequent argument lines rather be aligned with the first char after
  (, as done in PCI code?

 It's also my preferred style to indent to just after the open paren on
 wrapped lines where possible, but I don't think there are hard rules in
 CodingStyle or checkpatch that enforce this, so I often let it slide.
 Thanks,


You're right that there are no hard coding style rules for this, but I
also like this style so I'll apply it more consistently.

 Alex

   +{
   +   if (cmd == VFIO_DEVICE_GET_INFO)
   +   return -EINVAL;
   +
   +   else if (cmd == VFIO_DEVICE_GET_REGION_INFO)
   +   return -EINVAL;
   +
   +   else if (cmd == VFIO_DEVICE_GET_IRQ_INFO)
   +   return -EINVAL;
   +
   +   else if (cmd == VFIO_DEVICE_SET_IRQS)
   +   return -EINVAL;
   +
   +   else if (cmd == VFIO_DEVICE_RESET)
   +   return -EINVAL;
   +
   +   return -ENOTTY;
   +}
   +
   +static ssize_t vfio_platform_read(void *device_data, char __user *buf,
   + size_t count, loff_t *ppos)
   +{
   +   return -EINVAL;
   +}
   +
   +static ssize_t vfio_platform_write(void *device_data, const char __user 
   *buf,
   +  size_t count, loff_t *ppos)
   +{
   +   return -EINVAL;
   +}
   +
   +static int vfio_platform_mmap(void *device_data, struct vm_area_struct 
   *vma)
   +{
   +   return -EINVAL;
   +}
   +
   +static const struct vfio_device_ops vfio_platform_ops = {
   +   .name   = vfio-platform,
   +   .open   = vfio_platform_open,
   +   .release= vfio_platform_release,
   +   .ioctl  = vfio_platform_ioctl,
   +   .read   = vfio_platform_read,
   +   .write  = vfio_platform_write,
   +   .mmap   = vfio_platform_mmap,
   +};
   +
   +int vfio_platform_probe_common(struct vfio_platform_device *vdev,
   +  struct device *dev)
   +{
   +   struct iommu_group *group;
   +   int ret;
   +
   +   if (!vdev)
   +   return -EINVAL;
   +
   +   group = iommu_group_get(dev);
   +   if (!group) {
   +   pr_err(VFIO: No 

Re: [PATCH v9 06/19] vfio/platform: return info for bound device

2014-11-20 Thread Antonios Motakis
On Wed, Nov 12, 2014 at 5:36 PM, Alex Williamson
alex.william...@redhat.com wrote:
 On Wed, 2014-11-12 at 11:32 +0100, Eric Auger wrote:
 On 10/27/2014 07:07 PM, Antonios Motakis wrote:
  A VFIO userspace driver will start by opening the VFIO device
  that corresponds to an IOMMU group, and will use the ioctl interface
  to get the basic device info, such as number of memory regions and
  interrupts, and their properties. This patch enables the
  VFIO_DEVICE_GET_INFO ioctl call.
 
  Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
  ---
   drivers/vfio/platform/vfio_platform_common.c | 23 ---
   1 file changed, 20 insertions(+), 3 deletions(-)
 
  diff --git a/drivers/vfio/platform/vfio_platform_common.c 
  b/drivers/vfio/platform/vfio_platform_common.c
  index e0fdbc8..cb20526 100644
  --- a/drivers/vfio/platform/vfio_platform_common.c
  +++ b/drivers/vfio/platform/vfio_platform_common.c
  @@ -43,10 +43,27 @@ static int vfio_platform_open(void *device_data)
   static long vfio_platform_ioctl(void *device_data,
 unsigned int cmd, unsigned long arg)
   {
  -   if (cmd == VFIO_DEVICE_GET_INFO)
  -   return -EINVAL;
  +   struct vfio_platform_device *vdev = device_data;
  +   unsigned long minsz;
  +
  +   if (cmd == VFIO_DEVICE_GET_INFO) {
  +   struct vfio_device_info info;
  +
  +   minsz = offsetofend(struct vfio_device_info, num_irqs);
  +
  +   if (copy_from_user(info, (void __user *)arg, minsz))
  +   return -EFAULT;
  +
  +   if (info.argsz  minsz)
  +   return -EINVAL;
  +
  +   info.flags = vdev-flags;
  +   info.num_regions = 0;
  +   info.num_irqs = 0;
 Seems a bit weird to me to enable the modality but returning zeroed
 values. Shouldn't we put that patch after VFIO_DEVICE_GET_REGION_INFO
 and VFIO_DEVICE_GET_IRQ_INFO ones?

 I actually like how Antonios has started from a base framework, exposing
 a device but none of the resources and then incrementally adds each
 component.  It's also a good showcase of the VFIO ABI that we can do
 things like this.  Thanks,

I also agree with Alex with this. But of course I'm not married with
any particular splitting style, in case we decide to change this.


 Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 03/19] vfio: platform: add the VFIO PLATFORM module to Kconfig

2014-11-20 Thread Antonios Motakis
On Thu, Nov 13, 2014 at 9:05 AM, Hongbo Zhang hongbo.zh...@linaro.org wrote:
 On 12 November 2014 17:57, Antonios Motakis
 a.mota...@virtualopensystems.com wrote:
 Hello Hongbo,

 On Wed, Nov 12, 2014 at 10:52 AM, Hongbo Zhang hongbo.zh...@linaro.org 
 wrote:
 On 28 October 2014 02:07, Antonios Motakis
 a.mota...@virtualopensystems.com wrote:

 Enable building the VFIO PLATFORM driver that allows to use Linux platform
 devices with VFIO.

 Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
 ---
  drivers/vfio/Kconfig   | 1 +
  drivers/vfio/Makefile  | 1 +
  drivers/vfio/platform/Kconfig  | 9 +
  drivers/vfio/platform/Makefile | 4 
  4 files changed, 15 insertions(+)
  create mode 100644 drivers/vfio/platform/Kconfig
  create mode 100644 drivers/vfio/platform/Makefile

 diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
 index a0abe04..962fb80 100644
 --- a/drivers/vfio/Kconfig
 +++ b/drivers/vfio/Kconfig
 @@ -27,3 +27,4 @@ menuconfig VFIO
   If you don't know what to do here, say N.

  source drivers/vfio/pci/Kconfig
 +source drivers/vfio/platform/Kconfig
 diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
 index 0b035b1..dadf0ca 100644
 --- a/drivers/vfio/Makefile
 +++ b/drivers/vfio/Makefile
 @@ -3,3 +3,4 @@ obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
  obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
  obj-$(CONFIG_VFIO_PCI) += pci/
 +obj-$(CONFIG_VFIO_PLATFORM) += platform/
 diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
 new file mode 100644
 index 000..c51af17
 --- /dev/null
 +++ b/drivers/vfio/platform/Kconfig
 @@ -0,0 +1,9 @@
 +config VFIO_PLATFORM
 +   tristate VFIO support for platform devices
 +   depends on VFIO  EVENTFD  ARM

 Hi Antonios,
 Is this only for ARM? how about X86 and PowerPC?
 On Freescale's PowerPC platform, the IOMMU is called PAMU (Peripheral
 Access Management Unit), and I am trying to use this VFIO framework on
 it.


 In principle it should be working on any platform with such devices;
 as long as you have a VFIO IOMMU driver for the PAMU (on ARM we use
 VFIO PLATFORM for the device, with VFIO IOMMU TYPE1 for the IOMMU).


 Antonios,
 As far as you know, on which ARM platform can I apply your patches directly?
 My purpose is to apply you patches[1], and then implement a user space
 driver to evaluate the performance.


In principle, if your target has a working IOMMU in front of a
platform or AMBA device, then you should be able to use this.

In practice, I have tested this on various fast models, and less
extensively on Arndale in the past. Linaro as far as I know has tested
this series on Calxeda devices with an xgmac NIC.

 [1]  It is better without manually merging conflicts/dependencies etc,
 I am vfio-platform user, not a iommu expert.

 So if you have a suitable IOMMU driver for your target, feel free to
 test it, and let us know of the results.


 +   help
 + Support for platform devices with VFIO. This is required to make
 + use of platform devices present on the system using the VFIO
 + framework.
 +
 + If you don't know what to do here, say N.
 diff --git a/drivers/vfio/platform/Makefile 
 b/drivers/vfio/platform/Makefile
 new file mode 100644
 index 000..279862b
 --- /dev/null
 +++ b/drivers/vfio/platform/Makefile
 @@ -0,0 +1,4 @@
 +
 +vfio-platform-y := vfio_platform.o vfio_platform_common.o
 +
 +obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
 --
 2.1.1

 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 06/19] vfio/platform: return info for bound device

2014-11-20 Thread Eric Auger
On 11/20/2014 03:10 PM, Antonios Motakis wrote:
 On Wed, Nov 12, 2014 at 5:36 PM, Alex Williamson
 alex.william...@redhat.com wrote:
 On Wed, 2014-11-12 at 11:32 +0100, Eric Auger wrote:
 On 10/27/2014 07:07 PM, Antonios Motakis wrote:
 A VFIO userspace driver will start by opening the VFIO device
 that corresponds to an IOMMU group, and will use the ioctl interface
 to get the basic device info, such as number of memory regions and
 interrupts, and their properties. This patch enables the
 VFIO_DEVICE_GET_INFO ioctl call.

 Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
 ---
  drivers/vfio/platform/vfio_platform_common.c | 23 ---
  1 file changed, 20 insertions(+), 3 deletions(-)

 diff --git a/drivers/vfio/platform/vfio_platform_common.c 
 b/drivers/vfio/platform/vfio_platform_common.c
 index e0fdbc8..cb20526 100644
 --- a/drivers/vfio/platform/vfio_platform_common.c
 +++ b/drivers/vfio/platform/vfio_platform_common.c
 @@ -43,10 +43,27 @@ static int vfio_platform_open(void *device_data)
  static long vfio_platform_ioctl(void *device_data,
unsigned int cmd, unsigned long arg)
  {
 -   if (cmd == VFIO_DEVICE_GET_INFO)
 -   return -EINVAL;
 +   struct vfio_platform_device *vdev = device_data;
 +   unsigned long minsz;
 +
 +   if (cmd == VFIO_DEVICE_GET_INFO) {
 +   struct vfio_device_info info;
 +
 +   minsz = offsetofend(struct vfio_device_info, num_irqs);
 +
 +   if (copy_from_user(info, (void __user *)arg, minsz))
 +   return -EFAULT;
 +
 +   if (info.argsz  minsz)
 +   return -EINVAL;
 +
 +   info.flags = vdev-flags;
 +   info.num_regions = 0;
 +   info.num_irqs = 0;
 Seems a bit weird to me to enable the modality but returning zeroed
 values. Shouldn't we put that patch after VFIO_DEVICE_GET_REGION_INFO
 and VFIO_DEVICE_GET_IRQ_INFO ones?

 I actually like how Antonios has started from a base framework, exposing
 a device but none of the resources and then incrementally adds each
 component.  It's also a good showcase of the VFIO ABI that we can do
 things like this.  Thanks,
 
 I also agree with Alex with this. But of course I'm not married with
 any particular splitting style, in case we decide to change this.

Hi Antonios,
please keep as is. I also learn each day about splitting style ;-)
Best Regards
Eric
 

 Alex


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/6] ARM64: KVM: PMU infrastructure support

2014-11-20 Thread Anup Patel
On Wed, Nov 19, 2014 at 8:59 PM, Christoffer Dall
christoffer.d...@linaro.org wrote:
 On Tue, Nov 11, 2014 at 02:48:25PM +0530, Anup Patel wrote:
 Hi All,

 I have second thoughts about rebasing KVM PMU patches
 to Marc's irq-forwarding patches.

 The PMU IRQs (when virtualized by KVM) are not exactly
 forwarded IRQs because they are shared between Host
 and Guest.

 Scenario1
 -

 We might have perf running on Host and no KVM guest
 running. In this scenario, we wont get interrupts on Host
 because the kvm_pmu_hyp_init() (similar to the function
 kvm_timer_hyp_init() of Marc's IRQ-forwarding
 implementation) has put all host PMU IRQs in forwarding
 mode.

 The only way solve this problem is to not set forwarding
 mode for PMU IRQs in kvm_pmu_hyp_init() and instead
 have special routines to turn on and turn off the forwarding
 mode of PMU IRQs. These routines will be called from
 kvm_arch_vcpu_ioctl_run() for toggling the PMU IRQ
 forwarding state.

 Scenario2
 -

 We might have perf running on Host and Guest simultaneously
 which means it is quite likely that PMU HW trigger IRQ meant
 for Host between ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 and kvm_pmu_sync_hwstate(vcpu); (similar to timer sync routine
 of Marc's patchset which is called before local_irq_enable()).

 In this scenario, the updated kvm_pmu_sync_hwstate(vcpu)
 will accidentally forward IRQ meant for Host to Guest unless
 we put additional checks to inspect VCPU PMU state.

 Am I missing any detail about IRQ forwarding for above
 scenarios?

 Hi Anup,

Hi Christoffer,


 I briefly discussed this with Marc.  What I don't understand is how it
 would be possible to get an interrupt for the host while running the
 guest?

 The rationale behind my question is that whenever you're running the
 guest, the PMU should be programmed exclusively with guest state, and
 since the PMU is per core, any interrupts should be for the guest, where
 it would always be pending.

Yes, thats right PMU is programmed exclusively for guest when
guest is running and for host when host is running.

Let us assume a situation (Scenario2 mentioned previously)
where both host and guest are using PMU. When the guest is
running we come back to host mode due to variety of reasons
(stage2 fault, guest IO, regular host interrupt, host interrupt
meant for guest, ) which means we will return from the
ret = kvm_call_hyp(__kvm_vcpu_run, vcpu); statement in the
kvm_arch_vcpu_ioctl_run() function with local IRQs disabled.
At this point we would have restored back host PMU context and
any PMU counter used by host can trigger PMU overflow interrup
for host. Now we will be having kvm_pmu_sync_hwstate(vcpu);
in the kvm_arch_vcpu_ioctl_run() function (similar to the
kvm_timer_sync_hwstate() of Marc's IRQ forwarding patchset)
which will try to detect PMU irq forwarding state in GIC hence it
can accidentally discover PMU irq pending for guest while this
PMU irq is actually meant for host.

This above mentioned situation does not happen for timer
because virtual timer interrupts are exclusively used for guest.
The exclusive use of virtual timer interrupt for guest ensures that
the function kvm_timer_sync_hwstate() will always see correct
state of virtual timer IRQ from GIC.


 When migrating a VM with a pending PMU interrupt away for a CPU core, we
 also capture the active state (the forwarding patches already handle
 this), and obviously the PMU state along with it.

Yes, the migration of PMU state and PMU interrupt state is
quite clear.


 Does this address your concern?

I hope above description give you idea about the concern
raised by me.


 -Christoffer

Regards,
Anup
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] vfio: put off the allocation of minor in vfio_create_group

2014-11-20 Thread Alex Williamson
On Thu, 2014-11-20 at 19:25 +0800, Zhen Lei wrote:
 The next code fragment list_for_each_entry is not depend on minor. With 
 this
 patch, the free of minor in list_for_each_entry can be reduced, and there 
 is
 no functional change.

A reasonable micro-optimization, but I'm curious if you're actually
seeing some measurable overhead from this.  It seems like we'd need to
have multiple devices, all within the same IOMMU group, all probed by
vfio-pci at the same time to exercise the race condition.  Thanks,

Alex

 Signed-off-by: Zhen Lei thunder.leiz...@huawei.com
 ---
  drivers/vfio/vfio.c | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)
 
 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index f018d8d..737eb468 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -225,22 +225,21 @@ static struct vfio_group *vfio_create_group(struct 
 iommu_group *iommu_group)
 
   mutex_lock(vfio.group_lock);
 
 - minor = vfio_alloc_group_minor(group);
 - if (minor  0) {
 - vfio_group_unlock_and_free(group);
 - return ERR_PTR(minor);
 - }
 -
   /* Did we race creating this group? */
   list_for_each_entry(tmp, vfio.group_list, vfio_next) {
   if (tmp-iommu_group == iommu_group) {
   vfio_group_get(tmp);
 - vfio_free_group_minor(minor);
   vfio_group_unlock_and_free(group);
   return tmp;
   }
   }
 
 + minor = vfio_alloc_group_minor(group);
 + if (minor  0) {
 + vfio_group_unlock_and_free(group);
 + return ERR_PTR(minor);
 + }
 +
   dev = device_create(vfio.class, NULL,
   MKDEV(MAJOR(vfio.group_devt), minor),
   group, %d, iommu_group_id(iommu_group));
 --
 1.8.0
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO

2014-11-20 Thread Alex Williamson
On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote:
 Add new group KVM_DEV_VFIO_INTERRUPT and command
 KVM_DEV_VFIO_DEVIE_POSTING_IRQ related to it.
 
 This is used for VT-d Posted-Interrupts setup.

Eric proposed an interface for ARM forwarded interrupts[1] using group
KVM_DEV_VFIO_DEVICE with attributes KVM_DEV_VFIO_DEVICE_ASSIGN_IRQ and
KVM_DEV_VFIO_DEVICE_DEASSIGN_IRQ.  Why are we proposing yet another
group and attributes here?  Why can't we re-use the ones Eric proposes?

[1] https://lkml.org/lkml/2014/8/25/258

 Signed-off-by: Feng Wu feng...@intel.com
 ---
  Documentation/virtual/kvm/devices/vfio.txt |8 
  include/uapi/linux/kvm.h   |   14 ++
  2 files changed, 22 insertions(+), 0 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
 b/Documentation/virtual/kvm/devices/vfio.txt
 index ef51740..bd99176 100644
 --- a/Documentation/virtual/kvm/devices/vfio.txt
 +++ b/Documentation/virtual/kvm/devices/vfio.txt
 @@ -13,6 +13,7 @@ VFIO-group is held by KVM.
  
  Groups:
KVM_DEV_VFIO_GROUP
 +  KVM_DEV_VFIO_INTERRUPT
  
  KVM_DEV_VFIO_GROUP attributes:
KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
 @@ -20,3 +21,10 @@ KVM_DEV_VFIO_GROUP attributes:
  
  For each, kvm_device_attr.addr points to an int32_t file descriptor
  for the VFIO group.
 +
 +KVM_DEV_VFIO_INTERRUPT attributes:
 +  KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ: Set up the interrupt configuration for
 +VT-d Posted-Interrrupts
 +
 +For each, kvm_device_attr.addr points to struct kvm_posted_intr, which
 +include the needed information for VT-d Posted-Interrupts setup.
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 6076882..5544fcc 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -943,9 +943,23 @@ struct kvm_device_attr {
   __u64   addr;   /* userspace address of attr data */
  };
  
 +struct virq_info {
 + __u32   index;  /* index of the msi/msix entry */
 + int virq;   /* virq of the interrupt */
 +};
 +
 +struct kvm_posted_intr {
 + __u32   fd; /* file descriptor of the VFIO device */
 + __u32   count;
 + boolmsix;

Note that MSI-X (as opposed to MSI) is a PCI concept.  Being a VFIO
interface this should operate at VFIO IRQ index and sub-index.

 + struct virq_info virq_info[0];
 +};
 +
  #define  KVM_DEV_VFIO_GROUP  1
  #define   KVM_DEV_VFIO_GROUP_ADD 1
  #define   KVM_DEV_VFIO_GROUP_DEL 2
 +#define  KVM_DEV_VFIO_INTERRUPT  2
 +#define   KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ 1
  
  enum kvm_device_type {
   KVM_DEV_TYPE_FSL_MPIC_20= 1,



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v1 1/2] vfio: Add new interrupt group for VFIO

2014-11-20 Thread Eric Auger
On 11/20/2014 04:53 PM, Alex Williamson wrote:
 On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote:
 Add new group KVM_DEV_VFIO_INTERRUPT and command
 KVM_DEV_VFIO_DEVIE_POSTING_IRQ related to it.

 This is used for VT-d Posted-Interrupts setup.
 
 Eric proposed an interface for ARM forwarded interrupts[1] using group
 KVM_DEV_VFIO_DEVICE with attributes KVM_DEV_VFIO_DEVICE_ASSIGN_IRQ and
 KVM_DEV_VFIO_DEVICE_DEASSIGN_IRQ.  Why are we proposing yet another
 group and attributes here?  Why can't we re-use the ones Eric proposes?

Hi Alex, Feng

I share your point of view about the KVM_DEV_VFIO_DEVICE group. For the
attribute (renamed KVM_DEV_VFIO_DEVICE_FORWARD_IRQ in RFC v2) the issue
is I specify the kvm_device_attr.addr points to a kvm_arch_forwarded_irq
struct. Feng needs another struct - kvm_posted_intr -. An alternative is
to merge both structs if it makes sense.

Best Regards

Eric
 
 [1] https://lkml.org/lkml/2014/8/25/258
 
 Signed-off-by: Feng Wu feng...@intel.com
 ---
  Documentation/virtual/kvm/devices/vfio.txt |8 
  include/uapi/linux/kvm.h   |   14 ++
  2 files changed, 22 insertions(+), 0 deletions(-)

 diff --git a/Documentation/virtual/kvm/devices/vfio.txt 
 b/Documentation/virtual/kvm/devices/vfio.txt
 index ef51740..bd99176 100644
 --- a/Documentation/virtual/kvm/devices/vfio.txt
 +++ b/Documentation/virtual/kvm/devices/vfio.txt
 @@ -13,6 +13,7 @@ VFIO-group is held by KVM.
  
  Groups:
KVM_DEV_VFIO_GROUP
 +  KVM_DEV_VFIO_INTERRUPT
  
  KVM_DEV_VFIO_GROUP attributes:
KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
 @@ -20,3 +21,10 @@ KVM_DEV_VFIO_GROUP attributes:
  
  For each, kvm_device_attr.addr points to an int32_t file descriptor
  for the VFIO group.
 +
 +KVM_DEV_VFIO_INTERRUPT attributes:
 +  KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ: Set up the interrupt configuration for
 +VT-d Posted-Interrrupts
 +
 +For each, kvm_device_attr.addr points to struct kvm_posted_intr, which
 +include the needed information for VT-d Posted-Interrupts setup.
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 6076882..5544fcc 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -943,9 +943,23 @@ struct kvm_device_attr {
  __u64   addr;   /* userspace address of attr data */
  };
  
 +struct virq_info {
 +__u32   index;  /* index of the msi/msix entry */
 +int virq;   /* virq of the interrupt */
 +};
 +
 +struct kvm_posted_intr {
 +__u32   fd; /* file descriptor of the VFIO device */
 +__u32   count;
 +boolmsix;
 
 Note that MSI-X (as opposed to MSI) is a PCI concept.  Being a VFIO
 interface this should operate at VFIO IRQ index and sub-index.
 
 +struct virq_info virq_info[0];
 +};
 +
  #define  KVM_DEV_VFIO_GROUP 1
  #define   KVM_DEV_VFIO_GROUP_ADD1
  #define   KVM_DEV_VFIO_GROUP_DEL2
 +#define  KVM_DEV_VFIO_INTERRUPT 2
 +#define   KVM_DEV_VFIO_INTERRUPT_POSTING_IRQ1
  
  enum kvm_device_type {
  KVM_DEV_TYPE_FSL_MPIC_20= 1,
 
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v1 2/2] vfio: Add VFIO API vfio_msi_get_irq

2014-11-20 Thread Alex Williamson
On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote:
 This API returns the host irq for the MSI/MSI-X interrrupts.
 
 Signed-off-by: Feng Wu feng...@intel.com
 ---
  drivers/vfio/pci/vfio_pci.c |   10 ++
  include/linux/vfio.h|2 ++
  2 files changed, 12 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
 index 9558da3..4fb9828 100644
 --- a/drivers/vfio/pci/vfio_pci.c
 +++ b/drivers/vfio/pci/vfio_pci.c
 @@ -1009,6 +1009,16 @@ put_devs:
   kfree(devs.devices);
  }
  
 +unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, bool 
 msix)
 +{
 + struct vfio_pci_device *vdev =
 + (struct vfio_pci_device *)vfio_device_data(device);
 + struct pci_dev *pdev = vdev-pdev;
 +
 + return msix ? vdev-msix[vector].vector : pdev-irq + vector;
 +}
 +EXPORT_SYMBOL_GPL(vfio_msi_get_irq);
 +

It's difficult to see what you're going to do with this, but it's making
the assumption that the vfio_device is a vfio_pci_device, which is the
wrong starting point.  Eric proposed[1] some VFIO external user
interface extensions, that have since been refined in discussion that
allows the external user (KVM in this case) access to the struct device
where it can get the IRQ vectors itself.  We either need to follow that
same path for posted interrupts, teaching KVM about struct pci_device in
order to extract the IRQ info, or we need to create another vfio
external user interface extension that can return IRQ info regardless of
the device type.  Thanks,

Alex


[1] https://lkml.org/lkml/2014/8/25/257

  static void __exit vfio_pci_cleanup(void)
  {
   pci_unregister_driver(vfio_pci_driver);
 diff --git a/include/linux/vfio.h b/include/linux/vfio.h
 index d320411..007ca55 100644
 --- a/include/linux/vfio.h
 +++ b/include/linux/vfio.h
 @@ -92,6 +92,8 @@ extern void vfio_unregister_iommu_driver(
  /*
   * External user API
   */
 +extern unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector,
 +   bool msix);
  extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
  extern void vfio_group_put_external_user(struct vfio_group *group);
  extern int vfio_external_user_iommu_id(struct vfio_group *group);



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v1 2/2] vfio: Add VFIO API vfio_msi_get_irq

2014-11-20 Thread Eric Auger
On 11/20/2014 05:05 PM, Alex Williamson wrote:
 On Thu, 2014-11-20 at 17:05 +0800, Feng Wu wrote:
 This API returns the host irq for the MSI/MSI-X interrrupts.

 Signed-off-by: Feng Wu feng...@intel.com
 ---
  drivers/vfio/pci/vfio_pci.c |   10 ++
  include/linux/vfio.h|2 ++
  2 files changed, 12 insertions(+), 0 deletions(-)

 diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
 index 9558da3..4fb9828 100644
 --- a/drivers/vfio/pci/vfio_pci.c
 +++ b/drivers/vfio/pci/vfio_pci.c
 @@ -1009,6 +1009,16 @@ put_devs:
  kfree(devs.devices);
  }
  
 +unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector, bool 
 msix)
 +{
 +struct vfio_pci_device *vdev =
 +(struct vfio_pci_device *)vfio_device_data(device);
 +struct pci_dev *pdev = vdev-pdev;
 +
 +return msix ? vdev-msix[vector].vector : pdev-irq + vector;
 +}
 +EXPORT_SYMBOL_GPL(vfio_msi_get_irq);
 +
 
 It's difficult to see what you're going to do with this, but it's making
 the assumption that the vfio_device is a vfio_pci_device, which is the
 wrong starting point.  Eric proposed[1] some VFIO external user
 interface extensions, that have since been refined in discussion that
 allows the external user (KVM in this case) access to the struct device
 where it can get the IRQ vectors itself.  We either need to follow that
 same path for posted interrupts, teaching KVM about struct pci_device in
 order to extract the IRQ info, or we need to create another vfio
 external user interface extension that can return IRQ info regardless of
 the device type.  Thanks,
 
 Alex
 
 
 [1] https://lkml.org/lkml/2014/8/25/257

Here is the v2: https://lkml.org/lkml/2014/9/1/347

Best Regards

Eric
 http://lwn.net/Articles/610087/
  static void __exit vfio_pci_cleanup(void)
  {
  pci_unregister_driver(vfio_pci_driver);
 diff --git a/include/linux/vfio.h b/include/linux/vfio.h
 index d320411..007ca55 100644
 --- a/include/linux/vfio.h
 +++ b/include/linux/vfio.h
 @@ -92,6 +92,8 @@ extern void vfio_unregister_iommu_driver(
  /*
   * External user API
   */
 +extern unsigned int vfio_msi_get_irq(struct vfio_device *device, int vector,
 +  bool msix);
  extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
  extern void vfio_group_put_external_user(struct vfio_group *group);
  extern int vfio_external_user_iommu_id(struct vfio_group *group);
 
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM causes #GP on XRSTORS

2014-11-20 Thread Nadav Amit
Fenghua,

I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears to 
get a #GP when it is trying to load the guest FPU.
One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but I 
am not sure it is the only problem.
Was KVM ever tested with XRSTORS?

Thanks,
Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can I make this work… (Foundation for accessibility project)

2014-11-20 Thread Eric S. Johansson


On 11/18/2014 9:57 AM, Eric S. Johansson wrote:


That's great to know. I will spin up a version of Windows 7 and give 
it a try given that I'm not looking at it, I can strip it down to the 
barest user interface elements and improve performance significantly.


I tried it and it took me approximately 10 to 12 hours to install 
Windows 7 twice and I didn't even finish installing the last time.


Here's what happened. The first time I installed it, it was a naïve 
install. Took all the defaults just set up the ISO and let the install 
run. Then I installed all the updates. Hours went by and it kind of came 
up and ran but then I tried to install the virt I/O drivers and the 
Windows installation lost its mind. Did some reading on how to make 
performance better and on using the virtio drivers in windows.


So I start of the second install, same size disk 25 GB, same amount of 
RAM, 1 GB and installed the ethernet, disk and balloon drivers at the 
right time. I also changed the cache to none, I/O something to native 
and I think that's about it. Anyway, that was not really any 
improvement. It's still was incredibly slow and this time it was 
complaining about running out of memory and packages install never 
finished. Just kept going and going going.  iptraf reported network io 
ranging from 3kbit to 100kbit range when the updates were running.


I'm accustomed to lesser performance on virtual machines. That's the 
hazard of a running on old and slow laptop  (dell e6400 (2.2ghz core 
duo, 8gb ram)[1]) but even virtual box is not this slow.  So what am I 
doing wrong? It would be nice to use a slow machine like this as many 
handcrips don't have a whole lot of resources for buying newer/faster 
machines. On the other hand, many of them use desktops and work from one 
place whereas someone like me is all over the map (quite literally).


--- eric
[1] Part of the reason I don't bother upgrading machines all that often 
is because it no matter how fast the CPU runs or how much memory I have, 
Windows always runs about the same speed.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Add missing HPTE unlock

2014-11-20 Thread Alexander Graf


On 05.11.14 02:21, Paul Mackerras wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 In kvm_test_clear_dirty(), if we find an invalid HPTE we move on to the
 next HPTE without unlocking the invalid one.  In fact we should never
 find an invalid and unlocked HPTE in the rmap chain, but for robustness
 we should unlock it.  This adds the missing unlock.
 
 Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org

Thanks, applied to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: ptes are big endian

2014-11-20 Thread Alexander Graf


On 03.11.14 16:35, Cédric Le Goater wrote:
 When being restored from qemu, the kvm_get_htab_header are in native
 endian, but the ptes are big endian. 
 
 This patch fixes restore on a KVM LE host. Qemu also needs a fix for
 this :
 
  http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg8.html
 
 Signed-off-by: Cédric Le Goater c...@fr.ibm.com
 Cc: Paul Mackerras pau...@samba.org
 Cc: Alexey Kardashevskiy a...@ozlabs.ru
 Cc: Gregory Kurz gk...@linux.vnet.ibm.com
 
 ---
 
  Tested on 3.17-rc7 with LE and BE host.
 
  
 
  arch/powerpc/kvm/book3s_64_mmu_hv.c |2 ++
  1 file changed, 2 insertions(+)
 
 Index: linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c
 ===
 --- linux-3.18-hv.git.orig/arch/powerpc/kvm/book3s_64_mmu_hv.c
 +++ linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c
 @@ -1542,6 +1542,8 @@ static ssize_t kvm_htab_write(struct fil
   err = -EFAULT;
   if (__get_user(v, lbuf) || __get_user(r, lbuf + 1))
   goto out;
 + v = be64_to_cpu(v);
 + r = be64_to_cpu(r);

This will trigger warnings with sparse. Please introduce new be64
variables that you do get_user on and that you then use as source for v
and r.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/17] RFC: userfault v2

2014-11-20 Thread Andrea Arcangeli
Hi,

On Fri, Oct 31, 2014 at 12:39:32PM -0700, Peter Feiner wrote:
 On Fri, Oct 31, 2014 at 11:29:49AM +0800, zhanghailiang wrote:
  Agreed, but for doing live memory snapshot (VM is running when do 
  snapsphot),
  we have to do this (block the write action), because we have to save the 
  page before it
  is dirtied by writing action. This is the difference, compared to pre-copy 
  migration.
 
 Ah ha, I understand the difference now. I suppose that you have considered
 doing a traditional pre-copy migration (that is, passes over memory saving
 dirty pages, followed by a pause and a final dump of remaining dirty pages) to
 a file. Your approach has the advantage of having the VM pause time bounded by
 the time it takes to handle the userfault and do the write, as opposed to
 pre-copy migration which has a pause time bounded by the time it takes to do
 the final dump of dirty pages, which, in the worst case, is the time it takes
 to dump all of the guest memory!

It sounds really similar issue as live migration, one can implement a
precopy live snapshot, or a precopy+postcopy live snapshot or a pure
postcopy live snapshot.

The decision on the amount of precopy done before engaging postcopy
(zero passes, 1 pass, or more passes) would have similar tradeoffs
too, except instead of having to re-transmit the re-dirtied pages over
the wire, it would need to overwrite them to disk.

The more precopy passes, the longer it takes for the live snapshotting
process to finish and the more I/O there will be (for live migration it'd
be network bandwidth usage instead of amount of I/O), but the shortest
the postcopy runtime will be (and the shorter postcopy runtime is, the
fewer userfaults will end up triggering on writes, in turn reducing
the slowdown and the artificial fault latency introduced to the guest
runtime). But the more precopy passes the more overwriting will happen
during the longer precopy stage and the more overall load there will
be for the host (the otherwise idle part of the host).

For the postcopy live snapshot the wrprotect faults are quite
equivalent to the not-present faults of postcopy live migration logic.

 You could use the old fork  dump trick. Given that the guest's memory is
 backed by private VMA (as of a year ago when I last looked, is always the case
 for QEMU), you can have the kernel do the write protection for you.
 Essentially, you fork Qemu and, in the child process, dump the guest memory
 then exit. If the parent (including the guest) writes to guest memory, then it
 will fault and the kernel will copy the page. 
 
 The fork  dump approach will give you the best performance w.r.t. guest pause
 times (i.e., just pausing for the COW fault handler), but it does have the
 distinct disadvantage of potentially using 2x the guest memory (i.e., if the
 parent process races ahead and writes to all of the pages before you finish 
 the
 dump). To mitigate memory copying, you could madvise MADV_DONTNEED the child
 memory as you copy it.

This is a very good point. fork must be evaluated first because it
literally already provides you a readonly memory snapshot of the guest
memory.

The memory cons mentioned above could lead to both -ENOMEM of too many
guests runs live snapshots at the same time in the same host, unless
overcommit_memory is set to 1 (0 by default). Even then if too many
live snapshots are running in parallel you could hit the OOM killer if
there are just a bit too many faults at the same time, or you could
hit heavy swapping which isn't ideal either.

In fact the -ENOMEM avoidance (with qemu failing) is one of the two
critical reasons why qemu always set the guest memory as
MADV_DONTFORK. But that's not the only reason.

To use the fork() trick you'd need to undo the MADV_DONTFORK first but
that would open another problem: there's a race condition between
fork() O_DIRECT and 4k hardblocksize of virtio-blk. If there's any
read() syscall with O_DIRECT with len=512 while fork() is running
(think if the aio runs in parallel with the live snapshot thread that
forks the child to dump the snapshot) and if the guest writes with the
CPU to any 512 fragment of the same page that is the destination
buffer of the write(len=512) (on two different 512bytes area of the
same guest page) the O_DIRECT write will get lost.

So to use fork we'd need to fix this longstanding race (I tried but in
the end we declared it an userland issue because it's not exploitable
to bypass permissions or corrupt kernel or unrelated memory). Or you'd
need to add locking between the dataplane/aio threads and the live
snapshot thread to ensure no direct-io I/O is ever in-flight while
fork runs.

The O_DIRECT however would only help if it's qemu TCG, if it's KVM
it's not even enough to stop O_DIRECT reads. KVM would use
gup(write=1) from the async-pf all the time... and then the shadow
pagetables would go out of sync (it won't destabilize the host of
course, but the guest memory would be corrupt then and guest would

Re: [PATCH 2/5] KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI

2014-11-20 Thread Alexander Graf


On 03.11.14 05:51, Paul Mackerras wrote:
 From: Mahesh Salgaonkar mah...@linux.vnet.ibm.com
 
 When we get an HMI (hypervisor maintenance interrupt) while in a
 guest, we see that guest enters into paused state.  The reason is, in
 kvmppc_handle_exit_hv it falls through default path and returns to
 host instead of resuming guest.  This causes guest to enter into
 paused state.  HMI is a hypervisor only interrupt and it is safe to
 resume the guest since the host has handled it already.  This patch
 adds a switch case to resume the guest.
 
 Without this patch we see guest entering into paused state with following
 console messages:
 
 [ 3003.329351] Severe Hypervisor Maintenance interrupt [Recovered]
 [ 3003.329356]  Error detail: Timer facility experienced an error
 [ 3003.329359]HMER: 0840
 [ 3003.329360]TFMR: 4a12000980a84000
 [ 3003.329366] vcpu c007c35094c0 (40):
 [ 3003.329368] pc  = c00c2ba0  msr = 80009032  trap = e60
 [ 3003.329370] r 0 = c021ddc0  r16 = 0046
 [ 3003.329372] r 1 = c0007a02bbd0  r17 = 327d5d98
 [ 3003.329375] r 2 = c10980b8  r18 = 1fc9a0b0
 [ 3003.329377] r 3 = c142d6b8  r19 = c142d6b8
 [ 3003.329379] r 4 = 0002  r20 = 
 [ 3003.329381] r 5 = c524a110  r21 = 
 [ 3003.329383] r 6 = 0001  r22 = 
 [ 3003.329386] r 7 =   r23 = c524a110
 [ 3003.329388] r 8 =   r24 = 0001
 [ 3003.329391] r 9 = 0001  r25 = c0007c31da38
 [ 3003.329393] r10 = c14280b8  r26 = 0002
 [ 3003.329395] r11 = 746f6f6c2f68656c  r27 = c524a110
 [ 3003.329397] r12 = 28004484  r28 = c0007c31da38
 [ 3003.329399] r13 = cfe01400  r29 = 0002
 [ 3003.329401] r14 = 0046  r30 = c3011e00
 [ 3003.329403] r15 = ffba  r31 = 0002
 [ 3003.329404] ctr = c041a670  lr  = c0272520
 [ 3003.329405] srr0 = c007e8d8 srr1 = 90001002
 [ 3003.329406] sprg0 =  sprg1 = cfe01400
 [ 3003.329407] sprg2 = cfe01400 sprg3 = 0005
 [ 3003.329408] cr = 48004482  xer = 2000  dsisr = 4200
 [ 3003.329409] dar = 010015020048
 [ 3003.329410] fault dar = 010015020048 dsisr = 4200
 [ 3003.329411] SLB (8 entries):
 [ 3003.329412]   ESID = c800 VSID = 40016e7779000510
 [ 3003.329413]   ESID = d801 VSID = 400142add1000510
 [ 3003.329414]   ESID = f804 VSID = 4000eb1a81000510
 [ 3003.329415]   ESID = 1f00080b VSID = 40004fda0a000d90
 [ 3003.329416]   ESID = 3f00080c VSID = 400039f536000d90
 [ 3003.329417]   ESID = 180d VSID = 0001251b35150d90
 [ 3003.329417]   ESID = 0100080e VSID = 4001e4609d90
 [ 3003.329418]   ESID = d8000819 VSID = 40013d349c000400
 [ 3003.329419] lpcr = c04881847001 sdr1 = 001b1906 last_inst = 
 
 [ 3003.329421] trap=0xe60 | pc=0xc00c2ba0 | msr=0x80009032
 [ 3003.329524] Severe Hypervisor Maintenance interrupt [Recovered]
 [ 3003.329526]  Error detail: Timer facility experienced an error
 [ 3003.329527]HMER: 0840
 [ 3003.329527]TFMR: 4a12000980a94000
 [ 3006.359786] Severe Hypervisor Maintenance interrupt [Recovered]
 [ 3006.359792]  Error detail: Timer facility experienced an error
 [ 3006.359795]HMER: 0840
 [ 3006.359797]TFMR: 4a12000980a84000
 
  IdName   State
 
  2 guest2 running
  3 guest3 paused
  4 guest4 running
 
 Signed-off-by: Mahesh Salgaonkar mah...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org

Do we need this for PR running on bare metal as well?


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Exposing host debug capabilities to userspace

2014-11-20 Thread Alex Bennée
Hi,

I've almost finished the ARMv8 guest debug support but I have one
problem left to solve. userspace needs to know how many hardware debug
registers are available for GDB to use. This information is available
from the ID_AA64DFR0_EL1 register. Currently I abuse GET_ONE_REG to
fetch it's value however semantically this is poor as it's API is for
getting guest state not host state and they could theoretically have
different values.

So far the options I've examined are:

* KVM ioctl GET_ONE_REG(ID_AA64DFR0_EL1)

As explained above, abusing a guest state API for host configuration.

* ptrace(PTRACE_GETREGSET, NT_ARM_HW_WATCH)

This is used by GDB to access the host details in debug-monitors.
However the ptrace API really wants you to attach to a process before
calling PTRACE_GETREGSET. Currently I've tried attaching to the
thread_id of the vCPU but this fails with EPERM, I suspect because
attaching to your own threads likely upsets the kernel.

* KVM ioctl KVM_GET_DEBUGREGS

This is currently x86 only and looks like it's more aimed at debug
registers than capability stuff. Also I'm not sure what the state of
this ioctl is compared to KVM_SET_GUEST_DEBUG. Do these APIs overlap or
is one an older deprecated x86 only API?

* Export the information via sysfs

I suppose the correct canonical non-subsystem specific way to make this
information available it to expose the data in some sort of sysfs node?
However I don't see any existing sysfs structure for the CPU.

* Expand /proc/cpuinfo

I suspect adding extra text to be badly parsed by userspace is just
horrid and unacceptable behaviour ;-)

* Add another KVM ioctl?

This would have the downside of being specific to KVM and of course
proliferating the API space again.

I'm open to any suggestions and look forward to your valued feedback ;-)

-- 
Alex Bennée
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked

2014-11-20 Thread Alexander Graf


On 03.11.14 05:52, Paul Mackerras wrote:
 From: Suresh E. Warrier warr...@linux.vnet.ibm.com
 
 The kvmppc_vcore_blocked() code does not check for the wait condition
 after putting the process on the wait queue. This means that it is
 possible for an external interrupt to become pending, but the vcpu to
 remain asleep until the next decrementer interrupt.  The fix is to
 make one last check for pending exceptions and ceded state before
 calling schedule().
 
 Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org

I don't understand the race you're fixing here. Can you please explain it?


Alex

 ---
  arch/powerpc/kvm/book3s_hv.c | 20 
  1 file changed, 20 insertions(+)
 
 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
 index cd7e030..1a7a281 100644
 --- a/arch/powerpc/kvm/book3s_hv.c
 +++ b/arch/powerpc/kvm/book3s_hv.c
 @@ -1828,9 +1828,29 @@ static void kvmppc_wait_for_exec(struct kvm_vcpu 
 *vcpu, int wait_state)
   */
  static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
  {
 + struct kvm_vcpu *vcpu;
 + int do_sleep = 1;
 +
   DEFINE_WAIT(wait);
  
   prepare_to_wait(vc-wq, wait, TASK_INTERRUPTIBLE);
 +
 + /*
 +  * Check one last time for pending exceptions and ceded state after
 +  * we put ourselves on the wait queue
 +  */
 + list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) {
 + if (vcpu-arch.pending_exceptions || !vcpu-arch.ceded) {
 + do_sleep = 0;
 + break;
 + }
 + }
 +
 + if (!do_sleep) {
 + finish_wait(vc-wq, wait);
 + return;
 + }
 +
   vc-vcore_state = VCORE_SLEEPING;
   spin_unlock(vc-lock);
   schedule();
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Some fixes for HV KVM on PPC

2014-11-20 Thread Alexander Graf


On 03.11.14 05:51, Paul Mackerras wrote:
 Here are fixes for five bugs which were found in the testing of our
 PowerKVM product.  The bugs range from guest performance issues to
 guest crashes and memory corruption.  Please apply.

Thanks, applied patches 1-4 to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/17] RFC: userfault v2

2014-11-20 Thread Andrea Arcangeli
Hi,

On Thu, Nov 20, 2014 at 10:54:29AM +0800, zhanghailiang wrote:
 Yes, you are right. This is what i really want, bypass all non-present faults
 and only track strict wrprotect faults. ;)
 
 So, do you plan to support that in the userfault API?

Yes I think it's good idea to support wrprotect/COW faults too.

I just wanted to understand if there was any other reason why you
needed only wrprotect faults, because the non-present faults didn't
look like a big performance concern if they triggered in addition to
wrprotect faults, but it's certainly ok to optimize them away so it's
fully optimal.

All it takes to differentiate the behavior should be one more bit
during registration so you can select non-present, wrprotect faults or
both. postcopy live migration would select only non-present faults,
postcopy live snapshot would select only wrprotect faults, anything
like distributed shared memory supporting shared readonly access and
exclusive write access, would select both flags.

I just sent an (unfortunately) longish but way more detailed email
about live snapshotting with userfaultfd but I just wanted to give a
shorter answer here too :).

Thanks,
Andrea
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Mario Smarduch
On 11/20/2014 12:08 AM, Laszlo Ersek wrote:
 On 11/20/14 00:32, Mario Smarduch wrote:
 Hi Laszlo,

 couple observations.

  I'm wondering if access from qemu and guest won't
 result in mixed memory attributes and if that's acceptable
 to the CPU.
 
 Normally this would be a problem I think (Jon raised the topic of live
 migration). However, for flash programming specifically, I think the
 guest's access pattern ensures that we'll see things OK.
 
 When the guest issues the first write access, the memslot is deleted,
 and everything is forwarded to qemu, both reads and writes. In response
 qemu modifies the array that *otherwise* backs the flash. These
 modifications by qemu end up in the dcache mostly. When the guest is
 done programming, it writes a special command (read array mode) at
 which point the memslot is recreated (as read-only) and flushed / set up
 for flushing during demand paging.
 
 So from the emulated flash POV, the memslot either doesn't exist at all
 (and then qemu serves all accesses just fine), or it exists r/o, at
 which point qemu (host userspace) will have stopped writing to it, and
 will have set it up for flushing before and during guest read accesses.

I think beyond consistency, there should be no double mappings with
conflicting attributes at any time or CPU state is undefined. At least
that's what I recall for cases where identity mapping was cacheble and user
mmapp'ed regions uncacheable. Side effects like CPU hardstop or
victim invalidate of dirty cache line. With virtualization
extensions maybe behavior is different. I guess if you're not seeing
lock ups or crashes then it appears to work :) Probably more senior
folks in ARM community are in better position to address this,
but I thought I raise a flag.

 
 Also is if you update memory from qemu you may break
 dirty page logging/migration.
 
 Very probably. Jon said the same thing.
 
 Unless there is some other way
 you keep track. Of course it may not be applicable in your
 case (i.e. flash unused after boot).
 
 The flash *is* used after boot, because the UEFI runtime variable
 services *are* exercised by the guest kernel. However those use the same
 access pattern (it's the same set of UEFI services just called by a
 different client).
 
 *Uncoordinated* access from guest and host in parallel will be a big
 problem; but we're not that far yet, and we need to get the flash
 problem sorted, so that we can at least boot and work on the basic
 stuff. The flash programming dance happens to provide coordination; the
 flash mode changes (which are equivalent to the teardown and the
 recreation of the memslot) can be considered barriers.
 
 I hope this is acceptable for the time being...

Yeah I understand you have a more imediatte requirement to support,
migration
isssue is more fyi. Thanks for the details helps to understand the context.

- Mario
 
 Thanks
 Laszlo
 

 - Mario

 On 11/17/2014 07:49 AM, Laszlo Ersek wrote:
 On 11/17/14 16:29, Paolo Bonzini wrote:


 On 17/11/2014 15:58, Ard Biesheuvel wrote:
 Readonly memslots are often used to implement emulation of ROMs and
 NOR flashes, in which case the guest may legally map these regions as
 uncached.
 To deal with the incoherency associated with uncached guest mappings,
 treat all readonly memslots as incoherent, and ensure that pages that
 belong to regions tagged as such are flushed to DRAM before being passed
 to the guest.

 On x86, the processor combines the cacheability values from the two
 levels of page tables.  Is there no way to do the same on ARM?

 Combining occurs on ARMv8 too. The Stage1 (guest) mapping is very strict
 (Device non-Gathering, non-Reordering, no Early Write Acknowledgement --
 for EFI_MEMORY_UC), which basically overrides the Stage2 (very lax
 host) memory attributes.

 When qemu writes, as part of emulating the flash programming commands,
 to the RAMBlock that *otherwise* backs the flash range (as a r/o
 memslot), those writes (from host userspace) tend to end up in dcache.

 But, when the guest flips back the flash to romd mode, and tries to read
 back the values from the flash as plain ROM, the dcache is completely
 bypassed due to the strict stage1 mapping, and the guest goes directly
 to DRAM.

 Where qemu's earlier writes are not yet / necessarily visible.

 Please see my original patch (which was incomplete) in the attachment,
 it has a very verbose commit message.

 Anyway, I'll let others explain; they can word it better than I can :)

 FWIW,

 Series
 Reviewed-by: Laszlo Ersek ler...@redhat.com

 I ported this series to a 3.17.0+ based kernel, and tested it. It works
 fine. The ROM-like view of the NOR flash now reflects the previously
 programmed contents.

 Series
 Tested-by: Laszlo Ersek ler...@redhat.com

 Thanks!
 Laszlo



 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


 

--
To unsubscribe from this list: send the 

Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Peter Maydell
On 20 November 2014 18:35, Mario Smarduch m.smard...@samsung.com wrote:
 I think beyond consistency, there should be no double mappings with
 conflicting attributes at any time or CPU state is undefined.

The situation is not so bleak as this. See section B2.9 Mismatched
memory attributes in the ARMv8 ARM ARM (DDI0487A.d), which lays
out in some detail what the results of mismatched attributes are
(generally, you lose ordering or coherency guarantees you might
have hoped to have). They're not pretty, but it's not as bad
as completely UNPREDICTABLE behaviour.

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Mario Smarduch
On 11/20/2014 10:40 AM, Peter Maydell wrote:
 On 20 November 2014 18:35, Mario Smarduch m.smard...@samsung.com wrote:
 I think beyond consistency, there should be no double mappings with
 conflicting attributes at any time or CPU state is undefined.
 
 The situation is not so bleak as this. See section B2.9 Mismatched
 memory attributes in the ARMv8 ARM ARM (DDI0487A.d), which lays
 out in some detail what the results of mismatched attributes are
 (generally, you lose ordering or coherency guarantees you might
 have hoped to have). They're not pretty, but it's not as bad
 as completely UNPREDICTABLE behaviour.
 
 thanks
 -- PMM
 
Hi Peter,
  thanks for digging that up, quite a list to navigate
but it does provide detailed guidance.

- Mario
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked

2014-11-20 Thread Suresh E. Warrier


On 11/20/2014 11:36 AM, Alexander Graf wrote:
 
 
 On 03.11.14 05:52, Paul Mackerras wrote:
 From: Suresh E. Warrier warr...@linux.vnet.ibm.com

 The kvmppc_vcore_blocked() code does not check for the wait condition
 after putting the process on the wait queue. This means that it is
 possible for an external interrupt to become pending, but the vcpu to
 remain asleep until the next decrementer interrupt.  The fix is to
 make one last check for pending exceptions and ceded state before
 calling schedule().

 Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org
 
 I don't understand the race you're fixing here. Can you please explain it?
 

When a virtual interrupt needs to be delivered to the guest, and the
virtual ICS state for the interrupt and virtual ICP state for the VCPU
allow for the VCPU to be immediately interrupted, we
1. Set the BOOK3S_INTERRUPT_EXTERNAL_LEVEL bit in pending_exceptions.
2. Call kvmppc_fast_vcpu_kick_hv(), which checks the wait queue at vcpu-wq
   to wake the VCPU up.

The caller of kvmppc_vcore_blocked() does the check for pending exceptions, but
there is a race condition here and we do need to check again after the VCPU
is put on the wait queue.

-suresh

 
 Alex
 
 ---
  arch/powerpc/kvm/book3s_hv.c | 20 
  1 file changed, 20 insertions(+)

 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
 index cd7e030..1a7a281 100644
 --- a/arch/powerpc/kvm/book3s_hv.c
 +++ b/arch/powerpc/kvm/book3s_hv.c
 @@ -1828,9 +1828,29 @@ static void kvmppc_wait_for_exec(struct kvm_vcpu 
 *vcpu, int wait_state)
   */
  static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
  {
 +struct kvm_vcpu *vcpu;
 +int do_sleep = 1;
 +
  DEFINE_WAIT(wait);
  
  prepare_to_wait(vc-wq, wait, TASK_INTERRUPTIBLE);
 +
 +/*
 + * Check one last time for pending exceptions and ceded state after
 + * we put ourselves on the wait queue
 + */
 +list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) {
 +if (vcpu-arch.pending_exceptions || !vcpu-arch.ceded) {
 +do_sleep = 0;
 +break;
 +}
 +}
 +
 +if (!do_sleep) {
 +finish_wait(vc-wq, wait);
 +return;
 +}
 +
  vc-vcore_state = VCORE_SLEEPING;
  spin_unlock(vc-lock);
  schedule();

 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Jon Masters
On 11/20/2014 01:40 PM, Peter Maydell wrote:
 On 20 November 2014 18:35, Mario Smarduch m.smard...@samsung.com wrote:
 I think beyond consistency, there should be no double mappings with
 conflicting attributes at any time or CPU state is undefined.
 
 The situation is not so bleak as this. See section B2.9 Mismatched
 memory attributes in the ARMv8 ARM ARM (DDI0487A.d), which lays
 out in some detail what the results of mismatched attributes are
 (generally, you lose ordering or coherency guarantees you might
 have hoped to have). They're not pretty, but it's not as bad
 as completely UNPREDICTABLE behaviour.

Quick side note that I did raise exactly this issue with the ARM
Architecture team several years ago (that of missmatched memory
attributes between a guest and hypervisor) and it is a known concern.
I'm personally concerned about a couple of things that I won't go into
here but will followup on what the longer term plan might be.

Jon.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Peter Maydell
On 20 November 2014 19:49, Jon Masters j...@redhat.com wrote:
 On 11/20/2014 01:40 PM, Peter Maydell wrote:
 The situation is not so bleak as this. See section B2.9 Mismatched
 memory attributes in the ARMv8 ARM ARM (DDI0487A.d), which lays
 out in some detail what the results of mismatched attributes are
 (generally, you lose ordering or coherency guarantees you might
 have hoped to have). They're not pretty, but it's not as bad
 as completely UNPREDICTABLE behaviour.

 Quick side note that I did raise exactly this issue with the ARM
 Architecture team several years ago (that of missmatched memory
 attributes between a guest and hypervisor) and it is a known concern.

I think in practice for a well-behaved guest we can arrange
that everything is fine (roughly, the guest has to treat
DMA-capable devices as doing coherent-dma, which we can tell
them to do via DT bindings or ACPI[*], plus the special
case handling we already have for bootup), and naughty guests
will only confuse themselves. But I need to think a bit more
about it (and we should probably write down how it works
somewhere :-)).

[*] We should be able to emulate non-coherent-DMA devices but
would need an extra API from KVM so userspace can do clean
dcache to point of coherency. And in practice I'm not sure
we need to emulate those devices...

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds

2014-11-20 Thread Linus Torvalds
On Thu, Nov 20, 2014 at 3:39 AM, Christian Borntraeger
borntrae...@de.ibm.com wrote:

 So It looks like we could make a change to ACCESS_ONCE. Would something like

 CONFIG_ARCH_SCALAR_ACCESS_ONCE be a good start?

No, if it's just a handful of places to be fixed, let's not add config
options for broken cases.

 This would boil down to
 Patch1: Provide stricter ACCESS_ONCE if CONFIG_ARCH_SCALAR_ACCESS_ONCE is set 
 + docu update + comments
 Patch2: Change mm/* to barriers
 Patch3: Change x86 locks
 Patch4: Change x86 gup
 Patch4: Enable CONFIG_ARCH_SCALAR_ACCESS_ONCE for s390x and x86

Just do patches 2-4 first, and then patch 1 unconditionally.

Obviously you'd need to spread the word on linux-arch to see how bad
it is for other cases, but if other architectures are at all like x86
and s390, and just require a few trivial patches, let's not make this
some config option.

   Linus
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Laszlo Ersek
On 11/20/14 21:10, Peter Maydell wrote:
 On 20 November 2014 19:49, Jon Masters j...@redhat.com wrote:
 On 11/20/2014 01:40 PM, Peter Maydell wrote:
 The situation is not so bleak as this. See section B2.9 Mismatched
 memory attributes in the ARMv8 ARM ARM (DDI0487A.d), which lays
 out in some detail what the results of mismatched attributes are
 (generally, you lose ordering or coherency guarantees you might
 have hoped to have). They're not pretty, but it's not as bad
 as completely UNPREDICTABLE behaviour.

 Quick side note that I did raise exactly this issue with the ARM
 Architecture team several years ago (that of missmatched memory
 attributes between a guest and hypervisor) and it is a known concern.
 
 I think in practice for a well-behaved guest we can arrange
 that everything is fine (roughly, the guest has to treat
 DMA-capable devices as doing coherent-dma, which we can tell
 them to do via DT bindings or ACPI[*], plus the special
 case handling we already have for bootup), and naughty guests
 will only confuse themselves. But I need to think a bit more
 about it (and we should probably write down how it works
 somewhere :-)).
 
 [*] We should be able to emulate non-coherent-DMA devices but
 would need an extra API from KVM so userspace can do clean
 dcache to point of coherency. And in practice I'm not sure
 we need to emulate those devices...

This basically means that virtio transfers should just use normal memory
in the guest (treating virtio transfers as coherent DMA), right?

We're actually doing that in the ArmVirtualizationQemu.dsc build of
edk2, and Things Work Great (TM) in guests on the Mustang.

Thanks
Laszlo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can I make this work… (Foundation for accessibility project)

2014-11-20 Thread Paolo Bonzini


On 20/11/2014 17:28, Eric S. Johansson wrote:
 I'm accustomed to lesser performance on virtual machines. That's the
 hazard of a running on old and slow laptop  (dell e6400 (2.2ghz core
 duo, 8gb ram)[1]) but even virtual box is not this slow.  So what am I
 doing wrong? It would be nice to use a slow machine like this as many
 handcrips don't have a whole lot of resources for buying newer/faster
 machines. On the other hand, many of them use desktops and work from one
 place whereas someone like me is all over the map (quite literally).

How did you start the virtual machine?  Perhaps you're not using KVM but
emulation?  I have a fast machine but slow disk (a NAS on 100 MBit
ethernet) and I can do about 15 automated installations in less than 6
hours.

Are you using libvirt or directly invoking QEMU?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] arm, arm64: KVM: handle potential incoherency of readonly memslots

2014-11-20 Thread Peter Maydell
On 20 November 2014 21:13, Laszlo Ersek ler...@redhat.com wrote:
 On 11/20/14 21:10, Peter Maydell wrote:
 I think in practice for a well-behaved guest we can arrange
 that everything is fine (roughly, the guest has to treat
 DMA-capable devices as doing coherent-dma, which we can tell
 them to do via DT bindings or ACPI[*], plus the special
 case handling we already have for bootup), and naughty guests
 will only confuse themselves. But I need to think a bit more
 about it (and we should probably write down how it works
 somewhere :-)).

 This basically means that virtio transfers should just use normal memory
 in the guest (treating virtio transfers as coherent DMA), right?

Normal *cacheable*, but yes.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM causes #GP on XRSTORS

2014-11-20 Thread Yu, Fenghua
 From: Nadav Amit [mailto:nadav.a...@gmail.com]
 Sent: Thursday, November 20, 2014 8:34 AM
 To: Paolo Bonzini; Yu, Fenghua
 Cc: kvm list
 Subject: KVM causes #GP on XRSTORS
 
 Fenghua,
 
 I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears
 to get a #GP when it is trying to load the guest FPU.
 One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu,
 but I am not sure it is the only problem.
 Was KVM ever tested with XRSTORS?

In dmesg, do you see xsave: . using compacted form?

If you use kernel noxsaves, does KVM work fine?

Thanks.

-Fenghua
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can I make this work… (Foundation for accessibility project)

2014-11-20 Thread Eric S. Johansson


On 11/20/2014 4:48 PM, Paolo Bonzini wrote:


On 20/11/2014 17:28, Eric S. Johansson wrote:

I'm accustomed to lesser performance on virtual machines. That's the
hazard of a running on old and slow laptop  (dell e6400 (2.2ghz core
duo, 8gb ram)[1]) but even virtual box is not this slow.  So what am I
doing wrong? It would be nice to use a slow machine like this as many
handcrips don't have a whole lot of resources for buying newer/faster
machines. On the other hand, many of them use desktops and work from one
place whereas someone like me is all over the map (quite literally).

How did you start the virtual machine?  Perhaps you're not using KVM but
emulation?  I have a fast machine but slow disk (a NAS on 100 MBit
ethernet) and I can do about 15 automated installations in less than 6
hours.

Are you using libvirt or directly invoking QEMU?


I was using one of the GUIs ( less hand stress than trying to assemble a 
commandline). Unfortunately I'm in Windows 8 right now because I'm 
writing. I'm very sure the GUI was http://virt-manager.org/  I tried a 
different one but it kept telling me I only had QEMU I thought silly 
program, that can't be right. Someday I will not argue with software or 
small electronic boxes. They don't care who wins and they are much more 
stubborn than I am.


I'll be able to run some tests in about 2 to 3 hours after I finish this 
document. Let me know what I should look at?  on a side note, a pointer 
to an automated install process would be wonderful.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: PPC: Book3S HV: ptes are big endian

2014-11-20 Thread Cédric Le Goater
When being restored from qemu, the kvm_get_htab_header are in native
endian, but the ptes are big endian. 

This patch fixes restore on a KVM LE host. Qemu also needs a fix for
this :

 http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg8.html

Signed-off-by: Cédric Le Goater c...@fr.ibm.com
Cc: Paul Mackerras pau...@samba.org
Cc: Alexey Kardashevskiy a...@ozlabs.ru
Cc: Gregory Kurz gk...@linux.vnet.ibm.com

---

Tested on 3.18-rc5 with LE and BE host.

v2: add be64 local variables to be friendly with sparse
 

 arch/powerpc/kvm/book3s_64_mmu_hv.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c
===
--- linux-3.18-hv.git.orig/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1539,9 +1539,15 @@ static ssize_t kvm_htab_write(struct fil
hptp = (__be64 *)(kvm-arch.hpt_virt + (i * HPTE_SIZE));
lbuf = (unsigned long __user *)buf;
for (j = 0; j  hdr.n_valid; ++j) {
+   __be64 hpte_v;
+   __be64 hpte_r;
+
err = -EFAULT;
-   if (__get_user(v, lbuf) || __get_user(r, lbuf + 1))
+   if (__get_user(hpte_v, lbuf) ||
+   __get_user(hpte_r, lbuf + 1))
goto out;
+   v = be64_to_cpu(hpte_v);
+   r = be64_to_cpu(hpte_r);
err = -EINVAL;
if (!(v  HPTE_V_VALID))
goto out;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM causes #GP on XRSTORS

2014-11-20 Thread Wanpeng Li
Hi Nadav,
On Thu, Nov 20, 2014 at 06:34:04PM +0200, Nadav Amit wrote:
Fenghua,

I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears to 
get a #GP when it is trying to load the guest FPU.
One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but I 
am not sure it is the only problem.
Was KVM ever tested with XRSTORS?

Current kvm and qemu use standard format and xsaves/xrstors should use
compact format, in addition, vmx is still not enabled for xsaves/xrstors 
in kvm.

Regards,
Wanpeng Li 


Thanks,
Nadav

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM causes #GP on XRSTORS

2014-11-20 Thread Nadav Amit

 On Nov 21, 2014, at 02:10, Wanpeng Li wanpeng...@linux.intel.com wrote:
 
 Hi Nadav,
 On Thu, Nov 20, 2014 at 06:34:04PM +0200, Nadav Amit wrote:
 Fenghua,
 
 I got KVM (v3.17) crashing on a machine that supports XRSTORS - It appears 
 to get a #GP when it is trying to load the guest FPU.
 One reason for the #GP is that XCOMP_BV[63] is zeroed on the guest_fpu, but 
 I am not sure it is the only problem.
 Was KVM ever tested with XRSTORS?
 
 Current kvm and qemu use standard format and xsaves/xrstors should use
 compact format, in addition, vmx is still not enabled for xsaves/xrstors 
 in kvm.

Sorry, but I don’t quite understand. Should KVM work on a machine that supports 
xsaves/xrstors?
I am not referring to whether KVM exposes the capability to the guest, but to 
whether KVM should work at all.

Thanks,
Nadav


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] vfio: put off the allocation of minor in vfio_create_group

2014-11-20 Thread leizhen
On 2014/11/20 23:37, Alex Williamson wrote:
 On Thu, 2014-11-20 at 19:25 +0800, Zhen Lei wrote:
 The next code fragment list_for_each_entry is not depend on minor. With 
 this
 patch, the free of minor in list_for_each_entry can be reduced, and 
 there is
 no functional change.
 
 A reasonable micro-optimization, but I'm curious if you're actually
 seeing some measurable overhead from this.  It seems like we'd need to

Oh, I just found this by code review. It's good to put pure check first and
resource allocation later, if they are order independent. Isn't it?
I think it will not improve performance but only save a little code space
and looks feel better.

 have multiple devices, all within the same IOMMU group, all probed by
 vfio-pci at the same time to exercise the race condition.  Thanks,

I think vfio_create_group can not be invoked frequently. So, the performance of 
this function
is not a major consideration, we don't care slightly performance change.

 
 Alex
 
 Signed-off-by: Zhen Lei thunder.leiz...@huawei.com
 ---
  drivers/vfio/vfio.c | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)

 diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
 index f018d8d..737eb468 100644
 --- a/drivers/vfio/vfio.c
 +++ b/drivers/vfio/vfio.c
 @@ -225,22 +225,21 @@ static struct vfio_group *vfio_create_group(struct 
 iommu_group *iommu_group)

  mutex_lock(vfio.group_lock);

 -minor = vfio_alloc_group_minor(group);
 -if (minor  0) {
 -vfio_group_unlock_and_free(group);
 -return ERR_PTR(minor);
 -}
 -
  /* Did we race creating this group? */
  list_for_each_entry(tmp, vfio.group_list, vfio_next) {
  if (tmp-iommu_group == iommu_group) {
  vfio_group_get(tmp);
 -vfio_free_group_minor(minor);
  vfio_group_unlock_and_free(group);
  return tmp;
  }
  }

 +minor = vfio_alloc_group_minor(group);
 +if (minor  0) {
 +vfio_group_unlock_and_free(group);
 +return ERR_PTR(minor);
 +}
 +
  dev = device_create(vfio.class, NULL,
  MKDEV(MAJOR(vfio.group_devt), minor),
  group, %d, iommu_group_id(iommu_group));
 --
 1.8.0


 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
 
 
 
 .
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: x86: fix access memslots w/o hold srcu read lock

2014-11-20 Thread Wanpeng Li
Hi Tang,
On Fri, Nov 21, 2014 at 02:41:57PM +0800, Tang Chen wrote:
Hi Wanpeng,

Sorry, it is about this problem again.

I booted 3.18.0-rc2, without Paolo's patch. lockdep and RCU debug
were all opened.

Then I started a qemu vm with the following options:

/usr/libexec/qemu-kvm -hda rhel7.0ga-x64.qcow2 -m 512M -cpu
host,-x2apic -serial stdio

I added printk() in kvm_vcpu_reload_apic_access_page(), and it was
printed out.
So I think I can confirm that I have run into
kvm_vcpu_reload_apic_access_page().

But I still didn't see any warning.

Is there anything else I should do ?
Would you please share your qemu command with me ?

I test it on the other guy's Ivytown and take advantage of the qemu command 
line which he used, so I forget the accurate command line which used that day.

Paolo also reproduce the bug, Paolo, ping.

Regards,
Wanpeng Li 


Thanks. :)

On 11/14/2014 07:39 AM, Wanpeng Li wrote:
Hi Tang,
On Tue, Nov 11, 2014 at 01:35:29PM +0800, Tang Chen wrote:
Hi Wanpeng,

Sorry for the late.

I think I have totally missed this thread.
I opened lockdep and RCU debug, and tried on 3.18-rc1. But I didn't
get the warning.
I also opened lockdep and RCU debug, and tried 3.18.0-rc2 on a Ivy
bridge, the warning will be triggered after run qemu immediately. There
is no need to try any hotplug related stuff.

In addition, Paolo's patch is merged upstream to fix this.

commit a73896cb5bbdce672945745db8224352a689f580
Author: Paolo Bonzini pbonz...@redhat.com
Date:   Sun Nov 2 07:54:30 2014 +0100

KVM: vmx: defer load of APIC access page address during reset

Regards,
Wanpeng Li

My steps are:

1. Use numactl to bind a qemu process to node1.
2. Offline all node1 memory. And the qemu process is still running.

Would you please tell me how did you reproduce it ?

Thanks.

On 11/02/2014 03:07 PM, Wanpeng Li wrote:
The srcu read lock must be held while accessing memslots (e.g.
when using gfn_to_* functions), however, commit c24ae0dcd3e8
(kvm: x86: Unpin and remove kvm_arch-apic_access_page) call
gfn_to_page() in kvm_vcpu_reload_apic_access_page() w/o hold it in
vmx_vcpu_reset() path which leads to suspicious rcu_dereference_check()
usage warning. This patch fix it by holding srcu read lock in all
kvm_vcpu_reset() call path.


[ INFO: suspicious RCU usage. ]
3.18.0-rc2-test2+ #70 Not tainted
---
include/linux/kvm_host.h:474 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
1 lock held by qemu-system-x86/2371:
  #0:  (vcpu-mutex){+.+...}, at: [a037d800] vcpu_load+0x20/0xd0 
 [kvm]

stack backtrace:
CPU: 4 PID: 2371 Comm: qemu-system-x86 Not tainted 3.18.0-rc2-test2+ #70
Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013
  0001 880209983ca8 816f514f 
  8802099b8990 880209983cd8 810bd687 000fee00
  880208a2c000 880208a1 88020ef50040 880209983d08
Call Trace:
  [816f514f] dump_stack+0x4e/0x71
  [810bd687] lockdep_rcu_suspicious+0xe7/0x120
  [a037d055] gfn_to_memslot+0xd5/0xe0 [kvm]
  [a03807d3] __gfn_to_pfn+0x33/0x60 [kvm]
  [a0380885] gfn_to_page+0x25/0x90 [kvm]
  [a038aeec] kvm_vcpu_reload_apic_access_page+0x3c/0x80 [kvm]
  [a08f0a9c] vmx_vcpu_reset+0x20c/0x460 [kvm_intel]
  [a039ab8e] kvm_vcpu_reset+0x15e/0x1b0 [kvm]
  [a039ac0c] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
  [a037f7e0] kvm_vm_ioctl+0x1d0/0x780 [kvm]
  [810bc664] ? __lock_is_held+0x54/0x80
  [812231f0] do_vfs_ioctl+0x300/0x520
  [8122ee45] ? __fget+0x5/0x250
  [8122f0fa] ? __fget_light+0x2a/0xe0
  [81223491] SyS_ioctl+0x81/0xa0
  [816fed6d] system_call_fastpath+0x16/0x1b

Reported-by: Takashi Iwai ti...@suse.de
Reported-by: Alexei Starovoitov alexei.starovoi...@gmail.com
Suggested-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com
---
v3 - v4:
  * bypass the problem altoghter by kvm_make_request
v2 - v3:
  * take care all vmx_vcpu_reset call path
v1 - v2:
  * just fix hold the srcu read lock in vmx_vcpu_reset path

  arch/x86/kvm/vmx.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a0f78db..3e556c6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4579,7 +4579,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, 0);
}
-   kvm_vcpu_reload_apic_access_page(vcpu);
+   kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
if (vmx_vm_has_apicv(vcpu-kvm))
memset(vmx-pi_desc, 0, sizeof(struct pi_desc));
.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/17] RFC: userfault v2

2014-11-20 Thread zhanghailiang

On 2014/11/21 1:38, Andrea Arcangeli wrote:

Hi,

On Thu, Nov 20, 2014 at 10:54:29AM +0800, zhanghailiang wrote:

Yes, you are right. This is what i really want, bypass all non-present faults
and only track strict wrprotect faults. ;)

So, do you plan to support that in the userfault API?


Yes I think it's good idea to support wrprotect/COW faults too.



Great! Then i can expect your patches. ;)


I just wanted to understand if there was any other reason why you
needed only wrprotect faults, because the non-present faults didn't
look like a big performance concern if they triggered in addition to
wrprotect faults, but it's certainly ok to optimize them away so it's
fully optimal.



Er, you have got the answer, no special, it's only for optimality.


All it takes to differentiate the behavior should be one more bit
during registration so you can select non-present, wrprotect faults or
both. postcopy live migration would select only non-present faults,
postcopy live snapshot would select only wrprotect faults, anything
like distributed shared memory supporting shared readonly access and
exclusive write access, would select both flags.



It is really flexible in this way.


I just sent an (unfortunately) longish but way more detailed email
about live snapshotting with userfaultfd but I just wanted to give a
shorter answer here too :).



Thanks for your explanation, and your patience. It is really useful,
now i know more details about why 'fork  dump live snapshot' scenario
is not acceptable. Thanks :)


Thanks,
Andrea

.




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions

2014-11-20 Thread Aneesh Kumar K.V
Suresh E. Warrier warr...@linux.vnet.ibm.com writes:

 This patch adds trace points in the guest entry and exit code and also
 for exceptions handled by the host in kernel mode - hypercalls and page
 faults. The new events are added to /sys/kernel/debug/tracing/events
 under a new subsystem called kvm_hv.



   /* Set this explicitly in case thread 0 doesn't have a vcpu */
 @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
  
   vc-vcore_state = VCORE_RUNNING;
   preempt_disable();
 +
 + trace_kvmppc_run_core(vc, 0);
 +
   spin_unlock(vc-lock);

Do we really want to call tracepoint with spin lock held ? Is that a good
thing to do ?. 

-aneesh

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions

2014-11-20 Thread Alexander Graf


On 20.11.14 11:40, Aneesh Kumar K.V wrote:
 Suresh E. Warrier warr...@linux.vnet.ibm.com writes:
 
 This patch adds trace points in the guest entry and exit code and also
 for exceptions handled by the host in kernel mode - hypercalls and page
 faults. The new events are added to /sys/kernel/debug/tracing/events
 under a new subsystem called kvm_hv.
 
 
 
  /* Set this explicitly in case thread 0 doesn't have a vcpu */
 @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
  
  vc-vcore_state = VCORE_RUNNING;
  preempt_disable();
 +
 +trace_kvmppc_run_core(vc, 0);
 +
  spin_unlock(vc-lock);
 
 Do we really want to call tracepoint with spin lock held ? Is that a good
 thing to do ?. 

I thought it was safe to call tracepoints inside of spin lock regions?
Steve?


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions

2014-11-20 Thread Steven Rostedt
On Thu, 20 Nov 2014 13:10:12 +0100
Alexander Graf ag...@suse.de wrote:

 
 
 On 20.11.14 11:40, Aneesh Kumar K.V wrote:
  Suresh E. Warrier warr...@linux.vnet.ibm.com writes:
  
  This patch adds trace points in the guest entry and exit code and also
  for exceptions handled by the host in kernel mode - hypercalls and page
  faults. The new events are added to /sys/kernel/debug/tracing/events
  under a new subsystem called kvm_hv.
  
  
  
 /* Set this explicitly in case thread 0 doesn't have a vcpu */
  @@ -1687,6 +1691,9 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
   
 vc-vcore_state = VCORE_RUNNING;
 preempt_disable();
  +
  +  trace_kvmppc_run_core(vc, 0);
  +
 spin_unlock(vc-lock);
  
  Do we really want to call tracepoint with spin lock held ? Is that a good
  thing to do ?. 
 
 I thought it was safe to call tracepoints inside of spin lock regions?
 Steve?
 

There's tracepoints in the guts of the scheduler where rq lock is held.
Don't worry about it. The tracing system is lockless.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Add missing HPTE unlock

2014-11-20 Thread Alexander Graf


On 05.11.14 02:21, Paul Mackerras wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 In kvm_test_clear_dirty(), if we find an invalid HPTE we move on to the
 next HPTE without unlocking the invalid one.  In fact we should never
 find an invalid and unlocked HPTE in the rmap chain, but for robustness
 we should unlock it.  This adds the missing unlock.
 
 Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org

Thanks, applied to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: ptes are big endian

2014-11-20 Thread Alexander Graf


On 03.11.14 16:35, Cédric Le Goater wrote:
 When being restored from qemu, the kvm_get_htab_header are in native
 endian, but the ptes are big endian. 
 
 This patch fixes restore on a KVM LE host. Qemu also needs a fix for
 this :
 
  http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg8.html
 
 Signed-off-by: Cédric Le Goater c...@fr.ibm.com
 Cc: Paul Mackerras pau...@samba.org
 Cc: Alexey Kardashevskiy a...@ozlabs.ru
 Cc: Gregory Kurz gk...@linux.vnet.ibm.com
 
 ---
 
  Tested on 3.17-rc7 with LE and BE host.
 
  
 
  arch/powerpc/kvm/book3s_64_mmu_hv.c |2 ++
  1 file changed, 2 insertions(+)
 
 Index: linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c
 ===
 --- linux-3.18-hv.git.orig/arch/powerpc/kvm/book3s_64_mmu_hv.c
 +++ linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c
 @@ -1542,6 +1542,8 @@ static ssize_t kvm_htab_write(struct fil
   err = -EFAULT;
   if (__get_user(v, lbuf) || __get_user(r, lbuf + 1))
   goto out;
 + v = be64_to_cpu(v);
 + r = be64_to_cpu(r);

This will trigger warnings with sparse. Please introduce new be64
variables that you do get_user on and that you then use as source for v
and r.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI

2014-11-20 Thread Alexander Graf


On 03.11.14 05:51, Paul Mackerras wrote:
 From: Mahesh Salgaonkar mah...@linux.vnet.ibm.com
 
 When we get an HMI (hypervisor maintenance interrupt) while in a
 guest, we see that guest enters into paused state.  The reason is, in
 kvmppc_handle_exit_hv it falls through default path and returns to
 host instead of resuming guest.  This causes guest to enter into
 paused state.  HMI is a hypervisor only interrupt and it is safe to
 resume the guest since the host has handled it already.  This patch
 adds a switch case to resume the guest.
 
 Without this patch we see guest entering into paused state with following
 console messages:
 
 [ 3003.329351] Severe Hypervisor Maintenance interrupt [Recovered]
 [ 3003.329356]  Error detail: Timer facility experienced an error
 [ 3003.329359]HMER: 0840
 [ 3003.329360]TFMR: 4a12000980a84000
 [ 3003.329366] vcpu c007c35094c0 (40):
 [ 3003.329368] pc  = c00c2ba0  msr = 80009032  trap = e60
 [ 3003.329370] r 0 = c021ddc0  r16 = 0046
 [ 3003.329372] r 1 = c0007a02bbd0  r17 = 327d5d98
 [ 3003.329375] r 2 = c10980b8  r18 = 1fc9a0b0
 [ 3003.329377] r 3 = c142d6b8  r19 = c142d6b8
 [ 3003.329379] r 4 = 0002  r20 = 
 [ 3003.329381] r 5 = c524a110  r21 = 
 [ 3003.329383] r 6 = 0001  r22 = 
 [ 3003.329386] r 7 =   r23 = c524a110
 [ 3003.329388] r 8 =   r24 = 0001
 [ 3003.329391] r 9 = 0001  r25 = c0007c31da38
 [ 3003.329393] r10 = c14280b8  r26 = 0002
 [ 3003.329395] r11 = 746f6f6c2f68656c  r27 = c524a110
 [ 3003.329397] r12 = 28004484  r28 = c0007c31da38
 [ 3003.329399] r13 = cfe01400  r29 = 0002
 [ 3003.329401] r14 = 0046  r30 = c3011e00
 [ 3003.329403] r15 = ffba  r31 = 0002
 [ 3003.329404] ctr = c041a670  lr  = c0272520
 [ 3003.329405] srr0 = c007e8d8 srr1 = 90001002
 [ 3003.329406] sprg0 =  sprg1 = cfe01400
 [ 3003.329407] sprg2 = cfe01400 sprg3 = 0005
 [ 3003.329408] cr = 48004482  xer = 2000  dsisr = 4200
 [ 3003.329409] dar = 010015020048
 [ 3003.329410] fault dar = 010015020048 dsisr = 4200
 [ 3003.329411] SLB (8 entries):
 [ 3003.329412]   ESID = c800 VSID = 40016e7779000510
 [ 3003.329413]   ESID = d801 VSID = 400142add1000510
 [ 3003.329414]   ESID = f804 VSID = 4000eb1a81000510
 [ 3003.329415]   ESID = 1f00080b VSID = 40004fda0a000d90
 [ 3003.329416]   ESID = 3f00080c VSID = 400039f536000d90
 [ 3003.329417]   ESID = 180d VSID = 0001251b35150d90
 [ 3003.329417]   ESID = 0100080e VSID = 4001e4609d90
 [ 3003.329418]   ESID = d8000819 VSID = 40013d349c000400
 [ 3003.329419] lpcr = c04881847001 sdr1 = 001b1906 last_inst = 
 
 [ 3003.329421] trap=0xe60 | pc=0xc00c2ba0 | msr=0x80009032
 [ 3003.329524] Severe Hypervisor Maintenance interrupt [Recovered]
 [ 3003.329526]  Error detail: Timer facility experienced an error
 [ 3003.329527]HMER: 0840
 [ 3003.329527]TFMR: 4a12000980a94000
 [ 3006.359786] Severe Hypervisor Maintenance interrupt [Recovered]
 [ 3006.359792]  Error detail: Timer facility experienced an error
 [ 3006.359795]HMER: 0840
 [ 3006.359797]TFMR: 4a12000980a84000
 
  IdName   State
 
  2 guest2 running
  3 guest3 paused
  4 guest4 running
 
 Signed-off-by: Mahesh Salgaonkar mah...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org

Do we need this for PR running on bare metal as well?


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Some fixes for HV KVM on PPC

2014-11-20 Thread Alexander Graf


On 03.11.14 05:51, Paul Mackerras wrote:
 Here are fixes for five bugs which were found in the testing of our
 PowerKVM product.  The bugs range from guest performance issues to
 guest crashes and memory corruption.  Please apply.

Thanks, applied patches 1-4 to kvm-ppc-queue.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked

2014-11-20 Thread Alexander Graf


On 03.11.14 05:52, Paul Mackerras wrote:
 From: Suresh E. Warrier warr...@linux.vnet.ibm.com
 
 The kvmppc_vcore_blocked() code does not check for the wait condition
 after putting the process on the wait queue. This means that it is
 possible for an external interrupt to become pending, but the vcpu to
 remain asleep until the next decrementer interrupt.  The fix is to
 make one last check for pending exceptions and ceded state before
 calling schedule().
 
 Signed-off-by: Suresh Warrier warr...@linux.vnet.ibm.com
 Signed-off-by: Paul Mackerras pau...@samba.org

I don't understand the race you're fixing here. Can you please explain it?


Alex

 ---
  arch/powerpc/kvm/book3s_hv.c | 20 
  1 file changed, 20 insertions(+)
 
 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
 index cd7e030..1a7a281 100644
 --- a/arch/powerpc/kvm/book3s_hv.c
 +++ b/arch/powerpc/kvm/book3s_hv.c
 @@ -1828,9 +1828,29 @@ static void kvmppc_wait_for_exec(struct kvm_vcpu 
 *vcpu, int wait_state)
   */
  static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
  {
 + struct kvm_vcpu *vcpu;
 + int do_sleep = 1;
 +
   DEFINE_WAIT(wait);
  
   prepare_to_wait(vc-wq, wait, TASK_INTERRUPTIBLE);
 +
 + /*
 +  * Check one last time for pending exceptions and ceded state after
 +  * we put ourselves on the wait queue
 +  */
 + list_for_each_entry(vcpu, vc-runnable_threads, arch.run_list) {
 + if (vcpu-arch.pending_exceptions || !vcpu-arch.ceded) {
 + do_sleep = 0;
 + break;
 + }
 + }
 +
 + if (!do_sleep) {
 + finish_wait(vc-wq, wait);
 + return;
 + }
 +
   vc-vcore_state = VCORE_SLEEPING;
   spin_unlock(vc-lock);
   schedule();
 
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: PPC: Book3S HV: ptes are big endian

2014-11-20 Thread Cédric Le Goater
When being restored from qemu, the kvm_get_htab_header are in native
endian, but the ptes are big endian. 

This patch fixes restore on a KVM LE host. Qemu also needs a fix for
this :

 http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg8.html

Signed-off-by: Cédric Le Goater c...@fr.ibm.com
Cc: Paul Mackerras pau...@samba.org
Cc: Alexey Kardashevskiy a...@ozlabs.ru
Cc: Gregory Kurz gk...@linux.vnet.ibm.com

---

Tested on 3.18-rc5 with LE and BE host.

v2: add be64 local variables to be friendly with sparse
 

 arch/powerpc/kvm/book3s_64_mmu_hv.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Index: linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c
===
--- linux-3.18-hv.git.orig/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ linux-3.18-hv.git/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1539,9 +1539,15 @@ static ssize_t kvm_htab_write(struct fil
hptp = (__be64 *)(kvm-arch.hpt_virt + (i * HPTE_SIZE));
lbuf = (unsigned long __user *)buf;
for (j = 0; j  hdr.n_valid; ++j) {
+   __be64 hpte_v;
+   __be64 hpte_r;
+
err = -EFAULT;
-   if (__get_user(v, lbuf) || __get_user(r, lbuf + 1))
+   if (__get_user(hpte_v, lbuf) ||
+   __get_user(hpte_r, lbuf + 1))
goto out;
+   v = be64_to_cpu(hpte_v);
+   r = be64_to_cpu(hpte_r);
err = -EINVAL;
if (!(v  HPTE_V_VALID))
goto out;

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html