Re: [PATCH v2] KVM: PPC: Exit guest upon fatal machine check exception

2015-12-16 Thread Thomas Huth
On 16/12/15 06:56, Aravinda Prasad wrote:
> This patch modifies KVM to cause a guest exit with
> KVM_EXIT_NMI instead of immediately delivering a 0x200
> interrupt to guest upon machine check exception in
> guest address. Exiting the guest enables QEMU to build
> error log and deliver machine check exception to guest
> OS (either via guest OS registered machine check
> handler or via 0x200 guest OS interrupt vector).
> 
> This approach simplifies the delivering of machine
> check exception to guest OS compared to the earlier approach
> of KVM directly invoking 0x200 guest interrupt vector.
> In the earlier approach QEMU patched the 0x200 interrupt
> vector during boot. The patched code at 0x200 issued a
> private hcall to pass the control to QEMU to build the
> error log.
> 
> This design/approach is based on the feedback for the
> QEMU patches to handle machine check exception. Details
> of earlier approach of handling machine check exception
> in QEMU and related discussions can be found at:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
> 
> This patch also introduces a new KVM capability to
> control how KVM behaves on machine check exception.
> Without this capability, KVM redirects machine check
> exceptions to guest's 0x200 vector if the address in
> error belongs to guest. With this capability KVM
> causes a guest exit with NMI exit reason.
> 
> This is required to avoid problems if a new kernel/KVM
> is used with an old QEMU for guests that don't issue
> "ibm,nmi-register". As old QEMU does not understand the
> NMI exit type, it treats it as a fatal error. However,
> the guest could have handled the machine check error
> if the exception was delivered to guest's 0x200 interrupt
> vector instead of NMI exit in case of old QEMU.
> 
> Change Log v2:
>   - Added KVM capability
> 
> Signed-off-by: Aravinda Prasad 
> ---
>  arch/powerpc/include/asm/kvm_host.h |1 +
>  arch/powerpc/kernel/asm-offsets.c   |1 +
>  arch/powerpc/kvm/book3s_hv.c|   12 +++---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   37 
> +++
>  arch/powerpc/kvm/powerpc.c  |7 ++
>  include/uapi/linux/kvm.h|1 +
>  6 files changed, 31 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 827a38d..8a652ba 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -243,6 +243,7 @@ struct kvm_arch {
>   int hpt_cma_alloc;
>   struct dentry *debugfs_dir;
>   struct dentry *htab_dentry;
> + u8 fwnmi_enabled;

Here you declare the variable as 8-bits ...

>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>   struct mutex hpt_mutex;
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index 221d584..6a4e81a 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -506,6 +506,7 @@ int main(void)
>   DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
>   DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
>   DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
> + DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));

... then define an asm-offset for it ...

>   DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
>   DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
>   DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b98889e..f43c124 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
[...]
> @@ -2381,24 +2377,27 @@ machine_check_realmode:
>   ld  r9, HSTATE_KVM_VCPU(r13)
>   li  r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>   /*
> -  * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
> -  * machine check interrupt (set HSRR0 to 0x200). And for handled
> -  * errors (no-fatal), just go back to guest execution with current
> -  * HSRR0 instead of exiting guest. This new approach will inject
> -  * machine check to guest for fatal error causing guest to crash.
> -  *
> -  * The old code used to return to host for unhandled errors which
> -  * was causing guest to hang with soft lockups inside guest and
> -  * makes it difficult to recover guest instance.
> +  * Deliver unhandled/fatal (e.g. UE) MCE errors to guest
> +  * by exiting the guest with KVM_EXIT_NMI exit reason (exit
> +  * reason set later based on trap). For handled errors
> +  * (no-fatal), go back to guest execution with current HSRR0
> +  * instead of exiting the guest. This approach will cause
> +  * the guest to exit in case of fatal 

Re: [PATCH v2] KVM: PPC: Exit guest upon fatal machine check exception

2015-12-16 Thread Thomas Huth
On 16/12/15 06:56, Aravinda Prasad wrote:
> This patch modifies KVM to cause a guest exit with
> KVM_EXIT_NMI instead of immediately delivering a 0x200
> interrupt to guest upon machine check exception in
> guest address. Exiting the guest enables QEMU to build
> error log and deliver machine check exception to guest
> OS (either via guest OS registered machine check
> handler or via 0x200 guest OS interrupt vector).
> 
> This approach simplifies the delivering of machine
> check exception to guest OS compared to the earlier approach
> of KVM directly invoking 0x200 guest interrupt vector.
> In the earlier approach QEMU patched the 0x200 interrupt
> vector during boot. The patched code at 0x200 issued a
> private hcall to pass the control to QEMU to build the
> error log.
> 
> This design/approach is based on the feedback for the
> QEMU patches to handle machine check exception. Details
> of earlier approach of handling machine check exception
> in QEMU and related discussions can be found at:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html
> 
> This patch also introduces a new KVM capability to
> control how KVM behaves on machine check exception.
> Without this capability, KVM redirects machine check
> exceptions to guest's 0x200 vector if the address in
> error belongs to guest. With this capability KVM
> causes a guest exit with NMI exit reason.
> 
> This is required to avoid problems if a new kernel/KVM
> is used with an old QEMU for guests that don't issue
> "ibm,nmi-register". As old QEMU does not understand the
> NMI exit type, it treats it as a fatal error. However,
> the guest could have handled the machine check error
> if the exception was delivered to guest's 0x200 interrupt
> vector instead of NMI exit in case of old QEMU.
> 
> Change Log v2:
>   - Added KVM capability
> 
> Signed-off-by: Aravinda Prasad 
> ---
>  arch/powerpc/include/asm/kvm_host.h |1 +
>  arch/powerpc/kernel/asm-offsets.c   |1 +
>  arch/powerpc/kvm/book3s_hv.c|   12 +++---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   37 
> +++
>  arch/powerpc/kvm/powerpc.c  |7 ++
>  include/uapi/linux/kvm.h|1 +
>  6 files changed, 31 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 827a38d..8a652ba 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -243,6 +243,7 @@ struct kvm_arch {
>   int hpt_cma_alloc;
>   struct dentry *debugfs_dir;
>   struct dentry *htab_dentry;
> + u8 fwnmi_enabled;

Here you declare the variable as 8-bits ...

>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>   struct mutex hpt_mutex;
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index 221d584..6a4e81a 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -506,6 +506,7 @@ int main(void)
>   DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
>   DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
>   DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
> + DEFINE(KVM_FWNMI, offsetof(struct kvm, arch.fwnmi_enabled));

... then define an asm-offset for it ...

>   DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
>   DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
>   DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr));
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b98889e..f43c124 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
[...]
> @@ -2381,24 +2377,27 @@ machine_check_realmode:
>   ld  r9, HSTATE_KVM_VCPU(r13)
>   li  r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>   /*
> -  * Deliver unhandled/fatal (e.g. UE) MCE errors to guest through
> -  * machine check interrupt (set HSRR0 to 0x200). And for handled
> -  * errors (no-fatal), just go back to guest execution with current
> -  * HSRR0 instead of exiting guest. This new approach will inject
> -  * machine check to guest for fatal error causing guest to crash.
> -  *
> -  * The old code used to return to host for unhandled errors which
> -  * was causing guest to hang with soft lockups inside guest and
> -  * makes it difficult to recover guest instance.
> +  * Deliver unhandled/fatal (e.g. UE) MCE errors to guest
> +  * by exiting the guest with KVM_EXIT_NMI exit reason (exit
> +  * reason set later based on trap). For handled errors
> +  * (no-fatal), go back to guest execution with current HSRR0
> +  * instead of exiting the guest. This approach will cause
> +  * the guest to exit in case of fatal 

Re: [PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-12-09 Thread Thomas Huth
On 09/12/15 04:28, Paul Mackerras wrote:
> On Wed, Nov 04, 2015 at 10:03:48AM +0100, Thomas Huth wrote:
>> Only using 32 memslots for KVM on powerpc is way too low, you can
>> nowadays hit this limit quite fast by adding a couple of PCI devices
>> and/or pluggable memory DIMMs to the guest.
>> x86 already increased the limit to 512 in total, to satisfy 256
>> pluggable DIMM slots, 3 private slots and 253 slots for other things
>> like PCI devices. On powerpc, we only have 32 pluggable DIMMs in
> 
> I agree with increasing the limit.  Is there a reason we have only 32
> pluggable DIMMs in QEMU on powerpc, not more?  Should we be increasing
> that limit too?  If so, maybe we should increase the number of memory
> slots to 512?

H ... the comment before the #define in QEMU reads:

/*
 * This defines the maximum number of DIMM slots we can have for sPAPR
 * guest. This is not defined by sPAPR but we are defining it to 32 slots
 * based on default number of slots provided by PowerPC kernel.
 */
#define SPAPR_MAX_RAM_SLOTS 32

So as far as I can see, there's indeed a possibility that we'll
increase this value once the kernel can handle more slots!

So I guess you're right and we should play save and use more slots
right from the start. I'll send a new patch with 512 instead.

>> QEMU, not 256, so we likely do not as much slots as on x86. Thus
> 
> "so we likely do not need as many slots as on x86" would be better
> English.

Oops, I'm sure that was my original intention ... anyway, thanks for
pointing it out, it's always good to get some feedback as a non-native
English speaker.

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-12-09 Thread Thomas Huth
On 09/12/15 04:28, Paul Mackerras wrote:
> On Wed, Nov 04, 2015 at 10:03:48AM +0100, Thomas Huth wrote:
>> Only using 32 memslots for KVM on powerpc is way too low, you can
>> nowadays hit this limit quite fast by adding a couple of PCI devices
>> and/or pluggable memory DIMMs to the guest.
>> x86 already increased the limit to 512 in total, to satisfy 256
>> pluggable DIMM slots, 3 private slots and 253 slots for other things
>> like PCI devices. On powerpc, we only have 32 pluggable DIMMs in
> 
> I agree with increasing the limit.  Is there a reason we have only 32
> pluggable DIMMs in QEMU on powerpc, not more?  Should we be increasing
> that limit too?  If so, maybe we should increase the number of memory
> slots to 512?

H ... the comment before the #define in QEMU reads:

/*
 * This defines the maximum number of DIMM slots we can have for sPAPR
 * guest. This is not defined by sPAPR but we are defining it to 32 slots
 * based on default number of slots provided by PowerPC kernel.
 */
#define SPAPR_MAX_RAM_SLOTS 32

So as far as I can see, there's indeed a possibility that we'll
increase this value once the kernel can handle more slots!

So I guess you're right and we should play save and use more slots
right from the start. I'll send a new patch with 512 instead.

>> QEMU, not 256, so we likely do not as much slots as on x86. Thus
> 
> "so we likely do not need as many slots as on x86" would be better
> English.

Oops, I'm sure that was my original intention ... anyway, thanks for
pointing it out, it's always good to get some feedback as a non-native
English speaker.

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Increase memslots to 512

2015-12-09 Thread Thomas Huth
Only using 32 memslots for KVM on powerpc is way too low, you can
nowadays hit this limit quite fast by adding a couple of PCI devices
and/or pluggable memory DIMMs to the guest.

x86 already increased the KVM_USER_MEM_SLOTS to 509, to satisfy 256
pluggable DIMM slots, 3 private slots and 253 slots for other things
like PCI devices (i.e. resulting in 256 + 3 + 253 = 512 slots in
total). We should do something similar for powerpc, and since we do
not use private slots here, we can set the value to 512 directly.

While we're at it, also remove the KVM_MEM_SLOTS_NUM definition
from the powerpc-specific header since this gets defined in the
generic kvm_host.h header anyway.

Signed-off-by: Thomas Huth <th...@redhat.com>
---
 arch/powerpc/include/asm/kvm_host.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 887c259..2c96a72 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -38,8 +38,7 @@
 
 #define KVM_MAX_VCPUS  NR_CPUS
 #define KVM_MAX_VCORES NR_CPUS
-#define KVM_USER_MEM_SLOTS 32
-#define KVM_MEM_SLOTS_NUM KVM_USER_MEM_SLOTS
+#define KVM_USER_MEM_SLOTS 512
 
 #ifdef CONFIG_KVM_MMIO
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Increase memslots to 512

2015-12-09 Thread Thomas Huth
Only using 32 memslots for KVM on powerpc is way too low, you can
nowadays hit this limit quite fast by adding a couple of PCI devices
and/or pluggable memory DIMMs to the guest.

x86 already increased the KVM_USER_MEM_SLOTS to 509, to satisfy 256
pluggable DIMM slots, 3 private slots and 253 slots for other things
like PCI devices (i.e. resulting in 256 + 3 + 253 = 512 slots in
total). We should do something similar for powerpc, and since we do
not use private slots here, we can set the value to 512 directly.

While we're at it, also remove the KVM_MEM_SLOTS_NUM definition
from the powerpc-specific header since this gets defined in the
generic kvm_host.h header anyway.

Signed-off-by: Thomas Huth <th...@redhat.com>
---
 arch/powerpc/include/asm/kvm_host.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 887c259..2c96a72 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -38,8 +38,7 @@
 
 #define KVM_MAX_VCPUS  NR_CPUS
 #define KVM_MAX_VCORES NR_CPUS
-#define KVM_USER_MEM_SLOTS 32
-#define KVM_MEM_SLOTS_NUM KVM_USER_MEM_SLOTS
+#define KVM_USER_MEM_SLOTS 512
 
 #ifdef CONFIG_KVM_MMIO
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-12-07 Thread Thomas Huth
On 20/11/15 09:11, Thomas Huth wrote:
> In the old DABR register, the BT (Breakpoint Translation) bit
> is bit number 61. In the new DAWRX register, the WT (Watchpoint
> Translation) bit is bit number 59. So to move the DABR-BT bit
> into the position of the DAWRX-WT bit, it has to be shifted by
> two, not only by one. This fixes hardware watchpoints in gdb of
> older guests that only use the H_SET_DABR/X interface instead
> of the new H_SET_MODE interface.
> 
> Signed-off-by: Thomas Huth <th...@redhat.com>
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b98889e..3983b87 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -2143,7 +2143,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  
>   /* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
>  2:   rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
> - rlwimi  r5, r4, 1, DAWRX_WT
> + rlwimi  r5, r4, 2, DAWRX_WT
>   clrrdi  r4, r4, 3
>   std r4, VCPU_DAWR(r3)
>   std r5, VCPU_DAWRX(r3)

Ping?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-12-07 Thread Thomas Huth
On 20/11/15 09:11, Thomas Huth wrote:
> In the old DABR register, the BT (Breakpoint Translation) bit
> is bit number 61. In the new DAWRX register, the WT (Watchpoint
> Translation) bit is bit number 59. So to move the DABR-BT bit
> into the position of the DAWRX-WT bit, it has to be shifted by
> two, not only by one. This fixes hardware watchpoints in gdb of
> older guests that only use the H_SET_DABR/X interface instead
> of the new H_SET_MODE interface.
> 
> Signed-off-by: Thomas Huth <th...@redhat.com>
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b98889e..3983b87 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -2143,7 +2143,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  
>   /* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
>  2:   rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
> - rlwimi  r5, r4, 1, DAWRX_WT
> + rlwimi  r5, r4, 2, DAWRX_WT
>   clrrdi  r4, r4, 3
>   std r4, VCPU_DAWR(r3)
>   std r5, VCPU_DAWRX(r3)

Ping?

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-11-20 Thread Thomas Huth
In the old DABR register, the BT (Breakpoint Translation) bit
is bit number 61. In the new DAWRX register, the WT (Watchpoint
Translation) bit is bit number 59. So to move the DABR-BT bit
into the position of the DAWRX-WT bit, it has to be shifted by
two, not only by one. This fixes hardware watchpoints in gdb of
older guests that only use the H_SET_DABR/X interface instead
of the new H_SET_MODE interface.

Signed-off-by: Thomas Huth <th...@redhat.com>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b98889e..3983b87 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2143,7 +2143,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
/* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
 2: rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
-   rlwimi  r5, r4, 1, DAWRX_WT
+   rlwimi  r5, r4, 2, DAWRX_WT
clrrdi  r4, r4, 3
std r4, VCPU_DAWR(r3)
std r5, VCPU_DAWRX(r3)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8

2015-11-20 Thread Thomas Huth
In the old DABR register, the BT (Breakpoint Translation) bit
is bit number 61. In the new DAWRX register, the WT (Watchpoint
Translation) bit is bit number 59. So to move the DABR-BT bit
into the position of the DAWRX-WT bit, it has to be shifted by
two, not only by one. This fixes hardware watchpoints in gdb of
older guests that only use the H_SET_DABR/X interface instead
of the new H_SET_MODE interface.

Signed-off-by: Thomas Huth <th...@redhat.com>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b98889e..3983b87 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2143,7 +2143,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
/* Emulate H_SET_DABR/X on P8 for the sake of compat mode guests */
 2: rlwimi  r5, r4, 5, DAWRX_DR | DAWRX_DW
-   rlwimi  r5, r4, 1, DAWRX_WT
+   rlwimi  r5, r4, 2, DAWRX_WT
clrrdi  r4, r4, 3
std r4, VCPU_DAWR(r3)
std r5, VCPU_DAWRX(r3)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: Provide function for VCPU lookup by id

2015-11-19 Thread Thomas Huth
On 19/11/15 09:37, Christian Borntraeger wrote:
> From: David Hildenbrand 
> 
> Let's provide a function to lookup a VCPU by id.
> 
> Reviewed-by: Christian Borntraeger 
> Reviewed-by: Dominik Dingel 
> Signed-off-by: David Hildenbrand 
> Signed-off-by: Christian Borntraeger 
> [split patch from refactoring patch]
> ---
>  include/linux/kvm_host.h | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 5706a21..b9f345f 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -460,6 +460,17 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm 
> *kvm, int i)
>(vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
>idx++)
>  
> +static inline struct kvm_vcpu *kvm_lookup_vcpu(struct kvm *kvm, int id)
> +{
> + struct kvm_vcpu *vcpu;
> + int i;
> +
> + kvm_for_each_vcpu(i, vcpu, kvm)
> + if (vcpu->vcpu_id == id)
> + return vcpu;
> + return NULL;
> +}
> +

Any chance that you name this function differently? Otherwise we've got
two functions that sound very similar and also have similar prototypes:

- kvm_get_vcpu(struct kvm *kvm, int i)
- kvm_lookup_vcpu(struct kvm *kvm, int id)

I'm pretty sure this will cause confusion in the future!
==> Could you maybe name the new function something like
"kvm_lookup_vcpu_by_id" or "kvm_get_vcpu_by_id" instead?

Also a short comment before the function would be nice to make the
reader aware of the difference (e.g. when hot-plugging) between the
vcpu-id and array-id.

Otherwise: +1 for finally having a proper common function for this!

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: Provide function for VCPU lookup by id

2015-11-19 Thread Thomas Huth
On 19/11/15 09:37, Christian Borntraeger wrote:
> From: David Hildenbrand 
> 
> Let's provide a function to lookup a VCPU by id.
> 
> Reviewed-by: Christian Borntraeger 
> Reviewed-by: Dominik Dingel 
> Signed-off-by: David Hildenbrand 
> Signed-off-by: Christian Borntraeger 
> [split patch from refactoring patch]
> ---
>  include/linux/kvm_host.h | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 5706a21..b9f345f 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -460,6 +460,17 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm 
> *kvm, int i)
>(vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
>idx++)
>  
> +static inline struct kvm_vcpu *kvm_lookup_vcpu(struct kvm *kvm, int id)
> +{
> + struct kvm_vcpu *vcpu;
> + int i;
> +
> + kvm_for_each_vcpu(i, vcpu, kvm)
> + if (vcpu->vcpu_id == id)
> + return vcpu;
> + return NULL;
> +}
> +

Any chance that you name this function differently? Otherwise we've got
two functions that sound very similar and also have similar prototypes:

- kvm_get_vcpu(struct kvm *kvm, int i)
- kvm_lookup_vcpu(struct kvm *kvm, int id)

I'm pretty sure this will cause confusion in the future!
==> Could you maybe name the new function something like
"kvm_lookup_vcpu_by_id" or "kvm_get_vcpu_by_id" instead?

Also a short comment before the function would be nice to make the
reader aware of the difference (e.g. when hot-plugging) between the
vcpu-id and array-id.

Otherwise: +1 for finally having a proper common function for this!

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-11-18 Thread Thomas Huth
On 04/11/15 10:03, Thomas Huth wrote:
> Only using 32 memslots for KVM on powerpc is way too low, you can
> nowadays hit this limit quite fast by adding a couple of PCI devices
> and/or pluggable memory DIMMs to the guest.
> x86 already increased the limit to 512 in total, to satisfy 256
> pluggable DIMM slots, 3 private slots and 253 slots for other things
> like PCI devices. On powerpc, we only have 32 pluggable DIMMs in
> QEMU, not 256, so we likely do not as much slots as on x86. Thus
> setting the slot limit to 320 sounds like a good value for the
> time being (until we have some code in the future to resize the
> memslot array dynamically).
> And while we're at it, also remove the KVM_MEM_SLOTS_NUM definition
> from the powerpc-specific header since this gets defined in the
> generic kvm_host.h header anyway.
> 
> Signed-off-by: Thomas Huth <th...@redhat.com>
> ---
>  arch/powerpc/include/asm/kvm_host.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 887c259..89d387a 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -38,8 +38,7 @@
>  
>  #define KVM_MAX_VCPUSNR_CPUS
>  #define KVM_MAX_VCORES   NR_CPUS
> -#define KVM_USER_MEM_SLOTS 32
> -#define KVM_MEM_SLOTS_NUM KVM_USER_MEM_SLOTS
> +#define KVM_USER_MEM_SLOTS 320
>  
>  #ifdef CONFIG_KVM_MMIO
>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
> 

Ping! ... any comments on this one?

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-12 Thread Thomas Huth
On 13/11/15 07:26, Aravinda Prasad wrote:
> 
> On Friday 13 November 2015 07:20 AM, David Gibson wrote:
>> On Thu, Nov 12, 2015 at 11:22:29PM +0530, Aravinda Prasad wrote:
[...]
>>> So thinking whether qemu should explicitly enable the new NMI
>>> behavior.
>>
>> So, I think the reasoning above tends towards having qemu control the
>> MC behaviour.  If qemu does nothing, MCs are delivered direct to
>> 0x200, if it enables the new handling, they cause a KVM exit and qemu
>> will deliver the MC.
> 
> This essentially requires qemu to control how KVM behaves as KVM does
> the actual redirection of MC either to guest's 0x200 vector or to exit
> guest. So, if we are running new qemu, then KVM should exit guest and if
> we are running old qemu, KVM should redirect MC to 0x200. Is there any
> way to communicate this to KVM? ioctl?

Simply introduce a KVM capability that can be enabled by userspace.
See kvm_vcpu_ioctl_enable_cap() in arch/powerpc/kvm/powerpc.c.

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Exit guest upon fatal machine check exception

2015-11-12 Thread Thomas Huth
On 13/11/15 07:26, Aravinda Prasad wrote:
> 
> On Friday 13 November 2015 07:20 AM, David Gibson wrote:
>> On Thu, Nov 12, 2015 at 11:22:29PM +0530, Aravinda Prasad wrote:
[...]
>>> So thinking whether qemu should explicitly enable the new NMI
>>> behavior.
>>
>> So, I think the reasoning above tends towards having qemu control the
>> MC behaviour.  If qemu does nothing, MCs are delivered direct to
>> 0x200, if it enables the new handling, they cause a KVM exit and qemu
>> will deliver the MC.
> 
> This essentially requires qemu to control how KVM behaves as KVM does
> the actual redirection of MC either to guest's 0x200 vector or to exit
> guest. So, if we are running new qemu, then KVM should exit guest and if
> we are running old qemu, KVM should redirect MC to 0x200. Is there any
> way to communicate this to KVM? ioctl?

Simply introduce a KVM capability that can be enabled by userspace.
See kvm_vcpu_ioctl_enable_cap() in arch/powerpc/kvm/powerpc.c.

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH v2 02/19] trivial: lib: fail hard on failed mallocs

2015-11-09 Thread Thomas Huth
On 09/11/15 21:53, Andrew Jones wrote:
> It's pretty safe to not even bother checking for NULL when
> using malloc and friends, but if we do check, then fail
> hard.
> 
> Signed-off-by: Andrew Jones <drjo...@redhat.com>
> ---
> v2: no code in asserts [Thomas Huth]
> 
>  lib/virtio-mmio.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/virtio-mmio.c b/lib/virtio-mmio.c
> index 043832299174e..5ccbd193a264a 100644
> --- a/lib/virtio-mmio.c
> +++ b/lib/virtio-mmio.c
> @@ -54,8 +54,7 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
> *vdev,
>  
>   vq = calloc(1, sizeof(*vq));
>   queue = memalign(PAGE_SIZE, VIRTIO_MMIO_QUEUE_SIZE_MIN);
> - if (!vq || !queue)
> - return NULL;
> + assert(vq && queue);
>  
>   writel(index, vm_dev->base + VIRTIO_MMIO_QUEUE_SEL);
>  
> @@ -162,8 +161,7 @@ static struct virtio_device *virtio_mmio_dt_bind(u32 
> devid)
>   return NULL;
>  
>   vm_dev = calloc(1, sizeof(*vm_dev));
> - if (!vm_dev)
> - return NULL;
> + assert(vm_dev != NULL);
>  
>   vm_dev->base = info.base;
>   vm_device_init(vm_dev);

Reviewed-by: Thomas Huth <th...@redhat.com>


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 02/18] trivial: lib: fail hard on failed mallocs

2015-11-06 Thread Thomas Huth
On 06/11/15 01:24, Andrew Jones wrote:
> It's pretty safe to not even bother checking for NULL when
> using malloc and friends, but if we do check, then fail
> hard.
> 
> Signed-off-by: Andrew Jones 
> ---
>  lib/virtio-mmio.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/virtio-mmio.c b/lib/virtio-mmio.c
> index 043832299174e..1b6f0cc378b79 100644
> --- a/lib/virtio-mmio.c
> +++ b/lib/virtio-mmio.c
...
> @@ -161,9 +160,7 @@ static struct virtio_device *virtio_mmio_dt_bind(u32 
> devid)
>   if (node == -FDT_ERR_NOTFOUND)
>   return NULL;
>  
> - vm_dev = calloc(1, sizeof(*vm_dev));
> - if (!vm_dev)
> - return NULL;
> + assert((vm_dev = calloc(1, sizeof(*vm_dev))) != NULL);

Could you maybe write that as two statements? assert() is classically a
macro which could also be disabled, so if somebody introduces a switch
to "#define assert(...) /*nothing*/" in the future, that calloc call
would be completely gone!

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-11-04 Thread Thomas Huth
Only using 32 memslots for KVM on powerpc is way too low, you can
nowadays hit this limit quite fast by adding a couple of PCI devices
and/or pluggable memory DIMMs to the guest.
x86 already increased the limit to 512 in total, to satisfy 256
pluggable DIMM slots, 3 private slots and 253 slots for other things
like PCI devices. On powerpc, we only have 32 pluggable DIMMs in
QEMU, not 256, so we likely do not as much slots as on x86. Thus
setting the slot limit to 320 sounds like a good value for the
time being (until we have some code in the future to resize the
memslot array dynamically).
And while we're at it, also remove the KVM_MEM_SLOTS_NUM definition
from the powerpc-specific header since this gets defined in the
generic kvm_host.h header anyway.

Signed-off-by: Thomas Huth <th...@redhat.com>
---
 arch/powerpc/include/asm/kvm_host.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 887c259..89d387a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -38,8 +38,7 @@
 
 #define KVM_MAX_VCPUS  NR_CPUS
 #define KVM_MAX_VCORES NR_CPUS
-#define KVM_USER_MEM_SLOTS 32
-#define KVM_MEM_SLOTS_NUM KVM_USER_MEM_SLOTS
+#define KVM_USER_MEM_SLOTS 320
 
 #ifdef CONFIG_KVM_MMIO
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] KVM: PPC: Increase memslots to 320

2015-11-04 Thread Thomas Huth
Only using 32 memslots for KVM on powerpc is way too low, you can
nowadays hit this limit quite fast by adding a couple of PCI devices
and/or pluggable memory DIMMs to the guest.
x86 already increased the limit to 512 in total, to satisfy 256
pluggable DIMM slots, 3 private slots and 253 slots for other things
like PCI devices. On powerpc, we only have 32 pluggable DIMMs in
QEMU, not 256, so we likely do not as much slots as on x86. Thus
setting the slot limit to 320 sounds like a good value for the
time being (until we have some code in the future to resize the
memslot array dynamically).
And while we're at it, also remove the KVM_MEM_SLOTS_NUM definition
from the powerpc-specific header since this gets defined in the
generic kvm_host.h header anyway.

Signed-off-by: Thomas Huth <th...@redhat.com>
---
 arch/powerpc/include/asm/kvm_host.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 887c259..89d387a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -38,8 +38,7 @@
 
 #define KVM_MAX_VCPUS  NR_CPUS
 #define KVM_MAX_VCORES NR_CPUS
-#define KVM_USER_MEM_SLOTS 32
-#define KVM_MEM_SLOTS_NUM KVM_USER_MEM_SLOTS
+#define KVM_USER_MEM_SLOTS 320
 
 #ifdef CONFIG_KVM_MMIO
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] KVM: PPC: Increase memslots

2015-11-04 Thread Thomas Huth
Now that the patch from Nikunj to support the KVM_CAP_NR_MEMSLOTS
capability on powerpc has been merged to kvm/next, we can/should
increase the amount of memslots on ppc, too.
Since nobody else sent a patch yet (as far as I can see), I'd like
to suggest to increase the slot number to 320 now. Why 320? Well,
x86 uses 509 (+3 internal slots), to give enough space for
256 pluggable DIMMs and 253 other slots (for PCI etc.).
On powerpc, QEMU only supports 32 pluggable DIMMs, so I think we
should be fine by using something around 253 + 32 slots + some few
extra ... so 320 sounds like a good value to me for the time
being (but in the long run, we should really make this dynamically
instead, I think).

Thomas Huth (1):
  KVM: PPC: Increase memslots to 320

 arch/powerpc/include/asm/kvm_host.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] KVM: PPC: Increase memslots

2015-11-04 Thread Thomas Huth
Now that the patch from Nikunj to support the KVM_CAP_NR_MEMSLOTS
capability on powerpc has been merged to kvm/next, we can/should
increase the amount of memslots on ppc, too.
Since nobody else sent a patch yet (as far as I can see), I'd like
to suggest to increase the slot number to 320 now. Why 320? Well,
x86 uses 509 (+3 internal slots), to give enough space for
256 pluggable DIMMs and 253 other slots (for PCI etc.).
On powerpc, QEMU only supports 32 pluggable DIMMs, so I think we
should be fine by using something around 253 + 32 slots + some few
extra ... so 320 sounds like a good value to me for the time
being (but in the long run, we should really make this dynamically
instead, I think).

Thomas Huth (1):
  KVM: PPC: Increase memslots to 320

 arch/powerpc/include/asm/kvm_host.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 00/14] ppc64: initial drop

2015-11-02 Thread Thomas Huth
On 03/08/15 16:41, Andrew Jones wrote:
> This series is the first series of a series of series that will
> bring support to kvm-unit-tests for ppc64, and eventually ppc64le.

 Hi Andrew,

may I ask about the current state of ppc64 support in the
kvm-unit-tests? Is there a newer version available than the one you
posted three months ago?

 Thanks,
  Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 00/14] ppc64: initial drop

2015-11-02 Thread Thomas Huth
On 03/08/15 16:41, Andrew Jones wrote:
> This series is the first series of a series of series that will
> bring support to kvm-unit-tests for ppc64, and eventually ppc64le.

 Hi Andrew,

may I ask about the current state of ppc64 support in the
kvm-unit-tests? Is there a newer version available than the one you
posted three months ago?

 Thanks,
  Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Book3S HV: Synthesize segment fault if SLB lookup fails

2015-11-02 Thread Thomas Huth
On 27/10/15 06:13, Paul Mackerras wrote:
> When handling a hypervisor data or instruction storage interrupt (HDSI
> or HISI), we look up the SLB entry for the address being accessed in
> order to translate the effective address to a virtual address which can
> be looked up in the guest HPT.  This lookup can occasionally fail due
> to the guest replacing an SLB entry without invalidating the evicted
> SLB entry.  In this situation an ERAT (effective to real address
> translation cache) entry can persist and be used by the hardware even
> though there is no longer a corresponding SLB entry.
> 
> Previously we would just deliver a data or instruction storage interrupt
> (DSI or ISI) to the guest in this case.  However, this is not correct
> and has been observed to cause guests to crash, typically with a
> data storage protection interrupt on a store to the vmemmap area.
> 
> Instead, what we do now is to synthesize a data or instruction segment
> interrupt.  That should cause the guest to reload an appropriate entry
> into the SLB and retry the faulting instruction.  If it still faults,
> we should find an appropriate SLB entry next time and be able to handle
> the fault.
> 
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 20 
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b1dab8d..3c6badc 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1749,7 +1749,8 @@ kvmppc_hdsi:
>   beq 3f
>   clrrdi  r0, r4, 28
>   PPC_SLBFEE_DOT(R5, R0)  /* if so, look up SLB */
> - bne 1f  /* if no SLB entry found */
> + li  r0, BOOK3S_INTERRUPT_DATA_SEGMENT
> + bne 7f  /* if no SLB entry found */
>  4:   std r4, VCPU_FAULT_DAR(r9)
>   stw r6, VCPU_FAULT_DSISR(r9)
>  
> @@ -1768,14 +1769,15 @@ kvmppc_hdsi:
>   cmpdi   r3, -2  /* MMIO emulation; need instr word */
>   beq 2f
>  
> - /* Synthesize a DSI for the guest */
> + /* Synthesize a DSI (or DSegI) for the guest */
>   ld  r4, VCPU_FAULT_DAR(r9)
>   mr  r6, r3
> -1:   mtspr   SPRN_DAR, r4
> +1:   li  r0, BOOK3S_INTERRUPT_DATA_STORAGE
>   mtspr   SPRN_DSISR, r6
> +7:   mtspr   SPRN_DAR, r4

I first though "hey, you're not setting DSISR for the DsegI now" ... but
then I saw in the PowerISA that DSISR is "undefined" for the DSegI, so
this should be fine, I think.

>   mtspr   SPRN_SRR0, r10
>   mtspr   SPRN_SRR1, r11
> - li  r10, BOOK3S_INTERRUPT_DATA_STORAGE
> + mr  r10, r0
>   bl  kvmppc_msr_interrupt
>  fast_interrupt_c_return:
>  6:   ld  r7, VCPU_CTR(r9)
> @@ -1823,7 +1825,8 @@ kvmppc_hisi:
>   beq 3f
>   clrrdi  r0, r10, 28
>   PPC_SLBFEE_DOT(R5, R0)  /* if so, look up SLB */
> - bne 1f  /* if no SLB entry found */
> + li  r0, BOOK3S_INTERRUPT_INST_SEGMENT
> + bne 7f  /* if no SLB entry found */
>  4:
>   /* Search the hash table. */
>   mr  r3, r9  /* vcpu pointer */
> @@ -1840,11 +1843,12 @@ kvmppc_hisi:
>   cmpdi   r3, -1  /* handle in kernel mode */
>   beq guest_exit_cont
>  
> - /* Synthesize an ISI for the guest */
> + /* Synthesize an ISI (or ISegI) for the guest */
>   mr  r11, r3
> -1:   mtspr   SPRN_SRR0, r10
> +1:   li  r0, BOOK3S_INTERRUPT_INST_STORAGE
> +7:   mtspr   SPRN_SRR0, r10
>   mtspr   SPRN_SRR1, r11
> - li  r10, BOOK3S_INTERRUPT_INST_STORAGE
> + mr  r10, r0
>   bl  kvmppc_msr_interrupt
>   b   fast_interrupt_c_return

As far as I can see (but I'm really not an expert here), the patch seems
to be fine. And we have a test running with it for almost 6 days now and
did not see any further guest crashes, so it seems to me like this is
the right fix for this issue! So if you like, you can add my
"Reviewed-by" or "Tested-by" to this patch.

 Thanks a lot for fixing this!
  Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Implement extension to report number of memslots

2015-10-27 Thread Thomas Huth
On 26/10/15 06:15, Paul Mackerras wrote:
> On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote:
>> Yes, we'll likely need this soon! 32 slots are not enough...
> 
> Would anyone object if I raised the limit for PPC to 512 slots?

In the long run we should really make this somehow dynamically instead.
But as a first step, that should IMHO be fine. Question is whether we
immediately need 512 on PPC, too? QEMU for x86 features 256 pluggable
memory DIMM slots ("#define ACPI_MAX_RAM_SLOTS 256"), while PPC only has
32 ("#define SPAPR_MAX_RAM_SLOTS 32"). So maybe we are fine with less
memory slots on PPC, e.g. 300 or so?

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Implement extension to report number of memslots

2015-10-27 Thread Thomas Huth
On 26/10/15 06:15, Paul Mackerras wrote:
> On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote:
>> Yes, we'll likely need this soon! 32 slots are not enough...
> 
> Would anyone object if I raised the limit for PPC to 512 slots?

In the long run we should really make this somehow dynamically instead.
But as a first step, that should IMHO be fine. Question is whether we
immediately need 512 on PPC, too? QEMU for x86 features 256 pluggable
memory DIMM slots ("#define ACPI_MAX_RAM_SLOTS 256"), while PPC only has
32 ("#define SPAPR_MAX_RAM_SLOTS 32"). So maybe we are fine with less
memory slots on PPC, e.g. 300 or so?

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Implement extension to report number of memslots

2015-10-16 Thread Thomas Huth
On 16/10/15 06:57, Nikunj A Dadhania wrote:
> QEMU assumes 32 memslots if this extension is not implemented. Although,
> current value of KVM_USER_MEM_SLOTS is 32, once KVM_USER_MEM_SLOTS
> changes QEMU would take a wrong value.
> 
> Signed-off-by: Nikunj A Dadhania <nik...@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kvm/powerpc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 2e51289..6fd2405 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -559,6 +559,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   else
>   r = num_online_cpus();
>   break;
> + case KVM_CAP_NR_MEMSLOTS:
> + r = KVM_USER_MEM_SLOTS;
> + break;
>   case KVM_CAP_MAX_VCPUS:
>   r = KVM_MAX_VCPUS;
>   break;

Yes, we'll likely need this soon! 32 slots are not enough...

Reviewed-by: Thomas Huth <th...@redhat.com>

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Implement extension to report number of memslots

2015-10-16 Thread Thomas Huth
On 16/10/15 06:57, Nikunj A Dadhania wrote:
> QEMU assumes 32 memslots if this extension is not implemented. Although,
> current value of KVM_USER_MEM_SLOTS is 32, once KVM_USER_MEM_SLOTS
> changes QEMU would take a wrong value.
> 
> Signed-off-by: Nikunj A Dadhania <nik...@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kvm/powerpc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 2e51289..6fd2405 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -559,6 +559,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   else
>   r = num_online_cpus();
>   break;
> + case KVM_CAP_NR_MEMSLOTS:
> + r = KVM_USER_MEM_SLOTS;
> + break;
>   case KVM_CAP_MAX_VCPUS:
>   r = KVM_MAX_VCPUS;
>   break;

Yes, we'll likely need this soon! 32 slots are not enough...

Reviewed-by: Thomas Huth <th...@redhat.com>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] ppc/spapr: Implement H_RANDOM hypercall in QEMU

2015-09-21 Thread Thomas Huth
On 21/09/15 04:10, David Gibson wrote:
> On Fri, Sep 18, 2015 at 11:05:52AM +0200, Greg Kurz wrote:
>> On Thu, 17 Sep 2015 10:49:41 +0200
>> Thomas Huth <th...@redhat.com> wrote:
>>
>>> The PAPR interface defines a hypercall to pass high-quality
>>> hardware generated random numbers to guests. Recent kernels can
>>> already provide this hypercall to the guest if the right hardware
>>> random number generator is available. But in case the user wants
>>> to use another source like EGD, or QEMU is running with an older
>>> kernel, we should also have this call in QEMU, so that guests that
>>> do not support virtio-rng yet can get good random numbers, too.
>>>
>>> This patch now adds a new pseudo-device to QEMU that either
>>> directly provides this hypercall to the guest or is able to
>>> enable the in-kernel hypercall if available. The in-kernel
>>> hypercall can be enabled with the use-kvm property, e.g.:
>>>
>>>  qemu-system-ppc64 -device spapr-rng,use-kvm=true
>>>
>>> For handling the hypercall in QEMU instead, a "RngBackend" is
>>> required since the hypercall should provide "good" random data
>>> instead of pseudo-random (like from a "simple" library function
>>> like rand() or g_random_int()). Since there are multiple RngBackends
>>> available, the user must select an appropriate back-end via the
>>> "rng" property of the device, e.g.:
>>>
>>>  qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=gid0 \
>>>-device spapr-rng,rng=gid0 ...
>>>
>>> See http://wiki.qemu-project.org/Features-Done/VirtIORNG for
>>> other example of specifying RngBackends.
>>>
>>> Signed-off-by: Thomas Huth <th...@redhat.com>
>>> ---
>>
>> It is a good thing that the user can choose between in-kernel and backend,
>> and this patch does the work.
>>
>> This being said, I am not sure about the use case where a user has a hwrng
>> capable platform and wants to run guests without any hwrng support at all is
>> an appropriate default behavior... I guess we will find more users that want
>> in-kernel being the default if it is available.
>>
>> The patch below modifies yours to do just this: the pseudo-device is only
>> created if hwrng is present and not already created.
> 
> I have mixed feelings about this.  On the one hand, I agree that it
> would be nice to allow H_RANDOM support by default.  On the other hand
> the patch below leaves no way to turn it off for testing purposes.  It
> also adds another place where the guest hardware depends on the host
> configuration, which adds to the already substantial mess of ensuring
> that source and destination hardware configuration matches for
> migration.

I thought about this question on the weekend and came to the same
conclusion. I think if we want to enable this by default, it likely
should rather be done at the libvirt level instead?

 Thomas





signature.asc
Description: OpenPGP digital signature


Re: [PATCH] KVM: PPC: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()

2015-09-21 Thread Thomas Huth
On 21/09/15 03:37, David Gibson wrote:
> On Fri, Sep 18, 2015 at 08:57:28AM +0200, Thomas Huth wrote:
>> Access to the kvm->buses (like with the kvm_io_bus_read() and -write()
>> functions) has to be protected via the kvm->srcu lock.
>> The kvmppc_h_logical_ci_load() and -store() functions are missing
>> this lock so far, so let's add it there, too.
>> This fixes the problem that the kernel reports "suspicious RCU usage"
>> when lock debugging is enabled.
>>
>> Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1
>> Signed-off-by: Thomas Huth <th...@redhat.com>
> 
> Nice catch.  Looks like I missed this because the places
> kvm_io_bus_{read,write}() are called on x86 are buried about 5 layers
> below where the srcu lock is taken :/.

AFAIK the philosophy for taking the srcu lock is completely different
between powerpc and x86. On powerpc it is only taken when needed (and
released immediately afterwards), while the x86 code tries to hold it
the whole time while not being in the guest and not being in userspace.
See vcpu_enter_guest() in the x86 code for example, the lock is dropped
before entering the guest, and taken again before leaving this function.

 Thomas




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] KVM: PPC: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()

2015-09-21 Thread Thomas Huth
On 21/09/15 03:37, David Gibson wrote:
> On Fri, Sep 18, 2015 at 08:57:28AM +0200, Thomas Huth wrote:
>> Access to the kvm->buses (like with the kvm_io_bus_read() and -write()
>> functions) has to be protected via the kvm->srcu lock.
>> The kvmppc_h_logical_ci_load() and -store() functions are missing
>> this lock so far, so let's add it there, too.
>> This fixes the problem that the kernel reports "suspicious RCU usage"
>> when lock debugging is enabled.
>>
>> Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1
>> Signed-off-by: Thomas Huth <th...@redhat.com>
> 
> Nice catch.  Looks like I missed this because the places
> kvm_io_bus_{read,write}() are called on x86 are buried about 5 layers
> below where the srcu lock is taken :/.

AFAIK the philosophy for taking the srcu lock is completely different
between powerpc and x86. On powerpc it is only taken when needed (and
released immediately afterwards), while the x86 code tries to hold it
the whole time while not being in the guest and not being in userspace.
See vcpu_enter_guest() in the x86 code for example, the lock is dropped
before entering the guest, and taken again before leaving this function.

 Thomas




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v4] ppc/spapr: Implement H_RANDOM hypercall in QEMU

2015-09-21 Thread Thomas Huth
On 21/09/15 10:01, Greg Kurz wrote:
> On Mon, 21 Sep 2015 12:10:00 +1000
> David Gibson <da...@gibson.dropbear.id.au> wrote:
> 
>> On Fri, Sep 18, 2015 at 11:05:52AM +0200, Greg Kurz wrote:
>>> On Thu, 17 Sep 2015 10:49:41 +0200
>>> Thomas Huth <th...@redhat.com> wrote:
>>>
>>>> The PAPR interface defines a hypercall to pass high-quality
>>>> hardware generated random numbers to guests. Recent kernels can
>>>> already provide this hypercall to the guest if the right hardware
>>>> random number generator is available. But in case the user wants
>>>> to use another source like EGD, or QEMU is running with an older
>>>> kernel, we should also have this call in QEMU, so that guests that
>>>> do not support virtio-rng yet can get good random numbers, too.
>>>>
>>>> This patch now adds a new pseudo-device to QEMU that either
>>>> directly provides this hypercall to the guest or is able to
>>>> enable the in-kernel hypercall if available.
...
>>> It is a good thing that the user can choose between in-kernel and backend,
>>> and this patch does the work.
>>>
>>> This being said, I am not sure about the use case where a user has a hwrng
>>> capable platform and wants to run guests without any hwrng support at all is
>>> an appropriate default behavior... I guess we will find more users that want
>>> in-kernel being the default if it is available.
>>>
>>> The patch below modifies yours to do just this: the pseudo-device is only
>>> created if hwrng is present and not already created.
>>
>> I have mixed feelings about this.  On the one hand, I agree that it
>> would be nice to allow H_RANDOM support by default.  On the other hand
>> the patch below leaves no way to turn it off for testing purposes.  It
>> also adds another place where the guest hardware depends on the host
>> configuration, which adds to the already substantial mess of ensuring
>> that source and destination hardware configuration matches for
>> migration.
> 
> Yeah, describing the guest hw is really essential for migration... this
> is best addressed at the libvirt level with a full XML description of
> the machine... but FWIW if we are talking about running pseries on a
> POWER8 or newer host, I am not aware about "hwrng-less" boards... but
> I am probably missing something :)

Maybe it would be at least ok to enable it by default as long as
"-nodefaults" has not been specified as command line option?

> Back to Thomas' patch, it does the job and brings H_RANDOM, which is
> currently missing.
> 
> Acked-by: Greg Kurz <gk...@linux.vnet.ibm.com>
> 
> I could test both use-kvm and backend flavors (including migration).
> 
> Tested-by: Greg Kurz <gk...@linux.vnet.ibm.com>

Thanks!

 Thomas




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs

2015-09-18 Thread Thomas Huth
On 18/09/15 08:57, Paul Mackerras wrote:
> This fixes a bug which results in stale vcore pointers being left in
> the per-cpu preempted vcore lists when a VM is destroyed.  The result
> of the stale vcore pointers is usually either a crash or a lockup
> inside collect_piggybacks() when another VM is run.  A typical
> lockup message looks like:
> 
> [  472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
> [qemu-system-ppc:7039]
> [  472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM 
> ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT 
> nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc 
> ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
> nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter 
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
> nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp 
> rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon 
> i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio 
> dm_multipath [last unloaded: kvm_hv]
> [  472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ 
> #49
> [  472.162187] task: c01e38512750 ti: c01e41bfc000 task.ti: 
> c01e41bfc000
> [  472.162262] NIP: c096b094 LR: c096b08c CTR: 
> c030
> [  472.162337] REGS: c01e41bff520 TRAP: 0901   Not tainted  (4.2.0-kvm+)
> [  472.162399] MSR: 90019033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24848844  
> XER: 
> [  472.162588] CFAR: c096b0ac SOFTE: 1
> GPR00: c070 c01e41bff7a0 c127df00 0001
> GPR04: 0003 0001  00874821
> GPR08: c01e41bff8e0 0001  defde740
> GPR12: c030 cfdae400
> [  472.163053] NIP [c096b094] _raw_spin_lock_irqsave+0xa4/0x130
> [  472.163117] LR [c096b08c] _raw_spin_lock_irqsave+0x9c/0x130
> [  472.163179] Call Trace:
> [  472.163206] [c01e41bff7a0] [c01e41bff7f0] 0xc01e41bff7f0 
> (unreliable)
> [  472.163295] [c01e41bff7e0] [c070] __wake_up+0x40/0x90
> [  472.163375] [c01e41bff830] [defd6fc0] 
> kvmppc_run_core+0x1240/0x1950 [kvm_hv]
> [  472.163465] [c01e41bffa30] [defd8510] 
> kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv]
> [  472.163559] [c01e41bffb70] [de9318a4] 
> kvmppc_vcpu_run+0x44/0x60 [kvm]
> [  472.163653] [c01e41bffba0] [de92e674] 
> kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
> [  472.163745] [c01e41bffbe0] [de9263a8] 
> kvm_vcpu_ioctl+0x538/0x7b0 [kvm]
> [  472.163834] [c01e41bffd40] [c02d0f50] do_vfs_ioctl+0x480/0x7c0
> [  472.163910] [c01e41bffde0] [c02d1364] SyS_ioctl+0xd4/0xf0
> [  472.163986] [c01e41bffe30] [c0009260] system_call+0x38/0xd0
> [  472.164060] Instruction dump:
> [  472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 6000 6000 6042 
> 8bad02e2
> [  472.164224] 7fc3f378 4b6a57c1 6000 7c210b78  89290009 
> 792affe3 40820070
> 
> The bug is that kvmppc_run_vcpu does not correctly handle the case
> where a vcpu task receives a signal while its guest vcpu is executing
> in the guest as a result of being piggy-backed onto the execution of
> another vcore.  In that case we need to wait for the vcpu to finish
> executing inside the guest, and then remove this vcore from the
> preempted vcores list.  That way, we avoid leaving this vcpu's vcore
> on the preempted vcores list when the vcpu gets interrupted.
> 
> Fixes: ec2571650826
> Reported-by: Thomas Huth <th...@redhat.com>
> Signed-off-by: Paul Mackerras <pau...@samba.org>
> ---
>  arch/powerpc/kvm/book3s_hv.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 9754e68..2280497 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -2692,9 +2692,13 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
> struct kvm_vcpu *vcpu)
>  
>   while (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE &&
>  (vc->vcore_state == VCORE_RUNNING ||
> - vc->vcore_state == VCORE_EXITING))
> + vc->vcore_state == VCORE_EXITING ||
> + vc->vcore_state == VCORE_PIGGYBACK))
>   kvmppc_wait_for_exec(vc, vcpu, TASK_UNINTERRUPTIBLE);
>  
> + if (vc->vcore_state == VCORE_PREEMPT && vc->runner == NULL)
> + kvmppc_vcore_end_preempt(vc);
> +
>   if (vcpu-&

Re: [PATCH] KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs

2015-09-18 Thread Thomas Huth
On 18/09/15 08:57, Paul Mackerras wrote:
> This fixes a bug which results in stale vcore pointers being left in
> the per-cpu preempted vcore lists when a VM is destroyed.  The result
> of the stale vcore pointers is usually either a crash or a lockup
> inside collect_piggybacks() when another VM is run.  A typical
> lockup message looks like:
> 
> [  472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! 
> [qemu-system-ppc:7039]
> [  472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM 
> ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT 
> nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc 
> ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
> nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter 
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
> nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp 
> rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon 
> i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio 
> dm_multipath [last unloaded: kvm_hv]
> [  472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ 
> #49
> [  472.162187] task: c01e38512750 ti: c01e41bfc000 task.ti: 
> c01e41bfc000
> [  472.162262] NIP: c096b094 LR: c096b08c CTR: 
> c030
> [  472.162337] REGS: c01e41bff520 TRAP: 0901   Not tainted  (4.2.0-kvm+)
> [  472.162399] MSR: 90019033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24848844  
> XER: 
> [  472.162588] CFAR: c096b0ac SOFTE: 1
> GPR00: c070 c01e41bff7a0 c127df00 0001
> GPR04: 0003 0001  00874821
> GPR08: c01e41bff8e0 0001  defde740
> GPR12: c030 cfdae400
> [  472.163053] NIP [c096b094] _raw_spin_lock_irqsave+0xa4/0x130
> [  472.163117] LR [c096b08c] _raw_spin_lock_irqsave+0x9c/0x130
> [  472.163179] Call Trace:
> [  472.163206] [c01e41bff7a0] [c01e41bff7f0] 0xc01e41bff7f0 
> (unreliable)
> [  472.163295] [c01e41bff7e0] [c070] __wake_up+0x40/0x90
> [  472.163375] [c01e41bff830] [defd6fc0] 
> kvmppc_run_core+0x1240/0x1950 [kvm_hv]
> [  472.163465] [c01e41bffa30] [defd8510] 
> kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv]
> [  472.163559] [c01e41bffb70] [de9318a4] 
> kvmppc_vcpu_run+0x44/0x60 [kvm]
> [  472.163653] [c01e41bffba0] [de92e674] 
> kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
> [  472.163745] [c01e41bffbe0] [de9263a8] 
> kvm_vcpu_ioctl+0x538/0x7b0 [kvm]
> [  472.163834] [c01e41bffd40] [c02d0f50] do_vfs_ioctl+0x480/0x7c0
> [  472.163910] [c01e41bffde0] [c02d1364] SyS_ioctl+0xd4/0xf0
> [  472.163986] [c01e41bffe30] [c0009260] system_call+0x38/0xd0
> [  472.164060] Instruction dump:
> [  472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 6000 6000 6042 
> 8bad02e2
> [  472.164224] 7fc3f378 4b6a57c1 6000 7c210b78  89290009 
> 792affe3 40820070
> 
> The bug is that kvmppc_run_vcpu does not correctly handle the case
> where a vcpu task receives a signal while its guest vcpu is executing
> in the guest as a result of being piggy-backed onto the execution of
> another vcore.  In that case we need to wait for the vcpu to finish
> executing inside the guest, and then remove this vcore from the
> preempted vcores list.  That way, we avoid leaving this vcpu's vcore
> on the preempted vcores list when the vcpu gets interrupted.
> 
> Fixes: ec2571650826
> Reported-by: Thomas Huth <th...@redhat.com>
> Signed-off-by: Paul Mackerras <pau...@samba.org>
> ---
>  arch/powerpc/kvm/book3s_hv.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 9754e68..2280497 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -2692,9 +2692,13 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, 
> struct kvm_vcpu *vcpu)
>  
>   while (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE &&
>  (vc->vcore_state == VCORE_RUNNING ||
> - vc->vcore_state == VCORE_EXITING))
> + vc->vcore_state == VCORE_EXITING ||
> + vc->vcore_state == VCORE_PIGGYBACK))
>   kvmppc_wait_for_exec(vc, vcpu, TASK_UNINTERRUPTIBLE);
>  
> + if (vc->vcore_state == VCORE_PREEMPT && vc->runner == NULL)
> + kvmppc_vcore_end_preempt(vc);
> +
>   if (vc

[PATCH] KVM: PPC: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()

2015-09-18 Thread Thomas Huth
Access to the kvm->buses (like with the kvm_io_bus_read() and -write()
functions) has to be protected via the kvm->srcu lock.
The kvmppc_h_logical_ci_load() and -store() functions are missing
this lock so far, so let's add it there, too.
This fixes the problem that the kernel reports "suspicious RCU usage"
when lock debugging is enabled.

Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1
Signed-off-by: Thomas Huth <th...@redhat.com>
---
 arch/powerpc/kvm/book3s.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index d75bf32..096e5eb 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -828,12 +828,15 @@ int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
unsigned long size = kvmppc_get_gpr(vcpu, 4);
unsigned long addr = kvmppc_get_gpr(vcpu, 5);
u64 buf;
+   int srcu_idx;
int ret;
 
if (!is_power_of_2(size) || (size > sizeof(buf)))
return H_TOO_HARD;
 
+   srcu_idx = srcu_read_lock(>kvm->srcu);
ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, );
+   srcu_read_unlock(>kvm->srcu, srcu_idx);
if (ret != 0)
return H_TOO_HARD;
 
@@ -868,6 +871,7 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
unsigned long addr = kvmppc_get_gpr(vcpu, 5);
unsigned long val = kvmppc_get_gpr(vcpu, 6);
u64 buf;
+   int srcu_idx;
int ret;
 
switch (size) {
@@ -891,7 +895,9 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
return H_TOO_HARD;
}
 
+   srcu_idx = srcu_read_lock(>kvm->srcu);
ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, size, );
+   srcu_read_unlock(>kvm->srcu, srcu_idx);
if (ret != 0)
return H_TOO_HARD;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()

2015-09-18 Thread Thomas Huth
Access to the kvm->buses (like with the kvm_io_bus_read() and -write()
functions) has to be protected via the kvm->srcu lock.
The kvmppc_h_logical_ci_load() and -store() functions are missing
this lock so far, so let's add it there, too.
This fixes the problem that the kernel reports "suspicious RCU usage"
when lock debugging is enabled.

Fixes: 99342cf8044420eebdf9297ca03a14cb6a7085a1
Signed-off-by: Thomas Huth <th...@redhat.com>
---
 arch/powerpc/kvm/book3s.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index d75bf32..096e5eb 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -828,12 +828,15 @@ int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
unsigned long size = kvmppc_get_gpr(vcpu, 4);
unsigned long addr = kvmppc_get_gpr(vcpu, 5);
u64 buf;
+   int srcu_idx;
int ret;
 
if (!is_power_of_2(size) || (size > sizeof(buf)))
return H_TOO_HARD;
 
+   srcu_idx = srcu_read_lock(>kvm->srcu);
ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, );
+   srcu_read_unlock(>kvm->srcu, srcu_idx);
if (ret != 0)
return H_TOO_HARD;
 
@@ -868,6 +871,7 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
unsigned long addr = kvmppc_get_gpr(vcpu, 5);
unsigned long val = kvmppc_get_gpr(vcpu, 6);
u64 buf;
+   int srcu_idx;
int ret;
 
switch (size) {
@@ -891,7 +895,9 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
return H_TOO_HARD;
}
 
+   srcu_idx = srcu_read_lock(>kvm->srcu);
ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, size, );
+   srcu_read_unlock(>kvm->srcu, srcu_idx);
if (ret != 0)
return H_TOO_HARD;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious RCU usage with kvm_pr

2015-09-18 Thread Thomas Huth
On 16/09/15 12:59, Denis Kirjanov wrote:
> On 9/16/15, Thomas Huth <th...@redhat.com> wrote:
>> On 16/09/15 10:51, Denis Kirjanov wrote:
>>> Hi,
>>>
>>> I see the following trace on qemu startup (ps700 blade):
>>>
>>> v4.2-11169-g64d1def
>>>
>>>
>>> [  143.369638] ===
>>> [  143.369640] [ INFO: suspicious RCU usage. ]
>>> [  143.369643] 4.2.0-11169-g64d1def #10 Tainted: G S
>>> [  143.369645] ---
>>> [  143.369647] arch/powerpc/kvm/../../../virt/kvm/kvm_main.c:3310
>>> suspicious rcu_dereference_check() usage!
>>> [  143.369649]
>>> other info that might help us debug this:
>>>
>>> [  143.369652]
>>> rcu_scheduler_active = 1, debug_locks = 1
>>> [  143.369655] 1 lock held by qemu-system-ppc/2292:
>>> [  143.369656]  #0:  (>mutex){+.+.+.}, at: []
>>> .vcpu_load+0x2c/0xb0 [kvm]
>>> [  143.369672]
>>> stack backtrace:
>>> [  143.369675] CPU: 12 PID: 2292 Comm: qemu-system-ppc Tainted: G S
>>>   4.2.0-11169-g64d1def #10
>>> [  143.369677] Call Trace:
>>> [  143.369682] [c001d08bf200] [c0816dd0]
>>> .dump_stack+0x98/0xd4 (unreliable)
>>> [  143.369687] [c001d08bf280] [c00f7058]
>>> .lockdep_rcu_suspicious+0x108/0x170
>>> [  143.369696] [c001d08bf310] [d42296d8]
>>> .kvm_io_bus_read+0x1d8/0x220 [kvm]
>>> [  143.369705] [c001d08bf3c0] [d422f980]
>>> .kvmppc_h_logical_ci_load+0x60/0xe0 [kvm]
>>
>> Could it be that we need to srcu_read_lock(>kvm->srcu) before
>> calling the kvm_io_bus_read/write() function in the
>> kvmppc_h_logical_ci_load/store() function?
> 
> I haven't had time to dig into this. I'll try it.

FYI, I had the same problem with kvm_hv, so I tried to come up with a patch:

https://patchwork.ozlabs.org/patch/519143/

Sorry, forgot to CC: you there, but it would be great if you could give
it a try!

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious RCU usage with kvm_pr

2015-09-18 Thread Thomas Huth
On 16/09/15 12:59, Denis Kirjanov wrote:
> On 9/16/15, Thomas Huth <th...@redhat.com> wrote:
>> On 16/09/15 10:51, Denis Kirjanov wrote:
>>> Hi,
>>>
>>> I see the following trace on qemu startup (ps700 blade):
>>>
>>> v4.2-11169-g64d1def
>>>
>>>
>>> [  143.369638] ===
>>> [  143.369640] [ INFO: suspicious RCU usage. ]
>>> [  143.369643] 4.2.0-11169-g64d1def #10 Tainted: G S
>>> [  143.369645] ---
>>> [  143.369647] arch/powerpc/kvm/../../../virt/kvm/kvm_main.c:3310
>>> suspicious rcu_dereference_check() usage!
>>> [  143.369649]
>>> other info that might help us debug this:
>>>
>>> [  143.369652]
>>> rcu_scheduler_active = 1, debug_locks = 1
>>> [  143.369655] 1 lock held by qemu-system-ppc/2292:
>>> [  143.369656]  #0:  (>mutex){+.+.+.}, at: []
>>> .vcpu_load+0x2c/0xb0 [kvm]
>>> [  143.369672]
>>> stack backtrace:
>>> [  143.369675] CPU: 12 PID: 2292 Comm: qemu-system-ppc Tainted: G S
>>>   4.2.0-11169-g64d1def #10
>>> [  143.369677] Call Trace:
>>> [  143.369682] [c001d08bf200] [c0816dd0]
>>> .dump_stack+0x98/0xd4 (unreliable)
>>> [  143.369687] [c001d08bf280] [c00f7058]
>>> .lockdep_rcu_suspicious+0x108/0x170
>>> [  143.369696] [c001d08bf310] [d42296d8]
>>> .kvm_io_bus_read+0x1d8/0x220 [kvm]
>>> [  143.369705] [c001d08bf3c0] [d422f980]
>>> .kvmppc_h_logical_ci_load+0x60/0xe0 [kvm]
>>
>> Could it be that we need to srcu_read_lock(>kvm->srcu) before
>> calling the kvm_io_bus_read/write() function in the
>> kvmppc_h_logical_ci_load/store() function?
> 
> I haven't had time to dig into this. I'll try it.

FYI, I had the same problem with kvm_hv, so I tried to come up with a patch:

https://patchwork.ozlabs.org/patch/519143/

Sorry, forgot to CC: you there, but it would be great if you could give
it a try!

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] ppc/spapr: Implement H_RANDOM hypercall in QEMU

2015-09-17 Thread Thomas Huth
The PAPR interface defines a hypercall to pass high-quality
hardware generated random numbers to guests. Recent kernels can
already provide this hypercall to the guest if the right hardware
random number generator is available. But in case the user wants
to use another source like EGD, or QEMU is running with an older
kernel, we should also have this call in QEMU, so that guests that
do not support virtio-rng yet can get good random numbers, too.

This patch now adds a new pseudo-device to QEMU that either
directly provides this hypercall to the guest or is able to
enable the in-kernel hypercall if available. The in-kernel
hypercall can be enabled with the use-kvm property, e.g.:

 qemu-system-ppc64 -device spapr-rng,use-kvm=true

For handling the hypercall in QEMU instead, a "RngBackend" is
required since the hypercall should provide "good" random data
instead of pseudo-random (like from a "simple" library function
like rand() or g_random_int()). Since there are multiple RngBackends
available, the user must select an appropriate back-end via the
"rng" property of the device, e.g.:

 qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=gid0 \
   -device spapr-rng,rng=gid0 ...

See http://wiki.qemu-project.org/Features-Done/VirtIORNG for
other example of specifying RngBackends.

Signed-off-by: Thomas Huth <th...@redhat.com>
---
 hw/ppc/Makefile.objs   |   2 +-
 hw/ppc/spapr.c |   8 +++
 hw/ppc/spapr_rng.c | 186 +
 include/hw/ppc/spapr.h |   4 ++
 target-ppc/kvm.c   |   9 +++
 target-ppc/kvm_ppc.h   |   5 ++
 6 files changed, 213 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/spapr_rng.c

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index c8ab06e..c1ffc77 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -3,7 +3,7 @@ obj-y += ppc.o ppc_booke.o
 # IBM pSeries (sPAPR)
 obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
-obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
+obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index bf0c64f..34e7d24 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -768,6 +768,14 @@ static void spapr_finalize_fdt(sPAPRMachineState *spapr,
 exit(1);
 }
 
+if (object_resolve_path_type("", TYPE_SPAPR_RNG, NULL)) {
+ret = spapr_rng_populate_dt(fdt);
+if (ret < 0) {
+fprintf(stderr, "could not set up rng device in the fdt\n");
+exit(1);
+}
+}
+
 QLIST_FOREACH(phb, >phbs, list) {
 ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
 }
diff --git a/hw/ppc/spapr_rng.c b/hw/ppc/spapr_rng.c
new file mode 100644
index 000..ed43d5e
--- /dev/null
+++ b/hw/ppc/spapr_rng.c
@@ -0,0 +1,186 @@
+/*
+ * QEMU sPAPR random number generator "device" for H_RANDOM hypercall
+ *
+ * Copyright 2015 Thomas Huth, Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/error-report.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/device_tree.h"
+#include "sysemu/rng.h"
+#include "hw/ppc/spapr.h"
+#include "kvm_ppc.h"
+
+#define SPAPR_RNG(obj) \
+OBJECT_CHECK(sPAPRRngState, (obj), TYPE_SPAPR_RNG)
+
+struct sPAPRRngState {
+/*< private >*/
+DeviceState ds;
+RngBackend *backend;
+bool use_kvm;
+};
+typedef struct sPAPRRngState sPAPRRngState;
+
+struct HRandomData {
+QemuSemaphore sem;
+union {
+uint64_t v64;
+uint8_t v8[8];
+} val;
+int received;
+};
+typedef struct HRandomData HRandomData;
+
+/* Callback function for the RngBackend */
+static void random_recv(void *dest, const void *src, size_t size)
+{
+HRandomData *hrdp = dest;
+
+if (src && size > 0) {
+assert(size + hrdp->received <= sizeof(hrdp->val.v8));
+memcpy(>val.v8[hrdp->received], src, size);
+hrdp->received += size;
+}
+
+qemu_sem_post(>sem);
+}
+
+/* Handler for the H_RANDOM hypercall */
+static

Re: suspicious RCU usage with kvm_pr

2015-09-16 Thread Thomas Huth
On 16/09/15 10:51, Denis Kirjanov wrote:
> Hi,
> 
> I see the following trace on qemu startup (ps700 blade):
> 
> v4.2-11169-g64d1def
> 
> 
> [  143.369638] ===
> [  143.369640] [ INFO: suspicious RCU usage. ]
> [  143.369643] 4.2.0-11169-g64d1def #10 Tainted: G S
> [  143.369645] ---
> [  143.369647] arch/powerpc/kvm/../../../virt/kvm/kvm_main.c:3310
> suspicious rcu_dereference_check() usage!
> [  143.369649]
> other info that might help us debug this:
> 
> [  143.369652]
> rcu_scheduler_active = 1, debug_locks = 1
> [  143.369655] 1 lock held by qemu-system-ppc/2292:
> [  143.369656]  #0:  (>mutex){+.+.+.}, at: []
> .vcpu_load+0x2c/0xb0 [kvm]
> [  143.369672]
> stack backtrace:
> [  143.369675] CPU: 12 PID: 2292 Comm: qemu-system-ppc Tainted: G S
>   4.2.0-11169-g64d1def #10
> [  143.369677] Call Trace:
> [  143.369682] [c001d08bf200] [c0816dd0]
> .dump_stack+0x98/0xd4 (unreliable)
> [  143.369687] [c001d08bf280] [c00f7058]
> .lockdep_rcu_suspicious+0x108/0x170
> [  143.369696] [c001d08bf310] [d42296d8]
> .kvm_io_bus_read+0x1d8/0x220 [kvm]
> [  143.369705] [c001d08bf3c0] [d422f980]
> .kvmppc_h_logical_ci_load+0x60/0xe0 [kvm]

Could it be that we need to srcu_read_lock(>kvm->srcu) before
calling the kvm_io_bus_read/write() function in the
kvmppc_h_logical_ci_load/store() function?

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suspicious RCU usage with kvm_pr

2015-09-16 Thread Thomas Huth
On 16/09/15 10:51, Denis Kirjanov wrote:
> Hi,
> 
> I see the following trace on qemu startup (ps700 blade):
> 
> v4.2-11169-g64d1def
> 
> 
> [  143.369638] ===
> [  143.369640] [ INFO: suspicious RCU usage. ]
> [  143.369643] 4.2.0-11169-g64d1def #10 Tainted: G S
> [  143.369645] ---
> [  143.369647] arch/powerpc/kvm/../../../virt/kvm/kvm_main.c:3310
> suspicious rcu_dereference_check() usage!
> [  143.369649]
> other info that might help us debug this:
> 
> [  143.369652]
> rcu_scheduler_active = 1, debug_locks = 1
> [  143.369655] 1 lock held by qemu-system-ppc/2292:
> [  143.369656]  #0:  (>mutex){+.+.+.}, at: []
> .vcpu_load+0x2c/0xb0 [kvm]
> [  143.369672]
> stack backtrace:
> [  143.369675] CPU: 12 PID: 2292 Comm: qemu-system-ppc Tainted: G S
>   4.2.0-11169-g64d1def #10
> [  143.369677] Call Trace:
> [  143.369682] [c001d08bf200] [c0816dd0]
> .dump_stack+0x98/0xd4 (unreliable)
> [  143.369687] [c001d08bf280] [c00f7058]
> .lockdep_rcu_suspicious+0x108/0x170
> [  143.369696] [c001d08bf310] [d42296d8]
> .kvm_io_bus_read+0x1d8/0x220 [kvm]
> [  143.369705] [c001d08bf3c0] [d422f980]
> .kvmppc_h_logical_ci_load+0x60/0xe0 [kvm]

Could it be that we need to srcu_read_lock(>kvm->srcu) before
calling the kvm_io_bus_read/write() function in the
kvmppc_h_logical_ci_load/store() function?

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] ppc/spapr: Implement H_RANDOM hypercall in QEMU

2015-09-14 Thread Thomas Huth
On 14/09/15 04:15, David Gibson wrote:
> On Fri, Sep 11, 2015 at 11:17:01AM +0200, Thomas Huth wrote:
>> The PAPR interface defines a hypercall to pass high-quality
>> hardware generated random numbers to guests. Recent kernels can
>> already provide this hypercall to the guest if the right hardware
>> random number generator is available. But in case the user wants
>> to use another source like EGD, or QEMU is running with an older
>> kernel, we should also have this call in QEMU, so that guests that
>> do not support virtio-rng yet can get good random numbers, too.
>>
>> This patch now adds a new pseude-device to QEMU that either
>> directly provides this hypercall to the guest or is able to
>> enable the in-kernel hypercall if available. The in-kernel
>> hypercall can be enabled with the use-kvm property, e.g.:
>>
>>  qemu-system-ppc64 -device spapr-rng,use-kvm=true
>>
>> For handling the hypercall in QEMU instead, a RngBackend is required
>> since the hypercall should provide "good" random data instead of
>> pseudo-random (like from a "simple" library function like rand()
>> or g_random_int()). Since there are multiple RngBackends available,
>> the user must select an appropriate backend via the "backend"
>> property of the device, e.g.:
>>
>>  qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=rng0 \
>>-device spapr-rng,backend=rng0 ...
>>
>> See http://wiki.qemu-project.org/Features-Done/VirtIORNG for
>> other example of specifying RngBackends.
...
>> +
>> +#include "qemu/error-report.h"
>> +#include "sysemu/sysemu.h"
>> +#include "sysemu/device_tree.h"
>> +#include "sysemu/rng.h"
>> +#include "hw/ppc/spapr.h"
>> +#include "kvm_ppc.h"
>> +
>> +#define SPAPR_RNG(obj) \
>> +OBJECT_CHECK(sPAPRRngState, (obj), TYPE_SPAPR_RNG)
>> +
>> +typedef struct sPAPRRngState {
>> +/*< private >*/
>> +DeviceState ds;
>> +RngBackend *backend;
>> +bool use_kvm;
>> +} sPAPRRngState;
>> +
>> +typedef struct HRandomData {
>> +QemuSemaphore sem;
>> +union {
>> +uint64_t v64;
>> +uint8_t v8[8];
>> +} val;
>> +int received;
>> +} HRandomData;
>> +
>> +/* Callback function for the RngBackend */
>> +static void random_recv(void *dest, const void *src, size_t size)
>> +{
>> +HRandomData *hrdp = dest;
>> +
>> +if (src && size > 0) {
>> +assert(size + hrdp->received <= sizeof(hrdp->val.v8));
>> +memcpy(>val.v8[hrdp->received], src, size);
>> +hrdp->received += size;
>> +}
>> +
>> +qemu_sem_post(>sem);
> 
> I'm assuming qemu_sem_post() includes the necessary memory barrier to
> make sure the requesting thread actually sees the data.

Not sure whether I fully got your point here... both callback function
and main thread are calling an extern C-function, so the compiler should
not assume that the memory stays the same in the main thread...?

Anyway, I've tested the hypercall by implementing it in SLOF and calling
it a couple of times there to see that all bits in the result behave
randomly, so for me this is working fine.

>> +}
>> +
>> +/* Handler for the H_RANDOM hypercall */
>> +static target_ulong h_random(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>> + target_ulong opcode, target_ulong *args)
>> +{
>> +sPAPRRngState *rngstate;
>> +HRandomData hrdata;
>> +
>> +rngstate = SPAPR_RNG(object_resolve_path_type("", TYPE_SPAPR_RNG, 
>> NULL));
>> +
>> +if (!rngstate || !rngstate->backend) {
>> +return H_HARDWARE;
>> +}
>> +
>> +qemu_sem_init(, 0);
>> +hrdata.val.v64 = 0;
>> +hrdata.received = 0;
>> +
>> +qemu_mutex_unlock_iothread();
>> +while (hrdata.received < 8) {
>> +rng_backend_request_entropy(rngstate->backend, 8 - hrdata.received,
>> +random_recv, );
>> +qemu_sem_wait();
>> +}
>> +qemu_mutex_lock_iothread();
>> +
>> +qemu_sem_destroy();
>> +args[0] = hrdata.val.v64;
>> +
>> +return H_SUCCESS;
>> +}
>> +
>> +static void spapr_rng_instance_init(Object *obj)
>> +{
>> +sPAPRRngState *rngstate = SPAPR_RNG(obj);
>> +
>> +if (object_resolve_path_type("", TYPE_SPAPR_RNG, NULL) != 

[PATCH v3] ppc/spapr: Implement H_RANDOM hypercall in QEMU

2015-09-11 Thread Thomas Huth
The PAPR interface defines a hypercall to pass high-quality
hardware generated random numbers to guests. Recent kernels can
already provide this hypercall to the guest if the right hardware
random number generator is available. But in case the user wants
to use another source like EGD, or QEMU is running with an older
kernel, we should also have this call in QEMU, so that guests that
do not support virtio-rng yet can get good random numbers, too.

This patch now adds a new pseude-device to QEMU that either
directly provides this hypercall to the guest or is able to
enable the in-kernel hypercall if available. The in-kernel
hypercall can be enabled with the use-kvm property, e.g.:

 qemu-system-ppc64 -device spapr-rng,use-kvm=true

For handling the hypercall in QEMU instead, a RngBackend is required
since the hypercall should provide "good" random data instead of
pseudo-random (like from a "simple" library function like rand()
or g_random_int()). Since there are multiple RngBackends available,
the user must select an appropriate backend via the "backend"
property of the device, e.g.:

 qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=rng0 \
   -device spapr-rng,backend=rng0 ...

See http://wiki.qemu-project.org/Features-Done/VirtIORNG for
other example of specifying RngBackends.

Signed-off-by: Thomas Huth <th...@redhat.com>
---
 v3:
 - Completely reworked the patch set accordingly to discussion
   on the mailing list, so that the code is now encapsulated
   as a QEMU device in a separate file.

 hw/ppc/Makefile.objs   |   2 +-
 hw/ppc/spapr.c |   8 +++
 hw/ppc/spapr_rng.c | 178 +
 include/hw/ppc/spapr.h |   4 ++
 target-ppc/kvm.c   |   9 +++
 target-ppc/kvm_ppc.h   |   5 ++
 6 files changed, 205 insertions(+), 1 deletion(-)
 create mode 100644 hw/ppc/spapr_rng.c

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index c8ab06e..c1ffc77 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -3,7 +3,7 @@ obj-y += ppc.o ppc_booke.o
 # IBM pSeries (sPAPR)
 obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
-obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o
+obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
 ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
 obj-y += spapr_pci_vfio.o
 endif
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index bf0c64f..34e7d24 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -768,6 +768,14 @@ static void spapr_finalize_fdt(sPAPRMachineState *spapr,
 exit(1);
 }
 
+if (object_resolve_path_type("", TYPE_SPAPR_RNG, NULL)) {
+ret = spapr_rng_populate_dt(fdt);
+if (ret < 0) {
+fprintf(stderr, "couldn't setup rng device in fdt\n");
+exit(1);
+}
+}
+
 QLIST_FOREACH(phb, >phbs, list) {
 ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
 }
diff --git a/hw/ppc/spapr_rng.c b/hw/ppc/spapr_rng.c
new file mode 100644
index 000..d4923bc
--- /dev/null
+++ b/hw/ppc/spapr_rng.c
@@ -0,0 +1,178 @@
+/*
+ * QEMU sPAPR random number generator "device" for H_RANDOM hypercall
+ *
+ * Copyright 2015 Thomas Huth, Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/error-report.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/device_tree.h"
+#include "sysemu/rng.h"
+#include "hw/ppc/spapr.h"
+#include "kvm_ppc.h"
+
+#define SPAPR_RNG(obj) \
+OBJECT_CHECK(sPAPRRngState, (obj), TYPE_SPAPR_RNG)
+
+typedef struct sPAPRRngState {
+/*< private >*/
+DeviceState ds;
+RngBackend *backend;
+bool use_kvm;
+} sPAPRRngState;
+
+typedef struct HRandomData {
+QemuSemaphore sem;
+union {
+uint64_t v64;
+uint8_t v8[8];
+} val;
+int received;
+} HRandomData;
+
+/* Callback function for the RngBackend */
+static void random_recv(void *dest, const void *src, size_t size)
+{
+HRandomData *hrdp = dest;
+
+if (src && size > 0) {
+assert(size + hrdp->received <= sizeof(hrdp->val.v8));
+memcpy(>val.v8[hrdp->received], src, size);
+hrdp-&g

Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-08 Thread Thomas Huth
On 07/09/15 16:31, Igor Mammedov wrote:
> On Fri, 4 Sep 2015 12:04:41 +0200
> Alexander Graf <ag...@suse.de> wrote:
> 
>>
>>
>> On 04.09.15 11:59, Christian Borntraeger wrote:
>>> Am 04.09.2015 um 11:35 schrieb Thomas Huth:
>>>>
>>>>  Hi all,
>>>>
>>>> now that we get memory hotplugging for the spapr machine on qemu-ppc,
>>>> too, it seems like we easily can hit the amount of KVM-internal memory
>>>> slots now ("#define KVM_USER_MEM_SLOTS 32" in
>>>> arch/powerpc/include/asm/kvm_host.h). For example, start
>>>> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
>>>> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
>>>> you'll see that it aborts way earlier already.
>>>>
>>>> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
>>>> already (+3 internal slots = 512) ... maybe we should now increase the
>>>> amount of slots on powerpc, too? Since we don't use internal slots on
>>>> POWER, would 512 be a good value? Or would less be sufficient, too?
>>>
>>> When you are at it, the s390 value should also be increased I guess.
>>
>> That constant defines the array size for the memslot array in struct kvm
>> which in turn again gets allocated by kzalloc, so it's pinned kernel
>> memory that is physically contiguous. Doing big allocations can turn
>> into problems during runtime.
>>
>> So maybe there is another way? Can we extend the memslot array size
>> dynamically somehow? Allocate it separately? How much memory does the
>> memslot array use up with 512 entries?
> 
> KVM switched memslots allocation to kvm_kvzalloc(), so it would fallback to 
> vmalloc
>  commit 744961341d472db6272ed9b42319a90f5a2aa7c4
>  kvm: avoid page allocation failure in kvm_set_memory_region()

Good hint, thanks for pointing that out! ... so increasing the array
size should not cause too much trouble :-)

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-08 Thread Thomas Huth
On 07/09/15 16:31, Igor Mammedov wrote:
> On Fri, 4 Sep 2015 12:04:41 +0200
> Alexander Graf <ag...@suse.de> wrote:
> 
>>
>>
>> On 04.09.15 11:59, Christian Borntraeger wrote:
>>> Am 04.09.2015 um 11:35 schrieb Thomas Huth:
>>>>
>>>>  Hi all,
>>>>
>>>> now that we get memory hotplugging for the spapr machine on qemu-ppc,
>>>> too, it seems like we easily can hit the amount of KVM-internal memory
>>>> slots now ("#define KVM_USER_MEM_SLOTS 32" in
>>>> arch/powerpc/include/asm/kvm_host.h). For example, start
>>>> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
>>>> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
>>>> you'll see that it aborts way earlier already.
>>>>
>>>> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
>>>> already (+3 internal slots = 512) ... maybe we should now increase the
>>>> amount of slots on powerpc, too? Since we don't use internal slots on
>>>> POWER, would 512 be a good value? Or would less be sufficient, too?
>>>
>>> When you are at it, the s390 value should also be increased I guess.
>>
>> That constant defines the array size for the memslot array in struct kvm
>> which in turn again gets allocated by kzalloc, so it's pinned kernel
>> memory that is physically contiguous. Doing big allocations can turn
>> into problems during runtime.
>>
>> So maybe there is another way? Can we extend the memslot array size
>> dynamically somehow? Allocate it separately? How much memory does the
>> memslot array use up with 512 entries?
> 
> KVM switched memslots allocation to kvm_kvzalloc(), so it would fallback to 
> vmalloc
>  commit 744961341d472db6272ed9b42319a90f5a2aa7c4
>  kvm: avoid page allocation failure in kvm_set_memory_region()

Good hint, thanks for pointing that out! ... so increasing the array
size should not cause too much trouble :-)

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-08 Thread Thomas Huth
On 08/09/15 09:11, Christian Borntraeger wrote:
> Am 08.09.2015 um 08:05 schrieb Thomas Huth:
>> On 07/09/15 16:31, Igor Mammedov wrote:
>>> On Fri, 4 Sep 2015 12:04:41 +0200
>>> Alexander Graf <ag...@suse.de> wrote:
>>>> On 04.09.15 11:59, Christian Borntraeger wrote:
>>>>> Am 04.09.2015 um 11:35 schrieb Thomas Huth:
>>>>>>
>>>>>> now that we get memory hotplugging for the spapr machine on qemu-ppc,
>>>>>> too, it seems like we easily can hit the amount of KVM-internal memory
>>>>>> slots now ("#define KVM_USER_MEM_SLOTS 32" in
>>>>>> arch/powerpc/include/asm/kvm_host.h). For example, start
>>>>>> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
>>>>>> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
>>>>>> you'll see that it aborts way earlier already.
>>>>>>
>>>>>> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
>>>>>> already (+3 internal slots = 512) ... maybe we should now increase the
>>>>>> amount of slots on powerpc, too? Since we don't use internal slots on
>>>>>> POWER, would 512 be a good value? Or would less be sufficient, too?
>>>>>
>>>>> When you are at it, the s390 value should also be increased I guess.
>>>>
>>>> That constant defines the array size for the memslot array in struct kvm
>>>> which in turn again gets allocated by kzalloc, so it's pinned kernel
>>>> memory that is physically contiguous. Doing big allocations can turn
>>>> into problems during runtime.
>>>>
>>>> So maybe there is another way? Can we extend the memslot array size
>>>> dynamically somehow? Allocate it separately? How much memory does the
>>>> memslot array use up with 512 entries?
>>>
>>> KVM switched memslots allocation to kvm_kvzalloc(), so it would fallback to 
>>> vmalloc
>>>  commit 744961341d472db6272ed9b42319a90f5a2aa7c4
>>>  kvm: avoid page allocation failure in kvm_set_memory_region()
>>
>> Good hint, thanks for pointing that out! ... so increasing the array
>> size should not cause too much trouble :-)
> 
> Changing the allocation of the memslots to a growing structure seems like a
> good idea nevertheless. Any chance to do this as well when you are at it?

I'd like to finish some other stuff first, so if you want to have a try,
feel free to do so ... if not, I'll have a closer look at this later.

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM memory slots limit on powerpc

2015-09-04 Thread Thomas Huth

 Hi all,

now that we get memory hotplugging for the spapr machine on qemu-ppc,
too, it seems like we easily can hit the amount of KVM-internal memory
slots now ("#define KVM_USER_MEM_SLOTS 32" in
arch/powerpc/include/asm/kvm_host.h). For example, start
qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
you'll see that it aborts way earlier already.

The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
already (+3 internal slots = 512) ... maybe we should now increase the
amount of slots on powerpc, too? Since we don't use internal slots on
POWER, would 512 be a good value? Or would less be sufficient, too?

 Thomas
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM memory slots limit on powerpc

2015-09-04 Thread Thomas Huth

 Hi all,

now that we get memory hotplugging for the spapr machine on qemu-ppc,
too, it seems like we easily can hit the amount of KVM-internal memory
slots now ("#define KVM_USER_MEM_SLOTS 32" in
arch/powerpc/include/asm/kvm_host.h). For example, start
qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
you'll see that it aborts way earlier already.

The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
already (+3 internal slots = 512) ... maybe we should now increase the
amount of slots on powerpc, too? Since we don't use internal slots on
POWER, would 512 be a good value? Or would less be sufficient, too?

 Thomas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-04 Thread Thomas Huth
On 04/09/15 12:07, Christian Borntraeger wrote:
> Am 04.09.2015 um 12:04 schrieb Alexander Graf:
>>
>>
>> On 04.09.15 11:59, Christian Borntraeger wrote:
>>> Am 04.09.2015 um 11:35 schrieb Thomas Huth:
>>>>
>>>>  Hi all,
>>>>
>>>> now that we get memory hotplugging for the spapr machine on qemu-ppc,
>>>> too, it seems like we easily can hit the amount of KVM-internal memory
>>>> slots now ("#define KVM_USER_MEM_SLOTS 32" in
>>>> arch/powerpc/include/asm/kvm_host.h). For example, start
>>>> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
>>>> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
>>>> you'll see that it aborts way earlier already.
>>>>
>>>> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
>>>> already (+3 internal slots = 512) ... maybe we should now increase the
>>>> amount of slots on powerpc, too? Since we don't use internal slots on
>>>> POWER, would 512 be a good value? Or would less be sufficient, too?
>>>
>>> When you are at it, the s390 value should also be increased I guess.
>>
>> That constant defines the array size for the memslot array in struct kvm
>> which in turn again gets allocated by kzalloc, so it's pinned kernel
>> memory that is physically contiguous. Doing big allocations can turn
>> into problems during runtime.
>>
>> So maybe there is another way? Can we extend the memslot array size
>> dynamically somehow? Allocate it separately? How much memory does the
>> memslot array use up with 512 entries?
> 
> Maybe some rcu protected scheme that doubles the amount of memslots for
> each overrun? Yes, that would be good and even reduce the footprint for
> systems with only a small number of memslots.

Seems like Alex Williamson already posted a patchset for growable
memslots a couple of years ago:

 http://www.spinics.net/lists/kvm/msg50491.html

But I didn't quite spot the result in that thread why it never has been
included upstream. Alex (W.), do you remember the outcome?

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-04 Thread Thomas Huth
On 04/09/15 12:07, Christian Borntraeger wrote:
> Am 04.09.2015 um 12:04 schrieb Alexander Graf:
>>
>>
>> On 04.09.15 11:59, Christian Borntraeger wrote:
>>> Am 04.09.2015 um 11:35 schrieb Thomas Huth:
>>>>
>>>>  Hi all,
>>>>
>>>> now that we get memory hotplugging for the spapr machine on qemu-ppc,
>>>> too, it seems like we easily can hit the amount of KVM-internal memory
>>>> slots now ("#define KVM_USER_MEM_SLOTS 32" in
>>>> arch/powerpc/include/asm/kvm_host.h). For example, start
>>>> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
>>>> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
>>>> you'll see that it aborts way earlier already.
>>>>
>>>> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
>>>> already (+3 internal slots = 512) ... maybe we should now increase the
>>>> amount of slots on powerpc, too? Since we don't use internal slots on
>>>> POWER, would 512 be a good value? Or would less be sufficient, too?
>>>
>>> When you are at it, the s390 value should also be increased I guess.
>>
>> That constant defines the array size for the memslot array in struct kvm
>> which in turn again gets allocated by kzalloc, so it's pinned kernel
>> memory that is physically contiguous. Doing big allocations can turn
>> into problems during runtime.
>>
>> So maybe there is another way? Can we extend the memslot array size
>> dynamically somehow? Allocate it separately? How much memory does the
>> memslot array use up with 512 entries?
> 
> Maybe some rcu protected scheme that doubles the amount of memslots for
> each overrun? Yes, that would be good and even reduce the footprint for
> systems with only a small number of memslots.

Seems like Alex Williamson already posted a patchset for growable
memslots a couple of years ago:

 http://www.spinics.net/lists/kvm/msg50491.html

But I didn't quite spot the result in that thread why it never has been
included upstream. Alex (W.), do you remember the outcome?

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-04 Thread Thomas Huth
On 04/09/15 12:04, Alexander Graf wrote:
> 
> 
> On 04.09.15 11:59, Christian Borntraeger wrote:
>> Am 04.09.2015 um 11:35 schrieb Thomas Huth:
>>>
>>>  Hi all,
>>>
>>> now that we get memory hotplugging for the spapr machine on qemu-ppc,
>>> too, it seems like we easily can hit the amount of KVM-internal memory
>>> slots now ("#define KVM_USER_MEM_SLOTS 32" in
>>> arch/powerpc/include/asm/kvm_host.h). For example, start
>>> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
>>> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
>>> you'll see that it aborts way earlier already.
>>>
>>> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
>>> already (+3 internal slots = 512) ... maybe we should now increase the
>>> amount of slots on powerpc, too? Since we don't use internal slots on
>>> POWER, would 512 be a good value? Or would less be sufficient, too?
>>
>> When you are at it, the s390 value should also be increased I guess.
> 
> That constant defines the array size for the memslot array in struct kvm
> which in turn again gets allocated by kzalloc, so it's pinned kernel
> memory that is physically contiguous. Doing big allocations can turn
> into problems during runtime.

FWIW, I've just checked sizeof(struct kvm) with the current ppc64 kernel
build from master branch, and it is 34144 bytes.
So on a system that is using PAGE_SIZE = 64kB, there should be plenty of
space left before we're getting into trouble.

And even assuming the worst case, that we're on a system which still
uses PAGE_SIZE = 4kB, the last page of the 34144 bytes is only filled
with 1376 bytes, leaving 2720 bytes free right now.
sizeof(struct kvm_memory_slot) is 48 bytes right now on powerpc, and you
need two additional bytes per entry for the id_to_index array in
struct kvm_memslots, i.e. we need 50 additional bytes per entry on ppc.
That means we could increase KVM_USER_MEM_SLOTS by 2720 / 50 = 54
entries without getting into further trouble.

I think we should leave some more additional bytes left in that last
4k page of the struct kvm region, ... so what about increasing
KVM_USER_MEM_SLOTS to 32 + 48 = 80 now (instead of 32 + 54 = 86) to
ease the memslot situation at least a little bit 'till we figured out
a really final solution like growable memslots?

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-ppc] KVM memory slots limit on powerpc

2015-09-04 Thread Thomas Huth
On 04/09/15 12:04, Alexander Graf wrote:
> 
> 
> On 04.09.15 11:59, Christian Borntraeger wrote:
>> Am 04.09.2015 um 11:35 schrieb Thomas Huth:
>>>
>>>  Hi all,
>>>
>>> now that we get memory hotplugging for the spapr machine on qemu-ppc,
>>> too, it seems like we easily can hit the amount of KVM-internal memory
>>> slots now ("#define KVM_USER_MEM_SLOTS 32" in
>>> arch/powerpc/include/asm/kvm_host.h). For example, start
>>> qemu-system-ppc64 with a couple of "-device secondary-vga" and "-m
>>> 4G,slots=32,maxmem=40G" and then try to hot-plug all 32 DIMMs ... and
>>> you'll see that it aborts way earlier already.
>>>
>>> The x86 code already increased the amount of KVM_USER_MEM_SLOTS to 509
>>> already (+3 internal slots = 512) ... maybe we should now increase the
>>> amount of slots on powerpc, too? Since we don't use internal slots on
>>> POWER, would 512 be a good value? Or would less be sufficient, too?
>>
>> When you are at it, the s390 value should also be increased I guess.
> 
> That constant defines the array size for the memslot array in struct kvm
> which in turn again gets allocated by kzalloc, so it's pinned kernel
> memory that is physically contiguous. Doing big allocations can turn
> into problems during runtime.

FWIW, I've just checked sizeof(struct kvm) with the current ppc64 kernel
build from master branch, and it is 34144 bytes.
So on a system that is using PAGE_SIZE = 64kB, there should be plenty of
space left before we're getting into trouble.

And even assuming the worst case, that we're on a system which still
uses PAGE_SIZE = 4kB, the last page of the 34144 bytes is only filled
with 1376 bytes, leaving 2720 bytes free right now.
sizeof(struct kvm_memory_slot) is 48 bytes right now on powerpc, and you
need two additional bytes per entry for the id_to_index array in
struct kvm_memslots, i.e. we need 50 additional bytes per entry on ppc.
That means we could increase KVM_USER_MEM_SLOTS by 2720 / 50 = 54
entries without getting into further trouble.

I think we should leave some more additional bytes left in that last
4k page of the struct kvm region, ... so what about increasing
KVM_USER_MEM_SLOTS to 32 + 48 = 80 now (instead of 32 + 54 = 86) to
ease the memslot situation at least a little bit 'till we figured out
a really final solution like growable memslots?

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-02 Thread Thomas Huth
On 02/09/15 00:24, Paul Mackerras wrote:
> On Tue, Sep 01, 2015 at 11:41:18PM +0200, Thomas Huth wrote:
>> The size of the Problem State Priority Boost Register is only
>> 32 bits, so let's change the type of the corresponding variable
>> accordingly to avoid future trouble.
> 
> Since we're already using lwz/stw in the assembly code in
> book3s_hv_rmhandlers.S, this is actually a bug fix, isn't it?
> How did you find it?  Did you observe a failure of some kind, or did
> you just find it by code inspection?

Code inspection. I was looking for similar problems like the issue with
the XER register that we've hit recently
(https://patchwork.ozlabs.org/patch/476872/) while I was trying to debug
a similar problem.

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-02 Thread Thomas Huth
On 02/09/15 00:24, Paul Mackerras wrote:
> On Tue, Sep 01, 2015 at 11:41:18PM +0200, Thomas Huth wrote:
>> The size of the Problem State Priority Boost Register is only
>> 32 bits, so let's change the type of the corresponding variable
>> accordingly to avoid future trouble.
> 
> Since we're already using lwz/stw in the assembly code in
> book3s_hv_rmhandlers.S, this is actually a bug fix, isn't it?
> How did you find it?  Did you observe a failure of some kind, or did
> you just find it by code inspection?

Code inspection. I was looking for similar problems like the issue with
the XER register that we've hit recently
(https://patchwork.ozlabs.org/patch/476872/) while I was trying to debug
a similar problem.

 Thomas

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: ppc: Fix size of the PSPB register

2015-09-02 Thread Thomas Huth
The size of the Problem State Priority Boost Register is only
32 bits, but the kvm_vcpu_arch->pspb variable is declared as
"ulong", ie. 64-bit. However, the assembler code accesses this
variable with 32-bit accesses, and the KVM_REG_PPC_PSPB macro
is defined with SIZE_U32, too, so that the current code is
broken on big endian hosts: kvmppc_get_one_reg_hv() will only
return zero for this register since it is using the wrong half
of the pspb variable. Let's fix this problem by adjusting the
size of the pspb field in the kvm_vcpu_arch structure.

Signed-off-by: Thomas Huth <th...@redhat.com>
---
 v2: Updated patch description

 arch/powerpc/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..c825f3a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -473,7 +473,7 @@ struct kvm_vcpu_arch {
ulong ciabr;
ulong cfar;
ulong ppr;
-   ulong pspb;
+   u32 pspb;
ulong fscr;
ulong shadow_fscr;
ulong ebbhr;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-02 Thread Thomas Huth
On 02/09/15 10:26, Alexander Graf wrote:
> 
>> Am 02.09.2015 um 09:26 schrieb Thomas Huth <th...@redhat.com>:
>>
>>> On 02/09/15 00:55, Benjamin Herrenschmidt wrote:
>>>> On Wed, 2015-09-02 at 08:45 +1000, Paul Mackerras wrote:
>>>> On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt
>>>> wrote:
>>>>> On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote:
>>>>>> The size of the Problem State Priority Boost Register is only
>>>>>> 32 bits, so let's change the type of the corresponding variable
>>>>>> accordingly to avoid future trouble.
>>>>>
>>>>> It's not future trouble, it's broken today for LE and this should
>>>>> fix
>>>>> it BUT 
>>>>
>>>> No, it's broken today for BE hosts, which will always see 0 for the
>>>> PSPB register value.  LE hosts are fine.
>>
>> Right ... I just meant that nobody really experienced trouble with this
>> today yet, but the bug is already present now already of course.
> 
> Sounds like a great candidate for kvm-unit-tests then, no? ;)

I'm certainly looking forward to seeing powerpc support in there :-)

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: ppc: Fix size of the PSPB register

2015-09-02 Thread Thomas Huth
The size of the Problem State Priority Boost Register is only
32 bits, but the kvm_vcpu_arch->pspb variable is declared as
"ulong", ie. 64-bit. However, the assembler code accesses this
variable with 32-bit accesses, and the KVM_REG_PPC_PSPB macro
is defined with SIZE_U32, too, so that the current code is
broken on big endian hosts: kvmppc_get_one_reg_hv() will only
return zero for this register since it is using the wrong half
of the pspb variable. Let's fix this problem by adjusting the
size of the pspb field in the kvm_vcpu_arch structure.

Signed-off-by: Thomas Huth <th...@redhat.com>
---
 v2: Updated patch description

 arch/powerpc/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..c825f3a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -473,7 +473,7 @@ struct kvm_vcpu_arch {
ulong ciabr;
ulong cfar;
ulong ppr;
-   ulong pspb;
+   u32 pspb;
ulong fscr;
ulong shadow_fscr;
ulong ebbhr;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-02 Thread Thomas Huth
On 02/09/15 00:55, Benjamin Herrenschmidt wrote:
> On Wed, 2015-09-02 at 08:45 +1000, Paul Mackerras wrote:
>> On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt
>> wrote:
>>> On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote:
>>>> The size of the Problem State Priority Boost Register is only
>>>> 32 bits, so let's change the type of the corresponding variable
>>>> accordingly to avoid future trouble.
>>>
>>> It's not future trouble, it's broken today for LE and this should
>>> fix
>>> it BUT 
>>
>> No, it's broken today for BE hosts, which will always see 0 for the
>> PSPB register value.  LE hosts are fine.

Right ... I just meant that nobody really experienced trouble with this
today yet, but the bug is already present now already of course.

>>> The asm accesses it using lwz/stw and C accesses it as a ulong. On
>>> LE
>>> that will mean that userspace will see the value << 32
>>
>> No, that will happen on BE, and since KVM_REG_PPC_PSPB says it's a
>> 32-bit register, we'll just pass 0 back to userspace when it reads
>> it.
> 
> Ah ok, I missed that bit about KVM_REG_PPC_PSPB
> 
>>> Now "fixing" it might break migration if that field is already
>>> stored/loaded in its "broken" form. We may have to keep the
>>> "broken"
>>> behaviour and document that qemu sees a value shifted by 32.
>>
>> It will be being set to 0 on BE hosts across migration today
>> (fortunately 0 is a benign value for PSPB).  If we fix this on both
>> the source and destination host, then the value will get migrated
>> across correctly.
> 
> Ok, I missed the part where KVM_REG_PPC_PSPB passed it down as a 32
> -bit. That means Thomas patch should work indeed.

... and if I get the QEMU source code right, the register is currently
not migrated at all - or at least I was not able to find the spot in the
source code that migrates this register.

>> I think Thomas's patch is fine, it just needs a stronger patch
>> description saying that it fixes an actual bug.

Ok, I'll resend with a better patch description.

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-02 Thread Thomas Huth
On 02/09/15 10:26, Alexander Graf wrote:
> 
>> Am 02.09.2015 um 09:26 schrieb Thomas Huth <th...@redhat.com>:
>>
>>> On 02/09/15 00:55, Benjamin Herrenschmidt wrote:
>>>> On Wed, 2015-09-02 at 08:45 +1000, Paul Mackerras wrote:
>>>> On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt
>>>> wrote:
>>>>> On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote:
>>>>>> The size of the Problem State Priority Boost Register is only
>>>>>> 32 bits, so let's change the type of the corresponding variable
>>>>>> accordingly to avoid future trouble.
>>>>>
>>>>> It's not future trouble, it's broken today for LE and this should
>>>>> fix
>>>>> it BUT 
>>>>
>>>> No, it's broken today for BE hosts, which will always see 0 for the
>>>> PSPB register value.  LE hosts are fine.
>>
>> Right ... I just meant that nobody really experienced trouble with this
>> today yet, but the bug is already present now already of course.
> 
> Sounds like a great candidate for kvm-unit-tests then, no? ;)

I'm certainly looking forward to seeing powerpc support in there :-)

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-02 Thread Thomas Huth
On 02/09/15 00:55, Benjamin Herrenschmidt wrote:
> On Wed, 2015-09-02 at 08:45 +1000, Paul Mackerras wrote:
>> On Wed, Sep 02, 2015 at 08:25:05AM +1000, Benjamin Herrenschmidt
>> wrote:
>>> On Tue, 2015-09-01 at 23:41 +0200, Thomas Huth wrote:
>>>> The size of the Problem State Priority Boost Register is only
>>>> 32 bits, so let's change the type of the corresponding variable
>>>> accordingly to avoid future trouble.
>>>
>>> It's not future trouble, it's broken today for LE and this should
>>> fix
>>> it BUT 
>>
>> No, it's broken today for BE hosts, which will always see 0 for the
>> PSPB register value.  LE hosts are fine.

Right ... I just meant that nobody really experienced trouble with this
today yet, but the bug is already present now already of course.

>>> The asm accesses it using lwz/stw and C accesses it as a ulong. On
>>> LE
>>> that will mean that userspace will see the value << 32
>>
>> No, that will happen on BE, and since KVM_REG_PPC_PSPB says it's a
>> 32-bit register, we'll just pass 0 back to userspace when it reads
>> it.
> 
> Ah ok, I missed that bit about KVM_REG_PPC_PSPB
> 
>>> Now "fixing" it might break migration if that field is already
>>> stored/loaded in its "broken" form. We may have to keep the
>>> "broken"
>>> behaviour and document that qemu sees a value shifted by 32.
>>
>> It will be being set to 0 on BE hosts across migration today
>> (fortunately 0 is a benign value for PSPB).  If we fix this on both
>> the source and destination host, then the value will get migrated
>> across correctly.
> 
> Ok, I missed the part where KVM_REG_PPC_PSPB passed it down as a 32
> -bit. That means Thomas patch should work indeed.

... and if I get the QEMU source code right, the register is currently
not migrated at all - or at least I was not able to find the spot in the
source code that migrates this register.

>> I think Thomas's patch is fine, it just needs a stronger patch
>> description saying that it fixes an actual bug.

Ok, I'll resend with a better patch description.

 Thomas


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Thomas Huth
The size of the Problem State Priority Boost Register is only
32 bits, so let's change the type of the corresponding variable
accordingly to avoid future trouble.

Signed-off-by: Thomas Huth <th...@redhat.com>
---
 arch/powerpc/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..c825f3a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -473,7 +473,7 @@ struct kvm_vcpu_arch {
ulong ciabr;
ulong cfar;
ulong ppr;
-   ulong pspb;
+   u32 pspb;
ulong fscr;
ulong shadow_fscr;
ulong ebbhr;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: ppc: Fix size of the PSPB register

2015-09-01 Thread Thomas Huth
The size of the Problem State Priority Boost Register is only
32 bits, so let's change the type of the corresponding variable
accordingly to avoid future trouble.

Signed-off-by: Thomas Huth <th...@redhat.com>
---
 arch/powerpc/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..c825f3a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -473,7 +473,7 @@ struct kvm_vcpu_arch {
ulong ciabr;
ulong cfar;
ulong ppr;
-   ulong pspb;
+   u32 pspb;
ulong fscr;
ulong shadow_fscr;
ulong ebbhr;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers

2015-07-29 Thread Thomas Huth
- Original Message -
 The EPOW interrupt handler uses rtas_get_sensor(), which in turn
 uses rtas_busy_delay() to wait for RTAS becoming ready in case it
 is necessary. But rtas_busy_delay() is annotated with might_sleep()
 and thus may not be used by interrupts handlers like the EPOW handler!
 This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
 enabled:
 
  BUG: sleeping function called from invalid context at
  arch/powerpc/kernel/rtas.c:496
  in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
  Call Trace:
  [c0007ffe7b90] [c0807670] dump_stack+0xa0/0xdc (unreliable)
  [c0007ffe7bc0] [c00e1f14] ___might_sleep+0x134/0x180
  [c0007ffe7c20] [c002aec0] rtas_busy_delay+0x30/0xd0
  [c0007ffe7c50] [c002bde4] rtas_get_sensor+0x74/0xe0
  [c0007ffe7ce0] [c0083264] ras_epow_interrupt+0x44/0x450
  [c0007ffe7d90] [c0120260] handle_irq_event_percpu+0xa0/0x300
  [c0007ffe7e70] [c0120524] handle_irq_event+0x64/0xc0
  [c0007ffe7eb0] [c0124dbc] handle_fasteoi_irq+0xec/0x260
  [c0007ffe7ef0] [c011f4f0] generic_handle_irq+0x50/0x80
  [c0007ffe7f20] [c0010f3c] __do_irq+0x8c/0x200
  [c0007ffe7f90] [c00236cc] call_do_irq+0x14/0x24
  [c0007e6f39e0] [c0011144] do_IRQ+0x94/0x110
  [c0007e6f3a30] [c0002594] hardware_interrupt_common+0x114/0x180
 
 Fix this issue by introducing a new rtas_get_sensor_fast() function
 that does not use rtas_busy_delay() - and thus can only be used for
 sensors that do not cause a BUSY condition (which should be the case
 for the sensor that is queried by the EPOW IRQ handler).
 
 Signed-off-by: Thomas Huth th...@redhat.com
 ---
  arch/powerpc/include/asm/rtas.h  |  1 +
  arch/powerpc/kernel/rtas.c   | 17 +
  arch/powerpc/platforms/pseries/ras.c |  3 ++-
  3 files changed, 20 insertions(+), 1 deletion(-)
 
 diff --git a/arch/powerpc/include/asm/rtas.h
 b/arch/powerpc/include/asm/rtas.h
 index 7a4ede1..b77ef36 100644
 --- a/arch/powerpc/include/asm/rtas.h
 +++ b/arch/powerpc/include/asm/rtas.h
 @@ -343,6 +343,7 @@ extern void rtas_power_off(void);
  extern void rtas_halt(void);
  extern void rtas_os_term(char *str);
  extern int rtas_get_sensor(int sensor, int index, int *state);
 +extern int rtas_get_sensor_fast(int sensor, int index, int *state);
  extern int rtas_get_power_level(int powerdomain, int *level);
  extern int rtas_set_power_level(int powerdomain, int level, int *setlevel);
  extern bool rtas_indicator_present(int token, int *maxindex);
 diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
 index 7a488c1..caffb10 100644
 --- a/arch/powerpc/kernel/rtas.c
 +++ b/arch/powerpc/kernel/rtas.c
 @@ -584,6 +584,23 @@ int rtas_get_sensor(int sensor, int index, int *state)
  }
  EXPORT_SYMBOL(rtas_get_sensor);
  
 +int rtas_get_sensor_fast(int sensor, int index, int *state)
 +{
 + int token = rtas_token(get-sensor-state);
 + int rc;
 +
 + if (token == RTAS_UNKNOWN_SERVICE)
 + return -ENOENT;
 +
 + rc = rtas_call(token, 2, 2, state, sensor, index);
 + WARN_ON(rc == RTAS_BUSY || (rc = RTAS_EXTENDED_DELAY_MIN 
 + rc = RTAS_EXTENDED_DELAY_MAX));
 +
 + if (rc  0)
 + return rtas_error_rc(rc);
 + return rc;
 +}
 +
  bool rtas_indicator_present(int token, int *maxindex)
  {
   int proplen, count, i;
 diff --git a/arch/powerpc/platforms/pseries/ras.c
 b/arch/powerpc/platforms/pseries/ras.c
 index 02e4a17..3b6647e 100644
 --- a/arch/powerpc/platforms/pseries/ras.c
 +++ b/arch/powerpc/platforms/pseries/ras.c
 @@ -189,7 +189,8 @@ static irqreturn_t ras_epow_interrupt(int irq, void
 *dev_id)
   int state;
   int critical;
  
 - status = rtas_get_sensor(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX, state);
 + status = rtas_get_sensor_fast(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX,
 +   state);
  
   if (state  3)
   critical = 1;   /* Time Critical */
 --
 1.8.3.1

*ping*

Michael, do you think this patch is OK for fixing this problem?
Or shall I rather send a patch to simply revert 587f83e8dd50d instead?

 Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers

2015-07-22 Thread Thomas Huth
On 22/07/15 13:25, Michael Ellerman wrote:
 On Fri, 2015-17-07 at 10:46:58 UTC, Thomas Huth wrote:
 The EPOW interrupt handler uses rtas_get_sensor(), which in turn
 uses rtas_busy_delay() to wait for RTAS becoming ready in case it
 is necessary. But rtas_busy_delay() is annotated with might_sleep()
 and thus may not be used by interrupts handlers like the EPOW handler!
 This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
 enabled:
 
 When did we break this?

 Hi Michael,

the bug has been introduced by commit 587f83e8dd50d22bc0c62e32ec49fd31
(powerpc/pseries: Use rtas_get_sensor in RAS code) which switched the
EPOW handler to use rtas_get_sensor() instead of using rtas_call directly.

Also have a look at this thread here:
http://www.spinics.net/lists/kvm-ppc/msg10768.html

 Thomas


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers

2015-07-17 Thread Thomas Huth
The EPOW interrupt handler uses rtas_get_sensor(), which in turn
uses rtas_busy_delay() to wait for RTAS becoming ready in case it
is necessary. But rtas_busy_delay() is annotated with might_sleep()
and thus may not be used by interrupts handlers like the EPOW handler!
This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
enabled:

 BUG: sleeping function called from invalid context at 
arch/powerpc/kernel/rtas.c:496
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
 Call Trace:
 [c0007ffe7b90] [c0807670] dump_stack+0xa0/0xdc (unreliable)
 [c0007ffe7bc0] [c00e1f14] ___might_sleep+0x134/0x180
 [c0007ffe7c20] [c002aec0] rtas_busy_delay+0x30/0xd0
 [c0007ffe7c50] [c002bde4] rtas_get_sensor+0x74/0xe0
 [c0007ffe7ce0] [c0083264] ras_epow_interrupt+0x44/0x450
 [c0007ffe7d90] [c0120260] handle_irq_event_percpu+0xa0/0x300
 [c0007ffe7e70] [c0120524] handle_irq_event+0x64/0xc0
 [c0007ffe7eb0] [c0124dbc] handle_fasteoi_irq+0xec/0x260
 [c0007ffe7ef0] [c011f4f0] generic_handle_irq+0x50/0x80
 [c0007ffe7f20] [c0010f3c] __do_irq+0x8c/0x200
 [c0007ffe7f90] [c00236cc] call_do_irq+0x14/0x24
 [c0007e6f39e0] [c0011144] do_IRQ+0x94/0x110
 [c0007e6f3a30] [c0002594] hardware_interrupt_common+0x114/0x180

Fix this issue by introducing a new rtas_get_sensor_fast() function
that does not use rtas_busy_delay() - and thus can only be used for
sensors that do not cause a BUSY condition (which should be the case
for the sensor that is queried by the EPOW IRQ handler).

Signed-off-by: Thomas Huth th...@redhat.com
---
 arch/powerpc/include/asm/rtas.h  |  1 +
 arch/powerpc/kernel/rtas.c   | 17 +
 arch/powerpc/platforms/pseries/ras.c |  3 ++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 7a4ede1..b77ef36 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -343,6 +343,7 @@ extern void rtas_power_off(void);
 extern void rtas_halt(void);
 extern void rtas_os_term(char *str);
 extern int rtas_get_sensor(int sensor, int index, int *state);
+extern int rtas_get_sensor_fast(int sensor, int index, int *state);
 extern int rtas_get_power_level(int powerdomain, int *level);
 extern int rtas_set_power_level(int powerdomain, int level, int *setlevel);
 extern bool rtas_indicator_present(int token, int *maxindex);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 7a488c1..caffb10 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -584,6 +584,23 @@ int rtas_get_sensor(int sensor, int index, int *state)
 }
 EXPORT_SYMBOL(rtas_get_sensor);
 
+int rtas_get_sensor_fast(int sensor, int index, int *state)
+{
+   int token = rtas_token(get-sensor-state);
+   int rc;
+
+   if (token == RTAS_UNKNOWN_SERVICE)
+   return -ENOENT;
+
+   rc = rtas_call(token, 2, 2, state, sensor, index);
+   WARN_ON(rc == RTAS_BUSY || (rc = RTAS_EXTENDED_DELAY_MIN 
+   rc = RTAS_EXTENDED_DELAY_MAX));
+
+   if (rc  0)
+   return rtas_error_rc(rc);
+   return rc;
+}
+
 bool rtas_indicator_present(int token, int *maxindex)
 {
int proplen, count, i;
diff --git a/arch/powerpc/platforms/pseries/ras.c 
b/arch/powerpc/platforms/pseries/ras.c
index 02e4a17..3b6647e 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -189,7 +189,8 @@ static irqreturn_t ras_epow_interrupt(int irq, void *dev_id)
int state;
int critical;
 
-   status = rtas_get_sensor(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX, state);
+   status = rtas_get_sensor_fast(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX,
+ state);
 
if (state  3)
critical = 1;   /* Time Critical */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: sleeping function called from ras_epow_interrupt context

2015-07-16 Thread Thomas Huth
On 07/15/2015 09:58 PM, Nathan Fontenot wrote:
 On 07/15/2015 09:35 AM, Thomas Huth wrote:
 On 07/14/2015 11:22 PM, Benjamin Herrenschmidt wrote:
 On Tue, 2015-07-14 at 20:43 +0200, Thomas Huth wrote:
 Any suggestions how to fix this? Simply revert 587f83e8dd50d? Use
 mdelay() instead of msleep() in rtas_busy_delay()? Something more
 fancy?

 A proper fix would be more fancy, the get_sensor should happen in a
 kernel thread instead.

 I'm not very familiar with this stuff, but isn't the EPOW interrupt
 something that is very time-critical? Moving parts of the handler into a
 kernel thread then does not sound like a very good idea to me...

 Another question: Can it happen at all that this get-sensor call results
 in a sleep condition? Looking at commit ID
 81b73dd92b97423b8f5324a59044da478c04f4c4 (Fix might-sleep warning on
 removing cpus), which apparently fixed a similar issue for CPU
 hot-plugging, indicates that at least some of the rtas calls are never
 returning the busy code? In that case we could fix this by introducing a
 similar rtas_get_sensor_fast() function? (or simply revert 587f83e8dd50d
 which would be quite similar, I think)

 
 Looking at the PAPR, the get-sensor-state rtas call for the EPOW sensor
 is listed as a fast call and should not return a busy indication.

Great, good to know, thanks for looking that up! So IMHO we should
either introduce a rtas_get_sensor_fast() function or revert
587f83e8dd50d ... any preferences? Shall I come up with a patch?

 I'm curious as to why we're getting a busy return indication when
 making this call.

Looking at the code again, rtas_busy_delay() likely never slept ... it's
likely just the might_sleep() annotation in that function that causes
the BUG.

 Thomas




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v3 1/1] KVM: PPC: Book3S: correct width in XER handling

2015-07-16 Thread Thomas Huth
On 05/27/2015 01:56 AM, Sam Bobroff wrote:
 In 64 bit kernels, the Fixed Point Exception Register (XER) is a 64
 bit field (e.g. in kvm_regs and kvm_vcpu_arch) and in most places it is
 accessed as such.
 
 This patch corrects places where it is accessed as a 32 bit field by a
 64 bit kernel.  In some cases this is via a 32 bit load or store
 instruction which, depending on endianness, will cause either the
 lower or upper 32 bits to be missed.  In another case it is cast as a
 u32, causing the upper 32 bits to be cleared.
 
 This patch corrects those places by extending the access methods to
 64 bits.
 
 Signed-off-by: Sam Bobroff sam.bobr...@au1.ibm.com

Reviewed-by: Thomas Huth th...@redhat.com

Actually this patch also fixes a bug that SLOF sometimes crashes when a
vCPU gets kicked out of kernel mode (see the following URL for details:
https://bugzilla.redhat.com/show_bug.cgi?id=1178502 ), and I've just
tested that this bug does not occur with this patch anymore, so also:

Tested-by: Thomas Huth th...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: sleeping function called from ras_epow_interrupt context

2015-07-15 Thread Thomas Huth
On 07/14/2015 11:22 PM, Benjamin Herrenschmidt wrote:
 On Tue, 2015-07-14 at 20:43 +0200, Thomas Huth wrote:
 Any suggestions how to fix this? Simply revert 587f83e8dd50d? Use
 mdelay() instead of msleep() in rtas_busy_delay()? Something more
 fancy?
 
 A proper fix would be more fancy, the get_sensor should happen in a
 kernel thread instead.

I'm not very familiar with this stuff, but isn't the EPOW interrupt
something that is very time-critical? Moving parts of the handler into a
kernel thread then does not sound like a very good idea to me...

Another question: Can it happen at all that this get-sensor call results
in a sleep condition? Looking at commit ID
81b73dd92b97423b8f5324a59044da478c04f4c4 (Fix might-sleep warning on
removing cpus), which apparently fixed a similar issue for CPU
hot-plugging, indicates that at least some of the rtas calls are never
returning the busy code? In that case we could fix this by introducing a
similar rtas_get_sensor_fast() function? (or simply revert 587f83e8dd50d
which would be quite similar, I think)

 Thomas




signature.asc
Description: OpenPGP digital signature


BUG: sleeping function called from ras_epow_interrupt context

2015-07-14 Thread Thomas Huth

 Hi all!

A colleague recently ran into some kernel BUG messages that happen when
hot-plugging a virtio disk to a KVM guest on powerpc (with virsh
attach-disk), and IIRC CONFIG_DEBUG_ATOMIC_SLEEP enabled. I've tried to
re-create the problem with an up-to-date kernel (4.2.0-rc2) and the
problem still seems to be there:

The hotplug action triggers the ras_epow_interrupt() in
arch/powerpc/platforms/pseries/ras.c, which again calls
rtas_get_sensor(). That function then uses rtas_busy_delay() to wait in
case the RTAS call did not succeed immediately. But rtas_busy_delay()
uses msleep() for sleeping - which is forbidden during an atomic
interrupt context!

Following backtrace is printed out by the kernel:

[   33.920528] BUG: sleeping function called from invalid context at
/home/thuth/devel/linux-up/arch/powerpc/kernel/rtas.c:496
[   33.920590] in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
[   33.920624] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #1
[   33.920657] Call Trace:
[   33.920677] [c0007ffe79b0] [c07e43f4]
.dump_stack+0x98/0xd4 (unreliable)
[   33.920729] [c0007ffe7a30] [c00dcc78]
.___might_sleep+0x128/0x170
[   33.920769] [c0007ffe7aa0] [c0029f38]
.rtas_busy_delay+0x28/0xe0
[   33.920809] [c0007ffe7b20] [c002adb4]
.rtas_get_sensor+0x74/0xe0
[   33.920850] [c0007ffe7bc0] [c007ff58]
.ras_epow_interrupt+0x48/0x450
[   33.920896] [c0007ffe7c80] [c0119d94]
.handle_irq_event_percpu+0xa4/0x310
[   33.920942] [c0007ffe7d70] [c011a05c]
.handle_irq_event+0x5c/0xa0
[   33.920982] [c0007ffe7e00] [c011e7a8]
.handle_fasteoi_irq+0xe8/0x270
[   33.921028] [c0007ffe7e90] [c01190bc]
.generic_handle_irq+0x4c/0x80
[   33.921074] [c0007ffe7f10] [c0010a48] .__do_irq+0x88/0x1f0
[   33.921115] [c0007ffe7f90] [c0022a0c] .call_do_irq+0x14/0x24
[   33.921155] [c0007e6f37e0] [c0010c3c] .do_IRQ+0x8c/0x100
[   33.921195] [c0007e6f3880] [c0002594]
hardware_interrupt_common+0x114/0x180
[   33.921243] --- interrupt: 501 at .plpar_hcall_norets+0x14/0x20
[   33.921243] LR = .check_and_cede_processor+0x24/0x40
[   33.921300] [c0007e6f3b70] []   (null)
(unreliable)
[   33.921347] [c0007e6f3be0] [c0628068]
.shared_cede_loop+0x58/0x160
[   33.921393] [c0007e6f3c70] [c06259ac]
.cpuidle_enter_state+0xbc/0x3b0
[   33.921439] [c0007e6f3d30] [c00fe32c] .call_cpuidle+0x4c/0xa0
[   33.921479] [c0007e6f3db0] [c00fe700]
.cpu_startup_entry+0x380/0x4a0
[   33.921526] [c0007e6f3ed0] [c0043110]
.start_secondary+0x320/0x350
[   33.921571] [c0007e6f3f90] [c0008b6c]
start_secondary_prolog+0x10/0x14

I think that bug might have been introduced by commit
587f83e8dd50d22bc0c62 (Use rtas_get_sensor in RAS code) since the
rtas_busy_delay() was not called before that commit, as far as I can see.

Any suggestions how to fix this? Simply revert 587f83e8dd50d? Use
mdelay() instead of msleep() in rtas_busy_delay()? Something more fancy?

 Thanks,
  Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Thomas Huth
On Thu, 9 Jul 2015 16:07:47 +0300
Michael S. Tsirkin m...@redhat.com wrote:

 On Thu, Jul 09, 2015 at 02:57:33PM +0200, Paolo Bonzini wrote:
  
  
  On 09/07/2015 11:48, Laurent Vivier wrote:
   
   
   On 09/07/2015 09:49, Thomas Huth wrote:
   The option for supporting cross-endianness legacy guests in
   the vhost and tun code should only be available on systems
   that support cross-endian guests.
   
   I'm sure I misunderstand something, but what happens if we use QEMU with
   TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64
   little endian host ?
  
  TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost.
  
  Paolo
 
 vhost does not require irqfd anymore.  I think ioeventfd actually works
 fine though I didn't try, it would be easy to support.

That's an interesting issue, thanks for pointing this out, Laurent! So
do we now rather want to leave everything as it currently is, in case
somebody wants to use vhost-net with a cross-endian TCG guest one day?

Or do we assume that either
a) TCG is so slow anyway that nobody wants to accelerate it with vhost
or
b) TCG vhost likely won't happen that soon so we hope that everybody
will already be using virtio 1.0 at that point in time (with a fixed
endianness) 
?
... then I think we should go on and include this patch.

 Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Thomas Huth
On Thu, 9 Jul 2015 16:07:47 +0300
Michael S. Tsirkin m...@redhat.com wrote:

 On Thu, Jul 09, 2015 at 02:57:33PM +0200, Paolo Bonzini wrote:
  
  
  On 09/07/2015 11:48, Laurent Vivier wrote:
   
   
   On 09/07/2015 09:49, Thomas Huth wrote:
   The option for supporting cross-endianness legacy guests in
   the vhost and tun code should only be available on systems
   that support cross-endian guests.
   
   I'm sure I misunderstand something, but what happens if we use QEMU with
   TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64
   little endian host ?
  
  TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost.
  
  Paolo
 
 vhost does not require irqfd anymore.  I think ioeventfd actually works
 fine though I didn't try, it would be easy to support.

That's an interesting issue, thanks for pointing this out, Laurent! So
do we now rather want to leave everything as it currently is, in case
somebody wants to use vhost-net with a cross-endian TCG guest one day?

Or do we assume that either
a) TCG is so slow anyway that nobody wants to accelerate it with vhost
or
b) TCG vhost likely won't happen that soon so we hope that everybody
will already be using virtio 1.0 at that point in time (with a fixed
endianness) 
?
... then I think we should go on and include this patch.

 Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Thomas Huth
The option for supporting cross-endianness legacy guests in
the vhost and tun code should only be available on systems
that support cross-endian guests.

Signed-off-by: Thomas Huth th...@redhat.com
---
 arch/arm/kvm/Kconfig | 1 +
 arch/arm64/kvm/Kconfig   | 1 +
 arch/powerpc/kvm/Kconfig | 1 +
 drivers/net/Kconfig  | 1 +
 drivers/vhost/Kconfig| 1 +
 virt/kvm/Kconfig | 3 +++
 6 files changed, 8 insertions(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index bfb915d..9d8f363 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select KVM_CROSS_ENDIAN_GUESTS
depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
---help---
  Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index bfffe8f..9af39fe 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select KVM_CROSS_ENDIAN_GUESTS
---help---
  Support hosting virtualized guest machines.
 
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 3caec2c..e028710 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA
+   select KVM_CROSS_ENDIAN_GUESTS
---help---
  Support running unmodified book3s_64 guest kernels in
  virtual machines on POWER7 and PPC970 processors that have
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c18f9e6..0c4ce47 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -261,6 +261,7 @@ config TUN
 config TUN_VNET_CROSS_LE
bool Support for cross-endian vnet headers on little-endian kernels
default n
+   depends on KVM_CROSS_ENDIAN_GUESTS
---help---
  This option allows TUN/TAP and MACVTAP device drivers in a
  little-endian kernel to parse vnet headers that come from a
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 533eaf0..4d8ae6b 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -35,6 +35,7 @@ config VHOST
 
 config VHOST_CROSS_ENDIAN_LEGACY
bool Cross-endian support for vhost
+   depends on KVM_CROSS_ENDIAN_GUESTS
default n
---help---
  This option allows vhost to support guests with a different byte
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index e2c876d..cc7b28a 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
 config KVM_COMPAT
def_bool y
depends on COMPAT  !S390
+
+config KVM_CROSS_ENDIAN_GUESTS
+   bool
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Thomas Huth
The option for supporting cross-endianness legacy guests in
the vhost and tun code should only be available on systems
that support cross-endian guests.

Signed-off-by: Thomas Huth th...@redhat.com
---
 arch/arm/kvm/Kconfig | 1 +
 arch/arm64/kvm/Kconfig   | 1 +
 arch/powerpc/kvm/Kconfig | 1 +
 drivers/net/Kconfig  | 1 +
 drivers/vhost/Kconfig| 1 +
 virt/kvm/Kconfig | 3 +++
 6 files changed, 8 insertions(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index bfb915d..9d8f363 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select KVM_CROSS_ENDIAN_GUESTS
depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
---help---
  Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index bfffe8f..9af39fe 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select KVM_CROSS_ENDIAN_GUESTS
---help---
  Support hosting virtualized guest machines.
 
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 3caec2c..e028710 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA
+   select KVM_CROSS_ENDIAN_GUESTS
---help---
  Support running unmodified book3s_64 guest kernels in
  virtual machines on POWER7 and PPC970 processors that have
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c18f9e6..0c4ce47 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -261,6 +261,7 @@ config TUN
 config TUN_VNET_CROSS_LE
bool Support for cross-endian vnet headers on little-endian kernels
default n
+   depends on KVM_CROSS_ENDIAN_GUESTS
---help---
  This option allows TUN/TAP and MACVTAP device drivers in a
  little-endian kernel to parse vnet headers that come from a
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 533eaf0..4d8ae6b 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -35,6 +35,7 @@ config VHOST
 
 config VHOST_CROSS_ENDIAN_LEGACY
bool Cross-endian support for vhost
+   depends on KVM_CROSS_ENDIAN_GUESTS
default n
---help---
  This option allows vhost to support guests with a different byte
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index e2c876d..cc7b28a 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
 config KVM_COMPAT
def_bool y
depends on COMPAT  !S390
+
+config KVM_CROSS_ENDIAN_GUESTS
+   bool
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL] virtio/vhost: cross endian support

2015-07-07 Thread Thomas Huth
On Thu, 2 Jul 2015 11:32:52 +0200
Michael S. Tsirkin m...@redhat.com wrote:

 On Thu, Jul 02, 2015 at 11:12:56AM +0200, Greg Kurz wrote:
  On Thu, 2 Jul 2015 08:01:28 +0200
  Michael S. Tsirkin m...@redhat.com wrote:
...
   Yea, well - support for legacy BE guests on the new LE hosts is
   exactly the motivation for this.
   
   I dislike it too, but there are two redeeming properties that
   made me merge this:
   
   1.  It's a trivial amount of code: since we wrap host/guest accesses
   anyway, almost all of it is well hidden from drivers.
   
   2.  Sane platforms would never set flags like VHOST_CROSS_ENDIAN_LEGACY -
   and when it's clear, there's zero overhead (as some point it was
   tested by compiling with and without the patches, got the same
   stripped binary).
   
   Maybe we could create a Kconfig symbol to enforce point (2): prevent
   people from enabling it e.g. on x86. I will look into this - but it can
   be done by a patch on top, so I think this can be merged as is.
   
  
  This cross-endian *oddity* is targeting PowerPC book3s_64 processors... I
  am not aware of any other users. Maybe create a symbol that would
  be only selected by PPC_BOOK3S_64 ?
 
 I think some ARM systems are trying to support cross-endian
 configurations as well.
 
 Besides that, yes, this is more or less what I had in mind.

Would something simple like this already do the job:

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -35,6 +35,7 @@ config VHOST
 
 config VHOST_CROSS_ENDIAN_LEGACY
bool Cross-endian support for vhost
+   depends on KVM_BOOK3S_64 || KVM_ARM_HOST
default n
---help---
  This option allows vhost to support guests with a different byte

?

If that looks acceptable, I can submit a proper patch if you like.

 Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Fix warnings from sparse

2015-05-22 Thread Thomas Huth
When compiling the KVM code for POWER with make C=1, sparse
complains about functions missing proper prototypes and a 64-bit
constant missing the ULL prefix. Let's fix this by making the
functions static or by including the proper header with the
prototypes, and by appending a ULL prefix to the constant
PPC_MPPE_ADDRESS_MASK.

Signed-off-by: Thomas Huth th...@redhat.com
---
 arch/powerpc/include/asm/ppc-opcode.h| 2 +-
 arch/powerpc/kvm/book3s.c| 3 ++-
 arch/powerpc/kvm/book3s_32_mmu_host.c| 1 +
 arch/powerpc/kvm/book3s_64_mmu_host.c| 1 +
 arch/powerpc/kvm/book3s_emulate.c| 1 +
 arch/powerpc/kvm/book3s_hv.c | 8 
 arch/powerpc/kvm/book3s_paired_singles.c | 2 +-
 arch/powerpc/kvm/powerpc.c   | 2 +-
 8 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 5c93f69..05ac8b81 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -285,7 +285,7 @@
 
 /* POWER8 Micro Partition Prefetch (MPP) parameters */
 /* Address mask is common for LOGMPP instruction and MPPR SPR */
-#define PPC_MPPE_ADDRESS_MASK 0xc000
+#define PPC_MPPE_ADDRESS_MASK 0xc000ULL
 
 /* Bits 60 and 61 of MPP SPR should be set to one of the following */
 /* Aborting the fetch is indeed setting 00 in the table size bits */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 453a8a4..0e7d606 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -240,7 +240,8 @@ void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, 
ulong flags)
kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_INST_STORAGE);
 }
 
-int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
+static int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu,
+unsigned int priority)
 {
int deliver = 1;
int vec = 0;
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 2035d16..d5c9bfe 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -26,6 +26,7 @@
 #include asm/machdep.h
 #include asm/mmu_context.h
 #include asm/hw_irq.h
+#include book3s.h
 
 /* #define DEBUG_MMU */
 /* #define DEBUG_SR */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index b982d92..79ad35a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -28,6 +28,7 @@
 #include asm/mmu_context.h
 #include asm/hw_irq.h
 #include trace_pr.h
+#include book3s.h
 
 #define PTE_SIZE 12
 
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 5a2bc4b..2afdb9c 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -23,6 +23,7 @@
 #include asm/reg.h
 #include asm/switch_to.h
 #include asm/time.h
+#include book3s.h
 
 #define OP_19_XOP_RFID 18
 #define OP_19_XOP_RFI  50
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 48d3c5d..5c34f7d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -214,12 +214,12 @@ static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 
msr)
kvmppc_end_cede(vcpu);
 }
 
-void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
+static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
 {
vcpu-arch.pvr = pvr;
 }
 
-int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
+static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
 {
unsigned long pcr = 0;
struct kvmppc_vcore *vc = vcpu-arch.vcore;
@@ -259,7 +259,7 @@ int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 
arch_compat)
return 0;
 }
 
-void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
+static void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
 {
int r;
 
@@ -292,7 +292,7 @@ void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
   vcpu-arch.last_inst);
 }
 
-struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
+static struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
 {
int r;
struct kvm_vcpu *v, *ret = NULL;
diff --git a/arch/powerpc/kvm/book3s_paired_singles.c 
b/arch/powerpc/kvm/book3s_paired_singles.c
index bd6ab16..a759d9a 100644
--- a/arch/powerpc/kvm/book3s_paired_singles.c
+++ b/arch/powerpc/kvm/book3s_paired_singles.c
@@ -352,7 +352,7 @@ static inline u32 inst_get_field(u32 inst, int msb, int lsb)
return kvmppc_get_field(inst, msb + 32, lsb + 32);
 }
 
-bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst)
+static bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst)
 {
if (!(vcpu-arch.hflags  BOOK3S_HFLAG_PAIRED_SINGLE))
return false;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 8cd1f80..b509e2e

[PATCH] KVM: PPC: Fix warnings from sparse

2015-05-22 Thread Thomas Huth
When compiling the KVM code for POWER with make C=1, sparse
complains about functions missing proper prototypes and a 64-bit
constant missing the ULL prefix. Let's fix this by making the
functions static or by including the proper header with the
prototypes, and by appending a ULL prefix to the constant
PPC_MPPE_ADDRESS_MASK.

Signed-off-by: Thomas Huth th...@redhat.com
---
 arch/powerpc/include/asm/ppc-opcode.h| 2 +-
 arch/powerpc/kvm/book3s.c| 3 ++-
 arch/powerpc/kvm/book3s_32_mmu_host.c| 1 +
 arch/powerpc/kvm/book3s_64_mmu_host.c| 1 +
 arch/powerpc/kvm/book3s_emulate.c| 1 +
 arch/powerpc/kvm/book3s_hv.c | 8 
 arch/powerpc/kvm/book3s_paired_singles.c | 2 +-
 arch/powerpc/kvm/powerpc.c   | 2 +-
 8 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 5c93f69..05ac8b81 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -285,7 +285,7 @@
 
 /* POWER8 Micro Partition Prefetch (MPP) parameters */
 /* Address mask is common for LOGMPP instruction and MPPR SPR */
-#define PPC_MPPE_ADDRESS_MASK 0xc000
+#define PPC_MPPE_ADDRESS_MASK 0xc000ULL
 
 /* Bits 60 and 61 of MPP SPR should be set to one of the following */
 /* Aborting the fetch is indeed setting 00 in the table size bits */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 453a8a4..0e7d606 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -240,7 +240,8 @@ void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, 
ulong flags)
kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_INST_STORAGE);
 }
 
-int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
+static int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu,
+unsigned int priority)
 {
int deliver = 1;
int vec = 0;
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index 2035d16..d5c9bfe 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -26,6 +26,7 @@
 #include asm/machdep.h
 #include asm/mmu_context.h
 #include asm/hw_irq.h
+#include book3s.h
 
 /* #define DEBUG_MMU */
 /* #define DEBUG_SR */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index b982d92..79ad35a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -28,6 +28,7 @@
 #include asm/mmu_context.h
 #include asm/hw_irq.h
 #include trace_pr.h
+#include book3s.h
 
 #define PTE_SIZE 12
 
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 5a2bc4b..2afdb9c 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -23,6 +23,7 @@
 #include asm/reg.h
 #include asm/switch_to.h
 #include asm/time.h
+#include book3s.h
 
 #define OP_19_XOP_RFID 18
 #define OP_19_XOP_RFI  50
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 48d3c5d..5c34f7d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -214,12 +214,12 @@ static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 
msr)
kvmppc_end_cede(vcpu);
 }
 
-void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
+static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
 {
vcpu-arch.pvr = pvr;
 }
 
-int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
+static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
 {
unsigned long pcr = 0;
struct kvmppc_vcore *vc = vcpu-arch.vcore;
@@ -259,7 +259,7 @@ int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 
arch_compat)
return 0;
 }
 
-void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
+static void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
 {
int r;
 
@@ -292,7 +292,7 @@ void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
   vcpu-arch.last_inst);
 }
 
-struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
+static struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
 {
int r;
struct kvm_vcpu *v, *ret = NULL;
diff --git a/arch/powerpc/kvm/book3s_paired_singles.c 
b/arch/powerpc/kvm/book3s_paired_singles.c
index bd6ab16..a759d9a 100644
--- a/arch/powerpc/kvm/book3s_paired_singles.c
+++ b/arch/powerpc/kvm/book3s_paired_singles.c
@@ -352,7 +352,7 @@ static inline u32 inst_get_field(u32 inst, int msb, int lsb)
return kvmppc_get_field(inst, msb + 32, lsb + 32);
 }
 
-bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst)
+static bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst)
 {
if (!(vcpu-arch.hflags  BOOK3S_HFLAG_PAIRED_SINGLE))
return false;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 8cd1f80..b509e2e

[PATCH] KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig

2015-05-22 Thread Thomas Huth
Since the PPC970 support has been removed from the kvm-hv kernel
module recently, we should also reflect this change in the help
text of the corresponding Kconfig option.

Signed-off-by: Thomas Huth th...@redhat.com
---
 arch/powerpc/kvm/Kconfig | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 3caec2c..c2024ac 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -74,14 +74,14 @@ config KVM_BOOK3S_64
  If unsure, say N.
 
 config KVM_BOOK3S_64_HV
-   tristate KVM support for POWER7 and PPC970 using hypervisor mode in 
host
+   tristate KVM for POWER7 and later using hypervisor mode in host
depends on KVM_BOOK3S_64  PPC_POWERNV
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA
---help---
  Support running unmodified book3s_64 guest kernels in
- virtual machines on POWER7 and PPC970 processors that have
+ virtual machines on POWER7 and newer processors that have
  hypervisor mode available to the host.
 
  If you say Y here, KVM will use the hardware virtualization
@@ -89,8 +89,8 @@ config KVM_BOOK3S_64_HV
  guest operating systems will run at full hardware speed
  using supervisor and user modes.  However, this also means
  that KVM is not usable under PowerVM (pHyp), is only usable
- on POWER7 (or later) processors and PPC970-family processors,
- and cannot emulate a different processor from the host processor.
+ on POWER7 or later processors, and cannot emulate a
+ different processor from the host processor.
 
  If unsure, say N.
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Remove PPC970 from KVM_BOOK3S_64_HV text in Kconfig

2015-05-22 Thread Thomas Huth
Since the PPC970 support has been removed from the kvm-hv kernel
module recently, we should also reflect this change in the help
text of the corresponding Kconfig option.

Signed-off-by: Thomas Huth th...@redhat.com
---
 arch/powerpc/kvm/Kconfig | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 3caec2c..c2024ac 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -74,14 +74,14 @@ config KVM_BOOK3S_64
  If unsure, say N.
 
 config KVM_BOOK3S_64_HV
-   tristate KVM support for POWER7 and PPC970 using hypervisor mode in 
host
+   tristate KVM for POWER7 and later using hypervisor mode in host
depends on KVM_BOOK3S_64  PPC_POWERNV
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA
---help---
  Support running unmodified book3s_64 guest kernels in
- virtual machines on POWER7 and PPC970 processors that have
+ virtual machines on POWER7 and newer processors that have
  hypervisor mode available to the host.
 
  If you say Y here, KVM will use the hardware virtualization
@@ -89,8 +89,8 @@ config KVM_BOOK3S_64_HV
  guest operating systems will run at full hardware speed
  using supervisor and user modes.  However, this also means
  that KVM is not usable under PowerVM (pHyp), is only usable
- on POWER7 (or later) processors and PPC970-family processors,
- and cannot emulate a different processor from the host processor.
+ on POWER7 or later processors, and cannot emulate a
+ different processor from the host processor.
 
  If unsure, say N.
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Book3S HV: Replace kvmppc_find_vcpu() with kvm_get_vcpu()

2015-05-07 Thread Thomas Huth
Both functions are doing the same thing - looking up the struct
kvm_vcpu pointer for a given vCPU ID. So there's no need for the
kvmppc_find_vcpu() function, simply use the common function instead.

Signed-off-by: Thomas Huth th...@redhat.com
---
 arch/powerpc/kvm/book3s_hv.c | 22 +++---
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 48d3c5d..78e62cf 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -292,22 +292,6 @@ void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
   vcpu-arch.last_inst);
 }
 
-struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
-{
-   int r;
-   struct kvm_vcpu *v, *ret = NULL;
-
-   mutex_lock(kvm-lock);
-   kvm_for_each_vcpu(r, v, kvm) {
-   if (v-vcpu_id == id) {
-   ret = v;
-   break;
-   }
-   }
-   mutex_unlock(kvm-lock);
-   return ret;
-}
-
 static void init_vpa(struct kvm_vcpu *vcpu, struct lppaca *vpa)
 {
vpa-__old_status |= LPPACA_OLD_SHARED_PROC;
@@ -358,7 +342,7 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu 
*vcpu,
int subfunc;
struct kvmppc_vpa *vpap;
 
-   tvcpu = kvmppc_find_vcpu(kvm, vcpuid);
+   tvcpu = kvm_get_vcpu(kvm, vcpuid);
if (!tvcpu)
return H_PARAMETER;
 
@@ -678,7 +662,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
break;
case H_PROD:
target = kvmppc_get_gpr(vcpu, 4);
-   tvcpu = kvmppc_find_vcpu(vcpu-kvm, target);
+   tvcpu = kvm_get_vcpu(vcpu-kvm, target);
if (!tvcpu) {
ret = H_PARAMETER;
break;
@@ -696,7 +680,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
target = kvmppc_get_gpr(vcpu, 4);
if (target == -1)
break;
-   tvcpu = kvmppc_find_vcpu(vcpu-kvm, target);
+   tvcpu = kvm_get_vcpu(vcpu-kvm, target);
if (!tvcpu) {
ret = H_PARAMETER;
break;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/8] virtio: introduce virtio_is_little_endian() helper

2015-04-23 Thread Thomas Huth
Am Thu, 23 Apr 2015 17:26:20 +0200
schrieb Greg Kurz gk...@linux.vnet.ibm.com:

 Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
 ---
  include/linux/virtio_config.h |   17 +++--
  1 file changed, 11 insertions(+), 6 deletions(-)
 
 diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
 index ca3ed78..bd1a582 100644
 --- a/include/linux/virtio_config.h
 +++ b/include/linux/virtio_config.h
 @@ -205,35 +205,40 @@ int virtqueue_set_affinity(struct virtqueue *vq, int 
 cpu)
   return 0;
  }
  
 +static inline bool virtio_is_little_endian(struct virtio_device *vdev)
 +{
 + return virtio_has_feature(vdev, VIRTIO_F_VERSION_1);
 +}

So this function returns false when _not_ using version 1, but running
on a little endian host + guest? Sounds confusing. Maybe you could name
it virtio_is_v1() or so instead?

 Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/8] virtio: introduce virtio_is_little_endian() helper

2015-04-23 Thread Thomas Huth
Am Thu, 23 Apr 2015 19:22:15 +0200
schrieb Thomas Huth th...@redhat.com:

 Am Thu, 23 Apr 2015 17:26:20 +0200
 schrieb Greg Kurz gk...@linux.vnet.ibm.com:
 
  Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
  ---
   include/linux/virtio_config.h |   17 +++--
   1 file changed, 11 insertions(+), 6 deletions(-)
  
  diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
  index ca3ed78..bd1a582 100644
  --- a/include/linux/virtio_config.h
  +++ b/include/linux/virtio_config.h
  @@ -205,35 +205,40 @@ int virtqueue_set_affinity(struct virtqueue *vq, int 
  cpu)
  return 0;
   }
   
  +static inline bool virtio_is_little_endian(struct virtio_device *vdev)
  +{
  +   return virtio_has_feature(vdev, VIRTIO_F_VERSION_1);
  +}
 
 So this function returns false when _not_ using version 1, but running
 on a little endian host + guest? Sounds confusing. Maybe you could name
 it virtio_is_v1() or so instead?

Ah, never mind, I should have looked at patch 6 first, then it makes
sense. (maybe you could put a note to the later patch in this patch
description?)

 Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/8] virtio: introduce virtio_is_little_endian() helper

2015-04-23 Thread Thomas Huth
On Thu, 23 Apr 2015 17:26:20 +0200
Greg Kurz gk...@linux.vnet.ibm.com wrote:

 Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
 ---
  include/linux/virtio_config.h |   17 +++--
  1 file changed, 11 insertions(+), 6 deletions(-)
 
 diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
 index ca3ed78..bd1a582 100644
 --- a/include/linux/virtio_config.h
 +++ b/include/linux/virtio_config.h
 @@ -205,35 +205,40 @@ int virtqueue_set_affinity(struct virtqueue *vq, int 
 cpu)
   return 0;
  }
  
 +static inline bool virtio_is_little_endian(struct virtio_device *vdev)
 +{
 + return virtio_has_feature(vdev, VIRTIO_F_VERSION_1);
 +}
 +
  /* Memory accessors */
  static inline u16 virtio16_to_cpu(struct virtio_device *vdev, __virtio16 val)
  {
 - return __virtio16_to_cpu(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), 
 val);
 + return __virtio16_to_cpu(virtio_is_little_endian(vdev), val);
  }
  
  static inline __virtio16 cpu_to_virtio16(struct virtio_device *vdev, u16 val)
  {
 - return __cpu_to_virtio16(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), 
 val);
 + return __cpu_to_virtio16(virtio_is_little_endian(vdev), val);
  }
  
  static inline u32 virtio32_to_cpu(struct virtio_device *vdev, __virtio32 val)
  {
 - return __virtio32_to_cpu(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), 
 val);
 + return __virtio32_to_cpu(virtio_is_little_endian(vdev), val);
  }
  
  static inline __virtio32 cpu_to_virtio32(struct virtio_device *vdev, u32 val)
  {
 - return __cpu_to_virtio32(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), 
 val);
 + return __cpu_to_virtio32(virtio_is_little_endian(vdev), val);
  }
  
  static inline u64 virtio64_to_cpu(struct virtio_device *vdev, __virtio64 val)
  {
 - return __virtio64_to_cpu(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), 
 val);
 + return __virtio64_to_cpu(virtio_is_little_endian(vdev), val);
  }
  
  static inline __virtio64 cpu_to_virtio64(struct virtio_device *vdev, u64 val)
  {
 - return __cpu_to_virtio64(virtio_has_feature(vdev, VIRTIO_F_VERSION_1), 
 val);
 + return __cpu_to_virtio64(virtio_is_little_endian(vdev), val);
  }

Reviewed-by: Thomas Huth th...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/8] tun: add tun_is_little_endian() helper

2015-04-23 Thread Thomas Huth
On Thu, 23 Apr 2015 17:26:30 +0200
Greg Kurz gk...@linux.vnet.ibm.com wrote:

 Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
 ---
  drivers/net/tun.c |9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/net/tun.c b/drivers/net/tun.c
 index 857dca4..3c3d6c0 100644
 --- a/drivers/net/tun.c
 +++ b/drivers/net/tun.c
 @@ -206,14 +206,19 @@ struct tun_struct {
   u32 flow_count;
  };
  
 +static inline bool tun_is_little_endian(struct tun_struct *tun)
 +{
 + return tun-flags  TUN_VNET_LE;
 +}
 +
  static inline u16 tun16_to_cpu(struct tun_struct *tun, __virtio16 val)
  {
 - return __virtio16_to_cpu(tun-flags  TUN_VNET_LE, val);
 + return __virtio16_to_cpu(tun_is_little_endian(tun), val);
  }
  
  static inline __virtio16 cpu_to_tun16(struct tun_struct *tun, u16 val)
  {
 - return __cpu_to_virtio16(tun-flags  TUN_VNET_LE, val);
 + return __cpu_to_virtio16(tun_is_little_endian(tun), val);
  }

Reviewed-by: Thomas Huth th...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/8] vringh: introduce vringh_is_little_endian() helper

2015-04-23 Thread Thomas Huth
On Thu, 23 Apr 2015 17:26:52 +0200
Greg Kurz gk...@linux.vnet.ibm.com wrote:

 Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
 ---
  include/linux/vringh.h |   17 +++--
  1 file changed, 11 insertions(+), 6 deletions(-)
 
 diff --git a/include/linux/vringh.h b/include/linux/vringh.h
 index a3fa537..3ed62ef 100644
 --- a/include/linux/vringh.h
 +++ b/include/linux/vringh.h
 @@ -226,33 +226,38 @@ static inline void vringh_notify(struct vringh *vrh)
   vrh-notify(vrh);
  }
  
 +static inline bool vringh_is_little_endian(const struct vringh *vrh)
 +{
 + return vrh-little_endian;
 +}
 +
  static inline u16 vringh16_to_cpu(const struct vringh *vrh, __virtio16 val)
  {
 - return __virtio16_to_cpu(vrh-little_endian, val);
 + return __virtio16_to_cpu(vringh_is_little_endian(vrh), val);
  }
  
  static inline __virtio16 cpu_to_vringh16(const struct vringh *vrh, u16 val)
  {
 - return __cpu_to_virtio16(vrh-little_endian, val);
 + return __cpu_to_virtio16(vringh_is_little_endian(vrh), val);
  }
  
  static inline u32 vringh32_to_cpu(const struct vringh *vrh, __virtio32 val)
  {
 - return __virtio32_to_cpu(vrh-little_endian, val);
 + return __virtio32_to_cpu(vringh_is_little_endian(vrh), val);
  }
  
  static inline __virtio32 cpu_to_vringh32(const struct vringh *vrh, u32 val)
  {
 - return __cpu_to_virtio32(vrh-little_endian, val);
 + return __cpu_to_virtio32(vringh_is_little_endian(vrh), val);
  }
  
  static inline u64 vringh64_to_cpu(const struct vringh *vrh, __virtio64 val)
  {
 - return __virtio64_to_cpu(vrh-little_endian, val);
 + return __virtio64_to_cpu(vringh_is_little_endian(vrh), val);
  }
  
  static inline __virtio64 cpu_to_vringh64(const struct vringh *vrh, u64 val)
  {
 - return __cpu_to_virtio64(vrh-little_endian, val);
 + return __cpu_to_virtio64(vringh_is_little_endian(vrh), val);
  }
  #endif /* _LINUX_VRINGH_H */

Reviewed-by: Thomas Huth th...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 5/8] vhost: introduce vhost_is_little_endian() helper

2015-04-23 Thread Thomas Huth
On Thu, 23 Apr 2015 17:27:05 +0200
Greg Kurz gk...@linux.vnet.ibm.com wrote:

 Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
 ---
  drivers/vhost/vhost.h |   17 +++--
  1 file changed, 11 insertions(+), 6 deletions(-)
 
 diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
 index 8c1c792..6a49960 100644
 --- a/drivers/vhost/vhost.h
 +++ b/drivers/vhost/vhost.h
 @@ -173,34 +173,39 @@ static inline bool vhost_has_feature(struct 
 vhost_virtqueue *vq, int bit)
   return vq-acked_features  (1ULL  bit);
  }
  
 +static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq)
 +{
 + return vhost_has_feature(vq, VIRTIO_F_VERSION_1);
 +}
 +
  /* Memory accessors */
  static inline u16 vhost16_to_cpu(struct vhost_virtqueue *vq, __virtio16 val)
  {
 - return __virtio16_to_cpu(vhost_has_feature(vq, VIRTIO_F_VERSION_1), 
 val);
 + return __virtio16_to_cpu(vhost_is_little_endian(vq), val);
  }
  
  static inline __virtio16 cpu_to_vhost16(struct vhost_virtqueue *vq, u16 val)
  {
 - return __cpu_to_virtio16(vhost_has_feature(vq, VIRTIO_F_VERSION_1), 
 val);
 + return __cpu_to_virtio16(vhost_is_little_endian(vq), val);
  }
  
  static inline u32 vhost32_to_cpu(struct vhost_virtqueue *vq, __virtio32 val)
  {
 - return __virtio32_to_cpu(vhost_has_feature(vq, VIRTIO_F_VERSION_1), 
 val);
 + return __virtio32_to_cpu(vhost_is_little_endian(vq), val);
  }
  
  static inline __virtio32 cpu_to_vhost32(struct vhost_virtqueue *vq, u32 val)
  {
 - return __cpu_to_virtio32(vhost_has_feature(vq, VIRTIO_F_VERSION_1), 
 val);
 + return __cpu_to_virtio32(vhost_is_little_endian(vq), val);
  }
  
  static inline u64 vhost64_to_cpu(struct vhost_virtqueue *vq, __virtio64 val)
  {
 - return __virtio64_to_cpu(vhost_has_feature(vq, VIRTIO_F_VERSION_1), 
 val);
 + return __virtio64_to_cpu(vhost_is_little_endian(vq), val);
  }
  
  static inline __virtio64 cpu_to_vhost64(struct vhost_virtqueue *vq, u64 val)
  {
 - return __cpu_to_virtio64(vhost_has_feature(vq, VIRTIO_F_VERSION_1), 
 val);
 + return __cpu_to_virtio64(vhost_is_little_endian(vq), val);
  }
  #endif

Reviewed-by: Thomas Huth th...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/8] macvtap: introduce macvtap_is_little_endian() helper

2015-04-23 Thread Thomas Huth
On Thu, 23 Apr 2015 17:26:41 +0200
Greg Kurz gk...@linux.vnet.ibm.com wrote:

 Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
 ---
  drivers/net/macvtap.c |9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
 index 27ecc5c..a2f2958 100644
 --- a/drivers/net/macvtap.c
 +++ b/drivers/net/macvtap.c
 @@ -49,14 +49,19 @@ struct macvtap_queue {
  
  #define MACVTAP_VNET_LE 0x8000
  
 +static inline bool macvtap_is_little_endian(struct macvtap_queue *q)
 +{
 + return q-flags  MACVTAP_VNET_LE;
 +}
 +
  static inline u16 macvtap16_to_cpu(struct macvtap_queue *q, __virtio16 val)
  {
 - return __virtio16_to_cpu(q-flags  MACVTAP_VNET_LE, val);
 + return __virtio16_to_cpu(macvtap_is_little_endian(q), val);
  }
  
  static inline __virtio16 cpu_to_macvtap16(struct macvtap_queue *q, u16 val)
  {
 - return __cpu_to_virtio16(q-flags  MACVTAP_VNET_LE, val);
 + return __cpu_to_virtio16(macvtap_is_little_endian(q), val);
  }

Reviewed-by: Thomas Huth th...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 6/8] virtio: add explicit big-endian support to memory accessors

2015-04-23 Thread Thomas Huth
On Thu, 23 Apr 2015 17:29:06 +0200
Greg Kurz gk...@linux.vnet.ibm.com wrote:

 The current memory accessors logic is:
 - little endian if little_endian
 - native endian (i.e. no byteswap) if !little_endian
 
 If we want to fully support cross-endian vhost, we also need to be
 able to convert to big endian.
 
 Instead of changing the little_endian argument to some 3-value enum, this
 patch changes the logic to:
 - little endian if little_endian
 - big endian if !little_endian
 
 The native endian case is handled by all users with a trivial helper. This
 patch doesn't change any functionality, nor it does add overhead.
 
 Signed-off-by: Greg Kurz gk...@linux.vnet.ibm.com
 ---
 
 Changes since v4:
 - style fixes (I have chosen if ... else in most places to stay below
   80 columns, with the notable exception of the vhost helper which gets
   shorten in a later patch)
 
  drivers/net/macvtap.c|5 -
  drivers/net/tun.c|5 -
  drivers/vhost/vhost.h|2 +-
  include/linux/virtio_byteorder.h |   24 ++--
  include/linux/virtio_config.h|5 -
  include/linux/vringh.h   |2 +-
  6 files changed, 28 insertions(+), 15 deletions(-)
 
 diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
 index a2f2958..6cf6b3e 100644
 --- a/drivers/net/macvtap.c
 +++ b/drivers/net/macvtap.c
 @@ -51,7 +51,10 @@ struct macvtap_queue {
  
  static inline bool macvtap_is_little_endian(struct macvtap_queue *q)
  {
 - return q-flags  MACVTAP_VNET_LE;
 + if (q-flags  MACVTAP_VNET_LE)
 + return true;
 + else
 + return virtio_legacy_is_little_endian();

simply:

return (q-flags  MACVTAP_VNET_LE) ||
   virtio_legacy_is_little_endian();

?

  }
  
  static inline u16 macvtap16_to_cpu(struct macvtap_queue *q, __virtio16 val)
 diff --git a/drivers/net/tun.c b/drivers/net/tun.c
 index 3c3d6c0..5b044d4 100644
 --- a/drivers/net/tun.c
 +++ b/drivers/net/tun.c
 @@ -208,7 +208,10 @@ struct tun_struct {
  
  static inline bool tun_is_little_endian(struct tun_struct *tun)
  {
 - return tun-flags  TUN_VNET_LE;
 + if (tun-flags  TUN_VNET_LE)
 + return true;
 + else
 + return virtio_legacy_is_little_endian();

dito?

  }
  
  static inline u16 tun16_to_cpu(struct tun_struct *tun, __virtio16 val)
 diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
 index 6a49960..954c657 100644
 --- a/drivers/vhost/vhost.h
 +++ b/drivers/vhost/vhost.h
 @@ -175,7 +175,7 @@ static inline bool vhost_has_feature(struct 
 vhost_virtqueue *vq, int bit)
  
  static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq)
  {
 - return vhost_has_feature(vq, VIRTIO_F_VERSION_1);
 + return vhost_has_feature(vq, VIRTIO_F_VERSION_1) ? true : 
 virtio_legacy_is_little_endian();
  }

That line is way longer than 80 characters ... may I suggest to switch
at least here to:

return vhost_has_feature(vq, VIRTIO_F_VERSION_1) ||
   virtio_legacy_is_little_endian();

?

Apart from the cosmetics, the patch looks good to me.

 Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM

2015-04-21 Thread Thomas Huth
Am Tue, 21 Apr 2015 10:41:51 +1000
schrieb David Gibson da...@gibson.dropbear.id.au:

 On POWER, storage caching is usually configured via the MMU - attributes
 such as cache-inhibited are stored in the TLB and the hashed page table.
 
 This makes correctly performing cache inhibited IO accesses awkward when
 the MMU is turned off (real mode).  Some CPU models provide special
 registers to control the cache attributes of real mode load and stores but
 this is not at all consistent.  This is a problem in particular for SLOF,
 the firmware used on KVM guests, which runs entirely in real mode, but
 which needs to do IO to load the kernel.
 
 To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
 and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
 a logical address (aka guest physical address).  SLOF uses these for IO.
 
 However, because these are implemented within qemu, not the host kernel,
 these bypass any IO devices emulated within KVM itself.  The simplest way
 to see this problem is to attempt to boot a KVM guest from a virtio-blk
 device with iothread / dataplane enabled.  The iothread code relies on an
 in kernel implementation of the virtio queue notification, which is not
 triggered by the IO hcalls, and so the guest will stall in SLOF unable to
 load the guest OS.
 
 This patch addresses this by providing in-kernel implementations of the
 2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
 address not handled by the KVM IO bus will cause a VM exit, hitting the
 qemu implementation as before.
 
 Note that a userspace change is also required, in order to enable these
 new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.
 
 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 ---
  arch/powerpc/include/asm/kvm_book3s.h |  3 ++
  arch/powerpc/kvm/book3s.c | 76 
 +++
  arch/powerpc/kvm/book3s_hv.c  | 12 ++
  arch/powerpc/kvm/book3s_pr_papr.c | 28 +
  4 files changed, 119 insertions(+)
...
 diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
 index cfbcdc6..453a8a4 100644
 --- a/arch/powerpc/kvm/book3s.c
 +++ b/arch/powerpc/kvm/book3s.c
 @@ -821,6 +821,82 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
  #endif
  }
  
 +int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
 +{
 + unsigned long size = kvmppc_get_gpr(vcpu, 4);
 + unsigned long addr = kvmppc_get_gpr(vcpu, 5);
 + u64 buf;
 + int ret;
 +
 + if (!is_power_of_2(size) || (size  sizeof(buf)))
 + return H_TOO_HARD;
 +
 + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, buf);
 + if (ret != 0)
 + return H_TOO_HARD;
 +
 + switch (size) {
 + case 1:
 + kvmppc_set_gpr(vcpu, 4, *(u8 *)buf);
 + break;
 +

Most of the code in book3s.c seems not to use a empty line after a
break;, so may I suggest to remove these empty lines here, too, to
keep the coding style a little bit more consistent?

 + case 2:
 + kvmppc_set_gpr(vcpu, 4, be16_to_cpu(*(__be16 *)buf));
 + break;
 +
 + case 4:
 + kvmppc_set_gpr(vcpu, 4, be32_to_cpu(*(__be32 *)buf));
 + break;
 +
 + case 8:
 + kvmppc_set_gpr(vcpu, 4, be64_to_cpu(*(__be64 *)buf));
 + break;
 +
 + default:
 + BUG();

If I got the code right, a malicious guest could easily trigger this
BUG() statement, couldn't it? ... so a BUG() is maybe not the right
thing to do here. Would it be appropriate to return an error value to
the guest instead?

 + }
 +
 + return H_SUCCESS;
 +}
 +EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_load);

 Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM

2015-04-21 Thread Thomas Huth
Am Tue, 21 Apr 2015 10:41:51 +1000
schrieb David Gibson da...@gibson.dropbear.id.au:

 On POWER, storage caching is usually configured via the MMU - attributes
 such as cache-inhibited are stored in the TLB and the hashed page table.
 
 This makes correctly performing cache inhibited IO accesses awkward when
 the MMU is turned off (real mode).  Some CPU models provide special
 registers to control the cache attributes of real mode load and stores but
 this is not at all consistent.  This is a problem in particular for SLOF,
 the firmware used on KVM guests, which runs entirely in real mode, but
 which needs to do IO to load the kernel.
 
 To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
 and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
 a logical address (aka guest physical address).  SLOF uses these for IO.
 
 However, because these are implemented within qemu, not the host kernel,
 these bypass any IO devices emulated within KVM itself.  The simplest way
 to see this problem is to attempt to boot a KVM guest from a virtio-blk
 device with iothread / dataplane enabled.  The iothread code relies on an
 in kernel implementation of the virtio queue notification, which is not
 triggered by the IO hcalls, and so the guest will stall in SLOF unable to
 load the guest OS.
 
 This patch addresses this by providing in-kernel implementations of the
 2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
 address not handled by the KVM IO bus will cause a VM exit, hitting the
 qemu implementation as before.
 
 Note that a userspace change is also required, in order to enable these
 new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.
 
 Signed-off-by: David Gibson da...@gibson.dropbear.id.au
 ---
  arch/powerpc/include/asm/kvm_book3s.h |  3 ++
  arch/powerpc/kvm/book3s.c | 76 
 +++
  arch/powerpc/kvm/book3s_hv.c  | 12 ++
  arch/powerpc/kvm/book3s_pr_papr.c | 28 +
  4 files changed, 119 insertions(+)
...
 diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
 index cfbcdc6..453a8a4 100644
 --- a/arch/powerpc/kvm/book3s.c
 +++ b/arch/powerpc/kvm/book3s.c
 @@ -821,6 +821,82 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
  #endif
  }
  
 +int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
 +{
 + unsigned long size = kvmppc_get_gpr(vcpu, 4);
 + unsigned long addr = kvmppc_get_gpr(vcpu, 5);
 + u64 buf;
 + int ret;
 +
 + if (!is_power_of_2(size) || (size  sizeof(buf)))
 + return H_TOO_HARD;
 +
 + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, buf);
 + if (ret != 0)
 + return H_TOO_HARD;
 +
 + switch (size) {
 + case 1:
 + kvmppc_set_gpr(vcpu, 4, *(u8 *)buf);
 + break;
 +

Most of the code in book3s.c seems not to use a empty line after a
break;, so may I suggest to remove these empty lines here, too, to
keep the coding style a little bit more consistent?

 + case 2:
 + kvmppc_set_gpr(vcpu, 4, be16_to_cpu(*(__be16 *)buf));
 + break;
 +
 + case 4:
 + kvmppc_set_gpr(vcpu, 4, be32_to_cpu(*(__be32 *)buf));
 + break;
 +
 + case 8:
 + kvmppc_set_gpr(vcpu, 4, be64_to_cpu(*(__be64 *)buf));
 + break;
 +
 + default:
 + BUG();

If I got the code right, a malicious guest could easily trigger this
BUG() statement, couldn't it? ... so a BUG() is maybe not the right
thing to do here. Would it be appropriate to return an error value to
the guest instead?

 + }
 +
 + return H_SUCCESS;
 +}
 +EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_load);

 Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM

2015-04-21 Thread Thomas Huth
Am Tue, 21 Apr 2015 16:51:21 +1000
schrieb David Gibson da...@gibson.dropbear.id.au:

 On Tue, Apr 21, 2015 at 08:37:02AM +0200, Thomas Huth wrote:
  Am Tue, 21 Apr 2015 10:41:51 +1000
  schrieb David Gibson da...@gibson.dropbear.id.au:
  
   On POWER, storage caching is usually configured via the MMU - attributes
   such as cache-inhibited are stored in the TLB and the hashed page table.
   
   This makes correctly performing cache inhibited IO accesses awkward when
   the MMU is turned off (real mode).  Some CPU models provide special
   registers to control the cache attributes of real mode load and stores but
   this is not at all consistent.  This is a problem in particular for SLOF,
   the firmware used on KVM guests, which runs entirely in real mode, but
   which needs to do IO to load the kernel.
   
   To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
   and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
   a logical address (aka guest physical address).  SLOF uses these for IO.
   
   However, because these are implemented within qemu, not the host kernel,
   these bypass any IO devices emulated within KVM itself.  The simplest way
   to see this problem is to attempt to boot a KVM guest from a virtio-blk
   device with iothread / dataplane enabled.  The iothread code relies on an
   in kernel implementation of the virtio queue notification, which is not
   triggered by the IO hcalls, and so the guest will stall in SLOF unable to
   load the guest OS.
   
   This patch addresses this by providing in-kernel implementations of the
   2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
   address not handled by the KVM IO bus will cause a VM exit, hitting the
   qemu implementation as before.
   
   Note that a userspace change is also required, in order to enable these
   new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.
   
   Signed-off-by: David Gibson da...@gibson.dropbear.id.au
   ---
arch/powerpc/include/asm/kvm_book3s.h |  3 ++
arch/powerpc/kvm/book3s.c | 76 
   +++
arch/powerpc/kvm/book3s_hv.c  | 12 ++
arch/powerpc/kvm/book3s_pr_papr.c | 28 +
4 files changed, 119 insertions(+)
  ...
   diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
   index cfbcdc6..453a8a4 100644
   --- a/arch/powerpc/kvm/book3s.c
   +++ b/arch/powerpc/kvm/book3s.c
   @@ -821,6 +821,82 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
#endif
}

   +int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
   +{
   + unsigned long size = kvmppc_get_gpr(vcpu, 4);
   + unsigned long addr = kvmppc_get_gpr(vcpu, 5);
   + u64 buf;
   + int ret;
   +
   + if (!is_power_of_2(size) || (size  sizeof(buf)))
   + return H_TOO_HARD;
   +
   + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, buf);
   + if (ret != 0)
   + return H_TOO_HARD;
   +
   + switch (size) {
   + case 1:
   + kvmppc_set_gpr(vcpu, 4, *(u8 *)buf);
   + break;
   +
  
  Most of the code in book3s.c seems not to use a empty line after a
  break;, so may I suggest to remove these empty lines here, too, to
  keep the coding style a little bit more consistent?
 
 I don't think it's worth respinning just for that.
 
   + case 2:
   + kvmppc_set_gpr(vcpu, 4, be16_to_cpu(*(__be16 *)buf));
   + break;
   +
   + case 4:
   + kvmppc_set_gpr(vcpu, 4, be32_to_cpu(*(__be32 *)buf));
   + break;
   +
   + case 8:
   + kvmppc_set_gpr(vcpu, 4, be64_to_cpu(*(__be64 *)buf));
   + break;
   +
   + default:
   + BUG();
  
  If I got the code right, a malicious guest could easily trigger this
  BUG() statement, couldn't it? ... so a BUG() is maybe not the right
  thing to do here. Would it be appropriate to return an error value to
  the guest instead?
 
 Actually no - the test at the top of the function for
 is_power_of_2(size) etc. catches this safely before we get here.  The
 BUG() is just paranoia.

Ah, missed that, you're right, so the code should be fine!

 Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4] kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM

2015-04-21 Thread Thomas Huth
Am Tue, 21 Apr 2015 16:51:21 +1000
schrieb David Gibson da...@gibson.dropbear.id.au:

 On Tue, Apr 21, 2015 at 08:37:02AM +0200, Thomas Huth wrote:
  Am Tue, 21 Apr 2015 10:41:51 +1000
  schrieb David Gibson da...@gibson.dropbear.id.au:
  
   On POWER, storage caching is usually configured via the MMU - attributes
   such as cache-inhibited are stored in the TLB and the hashed page table.
   
   This makes correctly performing cache inhibited IO accesses awkward when
   the MMU is turned off (real mode).  Some CPU models provide special
   registers to control the cache attributes of real mode load and stores but
   this is not at all consistent.  This is a problem in particular for SLOF,
   the firmware used on KVM guests, which runs entirely in real mode, but
   which needs to do IO to load the kernel.
   
   To simplify this qemu implements two special hypercalls, H_LOGICAL_CI_LOAD
   and H_LOGICAL_CI_STORE which simulate a cache-inhibited load or store to
   a logical address (aka guest physical address).  SLOF uses these for IO.
   
   However, because these are implemented within qemu, not the host kernel,
   these bypass any IO devices emulated within KVM itself.  The simplest way
   to see this problem is to attempt to boot a KVM guest from a virtio-blk
   device with iothread / dataplane enabled.  The iothread code relies on an
   in kernel implementation of the virtio queue notification, which is not
   triggered by the IO hcalls, and so the guest will stall in SLOF unable to
   load the guest OS.
   
   This patch addresses this by providing in-kernel implementations of the
   2 hypercalls, which correctly scan the KVM IO bus.  Any access to an
   address not handled by the KVM IO bus will cause a VM exit, hitting the
   qemu implementation as before.
   
   Note that a userspace change is also required, in order to enable these
   new hcall implementations with KVM_CAP_PPC_ENABLE_HCALL.
   
   Signed-off-by: David Gibson da...@gibson.dropbear.id.au
   ---
arch/powerpc/include/asm/kvm_book3s.h |  3 ++
arch/powerpc/kvm/book3s.c | 76 
   +++
arch/powerpc/kvm/book3s_hv.c  | 12 ++
arch/powerpc/kvm/book3s_pr_papr.c | 28 +
4 files changed, 119 insertions(+)
  ...
   diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
   index cfbcdc6..453a8a4 100644
   --- a/arch/powerpc/kvm/book3s.c
   +++ b/arch/powerpc/kvm/book3s.c
   @@ -821,6 +821,82 @@ void kvmppc_core_destroy_vm(struct kvm *kvm)
#endif
}

   +int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
   +{
   + unsigned long size = kvmppc_get_gpr(vcpu, 4);
   + unsigned long addr = kvmppc_get_gpr(vcpu, 5);
   + u64 buf;
   + int ret;
   +
   + if (!is_power_of_2(size) || (size  sizeof(buf)))
   + return H_TOO_HARD;
   +
   + ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, buf);
   + if (ret != 0)
   + return H_TOO_HARD;
   +
   + switch (size) {
   + case 1:
   + kvmppc_set_gpr(vcpu, 4, *(u8 *)buf);
   + break;
   +
  
  Most of the code in book3s.c seems not to use a empty line after a
  break;, so may I suggest to remove these empty lines here, too, to
  keep the coding style a little bit more consistent?
 
 I don't think it's worth respinning just for that.
 
   + case 2:
   + kvmppc_set_gpr(vcpu, 4, be16_to_cpu(*(__be16 *)buf));
   + break;
   +
   + case 4:
   + kvmppc_set_gpr(vcpu, 4, be32_to_cpu(*(__be32 *)buf));
   + break;
   +
   + case 8:
   + kvmppc_set_gpr(vcpu, 4, be64_to_cpu(*(__be64 *)buf));
   + break;
   +
   + default:
   + BUG();
  
  If I got the code right, a malicious guest could easily trigger this
  BUG() statement, couldn't it? ... so a BUG() is maybe not the right
  thing to do here. Would it be appropriate to return an error value to
  the guest instead?
 
 Actually no - the test at the top of the function for
 is_power_of_2(size) etc. catches this safely before we get here.  The
 BUG() is just paranoia.

Ah, missed that, you're right, so the code should be fine!

 Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 4/9] KVM: s390: Add MEMOP ioctls for reading/writing guest memory

2015-03-19 Thread Thomas Huth
 Date: Wed, 18 Mar 2015 21:23:48 -0300
 From: Marcelo Tosatti mtosa...@redhat.com

 On Mon, Mar 16, 2015 at 09:51:40AM +0100, Christian Borntraeger wrote:
  From: Thomas Huth th...@linux.vnet.ibm.com
 
  On s390, we've got to make sure to hold the IPTE lock while accessing
  logical memory. So let's add an ioctl for reading and writing logical
  memory to provide this feature for userspace, too.
...

 What i was wondering is why you can't translate the address
 in the kernel and let userspace perform the actual read/write?

The idea here is to protect the read/write access with the ipte-lock, too.
That way, the whole address translation _and_ the read/write access are
protected together against invalidate-page-table operations from other
CPUs,
so the whole memory access looks atomic for other VCPUs. And since we do
not
want to expose the ipte lock directly to user space, both has to be done
in the kernel.

We already had a long internal discussion about this in our team, and
indeed, if the ipte-lock would be the only problem that we face on s390,
we might also come up with a solution where the memory read/write access
is done in userspace instead. However, for full architecture compliance,
we later have got to support the so-called storage keys during memory
accesses, too, and this can hardly be done accurately and safely from
userspace. So I'm afraid, it's somewhat ugly that we've got to provide an
ioctl here to read/write the guest memory, but it's the only feasible
solution that I could think of.

 Thomas

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Get rid of kvm_kvfree()

2015-03-09 Thread Thomas Huth
On Tue, 24 Feb 2015 21:29:25 +0100
Thomas Huth th...@linux.vnet.ibm.com wrote:

 kvm_kvfree() provides exactly the same functionality as the
 new common kvfree() function - so let's simply replace the
 kvm function with the common function.
 
 Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/x86.c   |8 
  include/linux/kvm_host.h |1 -
  virt/kvm/kvm_main.c  |   10 +-
  3 files changed, 5 insertions(+), 14 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index bd7a70b..c5f7e03 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -7429,7 +7429,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct 
 kvm_memory_slot *free,
 
   for (i = 0; i  KVM_NR_PAGE_SIZES; ++i) {
   if (!dont || free-arch.rmap[i] != dont-arch.rmap[i]) {
 - kvm_kvfree(free-arch.rmap[i]);
 + kvfree(free-arch.rmap[i]);
   free-arch.rmap[i] = NULL;
   }
   if (i == 0)
 @@ -7437,7 +7437,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct 
 kvm_memory_slot *free,
 
   if (!dont || free-arch.lpage_info[i - 1] !=
dont-arch.lpage_info[i - 1]) {
 - kvm_kvfree(free-arch.lpage_info[i - 1]);
 + kvfree(free-arch.lpage_info[i - 1]);
   free-arch.lpage_info[i - 1] = NULL;
   }
   }
 @@ -7491,12 +7491,12 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct 
 kvm_memory_slot *slot,
 
  out_free:
   for (i = 0; i  KVM_NR_PAGE_SIZES; ++i) {
 - kvm_kvfree(slot-arch.rmap[i]);
 + kvfree(slot-arch.rmap[i]);
   slot-arch.rmap[i] = NULL;
   if (i == 0)
   continue;
 
 - kvm_kvfree(slot-arch.lpage_info[i - 1]);
 + kvfree(slot-arch.lpage_info[i - 1]);
   slot-arch.lpage_info[i - 1] = NULL;
   }
   return -ENOMEM;
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index d12b210..0f574eb 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -658,7 +658,6 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
 
  void *kvm_kvzalloc(unsigned long size);
 -void kvm_kvfree(const void *addr);
 
  #ifndef __KVM_HAVE_ARCH_VM_ALLOC
  static inline struct kvm *kvm_arch_alloc_vm(void)
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index a109370..6f1a9c2 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -539,20 +539,12 @@ void *kvm_kvzalloc(unsigned long size)
   return kzalloc(size, GFP_KERNEL);
  }
 
 -void kvm_kvfree(const void *addr)
 -{
 - if (is_vmalloc_addr(addr))
 - vfree(addr);
 - else
 - kfree(addr);
 -}
 -
  static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
  {
   if (!memslot-dirty_bitmap)
   return;
 
 - kvm_kvfree(memslot-dirty_bitmap);
 + kvfree(memslot-dirty_bitmap);
   memslot-dirty_bitmap = NULL;
  }
 

ping?

 Thomas

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] virtio-balloon: do not call blocking ops when !TASK_RUNNING

2015-02-25 Thread Thomas Huth
On Wed, 25 Feb 2015 16:11:27 +0100
Cornelia Huck cornelia.h...@de.ibm.com wrote:

 On Wed, 25 Feb 2015 15:36:02 +0100
 Michael S. Tsirkin m...@redhat.com wrote:
 
  virtio balloon has this code:
  wait_event_interruptible(vb-config_change,
   (diff = towards_target(vb)) != 0
   || vb-need_stats_update
   || kthread_should_stop()
   || freezing(current));
  
  Which is a problem because towards_target() call might block after
  wait_event_interruptible sets task state to TAST_INTERRUPTIBLE, causing
  the task_struct::state collision typical of nesting of sleeping
  primitives
  
  See also http://lwn.net/Articles/628628/ or Thomas's
  bug report
  http://article.gmane.org/gmane.linux.kernel.virtualization/24846
  for a fuller explanation.
  
  To fix, rewrite using wait_woken.
  
  Cc: sta...@vger.kernel.org
  Reported-by: Thomas Huth th...@linux.vnet.ibm.com
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
  
  changes from v1:
  remove wait_event_interruptible
  noticed by Cornelia Huck cornelia.h...@de.ibm.com
  
   drivers/virtio/virtio_balloon.c | 19 ++-
   1 file changed, 14 insertions(+), 5 deletions(-)
  
 
 I was able to reproduce Thomas' original problem and can confirm that
 it is gone with this patch.
 
 Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com

Right, I just applied the patch on my system, too, and the problem is
indeed gone! Thanks for the quick fix!

Tested-by: Thomas Huth th...@linux.vnet.ibm.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] Fix WARNING: quoted string split across lines in kvm_main.c

2015-02-25 Thread Thomas Huth
On Thu, 26 Feb 2015 14:58:18 +0800
Xiubo Li lixi...@cmss.chinamobile.com wrote:

 WARNING: quoted string split across lines
 + printk(KERN_INFO kvm: enabling virtualization on 
 +  CPU%d failed\n, cpu);
 
 When fails to enable virtualization on CPUx for kvm, this log will
 be output in only one line, and it will be a little confusing for us
 to grep this log in kernel source code.
 
 In some case, the user maybe using on script to searching the error
 log in kernel source code, if so, won't it always fail?
 
 So this patch fix this issue.
 
 Signed-off-by: Xiubo Li lixi...@cmss.chinamobile.com
 ---
  virt/kvm/kvm_main.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index a109370..3f08716 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -2817,12 +2817,11 @@ static void hardware_enable_nolock(void *junk)
   cpumask_set_cpu(cpu, cpus_hardware_enabled);
 
   r = kvm_arch_hardware_enable();
 -
   if (r) {
   cpumask_clear_cpu(cpu, cpus_hardware_enabled);
   atomic_inc(hardware_enable_failed);
 - printk(KERN_INFO kvm: enabling virtualization on 
 -  CPU%d failed\n, cpu);
 + printk(KERN_INFO kvm: enabling virtualization on CPU%d 
 failed\n,
 + cpu);

You could use pr_info() here instead of printk(KERN_INFO, ...) to avoid
exceeding the 80 columns limit.

 Thomas

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio balloon: do not call blocking ops when !TASK_RUNNING

2015-02-25 Thread Thomas Huth
On Thu, 26 Feb 2015 11:50:42 +1030
Rusty Russell ru...@rustcorp.com.au wrote:

 Thomas Huth th...@linux.vnet.ibm.com writes:
   Hi all,
 
  with the recent kernel 3.19, I get a kernel warning when I start my
  KVM guest on s390 with virtio balloon enabled:
 
 The deeper problem is that virtio_ccw_get_config just silently fails on
 OOM.
 
 Neither get_config nor set_config are expected to fail.

AFAIK this is currently not a problem. According to
http://lwn.net/Articles/627419/ these kmalloc calls never
fail because they allocate less than a page.

 Thomas

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


virtio balloon: do not call blocking ops when !TASK_RUNNING

2015-02-25 Thread Thomas Huth

 Hi all,

with the recent kernel 3.19, I get a kernel warning when I start my
KVM guest on s390 with virtio balloon enabled:

[0.839687] do not call blocking ops when !TASK_RUNNING; state=1 set at
   [00174a1e] prepare_to_wait_event+0x7e/0x108
[0.839694] [ cut here ]
[0.839697] WARNING: at kernel/sched/core.c:7326
[0.839698] Modules linked in:
[0.839702] CPU: 0 PID: 46 Comm: vballoon Not tainted 3.19.0 #233
[0.839705] task: 021d ti: 021d8000 task.ti: 
021d8000
[0.839707] Krnl PSW : 0704c0018000 0015bf8e 
(__might_sleep+0x8e/0x98)
[0.839713]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 
EA:3
Krnl GPRS: 000d 021d 0071 0001
[0.839718]00675ace 01998c50  

[0.839720]00982134 0058f824 00a008a8 

[0.839722]04d9 007ea992 0015bf8a 
021dbc28
[0.839731] Krnl Code: 0015bf7e: c0200033e838larl
%r2,7d8fee
   0015bf84: c0e50028cd62   brasl   %r14,675a48
  #0015bf8a: a7f40001   brc 15,15bf8c
  0015bf8e: 9201a000   mvi 0(%r10),1
   0015bf92: a7f4ffe2   brc 15,15bf56
   0015bf96: 0707   bcr 0,%r7
   0015bf98: ebdff0800024   stmg%r13,%r15,128(%r15)
   0015bf9e: a7f13fe0   tmll%r15,16352
[0.839749] Call Trace:
[0.839751] ([0015bf8a] __might_sleep+0x8a/0x98)
[0.839756]  [0028a562] __kmalloc+0x272/0x350
[0.839759]  [0058f824] virtio_ccw_get_config+0x3c/0x100
[0.839762]  [0049fcb0] balloon+0x1b8/0x330
[0.839765]  [001529c8] kthread+0x120/0x138
[0.839767]  [00683c22] kernel_thread_starter+0x6/0xc
[0.839770]  [00683c1c] kernel_thread_starter+0x0/0xc
[0.839772] no locks held by vballoon/46.
[0.839773] Last Breaking-Event-Address:
[0.839776]  [0015bf8a] __might_sleep+0x8a/0x98
[0.839778] ---[ end trace d27fcdfa27273d7c ]---

The problem seems to be this code in balloon() in
drivers/virtio/virtio_balloon.c:

wait_event_interruptible(vb-config_change,
 (diff = towards_target(vb)) != 0
 || vb-need_stats_update
 || kthread_should_stop()
 || freezing(current));

wait_event_interruptible() sets the state of the current task to
TASK_INTERRUPTIBLE, then checks the condition. The condition contains
towards_target() which reads the virtio config space via virtio_cread().
On s390, this then triggers virtio_ccw_get_config() - and this function
calls some other functions again that might sleep (e.g. kzalloc or
wait_event in ccw_io_helper) ... and this causes the new kernel warning
message with kernel 3.19.

I think it would be quite difficult or at least ugly to rewrite
virtio_ccw_get_config() so that it does not call sleepable functions
anymore. So would it be feasible to rewrite the balloon() function that
it does not call the towards_target() in its wait_event condition
anymore? I am unfortunately not that familiar with the balloon code
semantics, so any help is very appreciated here!

 Thanks,
  Thomas

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Get rid of kvm_kvfree()

2015-02-24 Thread Thomas Huth
kvm_kvfree() provides exactly the same functionality as the
new common kvfree() function - so let's simply replace the
kvm function with the common function.

Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com
---
 arch/x86/kvm/x86.c   |8 
 include/linux/kvm_host.h |1 -
 virt/kvm/kvm_main.c  |   10 +-
 3 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bd7a70b..c5f7e03 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7429,7 +7429,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct 
kvm_memory_slot *free,
 
for (i = 0; i  KVM_NR_PAGE_SIZES; ++i) {
if (!dont || free-arch.rmap[i] != dont-arch.rmap[i]) {
-   kvm_kvfree(free-arch.rmap[i]);
+   kvfree(free-arch.rmap[i]);
free-arch.rmap[i] = NULL;
}
if (i == 0)
@@ -7437,7 +7437,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct 
kvm_memory_slot *free,
 
if (!dont || free-arch.lpage_info[i - 1] !=
 dont-arch.lpage_info[i - 1]) {
-   kvm_kvfree(free-arch.lpage_info[i - 1]);
+   kvfree(free-arch.lpage_info[i - 1]);
free-arch.lpage_info[i - 1] = NULL;
}
}
@@ -7491,12 +7491,12 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct 
kvm_memory_slot *slot,
 
 out_free:
for (i = 0; i  KVM_NR_PAGE_SIZES; ++i) {
-   kvm_kvfree(slot-arch.rmap[i]);
+   kvfree(slot-arch.rmap[i]);
slot-arch.rmap[i] = NULL;
if (i == 0)
continue;
 
-   kvm_kvfree(slot-arch.lpage_info[i - 1]);
+   kvfree(slot-arch.lpage_info[i - 1]);
slot-arch.lpage_info[i - 1] = NULL;
}
return -ENOMEM;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d12b210..0f574eb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -658,7 +658,6 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
 
 void *kvm_kvzalloc(unsigned long size);
-void kvm_kvfree(const void *addr);
 
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 static inline struct kvm *kvm_arch_alloc_vm(void)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a109370..6f1a9c2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -539,20 +539,12 @@ void *kvm_kvzalloc(unsigned long size)
return kzalloc(size, GFP_KERNEL);
 }
 
-void kvm_kvfree(const void *addr)
-{
-   if (is_vmalloc_addr(addr))
-   vfree(addr);
-   else
-   kfree(addr);
-}
-
 static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
if (!memslot-dirty_bitmap)
return;
 
-   kvm_kvfree(memslot-dirty_bitmap);
+   kvfree(memslot-dirty_bitmap);
memslot-dirty_bitmap = NULL;
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: s390: Add MEMOP ioctls for reading/writing guest memory

2015-02-19 Thread Thomas Huth
On Thu, 19 Feb 2015 10:47:29 +0100
Cornelia Huck cornelia.h...@de.ibm.com wrote:

 On Mon, 16 Feb 2015 13:16:41 +0100
 Thomas Huth th...@linux.vnet.ibm.com wrote:
 
  On s390, we've got to make sure to hold the IPTE lock while accessing
  logical memory. So let's add an ioctl for reading and writing logical
  memory to provide this feature for userspace, too.
  
  Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com
  ---
   Documentation/virtual/kvm/api.txt |   45 +++
   arch/s390/kvm/gaccess.c   |   22 +++
   arch/s390/kvm/gaccess.h   |2 +
   arch/s390/kvm/kvm-s390.c  |   70 
  +
   include/uapi/linux/kvm.h  |   20 ++
   5 files changed, 159 insertions(+), 0 deletions(-)
 
 Looks good, except for one minor thing:
 
  +static long kvm_s390_guest_mem_op(struct kvm_vcpu *vcpu,
  + struct kvm_s390_mem_op *mop)
  +{
  +   void __user *uaddr = (void __user *)mop-buf;
  +   void *tmpbuf = NULL;
  +   int r, srcu_idx;
  +   const u64 supported_flags = KVM_S390_MEMOP_F_INJECT_EXCEPTION
  +   | KVM_S390_MEMOP_F_CHECK_ONLY;
  +
  +   if (mop-flags  ~supported_flags)
  +   return -EINVAL;
  +
  +   if (mop-size  16 * PAGE_SIZE)
  +   return -E2BIG;
 
 Do you want to document this limit? Or make it discoverable by having
 the capability check return the number of pages?

I like the idea with the capability ... I think I'll add that feature.

 Thomas

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: s390: Add MEMOP ioctls for reading/writing guest memory

2015-02-16 Thread Thomas Huth
On s390, we've got to make sure to hold the IPTE lock while accessing
logical memory. So let's add an ioctl for reading and writing logical
memory to provide this feature for userspace, too.

Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com
---
 Documentation/virtual/kvm/api.txt |   45 +++
 arch/s390/kvm/gaccess.c   |   22 +++
 arch/s390/kvm/gaccess.h   |2 +
 arch/s390/kvm/kvm-s390.c  |   70 +
 include/uapi/linux/kvm.h  |   20 ++
 5 files changed, 159 insertions(+), 0 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index b112efc..8aa99f3 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2716,6 +2716,51 @@ The fields in each entry are defined as follows:
eax, ebx, ecx, edx: the values returned by the cpuid instruction for
  this function/index combination
 
+4.89 KVM_S390_MEM_OP
+
+Capability: KVM_CAP_S390_MEM_OP
+Architectures: s390
+Type: vcpu ioctl
+Parameters: struct kvm_s390_mem_op (in)
+Returns: = 0 on success,
+  0 on generic error (e.g. -EFAULT or -ENOMEM),
+  0 if an exception occurred while walking the page tables
+
+Read or write data from/to the logical (virtual) memory of a VPCU.
+
+Parameters are specified via the following structure:
+
+struct kvm_s390_mem_op {
+   __u64 gaddr;/* the guest address */
+   __u64 flags;/* arch specific flags */
+   __u32 size; /* amount of bytes */
+   __u32 op;   /* type of operation */
+   __u64 buf;  /* buffer in userspace */
+   __u8 reserved[32];  /* should be set to 0 */
+};
+
+The type of operation is specified in the op field. It is either
+KVM_S390_MEMOP_LOGICAL_READ for reading from logical memory space or
+KVM_S390_MEMOP_LOGICAL_WRITE for writing to logical memory space. The
+KVM_S390_MEMOP_F_CHECK_ONLY flag can be set in the flags field to check
+whether the corresponding memory access would create an access exception
+(without touching the data in the memory at the destination). In case an
+access exception occurred while walking the MMU tables of the guest, the
+ioctl returns a positive error number to indicate the type of exception.
+This exception is also raised directly at the corresponding VCPU if the
+flag KVM_S390_MEMOP_F_INJECT_EXCEPTION is set in the flags field.
+
+The start address of the memory region has to be specified in the gaddr
+field, and the length of the region in the size field. buf is the buffer
+supplied by the userspace application where the read data should be written
+to for KVM_S390_MEMOP_LOGICAL_READ, or where the data that should be written
+is stored for a KVM_S390_MEMOP_LOGICAL_WRITE. buf is unused and can be NULL
+when KVM_S390_MEMOP_F_CHECK_ONLY is specified.
+
+The reserved field is meant for future extensions. It is not used by
+KVM with the currently defined set of flags.
+
+
 5. The kvm_run structure
 
 
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 267523c..70fd956 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -697,6 +697,28 @@ int guest_translate_address(struct kvm_vcpu *vcpu, 
unsigned long gva,
 }
 
 /**
+ * check_gva_range - test a range of guest virtual addresses for accessibility
+ */
+int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva,
+   unsigned long length, int is_write)
+{
+   unsigned long gpa;
+   unsigned long currlen;
+   int rc = 0;
+
+   ipte_lock(vcpu);
+   while (length  0  !rc) {
+   currlen = min(length, PAGE_SIZE - (gva % PAGE_SIZE));
+   rc = guest_translate_address(vcpu, gva, gpa, is_write);
+   gva += currlen;
+   length -= currlen;
+   }
+   ipte_unlock(vcpu);
+
+   return rc;
+}
+
+/**
  * kvm_s390_check_low_addr_protection - check for low-address protection
  * @ga: Guest address
  *
diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
index 0149cf1..268beb7 100644
--- a/arch/s390/kvm/gaccess.h
+++ b/arch/s390/kvm/gaccess.h
@@ -157,6 +157,8 @@ int read_guest_lc(struct kvm_vcpu *vcpu, unsigned long gra, 
void *data,
 
 int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva,
unsigned long *gpa, int write);
+int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva,
+   unsigned long length, int is_write);
 
 int access_guest(struct kvm_vcpu *vcpu, unsigned long ga, void *data,
 unsigned long len, int write);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 0c36239..fe9eb5a 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -25,6 +25,7 @@
 #include linux/random.h
 #include linux/slab.h
 #include linux/timer.h
+#include linux/vmalloc.h
 #include asm/asm-offsets.h
 #include

  1   2   >